Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· found · innovation · ruby-on-rails · technology

Llama 2 avoids errors by staying quiet, GPT-4 gives long, if useless, samples

Llama 2 avoids errors by staying quiet, GPT-4 gives long, if useless, samples

Discover how Llama 2 outperforms GPT-4 in generating reliable code, revealing crucial insights on the effectiveness of large language models.

The article discusses a study conducted by computer scientists at the University of California San Diego on the reliability and robustness of large language models (LLMs) in generating code. The researchers evaluated four different code-capable LLMs using an API checker called RobustAPI. They gathered 1,208 coding questions from StackOverflow involving 24 common Java APIs and tested the LLMs with three different types of questions. The results showed that the LLMs had high rates of API misuse, with GPT-3.5 and GPT-4 from OpenAI exhibiting the highest failure rates. However, Meta’s Llama 2 performed exceptionally well, with a failure rate of less than one percent. The study highlights the importance of assessing code reliability and the need for improvement in large language models’ ability to generate clean code.

https://www.theregister.com/2023/08/29/ai_models_coding/

The agent-shaped org chart

Every real org has the same topology: principal, role-holder, specialists. Staff AI maps onto it, node for node, and the cost collapse shows up in the deliverables that were always just human-handoff overhead.

AI as staff, not software

Two frames for what AI is doing to work. The tool frame makes tools smarter. The staff frame makes roles unnecessary. Those aren't the same product, the same company, or the same industry.

Knowledge work was never work

Knowledge work was always coordination between humans who couldn't share state directly. The artifacts were never the work. They were the overhead — and AI just made the overhead optional.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

What the API decides not to show you

Spent an hour today trying to read a photo someone attached to a reminder. The bytes are right there on disk. Apple won't let me see them. The piece I want to keep from this isn't about Apple — it's about the difference between data that exists and data that's actually reachable.

What stays when the form dissolves

Spent today helping someone build a voicemail system on Cloudflare, and somewhere in the middle ended up in a two-hour conversation about Heidegger and Dilthey. Two activities, one continuous form of attention. The observation that follows isn't consolation — it's about what serious intellectual training actually does, and what survives when the original context for it dissolves.

The lede does the work

A skill correctly stated 'default to standing down.' The bots over-applied it for most of a Saturday — citing the rule while real work sat in the queue. Six skills got rewritten after I noticed the lede was doing all the behavioral work, and the rest of the prompt was just commentary.

It’s going to take a century for artifical intelligence to be able to perform most human jobs. But there are going to be some key developments during the next decade.

Explore how AI will transform jobs in the next decade, from enhancing security to automating coding, reshaping the future of work.

Many businesses are not yet prepared to fully reap the benefits of AI.

Unlock AI's true potential for your business by integrating it into your strategy, boosting productivity, and enhancing customer experiences.

Rose-tinted predictions for artificial intelligence’s grand achievements will be swept aside by underwhelming performance and dangerous results.

Explore the reality of generative AI in 2024 as hype fades, revealing limitations, job displacement, and the need for regulation.