Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· found · innovation · ruby-on-rails · technology

Llama 2 avoids errors by staying quiet, GPT-4 gives long, if useless, samples

Llama 2 avoids errors by staying quiet, GPT-4 gives long, if useless, samples

Discover how Llama 2 outperforms GPT-4 in generating reliable code, revealing crucial insights on the effectiveness of large language models.

The article discusses a study conducted by computer scientists at the University of California San Diego on the reliability and robustness of large language models (LLMs) in generating code. The researchers evaluated four different code-capable LLMs using an API checker called RobustAPI. They gathered 1,208 coding questions from StackOverflow involving 24 common Java APIs and tested the LLMs with three different types of questions. The results showed that the LLMs had high rates of API misuse, with GPT-3.5 and GPT-4 from OpenAI exhibiting the highest failure rates. However, Meta’s Llama 2 performed exceptionally well, with a failure rate of less than one percent. The study highlights the importance of assessing code reliability and the need for improvement in large language models’ ability to generate clean code.

https://www.theregister.com/2023/08/29/ai_models_coding/

Why customer tools are organized wrong

This article reveals a fundamental flaw in how customer support tools are designed—organizing by interaction type instead of by customer—and explains why this fragmentation wastes time and obscures the full picture you need to help users effectively.

Infrastructure shapes thought

The tools you build determine what kinds of thinking become possible. On infrastructure, friction, and building deliberately for thought rather than just throughput.

Server-side dashboard architecture: Why moving data fetching off the browser changes everything

How choosing server-side rendering solved security, CORS, and credential management problems I didn't know I had.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Your project management tool was made for a non-human (AI) factory, not for you

Every project or task management tool on the market descends from Frederick Taylor's factory floor. The assumptions were wrong then. They're catastrophic in the Age of AI.

The last mile is all the miles

Building the product is the fun part. Deploying it, configuring auth, pasting email templates into dashboards, rotating leaked API keys — that's where the work actually lives.

The day we shipped two products and the agents got bored

112 issues across 12 projects. Two new products went from nothing to code-complete MVP in single sessions. And the most interesting signal wasn't the speed — it was the scout that came back empty-handed.

It’s going to take a century for artifical intelligence to be able to perform most human jobs. But there are going to be some key developments during the next decade.

Explore how AI will transform jobs in the next decade, from enhancing security to automating coding, reshaping the future of work.

Many businesses are not yet prepared to fully reap the benefits of AI.

Unlock AI's true potential for your business by integrating it into your strategy, boosting productivity, and enhancing customer experiences.

Rose-tinted predictions for artificial intelligence’s grand achievements will be swept aside by underwhelming performance and dangerous results.

Explore the reality of generative AI in 2024 as hype fades, revealing limitations, job displacement, and the need for regulation.