Skip to main content
Paul Welty, PhD AI, WORK, AND STAYING HUMAN

· found · technology

Large language models struggle with generating clean code

Large language models struggle with generating clean code

Explore how large language models struggle with clean code generation, revealing high API misuse and the need for better reliability assessments.

The article discusses a study on the reliability and robustness of code generated by large language models (LLMs) for Java coding questions. The study evaluated four code-capable LLMs, including GPT-3.5 and GPT-4 from OpenAI, and found that they exhibited high rates of API misuse. The study also highlighted the importance of assessing code reliability beyond semantic correctness and emphasized the need for static analysis to ensure full coverage. Llama 2, an open model, performed the best with a failure rate of less than one percent.

Original article: Perhaps AI is going to take away coding jobs of those who trust this tech too much

The agent-shaped org chart

Every real org has the same topology: principal, role-holder, specialists. Staff AI maps onto it, node for node, and the cost collapse shows up in the deliverables that were always just human-handoff overhead.

AI as staff, not software

Two frames for what AI is doing to work. The tool frame makes tools smarter. The staff frame makes roles unnecessary. Those aren't the same product, the same company, or the same industry.

Knowledge work was never work

Knowledge work was always coordination between humans who couldn't share state directly. The artifacts were never the work. They were the overhead — and AI just made the overhead optional.

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

The file I almost made twice

A small operational footgun that runs everywhere — building a parallel system when the one you have is fine.

The actor doesn't get to be the verifier

The worker isn't lying. The worker is reporting what it thought it did, which is always one step removed from what the world actually shows. The fix isn't more self-honesty. The fix is a different pair of eyes.

Shopping is the last mile

Every meal planning app treats cooking as the hard problem and shopping as a logistics detail. They have it backwards. Cooking is mostly solved. Shopping is the last mile.

Article analysis: Sintra AI review: All-in-one business automation platform

Streamline your business operations with Sintra AI, the all-in-one platform designed to enhance automation and optimize efficiency effortlessly.

Article analysis: The 10 best headless CMS platforms to consider

Discover the top 10 headless CMS platforms that boost flexibility, performance, and scalability, transforming your content management strategy today.

Article analysis: Analyzing unionization trends: Why 67% of American tech workers are interested in joining a union

Explore why 67% of American tech workers are drawn to unionization, revealing key differences across major companies like Intuit, Apple, and Tesla.