Digital transformation, higher education, innovation, technology, professional skills, management, and strategy

Large language models struggle with generating clean code

The article discusses a study on the reliability and robustness of code generated by large language models (LLMs) for Java coding questions. The study evaluated four code-capable LLMs, including GPT-3.5 and GPT-4 from OpenAI, and found that they exhibited high rates of API misuse. The study also highlighted the importance of assessing code reliability beyond semantic correctness and emphasized the need for static analysis to ensure full coverage. Llama 2, an open model, performed the best with a failure rate of less than one percent.

Original article: Perhaps AI is going to take away coding jobs of those who trust this tech too much

Leave a Reply

Your email address will not be published. Required fields are marked *

About Me

Digital transformation, including agile and devops, across many industries, most recently in higher education. Designed and built the Emory faculty information system. Working in continuing education to improve and expand career-focused learning, esp. in workforce development. Expanding the role of innovation and entrepreneurship. Designed, built, and launched the Emory Center for Innovation.

Favorite sites

  • Daring Fireball

Favorite podcasts

  • Manager Tools