Building voice-driven AI applications using LLMs

Discover how to create voice-driven AI applications using large language models, focusing on essential components and best practices for success.

The article discusses the potential of voice-driven AI applications and the use of large language models (LLMs) in these applications. It highlights the importance of speech-to-text, text-to-speech, and the LLM itself as the three basic components for building an LLM application. The article also mentions the benefits of running application logic in the cloud, the challenges of phrase detection and endpointing, and the considerations for audio buffer management. It emphasizes the need for reliable and low-latency data flow in voice-driven LLM apps.

Original article: How to talk to an LLM (with your voice)

Featured writing

Nobody takes you aside anymore

Print taught a generation when to stop. What we lose when the machines absorb the constraints that used to form us.

Your AI agents need a water cooler

Coordination is a property of the room, not the org chart. What that means when your coworkers are agents.

On the death of the author and the birth of the detector

Why worrying about AI authorship is lazier, and more prejudiced, than it looks.

Books

The work of being available now

A book on AI, judgment, and staying human at work.

The practice of work in progress

Practical essays on how work actually gets done.

Recent writing

Did the state change? A simple test for whether work actually happened

Either something exists now that did not exist before, or it does not. A simple test for whether work actually happened, and what changes when you build your systems so they can't record anything else.

How to manage content for multiple clients without flattening their voices

How to manage content for multiple clients without their voices blurring into one house style: a workspace and a voice profile per client, batchable stages, and approval buffers.

Why does AI writing sound generic? It has nothing to work with

Why does AI writing sound generic? Because the model has none of your perspective, examples, constraints, or stakes to work with. The fix is interview-first, not better adjectives.

View all writing →

Related thinking

Jasper is a useful tool for developing employee training.

Transform employee training with Jasper by aligning programs to business goals, engaging diverse learning styles, and using innovative methods for success.

The IMF warns about AI’s impact on inequality

IMF warns AI could deepen global inequality, urging policymakers to implement safety nets and retraining programs to protect vulnerable workers.

It’s going to take a century for artifical intelligence to be able to perform most human jobs. But there are going to be some key developments during the next decade.

Explore how AI will transform jobs in the next decade, from enhancing security to automating coding, reshaping the future of work.