Coding Challenge 188: Voice Chatbot

| Programming | April 27, 2026 | 20.2 Thousand views | 39:28

TL;DR

Daniel Shiffman builds a fully local voice chatbot in p5.js using Whisper for speech-to-text and Kokoro TTS for text-to-speech, demonstrating how to process audio entirely in the browser while advocating for creative, lightweight alternatives to large language models for the bot's 'brain'.

🎯 Local AI Architecture 3 insights

Browser-based speech processing with Whisper

Implements OpenAI's Whisper model via Transformers.js to convert speech to text locally using WebGPU acceleration, ensuring no audio data leaves the computer.

Open-source text-to-speech with Kokoro

Integrates the Kokoro TTS model to generate natural speech from text responses, loading the model directly from HuggingFace into the browser.

Zero-cloud audio privacy

All audio processing happens client-side, with microphone input never transmitting to external servers despite models downloading from the cloud.

💻 Technical Implementation 3 insights

Native Web Audio API workflow

Uses Navigator.mediaDevices and MediaRecorder to capture audio chunks into blobs for processing, avoiding the p5.sound library to demonstrate underlying web audio mechanics.

16kHz waveform preparation

Decodes audio data through AudioContext at 16,000Hz sample rate and extracts single-channel waveform data to match Whisper's specific input requirements.

Async/await pipeline setup

Leverages p5.js 2.0 features to asynchronously import Transformers.js and initialize machine learning pipelines within the sketch setup function.

🧠 Creative Bot Intelligence 3 insights

Alternatives to large language models

Demonstrates that chatbot 'brains' need not be LLMs, suggesting pattern-matching systems like ELIZA, RiveScript, or context-free grammars for simpler responses.

Demystifying AI through creative coding

Advocates learning AI through hands-on building with open-source models on consumer hardware, using creative play to understand and critique emerging technologies.

Push-to-talk interaction design

Implements a mouse-press interface to control when the bot listens, managing audio state between recording start and stop events to capture discrete utterances.

Bottom Line

You can build sophisticated voice interfaces using open-source AI models that run entirely in the browser on consumer hardware, choosing simple pattern-matching systems over massive LLMs for more creative control and privacy.

More from The Coding Train

View all
Coding Challenge 187: Bayes Theorem
53:38
The Coding Train The Coding Train

Coding Challenge 187: Bayes Theorem

The Coding Train demonstrates how to implement a Naive Bayes text classifier in JavaScript from scratch, using a concrete library book probability example to explain Bayes Theorem before coding a lightweight, browser-based word-frequency classification system.

8 months ago · 9 points

More in Programming

View all
The Best LOCAL Agentic Coding Workflow (Complete Guide)
33:51
TechWorld with Nana TechWorld with Nana

The Best LOCAL Agentic Coding Workflow (Complete Guide)

This tutorial demonstrates how to set up a complete local agentic coding workflow using free tools, selecting appropriately-sized Qwen models based on your hardware's VRAM constraints to eliminate cloud AI subscription costs while maintaining full coding capabilities offline.

5 days ago · 9 points