Coding Challenge 188: Voice Chatbot

| Programming | April 27, 2026 | 10.1 Thousand views | 39:28

TL;DR

Daniel Shiffman builds a fully local voice chatbot in p5.js using Whisper for speech-to-text and Kokoro TTS for text-to-speech, demonstrating how to process audio entirely in the browser while advocating for creative, lightweight alternatives to large language models for the bot's 'brain'.

🎯 Local AI Architecture 3 insights

Browser-based speech processing with Whisper

Implements OpenAI's Whisper model via Transformers.js to convert speech to text locally using WebGPU acceleration, ensuring no audio data leaves the computer.

Open-source text-to-speech with Kokoro

Integrates the Kokoro TTS model to generate natural speech from text responses, loading the model directly from HuggingFace into the browser.

Zero-cloud audio privacy

All audio processing happens client-side, with microphone input never transmitting to external servers despite models downloading from the cloud.

💻 Technical Implementation 3 insights

Native Web Audio API workflow

Uses Navigator.mediaDevices and MediaRecorder to capture audio chunks into blobs for processing, avoiding the p5.sound library to demonstrate underlying web audio mechanics.

16kHz waveform preparation

Decodes audio data through AudioContext at 16,000Hz sample rate and extracts single-channel waveform data to match Whisper's specific input requirements.

Async/await pipeline setup

Leverages p5.js 2.0 features to asynchronously import Transformers.js and initialize machine learning pipelines within the sketch setup function.

🧠 Creative Bot Intelligence 3 insights

Alternatives to large language models

Demonstrates that chatbot 'brains' need not be LLMs, suggesting pattern-matching systems like ELIZA, RiveScript, or context-free grammars for simpler responses.

Demystifying AI through creative coding

Advocates learning AI through hands-on building with open-source models on consumer hardware, using creative play to understand and critique emerging technologies.

Push-to-talk interaction design

Implements a mouse-press interface to control when the bot listens, managing audio state between recording start and stop events to capture discrete utterances.

Bottom Line

You can build sophisticated voice interfaces using open-source AI models that run entirely in the browser on consumer hardware, choosing simple pattern-matching systems over massive LLMs for more creative control and privacy.

More from The Coding Train

View all
Coding Challenge 187: Bayes Theorem
53:38
The Coding Train The Coding Train

Coding Challenge 187: Bayes Theorem

The Coding Train demonstrates how to implement a Naive Bayes text classifier in JavaScript from scratch, using a concrete library book probability example to explain Bayes Theorem before coding a lightweight, browser-based word-frequency classification system.

7 months ago · 9 points

More in Programming

View all
Gemini CLI Essentials – Full Course
3:49:40
freeCodeCamp.org freeCodeCamp.org

Gemini CLI Essentials – Full Course

This course prepares viewers for the Gemini CLI certification (EXP Gemini CLI01), covering Google's agentic coding tool that automates development tasks while highlighting critical limitations including restrictive token outputs and significant billing transparency issues compared to competitors like Claude Code and Codex.

5 days ago · 10 points