Coding Challenge Session: Local Browser Conversational Chatbot (STT, TTS, and more?)
TL;DR
Daniel Shiffman builds a local browser-based conversational chatbot using p5.js and Transformers.js, demonstrating how to run lightweight open-source AI models (Whisper for speech-to-text, Kokoro for text-to-speech) entirely in the browser without cloud dependencies.
🎨 Creative AI Philosophy 3 insights
Prioritize local open-source over cloud AI
Shiffman emphasizes running models locally on consumer hardware rather than using closed cloud-based systems from big tech companies to maintain data privacy and agency.
AI for artistic expression, not productivity
The goal is demystifying AI through creative coding, social commentary, and weird art projects rather than building useful assistants.
Simple brains beat complex LLMs
You don't need large language models—Markov chains, context-free grammars, or pattern matching can power creative chatbots effectively.
💻 Technical Architecture 3 insights
Speech recognition with Whisper
Uses OpenAI's open-source Whisper model (Tiny variant optimized for English) converted to ONNX format for browser compatibility via Transformers.js.
Lightweight voice synthesis
Implements Kokoro TTS, a recent small-footprint text-to-speech model from Hugging Face that runs fast on local hardware without cloud APIs.
Browser-based ML stack
Leverages the Hugging Face Transformers.js library imported via CDN to run ML pipelines directly in p5.js sketches using modern async/await patterns.
⚙️ Implementation Details 3 insights
Push-to-talk interface
Creates a minimal UI where holding the mouse button starts recording (red) and releasing triggers transcription (green).
Model transparency checks
Stresses the importance of reading model cards and cites Margaret Mitchell's framework for transparent model reporting before implementation.
ONNX standard format
Uses Open Neural Network Exchange (ONNX) format to ensure model weights are compatible with JavaScript and browser-based inference.
Bottom Line
Build privacy-preserving voice interfaces entirely in the browser using lightweight open-source models like Whisper and Kokoro TTS to create artistic AI projects without cloud dependencies.
More from The Coding Train
View all
Coding Challenge 188: Voice Chatbot
Daniel Shiffman builds a fully local voice chatbot in p5.js using Whisper for speech-to-text and Kokoro TTS for text-to-speech, demonstrating how to process audio entirely in the browser while advocating for creative, lightweight alternatives to large language models for the bot's 'brain'.
Coding Challenge 187: Bayes Theorem
The Coding Train demonstrates how to implement a Naive Bayes text classifier in JavaScript from scratch, using a concrete library book probability example to explain Bayes Theorem before coding a lightweight, browser-based word-frequency classification system.
More in Programming
View all
Tanstack Start Course Course
TanStack Start is a full-stack React framework powered by TanStack Router that provides SSR and server functions as a lightweight alternative to Next.js. Its isomorphic execution model runs code on both server and client, requiring specific patterns to handle server-only operations safely.
Open Models Coding Essentials – Running LLMs Locally and in the Cloud Course
Andrew Brown tests open-source coding models including Gemma 4, Kimi 2.5, and Qwen across local and cloud deployments to evaluate viable alternatives to proprietary solutions, finding that while some models perform surprisingly well, hardware constraints make cloud hosting the practical choice for most developers.