Stanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems
TL;DR
ElevenLabs CEO Mati Staniszewski explains how the company pivoted from an AI dubbing vision to perfecting text-to-speech by staying close to Discord communities, leveraging open-source research, and running lean to solve the 'one voice' dubbing problem he experienced growing up in Poland.
🎭 Origin and Problem Definition 3 insights
The Polish dubbing monopoly inspiration
Growing up with monotone single-narrator voiceovers for all movie characters in Poland inspired the mission to preserve emotional voice characteristics across languages.
Pivoting from dubbing to voiceover
Customer research revealed creators urgently needed simple voiceover corrections more than full dubbing, shifting early focus to text-to-speech generation.
Narrowing scope to English first
Despite multilingual ambitions, the team focused initially on perfecting emotional English text-to-speech in 2022 rather than the full transcription-translation pipeline.
🔬 Technical Architecture Decisions 3 insights
Learning voice characteristics organically
They abandoned manually programming gender, age, and accent variables in favor of models that abstract these parameters automatically through transformers.
Leveraging open-source breakthroughs
Early architecture drew inspiration from James Betker's Tortoise model, which achieved human-like short-form speech as a Google side project.
Applying LLM context awareness
They utilized next-token prediction breakthroughs to help models understand broader textual context for appropriate emotional delivery.
🚀 Community-Driven Execution 3 insights
Operating on Discord initially
The founders ran the entire company on Discord with custom bots to avoid meetings and email, creating tight feedback loops with early creator communities.
Maximizing limited compute budgets
They trained first checkpoints using under $100,000 in free GPU credits from programs like NVIDIA Inception while skipping a $6,000 patent to preserve cash.
Staying problem-obsessed with users
Through product-led growth and community proximity, they discovered high-demand use cases like audiobook creation that weren't in the original roadmap.
Bottom Line
Solve one narrow technical problem exceptionally well while embedding deeply with your user community to discover real demand, rather than building the full vision from day one.
More from Stanford Online
View all
Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories
Crusoe Energy CEO Chase Lockmiller explains how AI data centers represent history's second-largest infrastructure investment, driven by the economic potential of scalable 'digital labor.' He reveals Crusoe's strategy of building massive AI factories in stranded-power locations like Abilene, Texas, to overcome the industry's critical bottleneck: energized data center capacity.
AI in Healthcare Series: Inside the Rise of AI in Healthcare, Open Evidence and Cyber Risks
Former U.S. Chief Data Scientist DJ Patil warns that healthcare systems are dangerously unprepared for AI-enabled cyberattacks from nation states, while simultaneously seeing rapid democratization of medical knowledge through tools like Open Evidence that are fundamentally reshaping the doctor-patient relationship.
Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything
Sam Altman explains how AI has fundamentally altered startup economics, enabling small teams to achieve unprecedented scale, while sharing OpenAI's journey from research lab to product company and arguing that pushing systems beyond conventional scaling limits often reveals emergent properties that consensus thinking misses.
Stanford CS547 HCI Seminar | Spring 2026 | The Modern Motivators of Play
The speaker challenges the game industry's outdated assumption that players primarily seek competition, presenting 2024 data showing only 18% of gamers are motivated by competition while 50% seek stress relief and 40% want community. They introduce a framework of nine motivators divided into classic (Fun, Mastery, Competition, Immersion, Meditation, Comfort) and modern (Self-expression, Companionship, Education), arguing that successful games must layer social and creative motivators onto traditional designs to serve contemporary player needs.