Stanford CS153 Frontier Systems | Mati Staniszewski from ElevenLabs on The Future of Voice Systems

| Podcasts | May 04, 2026 | 4.17 Thousand views | 1:06:26

TL;DR

ElevenLabs CEO Mati Staniszewski explains how the company pivoted from an AI dubbing vision to perfecting text-to-speech by staying close to Discord communities, leveraging open-source research, and running lean to solve the 'one voice' dubbing problem he experienced growing up in Poland.

🎭 Origin and Problem Definition 3 insights

The Polish dubbing monopoly inspiration

Growing up with monotone single-narrator voiceovers for all movie characters in Poland inspired the mission to preserve emotional voice characteristics across languages.

Pivoting from dubbing to voiceover

Customer research revealed creators urgently needed simple voiceover corrections more than full dubbing, shifting early focus to text-to-speech generation.

Narrowing scope to English first

Despite multilingual ambitions, the team focused initially on perfecting emotional English text-to-speech in 2022 rather than the full transcription-translation pipeline.

🔬 Technical Architecture Decisions 3 insights

Learning voice characteristics organically

They abandoned manually programming gender, age, and accent variables in favor of models that abstract these parameters automatically through transformers.

Leveraging open-source breakthroughs

Early architecture drew inspiration from James Betker's Tortoise model, which achieved human-like short-form speech as a Google side project.

Applying LLM context awareness

They utilized next-token prediction breakthroughs to help models understand broader textual context for appropriate emotional delivery.

🚀 Community-Driven Execution 3 insights

Operating on Discord initially

The founders ran the entire company on Discord with custom bots to avoid meetings and email, creating tight feedback loops with early creator communities.

Maximizing limited compute budgets

They trained first checkpoints using under $100,000 in free GPU credits from programs like NVIDIA Inception while skipping a $6,000 patent to preserve cash.

Staying problem-obsessed with users

Through product-led growth and community proximity, they discovered high-demand use cases like audiobook creation that weren't in the original roadmap.

Bottom Line

Solve one narrow technical problem exceptionally well while embedding deeply with your user community to discover real demand, rather than building the full vision from day one.

More from Stanford Online

View all
Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories
49:48
Stanford Online Stanford Online

Stanford MS&E435 Economics of the AI Supercycle | Spring 2026 | Building AI Factories

Crusoe Energy CEO Chase Lockmiller explains how AI data centers represent history's second-largest infrastructure investment, driven by the economic potential of scalable 'digital labor.' He reveals Crusoe's strategy of building massive AI factories in stranded-power locations like Abilene, Texas, to overcome the industry's critical bottleneck: energized data center capacity.

1 day ago · 9 points
Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything
41:10
Stanford Online Stanford Online

Stanford CS153 Frontier Systems | Scale, AGI, and the Future of Everything

Sam Altman explains how AI has fundamentally altered startup economics, enabling small teams to achieve unprecedented scale, while sharing OpenAI's journey from research lab to product company and arguing that pushing systems beyond conventional scaling limits often reveals emergent properties that consensus thinking misses.

4 days ago · 10 points
Stanford CS547 HCI Seminar | Spring 2026 | The Modern Motivators of Play
59:34
Stanford Online Stanford Online

Stanford CS547 HCI Seminar | Spring 2026 | The Modern Motivators of Play

The speaker challenges the game industry's outdated assumption that players primarily seek competition, presenting 2024 data showing only 18% of gamers are motivated by competition while 50% seek stress relief and 40% want community. They introduce a framework of nine motivators divided into classic (Fun, Mastery, Competition, Immersion, Meditation, Comfort) and modern (Self-expression, Companionship, Education), arguing that successful games must layer social and creative motivators onto traditional designs to serve contemporary player needs.

14 days ago · 9 points