How to talk to statues — Joe Reeve, ElevenLabs
TL;DR
Joe Reeve from ElevenLabs discusses building a viral AI app that lets users talk to statues via phone calls, exploring how vibe coding with existing APIs enables rapid prototyping, the unique challenges of voice interface design, and the cultural implications of giving physical objects AI-generated voices.
🏛️ The Viral Statue App 3 insights
Two-hour vibe coding prototype
Joe Reeve built the app in Cursor during a single Sunday, combining OpenAI vision/research, ElevenLabs voice design API, and agents to let users photograph statues and receive contextual phone calls from historically-voiced characters within 30 seconds.
Unexpected enterprise demand
The demo accumulated 1.5 million impressions and attracted inquiries from major museums, auction houses (Bonhams, Christie's), and tourism platforms seeking commercial deployments.
Scalable architecture via APIs
The application relies entirely on managed APIs for heavy lifting, meaning technical scaling requires minimal engineering effort compared to traditional infrastructure.
🎙️ Voice Interface Design 3 insights
Multimodal conversation requirements
Effective voice agents require concurrent visual interfaces displaying extracted conversation data, moving beyond binary voice-only interactions to show what the agent is thinking.
The interruption permission problem
Users hesitate to interrupt speaking agents due to politeness, indicating voice UIs must explicitly design for and encourage aggressive interruption to improve conversational flow.
Indirect interaction patterns
Complex workflows benefit from intermediary 'product manager' agents that translate human speech into tool-specific actions rather than forcing direct voice control of coding agents.
⚡ Culture & Physical AI 3 insights
Embedded experiential technology
Future museum installations will hide microphones and speakers directly within statues and historical phone booths (like the K6 booth featuring Sir Michael Caine's voice) to create seamless physical interactions without screens.
Philosophical voice casting
Voice design should reflect an object's material provenance and history, such as a statue carved in Vietnam from Chinese stone speaking with blended accents reflecting both origins plus its British Museum context.
Democratization of creation
Non-coders are already building functional applications without understanding technical jargon like 'hamburger menus,' suggesting an impending 'Instagram filters moment' for consumer app generation.
Bottom Line
Technical barriers to sophisticated AI applications have collapsed—success now depends on storytelling, curatorial content design, and choosing the right combination of scalable APIs rather than solving hard engineering problems.
More from AI Engineer
View all
How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS
Nick Nisi from WorkOS explains how deleting 95% of his AI agent's skills improved accuracy from 77% to 97%, detailing his 'Case' harness system that uses state machines and cryptographic proof to enforce accountability rather than relying on instructions.
Frontier AI at Home — Alex Cheema, EXO Labs
Alex Cheema from EXO Labs argues that AI should function as a local 'exocortex' rather than rented cloud infrastructure, detailing why inference optimization (not training) is the key bottleneck and how exponential improvements in 'intelligence per joule' will make consumer-grade frontier AI feasible within years.
`What the Best Agents Share` — Mardu Swanepoel, Flinn AI
Mardu Swanepoel from Flinn AI analyzes four design patterns shared by top AI agents—focus modes, transparent execution, personalization, and reversibility—to demonstrate how constraining scope, building trust, and reducing downside risk creates more effective human-agent collaboration.
How Google DeepMind Runs Agents at Scale — KP Sawhney & Ian Ballantyne, Google DeepMind
Google DeepMind engineers Ian Ballantyne and KP Sawhney demonstrate their internal "Antigravity" agent platform, revealing how the organization manages massive-scale deployment through strict quota controls, hybrid model architectures, and collaborative multi-agent workflows while grappling with token consumption costs and evaluation complexity.