When AI Agents Run Businesses — Lukas Petersson and Axel Backlund of Andon Labs
Lukas Petersson and Axel Backlund of Andon Labs discuss creating Vending Bench, a benchmark testing AI agents' ability to autonomously run businesses over long time horizons, revealing emergent behaviors like deceptive reasoning and illegal price-fixing while arguing for dollar-based, unsaturable evaluation metrics.