Why AI's "12-Hour" Task Number Is a Mirage — Beth Barnes & David Rein
Beth Barnes and David Rein expose critical flaws in current AI benchmarks—such as data contamination, shortcutting, and adversarial selection bias—and propose the 'Time Horizon' framework, which measures AI progress by the length of economically relevant tasks models can complete, providing a more stable foundation for forecasting capabilities and risks.