The "secret sauce" of recent AI breakthroughs: Post-training with RLVR (and RLHF) | Lex Fridman
Recent AI breakthroughs in reasoning models stem from Reinforcement Learning with Verifiable Rewards (RLVR), which trains models by rewarding accurate solutions to objectively checkable problems like math and coding, enabling scalable performance gains through iterative trial-and-error rather than human preference optimization.