Inference, Diffusion, World Models, and More | YC Paper Club
At the inaugural YC Paper Club, Stanford researcher Tanishk presented Speculative Speculative Decoding (SSD), arguing that inference speed is becoming the primary constraint on AI capabilities rather than just a cost factor. The technique achieves 300 tokens per second on Llama 3 70B by parallelizing the drafting and verification steps of speculative decoding, effectively predicting verification outcomes to hide latency.