{"version":"1.0","type":"rich","provider_name":"Acast","provider_url":"https://acast.com","height":250,"width":700,"html":"<iframe src=\"https://embed.acast.com/$/68470ba8d911dedd6501609c/6a30c6540592e82545da3db2?\" frameBorder=\"0\" width=\"700\" height=\"250\"></iframe>","title":"Why Tejal Patwardhan stopped underestimating the models - Episode 21","thumbnail_width":200,"thumbnail_height":200,"thumbnail_url":"https://open-images.acast.com/shows/68470ba8d911dedd6501609c/1781581356135-2cbdafb7-21c6-4372-806a-4a4c8617b0b1.jpeg?height=200","description":"<p>The old tests are getting too easy. Tejal Patwardhan leads OpenAI’s frontier evals team, which is finding new ways to measure and forecast progress as models become more capable. She and host Andrew Mayne discuss why evals matter for research, how benchmarks can break or get gamed, and what models need to be judged on next.</p><p><br></p><p><strong>Chapters</strong></p><p><br></p><p>00:00:24 Growing up at OpenAI</p><p>00:03:10 Why reasoning changed everything</p><p>00:06:28 What made o1 surprising</p><p>00:11:20 Why old benchmarks stopped working</p><p>00:14:45 What makes a good benchmark</p><p>00:17:35 Why evals are getting harder</p><p>00:22:09 Measuring voice and vision models</p><p>00:24:48 Testing models on real science</p><p>00:33:23 How OpenAI tracks frontier progress</p><p>00:40:47 What AI means for work</p><p><br></p><p><br></p>","author_name":"OpenAI"}