38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future

AXRP - the AI X-risk Research Podcast - En podkast av Daniel Filan

Kategorier:

In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.   Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: @FARAIResearch The Alignment Workshop: https://www.alignment-workshop.com/   Topics we discuss, and timestamps: 01:42 - The difficulty of sabotage evaluations 05:23 - Types of sabotage evaluation 08:45 - The state of sabotage evaluations 12:26 - What happens after AGI?   Links: Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514 Gradual Disempowerment: https://gradual-disempowerment.ai/   Episode art by Hamish Doodles: hamishdoodles.com

Visit the podcast's native language site