“AI for AI safety” by Joe_Carlsmith

EA Forum Podcast (All audio) - En podkast av EA Forum Team

This is a link post.

(Audio version here (read by the author), or search for "Joe Carlsmith Audio" on your podcast app.

This is the fourth essay in a series that I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction for a summary of the essays that have been released thus far, and for a bit more about the series as a whole.)

1. Introduction and summary

In my last essay, I offered a high-level framework for thinking about the path from here to safe superintelligence. This framework emphasized the role of three key “security factors” – namely:

  • Safety progress: our ability to develop new levels of AI capability safely,
  • Risk evaluation: our ability to track and forecast the level of risk that a given sort of AI capability development [...]

---

Outline:

(00:29) 1. Introduction and summary

(04:12) 2. What is AI for AI safety?

(12:12) 2.1 A tale of two feedback loops

(14:47) 2.2 Contrast with need human-labor-driven radical alignment progress views

(17:24) 2.3 Contrast with a few other ideas in the literature

(19:51) 3. Why is AI for AI safety so important?

(23:14) 4. The AI for AI safety sweet spot

(27:44) 4.1 The AI for AI safety spicy zone

(29:51) 4.2 Can we benefit from a sweet spot?

(31:59) 5. Objections to AI for AI safety

(32:17) 5.1 Three core objections to AI for AI safety

(34:03) 5.2 Other practical concerns

The original text contained 78 footnotes which were omitted from this narration.

The original text contained 14 images which were described by AI.

---

First published:
March 14th, 2025

Source:
https://forum.effectivealtruism.org/posts/pS7NDzfgydmyjEF3g/ai-for-ai-safety

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Visit the podcast's native language site