EA - Racing through a minefield: the AI deployment problem by Holden Karnofsky

The Nonlinear Library: EA Forum - En podkast av The Nonlinear Fund

Podcast artwork

Kategorier:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Racing through a minefield: the AI deployment problem, published by Holden Karnofsky on December 31, 2022 on The Effective Altruism Forum.In previous pieces, I argued that there's a real and large risk of AI systems developing dangerous goals of their own and defeating all of humanity - at least in the absence of specific efforts to prevent this from happening. I discussed why it could be hard to build AI systems without this risk and how it might be doable.The “AI alignment problem” refers1 to a technical problem: how can we design a powerful AI system that behaves as intended, rather than forming its own dangerous aims? This post is going to outline a broader political/strategic problem, the “deployment problem”: if you’re someone who might be on the cusp of developing extremely powerful (and maybe dangerous) AI systems, what should you . do?The basic challenge is this:If you race forward with building and using powerful AI systems as fast as possible, you might cause a global catastrophe (see links above).If you move too slowly, though, you might just be waiting around for someone else less cautious to develop and deploy powerful, dangerous AI systems.And if you can get to the point where your own systems are both powerful and safe . what then? Other people still might be less-cautiously building dangerous ones - what should we do about that?My current analogy for the deployment problem is racing through a minefield: each player is hoping to be ahead of others, but anyone moving too quickly can cause a disaster. (In this minefield, a single mine is big enough to endanger all the racers.)This post gives a high-level overview of how I see the kinds of developments that can lead to a good outcome, despite the “racing through a minefield” dynamic. It is distilled from a more detailed post on the Alignment Forum.First, I’ll flesh out how I see the challenge we’re contending with, based on the premises above.Next, I’ll list a number of things I hope that “cautious actors” (AI companies, governments, etc.) might do in order to prevent catastrophe.Many of the actions I’m picturing are not the kind of things normal market and commercial incentives would push toward, and as such, I think there’s room for a ton of variation in whether the “racing through a minefield” challenge is handled well. Whether key decision-makers understand things like the case for misalignment risk (and in particular, why it might be hard to measure) - and are willing to lower their own chances of “winning the race” to improve the odds of a good outcome for everyone - could be crucial.The basic premises of “racing through a minefield”This piece is going to lean on previous pieces and assume all of the following things:Transformative AI soon. This century, something like PASTA could be developed: AI systems that can effectively automate everything humans do to advance science and technology. This brings the potential for explosive progress in science and tech, getting us more quickly than most people imagine to a deeply unfamiliar future. I’ve argued for this possibility in the Most Important Century series.Misalignment risk. As argued previously, there’s a significant risk that such AI systems could end up with misaligned goals of their own, leading them to defeat all of humanity. And it could take significant extra effort to get AI systems to be safe.Ambiguity. As argued previously, it could be hard to know whether AI systems are dangerously misaligned, for a number of reasons. In particular, when we train AI systems not to behave dangerously, we might be unwittingly training them to obscure their dangerous potential from humans, and take dangerous actions only when humans would not be able to stop them. At the same time, I expect powerful AI systems will present massive opportuniti...

Visit the podcast's native language site