Specification Gaming: How AI Can Turn Your Wishes Against You

When we specify goals for AIs, we must ensure that our specifications truly capture what we want. Otherwise, the behavior of AI systems will be different from what we want them to do. This can be catastrophic in high-stakes situations and at high levels of AI capability. If you watched our video “The Hidden Complexity of Wishes“, you’ll recognize these problems as the same kind of failure. If you’d like to skill up on AI Safety, we highly recommend the AI Safety Fundamentals courses by BlueDot Impact at You can find three courses: AI Alignment, AI Governance, and AI Alignment 201 You can follow AI Alignment and AI Governance even without a technical background in AI. AI Alignment 201, instead, presupposes having followed the AI Alignment course first, and equivalent knowledge as having followed university-level courses on deep learning and reinforcement learning. The courses consist of a selection of readings curated by experts in AI safety. They are available to all, so you can simply read them if you can’t formally enroll in the courses. If you want to participate in the courses instead of just going through the readings by yourself, BlueDot Impact runs live courses which you can apply to. The courses are remote and free of charge. They consist of a few hours of effort per week to go through the readings, plus a weekly call with a facilitator and a group of people learning from the same material. At the end of each course, you can complete a personal project, which may help you kickstart your career in AI Safety. BlueDot impact receives more applications that they can take, so if you’d still like to follow the courses alongside other people you can go to the study-buddy channel in the AI Alignment Slack. You can join by clicking on the first entry on You could also join Rational Animations’ Discord server at , and see if anyone is up to be your partner in learning. #ai #aisafety #alignment ▀▀▀▀▀▀▀▀▀SOURCES & READINGS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 9 Examples of Specification Gaming by @RobertMilesAI: Specification gaming: the flip side of AI ingenuity by Victoria Krakovna, Jonathan Uesato, Vladimir Mikulik et al. (2020): Learning from Human Preferences by Paul Christiano, Alex Ray and Dario Amodei (2017): Learning to Summarize with Human Feedback by Jeffrey Wu, Nisan Stiennon, Daniel Ziegler et al. (2020): What failure looks like by Paul Christiano (2019): The alignment problem from a deep learning perspective by Richard Ngo, Soeren Mindermann and Lawrence Chan (2022): ▀▀▀▀▀▀▀▀▀PATREON, MEMBERSHIP, KO-FI▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🟠 Patreon: 🔵 Channel membership: 🟤 Ko-fi, for one-time and recurring donations: ▀▀▀▀▀▀▀▀▀PATRONS & MEMBERS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ Alcher Black RMR Kristin Lindquist Nathan Metzger Monadologist Glenn Tarigan NMS James Babcock Colin Ricardo Long Hoang Tor Barstad Gayman Crothers Stuart Alldritt Chris Painter Juan Benet Falcon Scientist Jeff Christian Loomis Tomarty Edward Yu Ahmed Elsayyad Chad M Jones Emmanuel Fredenrich Honyopenyoko Neal Strobl bparro Danealor Craig Falls Vincent Weisser Alex Hall Ivan Bachcin joe39504589 Klemen Slavic Scott Alexander noggieB Dawson John Slape Gabriel Ledung Jeroen De Dauw Craig Ludington Jacob Van Buren Superslowmojoe Michael Zimmermann Nathan Fish Bleys Goodson Ducky Bryan Egan Matt Parlmer Tim Duffy rictic marverati Luke Freeman Dan Wahl leonid andrushchenko Alcher Black Rey Carroll William Clelland ronvil AWyattLife codeadict Lazy Scholar Torstein Haldorsen Supreme Reader MichaÅ‚ ZieliÅ„ski ▀▀▀▀▀▀▀CREDITS▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ Writer: :3 Producer: :3 Line Producer and production manager: Kristy Steffens Animation director: Hannah Levingstone Quality Assurance Lead: Lara Robinowitz Animation: Michela Biancini Owen Peurois Zack Gilbert Jordan Gilbert Keith Kavanagh Ira Klages Colors Giraldo Renan Kogut Background Art: Hané Harnett Zoe Martin-Parkinson Hannah Levingstone Compositing: Renan Kogut Patrick O’Callaghan Ira Klages Voices: Robert Miles - Narrator VO Editing: Tony Di Piazza Sound Design and Music: Johnny Knittle
Back to Top