The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs") Artwork

Joe Carlsmith Audio

Audio versions of essays by Joe Carlsmith. Philosophy, futurism, and other topics. Text versions at joecarlsmith.com.

Joe Carlsmith Audio

The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs")

November 16, 2023 • Joe Carlsmith

0:00 | 19:11

This is section 2.3.1.1 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Text of the report here: https://arxiv.org/abs/2311.08379

Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power