Is scheming more likely if you train models to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of "Scheming AIs")

Chapters

Info

Joe Carlsmith Audio

Nov 16, 2023

Joe Carlsmith

This is sections 2.2.4.1-2.2.4.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Text of the report here: https://arxiv.org/abs/2311.08379

Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

This is sections 2.2.4.1-2.2.4.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Joe Carlsmith Audio

Is scheming more likely if you train models to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of "Scheming AIs")

Listen to this podcast on