Joe Carlsmith Audio

The counting argument for scheming (Sections 4.1 and 4.2 of "Scheming AIs")

November 16, 2023 Joe Carlsmith
Joe Carlsmith Audio
The counting argument for scheming (Sections 4.1 and 4.2 of "Scheming AIs")
Show Notes Chapter Markers

This is sections 4.1 and 4.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” 

Text of the report here: https://arxiv.org/abs/2311.08379
 
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
 
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

4. Arguments for/against scheming that focus on the final properties of the
4.1 Contributors to reward vs. extra criteria
4.2 The counting argument