Joe Carlsmith Audio

Empirical work that might shed light on scheming (Section 6 of "Scheming AIs")

November 16, 2023 Joe Carlsmith
Joe Carlsmith Audio
Empirical work that might shed light on scheming (Section 6 of "Scheming AIs")
Show Notes Chapter Markers

This is section 6 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” 

Text of the report here: https://arxiv.org/abs/2311.08379
 
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
 
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

6. Empirical work that might shed light on scheming
6.1 Empirical work on situational awareness
6.2 Empirical work on beyond-episode goals
6.3 Empirical work on the viability of scheming as an instrumental strategy
6.4 The “model organisms” paradigm
6.5 Traps and honest tests
6.6 Interpretability and transparency
6.7 Security, control, and oversight
6.8 Other possibilities