Joe Carlsmith Audio

Speed arguments against scheming (Section 4.4-4.7 of "Scheming AIs")

November 16, 2023 Joe Carlsmith
Joe Carlsmith Audio
Speed arguments against scheming (Section 4.4-4.7 of "Scheming AIs")
Show Notes Chapter Markers

This is section 4.4 through 4.7 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” 

Text of the report here: https://arxiv.org/abs/2311.08379
 
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
 
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

4.4 Speed arguments
4.4.1 How big are the absolute costs of this extra reasoning?
4.4.2 How big are the costs of this extra reasoning relative to the simplicity benefits of
4.4.3 Can we actively shape training to bias towards speed over simplicity?
4.5 The “not-your-passion” argument
4.6 The relevance of “slack” to these arguments
4.7 Takeaways re: arguments that focus on the final properties of the model