Joe Carlsmith Audio

Simplicity arguments for scheming (Section 4.3 of "Scheming AIs")

November 16, 2023 Joe Carlsmith
Joe Carlsmith Audio
Simplicity arguments for scheming (Section 4.3 of "Scheming AIs")
Show Notes Chapter Markers

This is section 4.3 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” 

Text of the report here: https://arxiv.org/abs/2311.08379
 
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
 
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

4.3 Simplicity arguments
4.3.1 What is “simplicity”?
4.3.2 Does SGD select for simplicity?
4.3.3 The simplicity advantages of schemer-like goals
4.3.4 How big are these simplicity advantages?
4.3.5 Does this sort of simplicity-focused argument make plausible predictions about the sort
4.3.6 Overall assessment of simplicity arguments