Joe Carlsmith Audio

Non-classic stories about scheming (Section 2.3.2 of "Scheming AIs")

November 16, 2023 Joe Carlsmith
Joe Carlsmith Audio
Non-classic stories about scheming (Section 2.3.2 of "Scheming AIs")
Show Notes Chapter Markers

This is section 2.3.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” 

Text of the report here: https://arxiv.org/abs/2311.08379
 
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
 
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

2.3.2 Non-classic stories
2.3.2.1 AI coordination
2.3.2.2 AIs with similar values by default
2.3.2.3 Terminal values that happen to favor escape/takeover
2.3.2.4 Models with false beliefs about whether scheming is a good strategy
2.3.2.5 Self-deception
2.3.2.6 Goal-uncertainty and haziness
2.3.2.7 Overall assessment of the non-classic stories
2.4 Take-aways re: the requirements of scheming
2.5 Path dependence