Joe Carlsmith Audio
Audio versions of essays by Joe Carlsmith. Philosophy, futurism, and other topics. Text versions at joecarlsmith.com.
Joe Carlsmith Audio
Is scheming more likely if you train models to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of "Scheming AIs")
•
Joe Carlsmith
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
This is sections 2.2.4.1-2.2.4.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
2.2.4 What if you intentionally train models to have long-term goals?
2.2.4.1 Training the model on long episodes
2.2.4.2 Using short episodes to train a model to pursue long-term goals