Joe Carlsmith Audio

Is Power-Seeking AI an Existential Risk?

January 24, 2023 Joe Carlsmith
Is Power-Seeking AI an Existential Risk?
Joe Carlsmith Audio
Chapters
0:13
Abstract
2:31
1 Introduction
6:30
1.1 Preliminaries
10:40
1.2 Backdrop
11:10
1.2.1 Intelligence
13:14
1.2.2 Agency
14:38
1.2.3 Playing with fire
17:04
1.2.4 Power
20:49
2 Timelines
21:13
2.1 Three key properties
21:32
2.1.1 Advanced capabilities
23:28
2.1.2 Agentic planning
30:22
2.1.3 Strategic awareness
31:56
2.2 Likelihood by 2070
34:22
3 Incentives
38:49
3.1 Usefulness
46:11
3.2 Available techniques
47:28
3.3 Byproducts of sophistication
49:39
4 Alignment
50:05
4.1 Definitions and clarifications
57:32
4.2 Power-seeking
1:13:15
4.3 The challenge of practical PS-alignment
1:14:32
4.3.1 Controlling objectives
1:16:41
4.3.1.1 Problems with proxies
1:21:34
4.3.1.2 Problems with search
1:27:08
4.3.1.3 Myopia
1:30:17
4.3.2 Controlling capabilities
1:31:15
4.3.2.1 Specialization
1:36:01
4.3.2.2 Preventing problematic improvements
1:37:43
4.3.2.3 Scaling
1:39:11
4.3.3 Controlling circumstances
1:42:41
4.4 Unusual difficulties
1:44:10
4.4.1 Barriers to understanding
1:47:38
4.4.2 Adversarial dynamics
1:49:40
4.4.3 Stakes of error
1:53:40
5 Deployment
1:57:08
5.1 Timing of problems
2:01:06
5.2 Decisions
2:05:19
Image: assessment of expected value of deployment
2:07:33
5.3 Key risk factors
2:08:02
5.3.1 Externalities and competition
2:12:27
5.3.2 Number of relevant actors
2:15:25
5.3.3 Bottlenecks on usefulness
2:20:40
5.3.4 Deception
2:23:48
5.4 Overall risk of problematic deployment
2:25:38
6 Correction
2:26:32
6.1 Take-off
2:29:10
6.2 Warning shots
2:34:37
6.3 Competition for power
2:48:21
6.4 Corrective feedback loops
2:54:50
6.5 Sharing power
2:56:13
7 Catastrophe
2:58:31
Marker 53
3:01:21
8 Probabilities
3:19:56
Acknowledgments
More Info
Joe Carlsmith Audio
Is Power-Seeking AI an Existential Risk?
Jan 24, 2023
Joe Carlsmith

Audio version of my report on existential risk from power-seeking AI. Text here: https://arxiv.org/pdf/2206.13353.pdf. Narration by Type III audio. 

Show Notes Chapter Markers

Audio version of my report on existential risk from power-seeking AI. Text here: https://arxiv.org/pdf/2206.13353.pdf. Narration by Type III audio. 

Abstract
1 Introduction
1.1 Preliminaries
1.2 Backdrop
1.2.1 Intelligence
1.2.2 Agency
1.2.3 Playing with fire
1.2.4 Power
2 Timelines
2.1 Three key properties
2.1.1 Advanced capabilities
2.1.2 Agentic planning
2.1.3 Strategic awareness
2.2 Likelihood by 2070
3 Incentives
3.1 Usefulness
3.2 Available techniques
3.3 Byproducts of sophistication
4 Alignment
4.1 Definitions and clarifications
4.2 Power-seeking
4.3 The challenge of practical PS-alignment
4.3.1 Controlling objectives
4.3.1.1 Problems with proxies
4.3.1.2 Problems with search
4.3.1.3 Myopia
4.3.2 Controlling capabilities
4.3.2.1 Specialization
4.3.2.2 Preventing problematic improvements
4.3.2.3 Scaling
4.3.3 Controlling circumstances
4.4 Unusual difficulties
4.4.1 Barriers to understanding
4.4.2 Adversarial dynamics
4.4.3 Stakes of error
5 Deployment
5.1 Timing of problems
5.2 Decisions
Image: assessment of expected value of deployment
5.3 Key risk factors
5.3.1 Externalities and competition
5.3.2 Number of relevant actors
5.3.3 Bottlenecks on usefulness
5.3.4 Deception
5.4 Overall risk of problematic deployment
6 Correction
6.1 Take-off
6.2 Warning shots
6.3 Competition for power
6.4 Corrective feedback loops
6.5 Sharing power
7 Catastrophe
Marker 53
8 Probabilities
Acknowledgments