Counterarguments to the basic AI x-risk case

LessWrong (Curated & Popular)

Chapters
0:46
I. If superhuman AI systems are built, any given system is likely to be ‘goal-directed’
1:23
II. If goal-directed superhuman AI systems are built, their desired outcomes will probably be about as bad as an empty universe by human lights
3:07
III. If most goal-directed superhuman AI systems have bad goals, the future will very likely be bad
5:04
Counterarguments
5:11
A. Contra “superhuman AI systems will be ‘goal-directed’”
5:18
Different calls to ‘goal-directedness’ don’t necessarily mean the same concept
17:00
Ambiguously strong forces for goal-directedness need to meet an ambiguously high bar to cause a risk
18:59
B. Contra “goal-directed AI systems’ goals will be bad”
19:08
Small differences in utility functions may not be catastrophic
22:21
Differences between AI and human values may be small
25:35
Maybe value isn’t fragile
28:44
Short-term goals
30:27
C. Contra “superhuman AI would be sufficiently superior to humans to overpower humanity”
30:37
Human success isn’t from individual intelligence
44:33
AI agents may not be radically superior to combinations of humans and non-agentic machines
48:35
Trust
50:12
Headroom
55:51
Intelligence may not be an overwhelming advantage
1:01:09
Unclear that many goals realistically incentivise taking over the universe
1:03:35
Quantity of new cognitive labor is an empirical question, not addressed
1:05:35
Speed of intelligence growth is ambiguous
1:08:07
Key concepts are vague
1:09:17
D. Contra the whole argument
1:09:22
The argument overall proves too much about corporations
1:09:38
I. Any given corporation is likely to be ‘goal-directed’
1:10:17
II. If goal-directed superhuman corporations are built, their desired outcomes will probably be about as bad as an empty universe by human lights
1:11:57
III. If most goal-directed corporations have bad goals, the future will very likely be bad
1:14:14
Conclusion
LessWrong (Curated & Popular)
Counterarguments to the basic AI x-risk case
Nov 04, 2022
LessWrong