"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

LessWrong (Curated & Popular)

Chapters
0:21
Podcast note
1:39
(Text resumes)
2:18
Introduction
4:18
A summary of our understanding of each approach:
11:05
(Text Resumes) Previous related overviews include:
11:43
Main Article Text
11:45
Aligned AI / Stuart Armstrong
14:48
Alignment Research Center (ARC)
14:54
Eliciting Latent Knowledge / Paul Christiano
18:38
Evaluating LM power-seeking / Beth Barnes
21:06
Anthropic
21:51
Interpretability
23:06
Graph: Strength vs Interpretability
23:43
(Text Resumes)
24:43
Scaling laws
25:47
Brain-Like-AGI Safety / Steven Byrnes
27:10
Center for AI Safety (CAIS) / Dan Hendrycks
29:41
Center for Human Compatible AI (CHAI) / Stuart Russell
31:35
Center on Long Term Risk (CLR)
33:13
Conjecture
33:47
Epistemology
35:07
Scalable LLM Interpretability
36:09
Refine
36:49
Simulacra Theory
39:55
David Krueger
42:13
DeepMind
43:55
Dylan Hadfield-Menell
45:26
Encultured
47:16
Externalized Reasoning Oversight / Tamera Lanham
48:48
Diagram: SSL vs RL
49:29
(Text Resumes)
51:11
Future of Humanity Institute (FHI)
52:12
Fund For Alignment Research (FAR)
54:41
MIRI
55:46
Communicate their view on alignment
56:41
Deception + Inner Alignment / Evan Hubinger
58:42
Agent Foundations / Scott Garrabrant and Abram Demski
59:13
Infra-Bayesianism / Vanessa Kosoy
1:03:31
Visible Thoughts Project
1:03:52
Jacob Steinhardt
1:04:58
OpenAI
1:08:33
Ought
1:12:02
Redwood Research
1:12:06
Adversarial training
1:12:16
Diagram: The plan if we had to align AGI right now
1:13:57
(Text Resumes)
1:15:38
LLM interpretability
1:16:13
Sam Bowman
1:17:03
Selection Theorems / John Wentworth
1:20:55
Team Shard
1:23:50
Truthful AI / Owain Evans and Owen Cotton-Barratt
1:25:23
Other Organizations
1:26:31
Appendix
1:26:32
Visualizing Differences
1:27:12
Diagram: Different approaches
1:28:46
(Text Resumes) Conceptual vs. applied
1:29:21
Thomas’s Alignment Big Picture
LessWrong (Curated & Popular)
"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland
Sep 10, 2022

https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is

Despite a clear need for it, a good source explaining who is doing what and why in technical AI alignment doesn't exist. This is our attempt to produce such a resource. We expect to be inaccurate in some ways, but it seems great to get out there and let Cunningham’s Law do its thing.[1] 

The main body contains our understanding of what everyone is doing in technical alignment and why, as well as at least one of our opinions on each approach. We include supplements visualizing differences between approaches and Thomas’s big picture view on alignment. The opinions written are Thomas and Eli’s independent impressions, many of which have low resilience. Our all-things-considered views are significantly more uncertain. 

This post was mostly written while Thomas was participating in the 2022 iteration SERI MATS program, under mentor John Wentworth. Thomas benefited immensely from conversations with other SERI MATS participants, John Wentworth, as well as many others who I met this summer.