Podcast Episode

Chapters

A summary of our understanding of each approach:

(Text Resumes) Previous related overviews include:

Main Article Text

Aligned AI / Stuart Armstrong

Alignment Research Center (ARC)

Eliciting Latent Knowledge / Paul Christiano

Evaluating LM power-seeking / Beth Barnes

Interpretability

Graph: Strength vs Interpretability

Brain-Like-AGI Safety / Steven Byrnes

Center for AI Safety (CAIS) / Dan Hendrycks

Center for Human Compatible AI (CHAI) / Stuart Russell

Center on Long Term Risk (CLR)

Scalable LLM Interpretability

Simulacra Theory

Dylan Hadfield-Menell

Externalized Reasoning Oversight / Tamera Lanham

Diagram: SSL vs RL

Future of Humanity Institute (FHI)

Fund For Alignment Research (FAR)

Communicate their view on alignment

Deception + Inner Alignment / Evan Hubinger

Agent Foundations / Scott Garrabrant and Abram Demski

Infra-Bayesianism / Vanessa Kosoy

Visible Thoughts Project

Jacob Steinhardt

Redwood Research

Adversarial training

Diagram: The plan if we had to align AGI right now

LLM interpretability

Selection Theorems / John Wentworth

Truthful AI / Owain Evans and Owen Cotton-Barratt

Other Organizations

Visualizing Differences

Diagram: Different approaches

(Text Resumes) Conceptual vs. applied

Thomas’s Alignment Big Picture

LessWrong (Curated & Popular)

"(My understanding of) What Everyone in Technical Alignment is Doing and Why" by Thomas Larsen & Eli Lifland

Sep 10, 2022

https://www.lesswrong.com/posts/QBAjndPuFbhEXKcCr/my-understanding-of-what-everyone-in-technical-alignment-is

Despite a clear need for it, a good source explaining who is doing what and why in technical AI alignment doesn't exist. This is our attempt to produce such a resource. We expect to be inaccurate in some ways, but it seems great to get out there and let Cunningham’s Law do its thing.[1]

The main body contains our understanding of what everyone is doing in technical alignment and why, as well as at least one of our opinions on each approach. We include supplements visualizing differences between approaches and Thomas’s big picture view on alignment. The opinions written are Thomas and Eli’s independent impressions, many of which have low resilience. Our all-things-considered views are significantly more uncertain.

This post was mostly written while Thomas was participating in the 2022 iteration SERI MATS program, under mentor John Wentworth. Thomas benefited immensely from conversations with other SERI MATS participants, John Wentworth, as well as many others who I met this summer.

Share Episode

Share on Facebook Share on Twitter Share on LinkedIn Download

Subscribe

Apple Podcasts Spotify More