LessWrong (Curated & Popular)

“Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro” by ryan_greenblatt

I've recently written about how I've updated against seeing substantially faster than trend AI progress due to quickly massively scaling up RL on agentic software engineering. One response I've heard is something like: RL scale-ups so far...

September 04, 2025 • 14:02

“⿻ Plurality & 6pack.care” by Audrey Tang

(Cross-posted from speaker's notes of my talk at Deepmind today.) Good local time, everyone. I am Audrey Tang, 🇹🇼 Taiwan's Cyber Ambassador and first Digital Minister (2016-2024). It is an honor to be here with you all at Deepmind.

September 03, 2025 • 23:57

[Linkpost] “The Cats are On To Something” by Hastings

This is a link post. So the situation as it stands is that the fraction of the light cone expected to be filled with satisfied cats is not zero. This is already remarkable. What's more remarkable is that this was orchestrated starting nearly 5000 ...

September 03, 2025 • 4:45

[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom

This is a link post. I've seen many prescriptive contributions to AGI governance take the form of proposals for some radically new structure. Some call for a Manhattan project, others for the creation of a new international organization, etc. The ...

September 03, 2025 • 2:13

“Will Any Old Crap Cause Emergent Misalignment?” by J Bostock

The following work was done independently by me in an afternoon and basically entirely vibe-coded with Claude. Code and instructions to reproduce can be found here. Emergent Misalignment was discovered in early 2025, and is a phenomenon w...

August 28, 2025 • 8:39

“AI Induced Psychosis: A shallow investigation” by Tim Hua

“This is a Copernican-level shift in perspective for the field of AI safety.” - Gemini 2.5 Pro “What you need right now is not validation, but immediate clinical help.” - Kimi K2 Two Minute Summary

August 27, 2025 • 56:46

“Before LLM Psychosis, There Was Yes-Man Psychosis” by johnswentworth

A studio executive has no beliefs That's the way of a studio system We've bowed to every rear of all the studio chiefs And you can bet your ass we've kissed 'em Even the birds in the Hollywood hills

August 27, 2025 • 5:26

“Training a Reward Hacker Despite Perfect Labels” by ariana_azarbal, vgillioz, TurnTrout

Summary: Perfectly labeled outcomes in training can still boost reward hacking tendencies in generalization. This can hold even when the train/test sets are drawn from the exact same distribution. We induce this surprising effect via a form of co...

August 25, 2025 • 13:19

“Banning Said Achmiz (and broader thoughts on moderation)” by habryka

It's been roughly 7 years since the LessWrong user-base voted on whether it's time to close down shop and become an archive, or to move towards the LessWrong 2.0 platform, with me as head-admin. For roughly equally long have I spent around one hu...

August 23, 2025 • 51:47

“Underdog bias rules everything around me” by Richard_Ngo

People very often underrate how much power they (and their allies) have, and overrate how much power their enemies have. I call this “underdog bias”, and I think it's the most important cognitive bias for understanding modern society. I’l...

August 23, 2025 • 13:26

“Epistemic advantages of working as a moderate” by Buck

Many people who are concerned about existential risk from AI spend their time advocating for radical changes to how AI is handled. Most notably, they advocate for costly restrictions on how AI is developed now and in the future, e.g. the Pause AI...

August 22, 2025 • 5:59

“Four ways Econ makes people dumber re: future AI” by Steven Byrnes

(Cross-posted from X, intended for a general audience.) There's a funny thing where economics education paradoxically makes people DUMBER at thinking about future AI. Econ textbooks teach concepts & frames that are great for most thin...

August 21, 2025 • 14:01

“Should you make stone tools?” by Alex_Altair

Knowing how evolution works gives you an enormously powerful tool to understand the living world around you and how it came to be that way. (Though it's notoriously hard to use this tool correctly, to the point that I think people mostly shouldn'...

August 21, 2025 • 6:02

“My AGI timeline updates from GPT-5 (and 2025 so far)” by ryan_greenblatt

As I discussed in a prior post, I felt like there were some reasonably compelling arguments for expecting very fast AI progress in 2025 (especially on easily verified programming tasks). Concretely, this might have looked like reaching 8 hour 50%...

August 21, 2025 • 7:26

“Hyperbolic model fits METR capabilities estimate worse than exponential model” by gjm

This is a response to https://www.lesswrong.com/posts/mXa66dPR8hmHgndP5/hyperbolic-trend-with-upcoming-singularity-fits-metr which claims that a hyperbolic model, complete with an actual singularity in the near future, is a better fit for the MET...

August 20, 2025 • 8:16

“My Interview With Cade Metz on His Reporting About Lighthaven” by Zack_M_Davis

On 12 August 2025, I sat down with New York Times reporter Cade Metz to discuss some criticisms of his 4 August 2025 article, "The Rise of Silicon Valley's Techno-Religion". The transcript below has been edited for clarity. ZMD: In accord...

August 18, 2025 • 10:06

“Church Planting: When Venture Capital Finds Jesus” by Elizabeth

I’m going to describe a Type Of Guy starting a business, and you’re going to guess the business: The founder is very young, often under 25. He might work alone or with a founding team, but when he tells the story o...

August 18, 2025 • 31:18

LessWrong (Curated & Popular)

Episodes

“‘If Anyone Builds It, Everyone Dies’ release day!” by alexvermeer

“Obligated to Respond” by Duncan Sabien (Inactive)

“Chesterton’s Missing Fence” by jasoncrawford

“The Eldritch in the 21st century” by PranavG, Gabriel Alfour

“The Rise of Parasitic AI” by Adele Lopez

“High-level actions don’t screen off intent” by AnnaSalamon

[Linkpost] “MAGA populists call for holy war against Big Tech” by Remmelt

“Your LLM-assisted scientific breakthrough probably isn’t real” by eggsyntax

“Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro” by ryan_greenblatt

“⿻ Plurality & 6pack.care” by Audrey Tang

[Linkpost] “The Cats are On To Something” by Hastings

[Linkpost] “Open Global Investment as a Governance Model for AGI” by Nick Bostrom

“Will Any Old Crap Cause Emergent Misalignment?” by J Bostock

“AI Induced Psychosis: A shallow investigation” by Tim Hua

“Before LLM Psychosis, There Was Yes-Man Psychosis” by johnswentworth

“Training a Reward Hacker Despite Perfect Labels” by ariana_azarbal, vgillioz, TurnTrout

“Banning Said Achmiz (and broader thoughts on moderation)” by habryka

“Underdog bias rules everything around me” by Richard_Ngo

“Epistemic advantages of working as a moderate” by Buck

“Four ways Econ makes people dumber re: future AI” by Steven Byrnes

“Should you make stone tools?” by Alex_Altair

“My AGI timeline updates from GPT-5 (and 2025 so far)” by ryan_greenblatt

“Hyperbolic model fits METR capabilities estimate worse than exponential model” by gjm

“My Interview With Cade Metz on His Reporting About Lighthaven” by Zack_M_Davis

“Church Planting: When Venture Capital Finds Jesus” by Elizabeth