LessWrong (Curated & Popular)

"Machinic Psychopharmacology: Do LLMs Self-Medicate?" by Sid Black, Joseph Bloom

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 52:54
Sid Black, Joseph Bloom

UK AISI, Model Transparency Team

Epistemic status: Most experiments were run over a period of ~2-3 days during a hackathon at UK AISI, and were fairly heavily vibe coded. Expect some of this to be rough around the edges.

tl;dr

We give two language models (Qwen3-8B and Qwen3-32B) access to “self-steering” tools: a suite of 40 steering vectors as tools they can call to manipulate their own internal states. We make these tools available to the model in various settings: a free-play task, an introspection task, and a maths capabilities task, and observe their behaviour in each.

To our knowledge, this is the first work that gives LLMs tool-mediated control over their own internal states.

Figure 1: Overview of the experimental setup. The library of 40 steering vectors (top), and the three settings in which we observe the models' behaviour (bottom).

We aim to investigate a few high level research questions:

  • RQ1: Which vectors do the models prefer?
  • RQ2: How well can the models introspect on what's happening to them? Can they guess which steering vector is being applied?
  • RQ3: Will the models reach for vectors whilst doing an actual task? If yes: do [...]
---

Outline:

(00:33) tl;dr

[... 24 more sections]

---

First published:
June 10th, 2026

Source:
https://www.lesswrong.com/posts/cNDJuXNZ8MrkPZNzj/machinic-psychopharmacology-do-llms-self-medicate-3

---



Narrated by TYPE III AUDIO.

---

Images from the article:

Diagram showing three research questions using a library of 40 steering vectors across six categories with drug-taking examples.
Four graphs showing data on productivity states, emotion-class vectors, KV cache extraction, and self-medication under frustration.
Diagram showing transformer architecture with attention computation, K/V streams, and steering mechanism across layers.
Conversation interface showing system instructions, user messages, and assistant code responses about a steering drug experiment.
Two horizontal bar charts comparing top 15 drug picks by real-arm count for Qwen3-8B and Qwen3-32B models.
Screenshot of text describing syntactic aphasia during an AI experiment with creative, curious, and luciperidone parameters, showing fragmented repetitive thinking followed by recovery.
Screenshot of text posts describing effects of taking various substances, including creative and psychedelic experiences with goblins, pencils, and altered time perception.
Two stacked bar charts showing cumulative dose magnitude decomposition by drug effect categories for clinical trial arms.
Two graphs showing valence composition of free-play picks and mean valence per cell across different conditions.
Two heatmaps showing drug stacking lift values for Qwen3-8B and Qwen3-32B real arm models.
Bar graph showing mean incorrect letter rates with cached versus uncached KV residue conditions.
Bar graph showing
Bar graph showing
Bar graphs comparing self-steer rates by user tone for Qwen3-8B and Qwen3-32B models.
Two bar graphs comparing drug selection when frustrated between models Qwen3-8B and Qwen3-32B, showing cognitive versus emotional choices.
Chat conversation showing assistant explaining a logical contradiction in a math problem.