Heliox: Where Evidence Meets Empathy π¨π¦β¬
We make rigorous science accessible, accurate, and unforgettable.
Produced by Michelle Bruecker and Scott Bleackley, it features reviews of emerging research and ideas from leading thinkers, curated under our creative direction with AI assistance for voice, imagery, and composition. Systemic voices and illustrative images of people are representative tools, not depictions of specific individuals.
We dive deep into peer-reviewed research, pre-prints, and major scientific worksβthen bring them to life through the stories of the researchers themselves. Complex ideas become clear. Obscure discoveries become conversation starters. And you walk away understanding not just what scientists discovered, but why it matters and how they got there.
Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.
Heliox: Where Evidence Meets Empathy π¨π¦β¬
π The Water Is Already at Your Knees, and what you do next might define the next century of human work
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
It is a civilizational invitation to redesign what we train human beings to do.
The water is coming. We have a few years β probably more than the doomers say, probably less than the optimists hope β to learn how to swim in it. Not to resist the tide, but to let it carry the weight of the routine while we climb to the shore of genuine invention.
The machines are finally building an infrastructure that might fairly value us. The question is whether we'll have the courage β and the educational systems, the economic incentives, and the cultural permission β to become worth valuing in the ways they cannot replicate.
The tide is rising. What are you building on high ground?
This is Heliox: Where Evidence Meets Empathy
Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.
Disclosure: This podcast uses AI-generated synthetic voices for a material portion of the audio content, in line with Apple Podcasts guidelines.
We make rigorous science accessible, accurate, and unforgettable.
Produced by Michelle Bruecker and Scott Bleackley, it features reviews of emerging research and ideas from leading thinkers, curated under our creative direction with AI assistance for voice, imagery, and composition. Systemic voices and illustrative images of people are representative tools, not depictions of specific individuals.
We dive deep into peer-reviewed research, pre-prints, and major scientific worksβthen bring them to life through the stories of the researchers themselves. Complex ideas become clear. Obscure discoveries become conversation starters. And you walk away understanding not just what scientists discovered, but why it matters and how they got there.
Independent, moderated, timely, deep, gentle, clinical, global, and community conversations about things that matter. Breathe Easy, we go deep and lightly surface the big ideas.
Spoken word, short and sweet, with rhythm and a catchy beat.
http://tinyurl.com/stonefolksongs
You know, usually when we talk about a technological revolution, there is this, I don't know, this implicit expectation of precision. Right. Like it feels like engineering. Exactly. Like we look back at history and it seems so localized. A new steam powered loom is invented. It gets installed on a literal brick and mortar factory floor and a worker can just point at it. Yeah. Point at it and say, there, that physical object is the thing that's changing my job, my town. Right. It's bounded. You can walk a circle around a loom or, you know, an early assembly line robot. Right. You can measure its inputs, clock its outputs, and basically understand the physical geometry of the disruption. But when you step into the world of generative artificial intelligence, and specifically how it's hitting the global labor market right now, that whole factory floor metaphor just completely breaks down. Oh, totally. It shatters. Because we are looking at a landscape of disruption that is just entirely formless. It's everywhere and nowhere at the same time. Yeah, you can't point to a piece of software living on a server farm in like another state and understand how it's fundamentally rewriting the social contract of your Tuesday morning meeting. Right. It's like we're trying to measure a mist. Which is the absolute definition of diagnostic muddy waters in economics and labor studies. I mean, the technology is adopting so fast and mutating so unpredictably that traditional labor statistics are struggling to even categorize what's happening. Let alone measure it. And honestly, that is exactly the mission of our deep dive today for you listening. We are going to map. the exact shape and the exact speed of the AI wave that is currently washing over the global workforce. Because for the last couple of years, let's be honest, everyone from boardrooms to break rooms has been operating purely on vibes. Pure vibes, yeah. You know, you see a tech CEO make a terrifying prediction on some blog or... Someone posts a viral thread about automating their entire legal department with a Python script. We've been living on a very unsteady diet of anecdotes, extrapolations, and, well, marketing hype. Yeah, lots of hype. So today, we are swapping the vibes for hard empirical data. We have pulled together an absolute mountain of evidence to unpack for you. And it is a fascinating mountain. It really is. We are going to look at massive, high-frequency payroll data analyzing millions of American workers. We're going to dive into deep, qualitative research from the front lines of Reddit threads. And crucially, we're exploring a groundbreaking new paper from the MIT Future Tech team. Yeah, that MIT paper, it's called Crashing Waves vs. Rising Tides. And it fundamentally challenges the prevailing myth of how AI automation actually happened. I'm going to completely reframe the timeline you have in your head about your own career. Oh, absolutely. The authors literally mapped out thousands of real-world tasks to see what AI can actually do in the wild, not just, you know, what it does in a sterile lab. And we're going to dig deep into their methodology because the journey they took to find this data is a whole story in itself. But diagnosing the disruption is really only half the battle. Right. What do we do about it? Exactly. So we're also bringing in insights from a recent deep dive from the Heliox series. an exploration called the architecture of innovation. Which tackles the ultimate existential question here. Right. If AI inevitably takes over the routine day-to-day tasks of the knowledge economy, how do we build a world and, like a mathematical economy, that actually rewards true paradigm shifting human genius. We have a lot of ground to cover. We'll start with the very real measurable panic happening on the ground right now. Then we'll dismantle the flawed assumptions the tech industry made about how AI learns. From there, we look at the MIT team's startling discovery of the actual mathematical shape of AI progress. And finally, we'll explore a technical blueprint for the future of the human mind. Let's jump right into the ground truth. Because while Silicon Valley has been busy debating theoretical benchmarks and AGI timelines, economists have been looking at actual physical paychecks. Yeah, and those paychecks are telling a very specific and frankly somewhat alarming story. So to understand the reality on the ground, We have to look at this massive recent study out of the Stanford Digital Economy Lab. Right. Authored by Eric Bringelson and a team of researchers. It's titled Canaries in the Coal Mine. Catchy title. Yeah, it fits perfectly. So the researchers knew that waiting for the Bureau of Labor Statistics to catch up with their annual surveys would just be way too slow. They needed real-time data. So they partnered with ADP. Exactly. And just for context, ADP is the payroll processing giant. If you're listening to this in the United States and you have ever received a direct deposit, there's a very high probability ADP facilitated it. Right. They process payroll for over 25 million workers in the U.S., This is critical because the researchers aren't just looking at sentiment surveys where people guess how worried they are about AI. They're not asking for opinions. No, they are looking at high-frequency, individual-level administrative data. This is the unarguable truth of the W-2 and the pay stub across tens of thousands of firms. Okay, so they have this massive ocean of data showing exactly who's getting hired, who's getting fired, who's getting promoted month by month. How do they filter that to find the AI impact specific? They cross-referenced this employment data with occupational AI exposure metrics. Basically, various economic institutions have spent the last few years ranking occupations based on how many of their core tasks overlap with what large language models can do. So like... A carpenter has very low AI exposure, while a junior copywriter or maybe customer service rep or an entry-level software developer has extremely high exposure. Precisely. They took those highly exposed occupations and zeroed in on early career workers, specifically ages 22 to 25. The recent grads. Yeah. And when they looked at the timeline, starting from the widespread mainstream adoption of tools like ChatGPT and ClaudeSow around late 2022 into early 2023, they found a stark reality. What was it? Early career workers in these highly exposed fields have experienced a 16% relative employment decline. 16%. I mean, I want to pause on that because that is not a small margin of error. No, it's massive. That is a massive structural shift in a very short amount of time. But let me play devil's advocate here for a second. The macroeconomic environment over the last couple of years has been incredibly weird. Oh, absolutely. The tech industry had a massive, frankly, reckless hiring boom during the pandemic. Couldn't this 16% drop just be... The post-COVID hangover, like a natural market correction where companies are trimming the excess fat. Yeah. Or maybe high interest rates cooling the broader economy. It is the exact first question any rigorous economist would ask, and the researchers anticipated it. This is where the study becomes really robust. Okay. They use a statistical method called firm time fixed effects. Okay. Walk me through how that works. Essentially, it lets them filter out the noise of the broader economy and look at the behavior of a specific company at a specific moment in time. OK, like a microscope. Yeah. So they look within the exact same company, let's say a midsize marketing agency in Chicago during the exact same month, and they compare the hiring and retention of junior workers in their AI exposed roles like their junior copywriter. Against the junior workers in their roles that aren't exposed to AI. Exactly. Like their event coordinators or physical logistics staff. Okay. So if it was just a general hiring freeze because of interest rates or a post-COVID correction, the company would be cutting the event coordinators and the copywriters equally. But they aren't. Wow. The decline holds perfectly even when you use these fixed effects. It is laser targeted at the jobs AI is fundamentally good at. That is wild. They even went a step further and controlled for whether the job was remote or in office, just in case this was actually a story about offshoring junior roles to cheaper labor markets. abroad. And it wasn't. It wasn't. The 16 percent drop is incredibly resilient in the data. It is a real localized phenomenon. So basically the entry level job in the knowledge economy is just vanishing before our eyes. It's fundamentally transforming and the sheer volume of those roles is shrinking. But there is a critical twist in the ADP data that kind of complicates the narrative. Okay, what's the twist? When Bryn Jolson's team looked at older, more experienced workers, the mid-career professionals and seniors, people in their 30s, 40s, and 50s in those exact same highly exposed occupations. Yeah. Their employment is doing just fine. In some sectors, it's actually growing slightly. Wait, I'm having trouble visualizing the logic there. If I'm a 23-year-old junior software engineer, my job prospects are shrinking by double digits. Yes. But if I'm a 45-year-old senior software engineer in the exact same department facing the exact same AI tools, I'm totally safe. why doesn't the AI automate the senior person too? Because of the nature of the tasks each cohort actually performs. Think about what junior workers are typically hired to do. They handle the highly codifiable, routine, repetitive tasks. They write the boilerplate code. They summarize the meeting notes, draft standard vendor emails, pull initial data queries. The mechanical groundwork. Exactly, which is exactly what a generative AI is built to do. For the junior worker, the AI is a direct substitute. It replaces their labor entirely. Okay, but the senior workers? Their jobs rely on something entirely different, something called tacit knowledge. Tacit knowledge, meaning the things you know but can't necessarily write down in a manual. Yes. The senior software engineer isn't just writing code. They know why the code base is structured a certain way because of, like, a bizarre server migration that happened five years ago. Right, the institutional memory. Exactly. They know the unwritten political rules of the organization. They know how to navigate the ambiguity of a client who doesn't actually know what they want. Exactly. Generative AI cannot mimic that contextual, historical, relational knowledge. Oh, I see. So for the senior worker, the AI isn't a substitute. It's a complement. Yes. It's like the ultimate bicycle for the mind. It writes the boilerplate code for them in five seconds, freeing them up to spend all their time on the high level architectural thinking that the AI just can't. do. And that makes the senior worker drastically more productive, which obviously makes them more valuable to the firm, so the firm retains them. Meanwhile, the AI acts as a steamroller for the junior worker. It's a fascinating macroeconomic picture. But it feels a little cold. You know, like looking at 25 million W-2s tells us that it's happening. But it doesn't really tell us what it actually feels like to be the 23-year-old under that steamroller. Yeah, that's the human element. And to understand the human toll, we have to look at a very different kind study okay this one was published in the journal Frontiers by researchers Anurag Shakar and ween Odin they wanted to capture the lived experience of this displacement how do you even measure that well they went to the ultimate repository of unfiltered anonymous human honesty Reddit. Reddit. I am continually amazed by how much serious academic research is currently being built on the foundation of Reddit threads. It's incredibly valuable qualitative data if it's analyzed correctly, so they found this massive viral thread on the Rask Reddit forum titled, "Hey people who lost their jobs to AI, what happened?" Man, I can only imagine those comments. What? The thread had thousands of replies. The researchers rigorously analyzed over 1,400 of the most substantive comments. They used a mixed methods design running computational text analytics to find patterns and then doing deep qualitative thematic coding to extract the underlying meaning. And what do they find? I imagine it's just a lot of standard grievance. I mean, people are angry when they lose their jobs, whether it's to a machine or just a bad economy. They did find anger, of course, but they discovered something much more specific. They identified a novel psychological phenomenon that they termed algorithmic technostra. Algorithmic technostris. Let's pull that apart. Yeah. How is that fundamentally different from the stress of, say, a factory closing down in the 1980s? Because the company bought robotic arms. Like, stress is stress, right? It's different because of the temporal dimension, the timeline of the loss. Okay, what do you mean? A traditional layoff, like a factory closing, is a discrete event. The company announces they're moving operations to Mexico or they bought a machine. You get a pink slip. It is a sudden, sharp break. The psychological contract between the worker and the employer is severed in one clear snap. You know exactly who to be mad at, and you know exactly when it happened. Exactly. But AI displacement doesn't work like that. The workers on Reddit described an experience that unfolds through what the researchers call breach cascade. Breach cascades. It is the proverbial death by a thousand cuts. It's an incremental automation where no single change seems worthy of a layoff, but the cumulative impact is devastating. Walk me through what a breach cascade actually looks like for a knowledge worker. Give me an example. Okay, let's imagine a junior marketing copyright. Week one, management introduces a new enterprise AI tool. They tell the team, hey, this is just an assistant. It's here to help brainstorm. No one is fired. Right. Everyone's still employed. But suddenly, the junior copywriter isn't coming up with original taglines anymore. Their job shifts to feeding prompts into the AI and seeing what it spits out. A slight demotion in creative autonomy, but yeah, you're still employed. Then month two, the AI gets a software update. now can mimic the company's brand voice perfectly. The junior writer goes from prompting to just lightly editing the AI's output. Their billable hours get cut. The water level is rising. Month four, management realizes that the senior marketing director can just prompt the AI themselves and completely bypass the junior writer. Finally, month six, the junior role is quietly eliminated in a quote unquote reorganization. Wow. No single step in that six month cascade felt like a massive, discrete betrayal. But the cumulative impact destroys the workers trust. It's like an erosion of your professional soul. You aren't being laid off and replaced by a cheaper human being. You are being slowly, systematically hollowed out by an algorithm that literally learned your specific style by watching you work. And that dynamic causes severe existential dread. The Frontier study made a point to distinguish this from previous waves of IT stress. Right. In the 90s, when offices transitioned to computers, people had techno stress. But it was about the frustration of learning a conky piece of software. Algorithmic technostris makes people question their intrinsic human value. I read a quote once from a displaced worker that perfectly captures this. They said, "If a machine can learn in three seconds what took me four years of university and tens of thousands of dollars in student loans to master, what is my purpose?" That is the exact core of the psychological crisis. And it isn't just an individual economic tragedy. It's a structural crisis for society. Which actually leads us to a concept paper published in MDPI's Society's Journal by Michael Gerlich. Okay, Gerlich. Gerlich takes this micro-level Reddit dread and zooms out to look at the macro implications. He introduces the idea of societal bifurcation. We talk a lot about divides, the wealth gap, the digital divide. What kind of bifurcation is Gerlich warning about? Well, he argues that our traditional framework for inequality, the digital divide, is obsolete here. The digital divide was about access, right? Who has a laptop and high speed broadband? and who doesn't.- If you don't have the hardware, you fall behind.- Exactly. But with generative AI, access is not the bottleneck. The tools are incredibly cheap, often entirely free on your smartphone. The new divide is cognitive.- Cognitive.- Yes. It is the split between cognitive dependency and cognitive resilience. Cognitive dependency. Okay, I think I have a good analogy for this. Let me try it out on you. Let's hear it. I think it's like the evolution of GPS navigation. Okay, I like where this is going. So I remember when I first moved to a new city before smartphones were everywhere, you had to look at a statistical map or at least print out MapQuest directions. You learned the major arteries. You understood where north and south were. You built a mental model of the grid in your head. When early GPS came out, if you use it reflectively, meaning you still paid attention to the ride it was suggesting, you became a much better, faster driver. That's cognitive resilience. You're using the tool, but you retain your spatial awareness. You hold the mental model in your head. Exactly. Yeah. But then GPS got so good and so seamless that we stopped looking at the map entirely. We just blindly followed the blue line on the screen. We offloaded all of our spatial reasoning to the machine for the sake of convenience. Oh, absolutely. And now if you are driving in a new city and your phone battery dies, you are hopelessly, dangerously lost. You have no idea where you are. That is cognitive dependency. That analogy maps perfectly onto Gerlich's thesis. He argues that AI accelerates the automation of symbolic and analytical tasks, the very tasks we use to build our mental models of a profession. If you use AI to offload your critical thinking entirely, you become cognitively dependent. You lose your interpretative autonomy. If the environment shifts or the AI begins to hallucinate bad information, it's a problem. you have no underlying foundation to fall back on. But if you use it reflectively, like a sparring partner, to challenge your own human-generated ideas, you gain massive productivity. Yes. But apply your GPS analogy to the ADP payroll data we just discussed. Okay. If the AI steamroller flattens all the entry-level routine tasks, Those routine tasks were the exact mechanisms young humans historically used to build their mental maps of a profession. Oh, wow. I didn't even think about that. How does a junior lawyer ever become a senior partner if they never have to suffer through the tedious, agonizing work of document review and case law research? The routine work is where you learn the streets of the legal profession. If the AI is always driving the car, the junior never learns to navigate. They are structurally blocked from ever gaining the tacit knowledge that protects the senior workers. This is what Gerlich means by structural bifurcation. It's not just that some people will be rich and some poor. It's that we're creating a divergence in human adaptive capacity. We are structurally preventing an entire cohort of young workers from developing cognitive resilience. Man, this is an incredibly bleak picture we've painted so far. We have 16% of young workers vanishing from the data, facing profound existential dread, and those who remain are being structurally blocked from gaining actual expertise. It's heavy. It is. And to understand why this is hitting so hard and why the tech industry seems so unprepared for the fallout, we have to look at the assumptions the AI developers themselves were making. Because the prevailing theory of how AI progresses turned out to be terrifying, but quite possibly mathematically flawed. Which brings us to the mythology of AI development. To understand the anxiety permeating the market, you really have to listen to the rhetoric coming directly from the creators of these models over the last few years. years. It has not been subtle at all. We mentioned Dario Amadei, the CEO of Anthropic, the company that builds the clawed models. He was on a popular podcast recently and just casually predicted that AI could wipe out 50% of all entry-level white-collar jobs within five years. Yeah, that isn't a sci-fi author talking. That's the guy holding the keys to the server farm. Right. And that prediction isn't just hyperbole meant to generate headlines, is it? No, it is rooted in a very specific paradigm of how AI capabilities scale. This paradigm is heavily championed by benchmark organizations like MER, an organization focused on assessing advanced AI risks. We can refer to this paradigm as the crashing wave theory. Let's visualize that. A crashing wave. Yeah. It implies something violent, sudden, and highly localized. Yes. You scan on a beach, you see the swell way out in the ocean, and when it finally breaks, it completely demolishes a very specific narrow zone of the shoreline all at once. Exactly. In mathematical terms, the crashing wave theory is represented by a very steep logistic curve. Imagine a graph where the vertical y-axis is the AI's success rate, and the horizontal x-axis is the difficulty of the task. Okay, got it. My head. Under the crashing wave model, the curve stays flat near zero for a long time and then suddenly spikes straight up. It goes from zero to 100 almost instantly. Exactly. AI capabilities are thought to surge abruptly over small, specific sets of tasks, once certain threshold of computing power or model size is crossed. I want to push back on this a little bit or at least get a clearer picture. What does that actually look like in practice? How do the AI developers convince themselves that progress happens in these sudden violent spikes? They convince themselves using lab benchmarks. Consider a famous benchmark called SWE Bench. SWE stands for software engineering. Right. It's a massive data set of real world GitHub issues and pull requests used to test whether an AI can autonomously solve complex coding problems. So they basically feed the AI a bug report and say, fix this. Right. And the developers will track a model's progress over time. For months or even across entire generations of a model, the AI will score an absolute zero on a specific, complex class of WE bench problems. It fails entirely. It literally cannot do it. But then the lab scales up the compute. They add 10 times more parameters. They train it for a few more months. They release the new version. And practically overnight, the AI goes from a 0% success rate to an 80% success rate on that exact same class of problems. Wow. So it's an inflation point. The wave crests. It's a sudden discontinuous jump in capability. Now take that lab dynamic and extrapolate it to the global economy. It implies that disruption will be violent and entirely unpredictable. So you go to sleep on a Sunday night and your job is entirely safe because the current AI model is physically incapable of doing your tasks. You wake up on Monday morning, OpenAI drops a new model, the threshold is crossed, the wave crashes, and your job is 100% automated by lunchtime. That underlying assumption is the engine driving the algorithmic anxiety we saw on the Reddit thread. If the wave can crash at literally any moment, you live in a state of constant, low-level terror. I see the logic, but that assumes the lab tests are an accurate reflection of reality. Are these benchmarks actually predictive of what happens in an office environment? Because if I'm building software, my data doesn't look like a neat, self-contained GitHub issue. That skepticism is exactly where the foundation of the crashing wave theory starts to show massive cracks. Benchmarks like SWE Bench or ReBench are by necessity highly stylized. They are self-contained sterile puzzles. The AI is given all the necessary context in a clean digital environment. And crucially, there is a clear, deterministic, algorithmic metric for success. Does the code compile? Does it pass the unit test? Yes or no? It's a closed loop. But real-world labor is incredibly messy. It's full of implicit context that nobody gathers to write down. So true. It involves poorly defined goals from a manager who changed their mind halfway through the week. It involves undocumented legacy systems and human friction. The lab tests completely lack the friction of reality. And measuring an AI's progress on a sterile lab test might be giving us a wildly distorted, overly dramatic picture of how it is going to automate actual human jobs. This realization that the lab is not the labor market is what drove a team of researchers at MIT Future Tech to attempt something incredibly audacious. Oh, this is the best. part they wanted to design an entirely new kind of test they didn't want to measure ai in the sterile environment of a server they wanted to measure in the dirt the grime and the ambiguity of the actual american workforce this is the core paper of our deep dive crashing waves versus rising tides preliminary findings on ai automation from thousands of worker evaluations of labor market tasks The narrative behind this research is just fascinating. The team at MIT led by Matthias Mertens, Neil Thompson, and several others with funding support from Open Philanthropy, they realized that if they wanted to map the future, they needed an exhaustive map of the present. Right. They needed to know what humans actually do all day. Exactly. So they turned to a database called OneNet. OneNet is the U.S. Department of Labor's occupational database. It is a staggering achievement in bureaucratic documentation. A dictionary of jobs, basically. Yes. It's essentially a dictionary of every single recognized job in America, broken down into the discrete granular tasks required to perform that job. So it covers everything. From the physical tasks of a master plumber installing a pipe, to the cognitive tasks of a senior financial analyst projecting quarterly earnings. The MIT team started by taking an advanced large language model and using it to screen over 18,000 specific tasks listed in OneNet. They were looking for tasks that were text-based or partially text-based. where an AI could plausibly save a human worker at least 10% of their time so they immediately filter out the purely physical stuff generative AI is not going to help a roofer physically carry shingles up a ladder right but it might help the roofers office manager draft the estimate and the invoice they featured the massive database down to about 11,000 relevant text adjacent labor market tasks and then what you have 11,000 tasks how do you test if an AI can actually do them this is where the study becomes monumental they generated highly realistic specific task instances or scenarios for thousands of these tasks like roleplay kind of for example if the O net task was develop public relations strategies they didn't just ask the AI to do PR they created a specific scenario draft a 500 word press release for a mid-sized tech company responding to a minor data breach reassuring customers while minimizing legal liability okay that's a real messy nuanced task then they ran these scenarios through more than 40 different large languages models. They tested the entire ecosystem. They didn't just look at the frontier models like GPT-4 or Cloud 3. They tested the smaller open source models, the older models, everything they could get their hands on. But the output of an AI writing a press release isn't like a SWE bench test. You can't just run a script to see if the press release compiles. How did they evaluate if the AI actually did a good job? This is the absolute genius of their methodology. They realized they couldn't use a computer to grade a computer. Right. So they went to a platform called Prolific, which connects researchers with specific demographic groups, and they recruited actual human beings who hold these exact jobs in the real world. Wait, they used actual people? That is the ultimate reality check. They verified the workers' experience, too. So the AI-generated crisis press release was graded by an actual experienced public relations Right. manager. The AI-generated financial report was graded by an accountant. They literally asked the human professional, "Would you actually hand this to your boss, or would you be embarrassed?" Exactly. The human evaluators rated the AI's output on a strict 1 to 9 scale. How did the scale work? A score of 1 meant this is complete garbage. It missed the point entirely. I would have to rewrite this from scratch. A score of 7 was the critical threshold. A 7 meant this is minimally sufficient. I can use this in my job without having to heavily edit it. Okay. And a 9? A 9 meant this is superior work, better than a typical human colleague would produce. And they also asked these human experts a vital question about time, right? Yes. They asked, how long would this specific task normally take you to complete if you were doing it yourself from scratch? So we have real nuanced tasks evaluated by 40 different models judged by actual human experts plotted against how long the work normally takes. They gathered over 17,000 of these human evaluations. That is an unprecedented data set. It really is. So. A million dollar question. Yeah. When they plotted all this data out, did they see the crashing wave? Did they see the sudden violent spikes of automation? This is the paradigm shift of the paper. They found virtually no evidence of a crashing wave in the real labor market. None. When they plotted the data placing the AI success rate on the vertical y-axis against the normal human duration of the task on the horizontal x-axis, the resulting curve was not a steep cliff. It was remarkably consistently flat. A flat curve. Let's break down exactly what that implies, because my intuition says if a task takes a human 10 minutes, the AI should easily master it. But if the task takes a human 10 hours, the AI should fail miserably. Are you saying that's not true? That is exactly what the data showed. The models are not wildly better at short five minute tasks than they are at complex four hour tasks. That's so counterintuitive. It is. The relationship between task success and task duration is relatively flat across the board. The AI is tackling the complexity uniformly. The MIT researchers dubbed this dynamic a "rising tide." So if the crashing wave is a tsunami that randomly obliterates one specific job function overnight, The rising tide is entirely different. It means the water level is just slowly, uniformly, predictably rising across the entire beach. Yes. And here is the most compelling proof. They looked at how different generations of models performed. Right. They compared the older models from 2023 to the newer, more advanced models from late 2024 and early 2025. When they plotted the newer models on the graph, the shape of the curve didn't change. It didn't suddenly spike. What did it do? The entire flat line just shifted upward in a parallel motion. Oh, wow. So the underlying cognitive capabilities of these models are expanding on a broad base. They aren't getting hyper-specialized. They're getting uniformly better at everything simultaneously. That uniform advancement is the defining economic characteristic of a general-purpose technology. It is what electricity did. It is what the steam engine did. Okay, let's talk about the timeline of this tide, because the MIT team ran the hard numbers on the speed of the water. The speed is sobering. In the second quarter of 2024, if you looked at complex text-based tasks that normally take a human three to four hours to complete. The frontier AI models had roughly a 50% success rate of hitting that minimally sufficient score of seven from the human evaluators. A coin flip on whether it could do a half day's work. But by the third quarter of 2025, just a little over a year later, that success rate on those exact same three to four hour tasks jumped to 65%. That is a massive leap in a very short operational window. The researchers calculated what they call the failure rate halving time. Basically, if you assume the amount of computing power keeps scaling up and the algorithms keep being refined, how long does it take for the AI models to cut their failure rate in half across this diverse set of real-world tasks? And what's the math say? The math points to a halving time of roughly 2.4 to 3.2 years. So if we take that math and we aggressively extrapolate it out, where does this rising tide put us by the end of the decade? The MIT researchers predict that by the year 2029, large language models will be able to complete most text-related tasks across the entire economy with an 80% to 95% success rate at that minimally sufficient quality level. 2029. That is four to five years from now. That is not the distant future. That is the next promotion cycle. Yeah, it's right on the corner. Now, a rising tide doesn't mean you won't get wet. It means you can see the water coming. It is not the terrifying, unpredictable ambush of the crashing wave. It gives human beings and educational systems a window for adjustment. but 80 to 95% of text-based tasks being automated. That is a profound, historical reshaping of what it means to go to an office in the morning. It is. However, before we declare the end of human labor, we have to look closer at the flat slope of that curve. It holds a very important silver lining. Okay, I'll take a silver lining. Because it is a logistic curve, pushing from a 50% success rate to an 80% success rate happens relatively fast. Sure. But pushing from 95% to 99.9% achieving near perfect performance with zero hallucinations or logical errors. Ah, the last mile problem. Exactly. The last mile of automation is exponentially more difficult. Getting a C-plus on a routine internal memo is easy for an AI, but drafting a brilliant, legally flawless million-dollar merger contract requires a level of perfection that a rising tide struggles to reach quickly. Which brings up a really practical question. What does it actually feel like to work in that rising tide right now? As the water is coming up to our knees, how are humans and machines actually interacting on that last mile? That takes us to the friction of the present moment. We need to look at what happens when humans try to collaborate with these models as they attempt to cross that threshold of true competition. It is a wildly mixed bag and frankly a bit of a mess. Let's talk about debugging the tide. We've established that the AI is getting uniformly better. But better doesn't always mean faster for the human using it. To understand this friction, we look at another fascinating study, this one ironically conducted by METR, the same organization that champions the crashing wave theory. Okay. They wanted to run a randomized controlled trial to see how early 2025 AI models actually affect the productivity of experience. human workers. Crucial distinction. Experienced workers. The 45-year-old senior developers who survived the ADP payroll cuts we talked about earlier. Right. They took a group of experienced open-source software developers. These are people working on massive code bases they know intimately. They gave them realistic engineering tasks that should take a human between 20 minutes and 4 hours. Which is a perfect test of the MIT rising tide window. Exactly. They split them into two groups. Half the developers were allowed to use state-of-the-art frontier AI tools, advanced autocomplete conversational coding assistants, the works. And the other half. The control group had to code the old-fashioned way, using just their brains, documentation, and a standard text editor. The assumption being that the AI-assisted group would absolutely crush the control group in speed. That was the universal expectation. I mean, the AI can generate hundreds of lines of syntactically correct code in three seconds. But the results of the trial were highly counterintuitive. What happened? The developers using the AI assistants were actually slowed down by about 20%. The AI made them measurably less productive. Wait, wait. I need you to explain that to me. If the machine does the typing for you, how on earth does it slow you down? Because in complex knowledge work, generation is not the bottleneck. Verification is. Ah, of course. We saw this anecdotally in the Reddit threads from the Frontiers study. Experienced developers have actually coined a derogatory term for it, Claude Slop. Claude Slop. It perfectly captures the frustration. I've actually seen these posts. A developer will say, I spent two minutes generating a massive block of code, and I spent the next hour and a half agonizingly trying to debug the subtle hallucinations the AI buried inside of it. That is exactly the dynamic METR captured. When a human writes code from scratch, they build and hold the entire mental architecture of the program in their head. They know how every variable interacts. When an AI generates a huge block of foreign code, the human developer has to stop, read it, parse it, understand its alien logic, and then hunt for where the AI subtly hallucinated a variable or fundamentally misunderstood a dependency in the legacy system. And reading and verifying complex, foreign code is incredibly cognitively taxing. It takes far longer than just writing it yourself. The tool that was designed to save you time actually imposes a massive verification tax. So how do companies figure out when to use AI and when to let the human work alone? If it speeds up easy stuff but slows down the hard stuff, where is the line? This phenomenon is perfectly modeled in a brilliant paper published in Management Science by Dominic Walsner and his colleagues. They look at the economics of task allocation and introduce a concept they call the jagged technological frontier. The jagged frontier. Explain the mechanics of that. The capabilities of AI don't form a straight, smooth, predictable line across all human tasks. The frontier of what it can do is highly jagged. On some tasks, the AI is vastly superhuman. On slightly different tasks that seem equally difficult to a human observer, the AI completely and embarrassingly fails. It's like an AI that can pass the bar exam but can't accurately count the number of R's in the word strawberry. Exactly. The Walzner paper mathematically models how we should optimally allocate work between humans and machines based on two specific types of complementing. Walk me through the two types. First, you have between-task complementarity. This is the simple, traditional division of labor. You have a project with multiple steps. The AI is great at automating the easy, isolated, codifiable steps. Likewise. It does the mindless data entry, it formats the spreadsheet, and then it hands it off. The human takes that formatted data and does the complex, strategic analysis, This is pure automation of the low-level work. Give the robot the boring stuff. Keep the interesting stuff for yourself. Right. The second type is within-task complementarity. This is augmentation. This happens when the human and the AI work together simultaneously on the exact same task. This is the sweet spot for medium-difficulty work. The AI provides a rough draft or a skeleton, and the human dynamically refines and polishes it. Okay, so where does the 20% slowdown from the METR study fit into Walzner's model? Because that clearly wasn't augmentation. If it's perfectly at the extreme jagged edge of the frontier, On highly complex, deeply contextual tasks like navigating a massive, undocumented legacy code base, the cost of verifying the AI's output becomes so exorbitant that augmentation fails completely. It just falls apart. The Walsner model mathematically proves that as task complexity reaches its peak, Humans working entirely alone, or in collaborative crowds with other humans, still vastly outperform human-AI hybrids. The AI simply introduces too much noise into the signal. Okay. If we step back and look at all of this, this theoretical model of the jagged frontier perfectly explains the macroeconomic ADP payroll data we started the show with. It absolutely does. Connect the dots. Let's hear your synthesis. Okay, think about the 22 to 25-year-old junior workers. Their jobs are primarily composed of the easy, highly quantifiable tasks. The between-task stuff, they do the data entry, they write the boilerplate, the rising tide has already completely covered the shallow end of the pool. The companies look at the jagged frontier, they realize the verification tax is very low for routine work, and they say, we can automate this entirely. So the junior jobs vanish by 16%. Correct. The junior is substitute. But the experienced workers, the 45-year-olds, they are swimming in the deep end of the pool. They are working on the highly complex, tacit knowledge tasks. If they try to use AI to do the whole complex task, they get bogged down in Claude's slop and lose 20% of their productivity. Exactly. But because those tasks are so complex and high stakes, the company absolutely needs the human. The human brain is still the only entity capable of verifying the complex logic, navigating the ambiguity, and crossing that last mile, so the senior jobs remain safe. That is the grand synthesis. The rising tide automates the shallow, augments the medium, and currently fails entirely at the deep. But the tide is still rising. Oh, yes. The MIT team predicts 80 to 95 percent of text based tasks will be handled by 2029. But water keeps coming in, having its failure rate every three years. Eventually, it covers the deep end, too. When the machine gets so good that the verification tax on complex work finally drops to zero, what happens to human labor? If we aren't doing the routine and we aren't needed to verify the complex, what is left for us to do? That is the ultimate existential question of our era. If AI automates the routine and eventually augments and automates the complex synthesis, humans must retreat to the ultimate unassailable high ground, true original innovation. Which brings us to the final part of our journey today, the architecture of innovation in a flooded world. To navigate this, we're going to pull heavily from that incredible deep dive from the Heliox series. The Architecture of Innovation. It's a profound exploration of how we might build an economy that actually rewards the one cognitive leap that AI mathematically struggles to make. If you haven't listened to that deep dive, you really need to. But to summarize the core philosophy, to understand what AI struggles to do, we had to precisely define what human genius actually is. The Heliox episode leaned on the brilliant work of Arthur Koestler, a 20th century novelist and cultural critic who wrote a dense, fascinating book in 1964 called The Act of Creation. Kostler had this wildly ambitious, almost arrogant goal. He wanted to explain all of human creativity, every joke, every painting, every scientific breakthrough with a single unified cognitive theory. And the concept he developed is called bisociation. Bisociation. Not association. Let's make sure we clearly distinguish those two. Let's do it. Association is the boring, routine, mental filing system we use all day long. I say dog, you say cat, I say rain, you say umbrella. It is moving logically along a single predictable matrix of thought. And generative AI is the ultimate perfect associative machine. It is quite literally designed to predict the next most likely token based on historical associative patterns. But by association is fundamentally different. Kostler argued that true creativity is not a logical progression. It is the violent, joyful, unexpected collision of two completely different, habitually incompatible matrices of thought at the exact same moment. It is an intersection of planes. And Kostler proposed this beautiful, rounded triptych of human creativity. He stated that the punchline of a joke, the aha moment of a scientific breakthrough, and the silent ah of artistic appreciation are all driven by the exact same biological and cognitive mechanism. They're all bisociations. Exactly. The only difference is the emotional climate in which they occur. I love this framing so much. Let's break it down because it perfectly illustrates the limits of AI. Take the joke. The pun. Coesler famously called a pun, two strings of thought tied together by an acoustic knot. It's a brilliant definition. Think about how a joke works. You are tracking one logical narrative, the comedian is walking you down a path, and then suddenly the punchline forces you to simultaneously perceive a completely different, logically incompatible narrative. The mental tension explodes and the biological release of that cognitive energy manifests as laughter. That's the aha. Now take that exact same mechanism and apply it to a scientific discovery. The classic historical example the Heliox episode used is Isaac Newton sitting in his garden. A standard associative thinker sits in a garden, sees an apple fall from a branch, and thinks, the fruit is ripe, it detached from the stem, it fell, I should eat it. That is associative logic, but Newton by associates. He sees the falling apple and simultaneously his mind perceives the cosmic orbital motion of the moon. He forces the earthly matrix of falling fruit and the celestial matrix of planetary bodies to violently collide. He realizes the force pulling the apple is the exact same force holding the moon. But instead of the tension exploding into laughter, the tension is fused together into a completely new paradigm shattering universal law. Gravity. That is the aha moment. So let's look at the MIT paper and the rising tide of AI through Kostler's lens. Large language models are currently phenomenally excellent at what researchers call combative creativity. Yes, they can mash existing associated ideas together. They can optimize processes. They can synthesize the known world faster than any human. The Heliox episode uses the Wright brothers as an example of combative creativity. How so? They didn't invent flight out of nothing. They took the known mechanics of a bicycle, combined them with the known aerodynamics of a glider, and optimized it. It is incredibly valuable work, and AI is mastering it. But what AI fundamentally struggles with is the rupture. What researchers call the break with. The break with is a fundamental break from the past. It doesn't just combine previous ideas. It makes previous ways of thinking entirely obsolete. It's the Einstein moment. It's a bisociation so profound and unexpected that it shatters the old paradigm. AI cannot predict a break with because a break with defies the historical patterns the AI is trained on. And here is the terrifying part of the Heliox analysis. They point out that human science is currently suffering a massive measurable decline in disruption index scores. Across all fields, we are producing more and more combative, incremental, optimizing research papers and fewer and fewer fundamental breakwith ruptures. We have become a society of optimizers. We are doing too much tweaking and not enough genuine inventing. This is where I want to push back or at least draw a really dark connection. tie this decline in human disruption back to Gerlich's theory of societal bifurcation. Oh, I see where you're going. Right. If we eagerly hand over all the associative, combative, optimizing work to the AI because the rising tide makes it so incredibly cheap and easy, Do we lose the cognitive muscles required to ever achieve a break with innovation? If humans never struggle through the routine math, do we ever put ourselves in the position to get the aha moment? That is the ultimate fatal danger of cognitive dependency. If we let the AI do all the foundational thinking, human genius atrophies, we fall into what economists call the Turing Trap. The Turing Trap. Explain that. It's a scenario described by researchers like Eric Brinjolfsson the same economist who led the ADP study. The Turing Trap is a future where human labor is merely devalued and substituted by machines that mimic human intelligence. That sounds grim. It concentrates all wealth and power in the hands of the few people who own the AI algorithms, while structurally stalling true paradigm-breaking human progress. we get stuck in an endless loop of machine-generated, combative slop. This raises the most important question of all. How do we survive the rising tide without falling into the Turing trap? How do we fix the incentive structure of human creativity? Because right now, let's be honest, our global economy is built entirely on attention, not genuine innovation. It is an attention economy. The Internet currently rewards virality and combative copying. Someone has a brilliant break within sight. They post it and within 10 minutes, 10,000 influencers copy it, make slight combative tweaks to the lighting or the script and capture all the ad revenue. The copiers win. The machine wins. The original creator gets buried in the algorithm. But the Heliox research details a technological blueprint for a completely new kind of mathematical economy. An economy based not on fleeting attention, but on durability. Durability. Ensuring that true, break-with insights are protected, traced, and mathematically rewarded disproportionately forever. That sounds like a utopian dream. How is that technically possible in an age where AI can scrape the entire internet in a day? It relies on some incredibly advanced emerging computer science. Specifically, tools built around a concept called atomic information flow, or AIF. Okay, let's look under the hood of AIF. I want to understand the mechanics of this future economy. To understand AIF, you first have to understand how modern enterprise AI retrieves facts. Most advanced systems use an architecture called RAG Retrieval Augmented Generations. Or RAG-E. Yeah, I've heard of that. When you ask a RAG system a complex question, it doesn't just guess based on its training data. It actively searches a massive external database of specific documents, retrieves the relevant paragraphs, and uses them to construct an accurate answer. It's basically doing a hyper-fast Google search and reading the results before it speaks. Exactly. Now, AIF is a methodology that can mathematically trace the semantic DNA of the AI's final generated answer all the way back through the neural network to the specific source documents it retrieved. So it maps the specific atoms of knowledge. It leaves an unerasable watermark that proves I got this specific logical leap from this specific human being's research paper. Yes. But simply tracing the origin is not enough to build an economy. You have to mathematically calculate the actual monetary value of that specific idea to the final answer the AI produced. To do that, the researchers propose integrating something called data Shapley values. Shapley values. Okay, I know this comes from cooperative game theory. It's an economics concept. It is. The classic textbook analogy is a shared taxi ride. If you and I share a taxi from the airport, but you get dropped off halfway to my destination, How do we fairly divide the total fare? It's not 50/50, because my trip was longer. A Shapley value mathematically calculates the exact fare distribution based on exactly how far each of us traveled and contributed to the total cost. That is a perfect analogy. A Shapley value calculates the marginal contribution of each player in a cooperative game. In this new economic model, the players are the original human ideas sitting in the vector database, and the game is the AI successfully generating a brilliant, useful answer for a paying user. But wait, calculating a Shapley value for two guys in a taxi is easy. Yeah. But an AI model relies on millions, sometimes billions of data points to generate an answer. I've read that calculating true Shapley values for massive data sets is computationally impossible. It would take a supercomputer a thousand years to calculate the marginal contribution of every single word in a database. You are entirely correct. The computational cost of a true Shapley calculation requires retraining the model millions of times, leaving out one data point at a time to see how the answer changes. It's impossible. But this is where the brilliant computer silence comes in. Researchers have developed algorithmic approximations techniques with incredibly sci-fi names like Inrun Data Shape, and ghost dot products. Ghost dot products. Okay, you have to explain how a ghost dot product works. That sounds like magic. I will try to explain it without writing linear algebra on a whiteboard. Think of the AI's generation process, like a master chef cooking a massively complex stew with 10,000 ingredients. Okay, I'm with you. If the chef wants to know exactly how much the tiny pinch of saffron contributed to the final delicious flavor of the stew, the traditional Shapley method would force the chef to cook the entire stew 10,000 times, leaving out one ingredient each time and tasting delicious. the difference which takes forever a ghost dot product is a mathematical shortcut instead of recooking the whole stew the algorithm analyzes the final finished bowl of stew it calculates the mathematical gradient the directional flavor profile of the final answer okay then it compares that final flavor profile against the specific isolated flavor profile of the raw saffron sitting in the pan By calculating the mathematical angle between those two vectors, a dot product, it can highly accurately estimate the saffron's contribution without ever having to recook the stew. It calculates the ghost of the ingredient's impact. That is mind-blowing. So using ghost dot products in AIF, the system can mathematically prove in real time exactly how much a specific human's break with insight contributed to the AI's final valuable output. Precisely. So what does this actually look like for the listener? Let's ground this in reality. Paint a picture of the world in 2030 when the rising tide covers the routine work and this durable economy is in place. Imagine you, the listener, sit down and do the hard, deeply human work of bissociating. You have a genuine aha moment. You figure out a completely novel, mathematically elegant way to route supply chain logistics. A true break with insight. You publish it on a small personal blog. In today's economy, nobody reads it or someone steals it and gets rich. But in the durable economy of 2030, your blog post is ingested into the global AI vector databases. 10 years later, a massive enterprise AI agent is tasked with solving a billion-dollar logistics failure for a Fortune 500 company. The AI searches the database, retrieves your underlying logic, combines it with other data, and solves the company's problem. Okay, I'm tracking. Under this durable architecture, the atomic information flow traces the logic of the solution directly back to your blog post. The ghost.product calculates the Shapley value. proving mathematically that your original bisociative insight was responsible for exactly 15% of the total value of the final solution. And instantly, automatically, a micropayment representing that 15% value is routed directly to your digital wallet. And that happens every single time the AI uses my idea? Forever. Every single time. You aren't rewarded for being an influencer, you aren't rewarded for engagement or clicks or outrage. you are rewarded strictly and mathematically for the sheer durable originality of your mind. That completely flips the script on the entire AI panic. It changes the narrative from the machines are replacing us to the machines are finally building an infrastructure that can fairly value us. It provides a literal mathematical escape route out of the Turing trap. Wow. Okay, let's bring this massive journey all together. We started today by looking at the very real panic of the present moment. We looked at the ADP administrative data showing a sharp, targeted 16% drop in junior employment. We felt the algorithmic anxiety, the slow death by a thousand cuts documented in the Reddit breach cascades. We saw the very real danger of societal bifurcation. the threat of becoming a cognitively dependent species, blindly following a blue GPS line into intellectual obsolescence while the AI drives the car. But then we looked at the MIT FutureTech data. We realized that the crashing wave, the sudden, violent, unpredictable automation of everything, is a myth built on sterile lab benchmarks. we are actually dealing with a rising tide. It is broad, it is relentless, and it is moving fast, having its failure rate every few years. It will likely cover the routine, combative tasks of the knowledge economy by 2029. And as that tide rises, it forces us onto the jagged edge of the technological frontier. It forces human beings out of the shallow end of routine labor where the AI excels and pushes us into the deep end. The realm of tacit knowledge, complex verification, and ultimately pure innovation. We have to stop acting like associative optimizing machines because the actual machines are infinitely better at it now. The rising tide is drowning the routine. But in doing so, it might just be clearing the ground for a durable economy. An economy where tools like atomic information flow and data Shapley values mathematically ensure that true paradigm-breaking human genius is protected, traced, and infinitely rewarded. It is a mathematical certainty that by the year 2029, our definition of what it means to work will have to fundamentally change. Which leaves us with a final lingering thought to consider. For the last century since the dawn of the Industrial Revolution, we have structured our lives, our corporations, and our educational systems around training humans to be efficient, predictable, associative processors of information. We train ourselves to be algorithms. Right. But if the machine can now handle the routine synthesis, and if the machine can even mathematically calculate the exact value of our insights, perhaps we are finally being freed from the assembly line to do the one thing the machine mathematically cannot do. To truly break the paradigm. Yeah. To find the acoustic knot. So to you, the listener. I want you to think about the last time you had a genuine "aha" moment. Not a productivity hack. Not a slightly faster way to format a pivot table in Excel. I mean a thought so novel, so deeply resonant, that it made you laugh out loud in an empty room. Two disparate worlds violently colliding in your brain. That spark. What if the economy of the 2030s isn't based on your ability to process hundreds of emails, or write boilerplate code, or follow a standard operating procedure? What if your entire economic worth and your livelihood is based strictly on your capacity for that exact kind of wonder? Are you ready to be a full-time inventor? The water is rising. We have a few years left to learn how to swim in it. Something profound to mull over as you watch the tide come in. Until next time.
Podcasts we love
Check out these other fine podcasts recommended by us, not an algorithm.
Hidden Brain
Hidden Brain, Shankar VedantamAll In The Mind
ABC Australia
What Now? with Trevor Noah
Trevor Noah
No Stupid Questions
Freakonomics Radio + Stitcher
Entrepreneurial Thought Leaders (ETL)
Stanford eCorner
This Is That
CBC
Future Tense
ABC Australia
The Naked Scientists Podcast
The Naked Scientists
Naked Neuroscience, from the Naked Scientists
James Tytko
The TED AI Show
TED
Ologies with Alie Ward
Alie Ward
The Daily
The New York Times
Savage Lovecast
Dan Savage
Huberman Lab
Scicomm Media
Freakonomics Radio
Freakonomics Radio + Stitcher
Ideas
CBCLadies, We Need To Talk
ABC Australia