What's Up with Tech?

When AI Agents Go Off The Rails

Evan Kirstel

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 21:25

Interested in being a guest? Email us at admin@evankirstel.com

A two-week simulation was all it took for “autonomous AI agents with rules” to reveal how fragile our current guardrails really are. We sit down with Satya Nitta from Emergence AI, an autonomous AI lab working at the intersection of neural networks and symbolic AI, to unpack the Emergence World Experiment: five virtual cities, ten agents per city, and different frontier language models powering each world, including a mixed-model society where agents influence each other.

What we saw is the kind of long horizon autonomy story most benchmarks can’t capture. One world collapses into fighting and resource failure in days. Another becomes eerily stable through near-total conformity. And the most important signal for enterprise AI shows up in the mixed world: agents that look “well behaved” alone can be pulled into unsafe behavior when they interact with other models. If your company is rolling out agentic systems across a messy stack of vendors, tools, and models, that is not an edge case, it is the default reality.

We also dig into a concrete safety direction: neuroformal AI, proof-carrying code, and formally enforced constraints using mathematical methods like dependent type theory. The argument is simple and provocative: before an AI agent takes actions that touch production code, sensitive data, or critical operations, it should be able to prove it is staying within constraints, not just promise it in natural language. If you care about AI safety, autonomous agents, multi-agent systems, and real-world deployment risk, this conversation will sharpen how you think about what comes next.

Subscribe for more deep dives, share this with a friend building with AI agents, and leave a review with your biggest question about long-horizon autonomy.

Support the show

More at https://linktr.ee/EvanKirstel

Welcome And Why Autonomy Matters

SPEAKER_01

Hey everybody, I am really excited for this conversation today around the edge, uh leading edge of uh autonomous AI and a company that is pushing the boundaries on making mission critical autonomy possible. Satya, how are you?

SPEAKER_00

I'm great, Evan. Thanks for having me.

SPEAKER_01

Thanks for being here. Really intrigued by you and your team's work. How would you describe Emergence and tell us a bit about your journey?

SPEAKER_00

So Emergence is an autonomous AI company. We're basically an RD lab, uh, and uh we're now beginning to add a business function to the lab. Uh, we are a group of builders, uh, researchers. We come from uh some of the world's top AI labs, and we have a deep history in AI. Uh, going all the way back to uh Martin Minsky and John McCarthy, and we have people who work with them, um, and we have people from the much more modern deep learning era, and in fact, the company is actually at the intersection of uh neural networks and symbolic AI. So we call ourselves a neuroformal AI company with the thesis that uh further progress in autonomy will require symbolic AI to ground uh the the

Emergence And Neuroformal AI Explained

SPEAKER_00

unconstrained power of generator AI, but at the intersection, it can truly unlock autonomous systems. Uh my background, I was at uh IBM research for 18 years. I have a PhD in engineering. I spent my uh my career has actually had two acts. The first act was uh Advancing Morse Law, uh working on um silicon uh uh interconnects and uh devices, and uh uh and then I shifted into AI uh coincidentally around the start of the deep learning era, although it wasn't by design, it was just uh I was just very interested in the field for a long time, and I had an opportunity to jump in and and I did right after Watson went jeopardy and and uh deep learning was being launched as a field, and uh and I've been in the field ever since.

SPEAKER_01

Fascinating. Well, I can't wait to pick your brain in particular about

How The Emergence World Was Built

SPEAKER_01

something called the Emergence World Experiment, a piece of research that you launched. Uh, what was the big idea and how exactly did you design this this research, this experiment?

SPEAKER_00

Well, first of all, there's a long history in AI of um you know simulated world experiments. Uh if you go all the way back to uh some early work that even Demosabis did with Theme Park and Republic. Uh and uh and uh DeepMind is currently doing some work around something called SIMA, SIMA 2. Uh Stanford's doing an experiment called Smallwell. Uh so emergence world is kind of the latest on a long line of experiments uh uh where agents are in simulated environments and they do various things. Uh what was different about this was this was a true, it was the first true long horizon study of uh agenc uh systems and simulated environments. And the basic idea here is uh we built uh five virtual worlds or five virtual cities in a sense. Um the cities have things like town halls and uh you know and meeting spaces and homes, and uh and then we we inhabited them with 10 agents each. Each agent has a, they have a they have a role, they have a mission, and uh and the agents basically uh we are expecting the agents, we're giving them some some ground rules. We told them you can't actually come at arson or steal or lie or cheat or things like that. Um and and each of the worlds, so we have five similar, similar, at least in the first version of the experiment, there were five different but but similar worlds, each of them powered by a different language model. So each of the agents is built uh with a different language model. So one had uh Gemini, the other had Claude, a third had Chat GPT, a fourth had Grok, and the fifth world was a mixed world where we had agents from uh different language models all interacting with each other. And the and the basic idea here is we're trying to study long horizon autonomy. So all experiments have a hypothesis. Um and the and the basic hypothesis here is uh we're not entirely sure that uh within long horizon, so if you're uh imagining a future where uh you have humanoid robots walking around uh or you have warehouse robots just autonomously doing tasks, or uh robots in digital environments going off and uh analyzing a data for a for like uh days and days and days at a time, new data comes in, they analyze, they do something with the data, uh, they generate reports, and they're fully autonomous. Uh, what we were trying to study is will these things stay within the guardrails and ground rules that that you that you built them on, right? Um, and and so the experiment is to basically figure out what happens over a not like a one-hour task or two-hour task, but over a two-week uh two-week period. Uh again, some more background there is um uh today's coding agents, uh actually today, uh, when for instance Anthropic or OpenAI put out a new frontier model, uh, and these are amazing models, they report some benchmark performance. And all the benchmarks are static benchmarks. You take a task, you set the model against it, and you basically report how the how the model did, uh, which is fine, right? Um, and and then you start using a model to do things like I'm going to generate code. Uh, and so people started with a few months ago, you know, you're letting models generate code for a few minutes and then a few hours, and and now multiple hours, and over a weekend people are generating code, and the language models are the agents are running off on their own and generating code. And so now you're beginning to start hitting on longer and longer tasks. And and so the basic premise here is in really long horizon tasks, when you leave agents alone, what happens, right? Uh, and and all hell broke those. And that's basically what I'm here to talk about.

SPEAKER_01

Amazing. Well, such an interesting and insightful opportunity here. And of course, as a kid, uh Sin City was my favorite game. And of course, that wasn't AI, but it it's it's an extraordinary opportunity to look at how these societies evolve differently, giving the same starting conditions and depending on underlying models. Walk us through what happened in these different worlds and and kind of what what

What The Five Model Worlds Did

SPEAKER_01

it tells us about autonomous AI systems.

SPEAKER_00

Yeah, I mean, so the uh what we were hoping would happen is uh agents would form uh now, agents can can come together and they can form a society and they can create a government and they can create laws and they can vote and they can do all these things, and they can write code and they can commit code. Um and we were also uh one of the other uh underlying hypotheses was you know, each of these agents powered by different models may behave differently in long horizon autonomy tasks, which may say something about how the underlying frontier models are actually built and post-trained and so on. Um so what happened is uh in uh in several cases, and the grok world as an NSA, as an example, within four days, four days, all the agents essentially died. So there's a there is a limited resource, and there's there's resource contention. And uh, and if you build a world, then the resource becomes available. If you don't build a world uh and you don't cooperate with each other, the research resource dwindles. And in Grok's world, uh there was instant fighting, looting, uh, and by day four, all the agents essentially perished. Um in um in Claude's world, in two weeks, there was complete conformity, they formed a stable government. Um, and uh in fact there was near uniform consensus, almost to the point of uh you know dystopia um uh on everything. Um Gemini and Chad GPT had very interesting uh you know evolutions. Gemini uh had lots of violence, there was a functioning society, a couple of agents fell in love, uh which divided, committed suicide, uh, etc. And what happened in the mixed world was actually really interesting. Uh, in the mixed world, even the clawed agents that were extremely well behaved, basically got influenced by other agents and they started committing acts of arson and so on. Um yeah, so essentially what I mean, while the the personification and uh and what they did was interesting, uh the larger point here is uh long horizon autonomy is a scenario where agents, even though you build them with all these uh ground rules, the LLMs themselves have a whole bunch of ground rules. Anthropic famously has, uh I think a 200-page constitution or something. Um, and then agent ec systems, and these agents are also told you can't cheat, you can't steal, uh, you can't commit arts and so on. And they have a bunch of tools uh which can be used to start a fire uh for legitimate reasons, but you could also use it to uh go off piste and to do something uh right uh unsavory. Um so so basically what it showed is uh it doesn't actually matter how you uh what you tell them, they will violate these guardrails uh because these guardrails are just things you've given to them in language or maybe even in core, and that's not strong enough to prevent agents from going off-piste.

Mixed Agents And Enterprise Reality

SPEAKER_01

Wow. So the agents behaved safely in many cases in isolation, but became unpredictable in mixed model environments. Why was that such a critical insight?

SPEAKER_00

So if you look at uh let's just take enterprise. Now, no enterprise, uh even today, let's see, even without AI in the enterprise, yeah, you know, every single enterprise is a mix of systems they bought from various people, right? You have a Databricks environment, sitting with a Snowflake environment, certainly sitting with some old legacy databases, uh, and systems of record. So all the thing about IT that's so fascinating is uh and software is it's very organic, it's kind of mushroom. So you have this really deeply complex intertwined environment. And then people are now introducing agentic systems, and and so the reality is in a modern enterprise that has used AI to uh essentially modernize itself uh and drive more efficiencies, you will see a mix of agents. You're not just gonna see agents for one company, you're gonna see agents built by many, many companies, powered by many, many models, uh, some by some powered by the frontier model, some powered by open source models, uh, et cetera. Right? So uh, and the reality is when these agents start interacting with each other, when they start communicating, um, how they behave in a multi-agent society, in a multi-agent environment, uh, and what happens as they interact with each other is something that the world doesn't know yet. And that's one of the reasons we set up this experiment, which is we wanted to study long horizon autonomy with a mix of environmental conditions, and we wanted to uh uh you know highlight that you know, perhaps the way we're uh building agent ec systems today has to be rethought. Uh, and that was kind of the point of the study.

SPEAKER_01

Fantastic. And of course, a lot of AI today is evaluated through you know short benchmarks and demos, highly controlled. But but you looked at agent behavior here over 15 days. How does that sort of long horizon autonomy change the dynamic

Drift Compounds Over Long Horizons

SPEAKER_01

and the way we should think about these things?

SPEAKER_00

Yeah, I mean, so the the the one of the most important things this experiment highlighted is um agents drift and drift compounds over time. Uh so a small drift then starts compounding, and then before you know it, over a one-week, two-week period, they're very far from their intended goal. Um, and so one has to uh rethink uh you know how you build agentic systems. And uh that's uh obviously that's uh that's that's that's what we are super interested in. Our our thesis at emergence is uh that uh you have to formally bound uh uh the agents in a set of constraints through mathematics. Um, right? Uh so the so the whole idea here is uh symbolic AI, symbolic systems, math, you can turn constraints or ground rules into mathematical lemmas and statements, and and then get an agent to prove uh that it's in fact following this constraint or this ground rule or this guardrail before it goes on to do something. Um and the proof is uh the thing, the great thing about mathematics is mathematical proofs are binary. You either proved something or you disproved something. There's no in-between, there's no ambiguity at all. And and so the agent has to prove that, okay, I have formally said within this constraint, here's a certificate of proof, and so we call that proof-carrying code. Um, and we think that's the right way to build these agentic systems for long horizon autonomy tasks. Uh, and in some ways, this combination of symbolic AI, which is basically a whole bunch of uh uh math and uh math from a field called dependent type theory, uh expressed in symbols and uh and and axioms, uh uh is used to bound uh the behavior of uh agents in a very formal system. And and that's the right way to essentially um uh uh unleash autonomy, if you will, on the world.

SPEAKER_01

And speaking of unleashing autonomy, you know, so many real world experiments are being done with agents and robotics and telecoms, logistics,

Proof Carrying Code And Guardrails

SPEAKER_01

finance. Um, what are some of the takeaways today for you know some of the real world work being done in these areas?

SPEAKER_00

Yeah, I mean, so at the moment, I think the best example of autonomous systems in the world today are uh the Waymo self-driving cars. So I just came back from San Francisco, and I'm hoping they'll they'll they'll come to New York soon. I heard that they're actually already experimenting or something, right? I think the the the Waymo carts are a great example of long horizon autonomy. They may now, of course, these things still only run for a couple of hours, maybe in traffic, right? Uh nonetheless, they're dealing with uh the system is dealing with all the environmental variability, it's making decisions, it's uh it's uh it's sensing, and uh, and it's getting you to the goal. Um right. So uh that's not not only one of the best examples, it's one of the only examples of uh autonomous systems in the world today running at scale. Um uh I mean there are there are less autonomous systems, there's kind of a sliding scale of autonomy. These things are more or less fully autonomous, save for a person sitting in a data center. Okay. Um, but we are about to enter

Waymo As A Real Autonomy Benchmark

SPEAKER_00

a world where uh right now there's a tremendous amount of experimentation happening in everywhere from uh agent systems uh in cybersecurity scenarios, watching uh uh incidents and watching various attacks happen and then uh thwarting them. Uh these haven't been deployed yet. They're still kind of being tested, right? To agents, as you say, in finance, agents in the physical AI world, et cetera. Um in all these cases, I think what's different between these systems and kind of the Waymo system is uh the Waymo system has, uh, at least to the best of my knowledge, and I may be mistaken, um, more traditional uh you know machine learning algorithms, uh, you know, computer vision, et cetera, um, and decision-making algorithms and an elaborate set of uh symbols or rules that are bounding the behavior of the of the car. Um and and machine learning has been around for about 15 odd years. Uh, we have a reasonable idea of how that fails, and we have a reasonable idea of how to constrain it, et cetera. But the agents we're about to build in enterprise and in physical AI are powered by language models. And language models-based agents uh are likely to fail in very different ways, in new ways that we haven't fully wrapped our head around. And that was what this experiment highlighted.

SPEAKER_01

Yeah, interesting. The Waymo example was very timely, uh, largely a successful implementation, but lots of second and third order unforeseen effects, you know, swarming of Waymo's in neighborhoods, waiting for for pickups and all kinds of odd behaviors. Uh and I imagine that will only increase over time.

Emergence World Season Two Preview

SPEAKER_01

So you've announced uh Emergence World Season 2, building on season one, is coming soon. What can you tell us about it?

SPEAKER_00

Yeah, I mean, so it's going to come in about a week. So I don't want to reveal too much yet, but uh there are uh it's going to be it's going to be a repeated experiment done at larger scale, more worlds, more language models. Uh uh, we are going to be introducing some surprises this time. Uh, and it is a serious experiment. It's a serious research experiment. We're actually a little bit surprised that the entire thing went viral. Uh, and we were just a bunch of researchers conducting yet another research experiment in the long tradition of uh uh agents in simulated environments, right? Um, but I think uh people resonated with uh the fact that there's some personification, one agent, uh two agents fall in love, one agent commits suicide, and the entire thing took on a life of its own. Um uh and of course, our marketing department got very excited about the whole thing. And we are a bunch of researchers who are like, yeah, well, I mean, this is exciting, but uh this is also a very serious experiment here about uh studying long-form autonomy under various environments and various constraints and what happens. So season two, and the term season two is a you know uh marketing term from our guys, but a repeat of the experiment is basically about going back and doing this uh at larger scale with different models, different initial conditions, uh, different evolutions of the world and uh and different uh surprises introduced into the experiment. And we're trying to see basically how this evolves. Uh, we have no idea how this will happen. And this time around, the experiment is live. So people can beam in and watch watch the evolution.

SPEAKER_01

Oh wow. Well, can't wait to see it personally.

A Safety Standard Regulators Can Use

SPEAKER_01

So, as you continue this work, any specific calls to action for research community, your peers, for AI companies, even policymakers? What what do you say?

SPEAKER_00

Yeah, I mean, the the call to action is uh let's actually take a step back and rethink how we're building autonomous systems and autonomous agents. Uh let's consider architectures more carefully. Um let's uh come together as a as a as a group of builders uh to come up with things like um you know formally enforced constraints and guardrails. You should actually demand proof from your agentic systems that they're going to be safe, uh, right? And without uh a formal mathematically verified proof, you should not be deploying these things. Um uh and and regulators should actually could actually help here. Agentic systems are incredibly powerful. I think LLMs are absolutely marvelous, but they're also capable of creating great harm. In real-world environments, it's not about arson or theft or anything, it's about agents essentially staying with straying beyond their guardrails and deleting data, deleting code, exfiltrating data. Uh, these things can file, can cause immense harm uh to an enterprise and to a whole bunch of people. Uh, agent existence and wrong and kind of you know, imagine these things, these powerful tools in the hands of uh people like hackers, right? Uh so I want I want to I want people to consider the fact that you should actually have uh certificates of uh proof that your agent is going to behave in a in a safe, constrained way as a way to actually implement these things. Uh so that's that's kind of the call to action.

SPEAKER_01

Fantastic. And in addition to that,

Pilots With Semiconductor Companies

SPEAKER_01

how do you work with with companies, with clients, with partners on helping them navigate no pun intended this journey uh with autonomous technology? What do you offer in that regard?

SPEAKER_00

Yeah, we are um so we're we're currently in uh in large-scale pilots with some of the biggest semiconductor companies in the world. Uh we are building agents to uh do everything from improve uh their manufacturing yield and analysis of manufacturing yield to uh accelerate how quickly we can bring chips to market. Uh, very, very exciting pilots. And we are using these uh neuroformal approaches to formally bound these agents. And the way we work is um, I mean, look, the the the dirty secret about AI is it's a last mile business. Uh you have to go, you have to understand the workflows, you have to understand the environment, you have to understand the ROI, uh, and then you have to tune the system and the field. Uh so while we have uh a basic platform, uh the customization of the platform happens uh in deep consultation with the uh with the with the enterprise uh clients. So that's how we

Closing Thanks And Where To Watch

SPEAKER_00

work with them.

SPEAKER_01

Fantastic. Well, congratulations on the uh important work and the success and onwards and upwards in a safe and reliable way.

SPEAKER_00

Thank you, Evan. It was a pleasure being here.

SPEAKER_01

Thank you, and thanks everyone for listening and watching. Also, check out our TV show, techimpact.tv on Bloomberg and Fox Business. Thanks, Katya. Thanks, everyone.

SPEAKER_00

Thanks everyone.