BRSL Weekly Brief

The AI-DOD Standoff and the Future of Autonomous Weapons at the Pentagon

Berkeley Risk and Security Lab Season 1 Episode 2

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 19:42

On this week's episode, BRSL Faculty Director Professor Andrew Reddie joins BRSL Senior Research Scholar Sarah Shoker for a discussion covering the recent standoff between AI companies Anthropic and OpenAI and the U.S. Government, as well as how autonomous weapons are being used in military contexts. 

Vivian Bossieux-Skinner:

Welcome back to the Berkeley risk and security lab's new podcast, brsl, weekly brief, where we bring you the latest information on current events from our lab experts. Today, we're going to be talking about the recent standoff between OpenAI anthropic and the DOD and autonomous weapons in military contexts. In general. I'm the labs communications manager, Vivian Bossieux-Skinner, and I'm here with brsl, Faculty Director, Professor Andrew Reddy and brsl, Senior Research Scholar, Sarah schoker, welcome to the podcast. Could you both first start off by introducing yourself and your work at the lab and your background in this area.

Sarah Shoker:

My name is Sarah Shoker. I am a Senior Research Scholar in artificial intelligence at the Berkeley risk and security lab. Previously, I led the geopolitics team at OpenAI, where I looked at the intersection of AI and international stability, and I continue that work here at the lab.

Professor Andrew Reddie:

I am Andrew Reddie. I'm not as smart as Sarah, but I am a professor in public policy, and work on AI military integration issues primarily. So that's hence, hence, wanting that question.

Sarah Shoker:

I don't believe that for a moment. By the way, Andrew, I'm constantly, I feel like I'm constantly learning from you.

Professor Andrew Reddie:

Likewise, Sarah.

Vivian Bossieux-Skinner:

Sarah, can you start by giving us a quick overview of where we are right now. How did we get here? And what's the latest today?

Sarah Shoker:

So I think you're referring to the Pentagon, anthropic and open AI, what I'll call drama, or very public conflict, around the role of large language models and military use in military use. I think it broke out and public awareness became really salient in mid to late February. But in actuality, I think this conflict has its roots earlier than that. And I would, you know, I'd point the audience actually, to the memorandum that was released by the Secretary of War on January 9, 2026 and the subject of that memorandum was the artificial intelligence strategy for the Department of War. And in that memo, towards the end, there is a requirement or a plan to get AI model providers, companies to agree to language around all lawful use which, of course, became a very dominant term that that permeated throughout the throughout the media, and also to ensure that their usage policies would not unnecessarily interfere with Department of War operations. I think the public probably knows the rest of that story. Anthropic decided to set two red lines that are technically on the books as U.S. policy so no mass surveillance of American citizens and no fully lethal autonomous weapons systems, OpenAI followed suit with the same two red lines. I believe there was a disintegration of trust. There seems to be a clash of very strong personalities at work here. Anthropic has now been designated a supply chain security risk, which they are fighting in court, and OpenAI seems on track to, while they have, in fact, agreed to the lawful, all lawful uses language as has Google, as has X ai, which, which makes grok. So that's where we are today.

Vivian Bossieux-Skinner:

Okay, yeah. And then Andrew, do you have anything to add to that from your angle?

Professor Andrew Reddie:

Yeah, no. I mean, I think Sarah gave a really nice introduction. I am curious, Sarah, you probably saw that Microsoft. So there's a countersuit from anthropic on the bay on the on the back of being deemed to supply chain security risk, primarily on procedural grounds. If you go and look at the actual lawsuit, and Microsoft actually filed an amicus brief with anthropic, you know, as they fight against that designation. So I'm just curious you have any thoughts about kind of the countersuit, likelihood of the success of that countersuit, whether kind of, you know, trying to address this on the basis of legalese is the kind of the appropriate response.

Sarah Shoker:

I mean, I think they have a very narrow range of options. And, you know, my understanding, and I think everyone was kind of expecting anthropic to rapidly respond with a lawsuit. I'll also note that, in addition to Microsoft, a number of researchers from other labs, including OpenAI, have also filed an amicus brief too. So there's been, you know, a supportive coalition for anthropic I think the general perception, I mean, I don't think I've even seen anyone really argue to the contrary, that this is an example of government overreach, and even. People who are, I mean, like myself, who are, who are critical of the red lines being imposed or being articulated by open AI and anthropic as being insufficiently rigorous and, frankly, not a high bar to clear at all. Even those people, people like me are, have argued that this is, you know, a case of government overreach, and that in that regard, we should be supportive of anthropics lawsuit. Because I think this does, in fact, have some pretty drastic consequences, potentially to the relationship between companies and government down the line, it is unprecedented, right? There has been no other U.S. company that has been labeled a supply chain security risk. Anthropic is the

Professor Andrew Reddie:

Yeah, great. I mean, obviously it's first. usually companies like ZTE and Huawei emanating from foreign countries that receive that designation. I think, I mean, just to, you know, one of the, one of the things that wrinkles me, or leads to a lot of kind of questions on my side is that, you know, ultimately, we have ways in which we manage contracts between private sector vendors and the government that are very well worn. And ultimately, if any party, either party on the private sector side or the public sector side, we're unhappy with the contract, there's a, you know, there's effectively three options. One is to renegotiate that contract and both sides come to some sort of agreement vis a vis usage policy, because that's really what we're talking about here. It's how Claude gov is being used instead of the DoD failing that they either can let the contract run it to its term and renegotiate with, you know, tabula rasa, or there's, generally speaking, break clauses. I don't think anybody's got their hands on what the original contract actually looked like beyond, you know, the sense of the terms, so $200 million passing from DoD to anthropic. But you know, ultimately, if one side is not part not happy with that, there would be a break clause, there'd be a financial penalty to pay, likely, in this case, in the tens of millions of dollars, and then you walk away from the contract. But instead, you had this kind of strange attempt to arbitrage against anthropic by going to all the other AI companies and saying, Will you agree to all lawful purposes, and trying to kind of, you know, push their hand a little bit, and which is difficult to do when, to the best of my knowledge, and maybe Sarah, you can shed some light here if I've got this incorrect, but cloud gov is, I believe, the only tool that's actually been approved for classified use instead of DoD. So it's not as if, in the near term, you actually could have X ai or open AI tools being used on classified compute, at least not until they receive right some sort of testing and evaluation for that use.

Sarah Shoker:

Yeah, I'm actually going to check right now if OpenAI has managed to Yeah, no, they did, in fact, reach a deal with the US Department of War to deploy its models on classified cloud networks the State Department, you know, I was reading earlier today that they've essentially been instructed to use GPT 4.1 which is a much less capable model in every respect. It's also less safer, right? You know, one of the things that the machine learning community has been trying to do, or tackling this issue as a as a research problem is trying to reduce hallucinations, or what is commonly referred to as hallucinations, which is when the large language model outputs something that is just factually, factually incorrect. I know that some philosophers and linguists take issue with hallucination because it's you know. It is, you know, we risk attributing human like qualities to the model. And that's certainly not what I am trying to do here. Only, you know, only mention that this is, you know, the common language inside industry to refer to this particular issue. But basically, the safeguards just not are not, you know, across the border, just not as rigorous for 4.1 and yeah, I am very curious to know what the average State Department employee is going to make of this after they've been experiencing, you know, the latest Claude models. But it's, it's a, you know, it's a bit of an absurd turn of events, being forced to use a much less capable model like several years

Vivian Bossieux-Skinner:

When we're trying to put this in old at this point, context of AI tools used within the military today. Andrew, can you kind of contextualize those things and where you see AI models being used in military context in the future,

Professor Andrew Reddie:

Yeah, I think that. I mean, one of the interesting misperceptions is driven by this particular conflagration between the Pentagon and anthropic is the notion that we actually have these generative AI systems used for lethal autonomous weapons, or that they're even capable of that. I mean, ultimately, I think the way that I you know that I kind of diagram, it is. That you've got this continuum of AI applications, and of course, we probably ought not to have the AI as one big bucket, but ought to probably separate into separate subsidiary technologies, whether that be algorithmic systems or systems that are leveraging computer vision for particular right applications, or what have you so all to say that like the AI is like the term of art probably is inappropriate when we're talking about military use cases, but effectively, you've got use cases that are relatively innocuous. So we've been using AI tools for intelligence, surveillance, reconnaissance for a long time, primarily to perform functions like data fusion and to help human analysts make sense of the data that's coming their way, particularly given the proliferation of sensor systems, for us to kind of go and parse data for predictive maintenance logistics, same thing. So, you know, there, if things go wrong, the failure modes aren't deemed to be so significant that you're going to have to go and, you know, you know, be very worried about using that system, right if I send personnel and material to the wrong place, or, you know, three days early, three days late, I get yelled at, but nobody's going to die. It's when you start to get towards use cases like courses of action generation. So actually, using some of these tools to create strategy and doctrine in military context. People start to ask questions and then moving a little bit further targeting applications, and Sarah's got some work here as well. So there are definitely fears when you start using particularly generative AI chatbots effectively and put them on top of targeting algorithms, whether that might lead somebody more to be, you know, a little bit more blase about actually undertaking that targeting. And so that's been a conversation in the context of gospel and lavender used in Gaza by the Israeli military. And then all the way on the right, you've got right kinetic, kinetic applications of force, where you might use AI systems to, kind of to do that. And so, you know, the kind of example du jour are the small drones used in Ukraine, which are, effectively, you know, DJI quadcopters that have, you know, computer have a sensor, have a computer vision algorithm that's determining whether something's a Russian tank or not if yes, drop ordinance. So relatively rudimentary, but ultimately, it's a system where you don't have, you don't have a human on the loop right kind of helping make that determination. The system is doing it entirely of its own accord. And so effectively, as you move further and further down that continuum, you have more and more concerns about about about AI's use in that context. But generally speaking, that's not what Claude gov is providing. Claude gov to the in a lot of cases, is providing a whole lot of back office functions in the same way that, like, you know, our students here at Berkeley use them to help write essays. It's not necessarily, you know, as it's not all the way to the right on that continuum, it's much more, kind of on the left hand side. And so, you know, it's something that we're watching pretty carefully, but that's kind of how I cut up the field.

Sarah Shoker:

Yeah, I think it's a, you know, it's a really good call to point out that large language models are merely one application of AI, certainly for decades, depending, honestly, how you define AI, which has been a shifting target. The definition has morphed, you know, since the late, 1940s 1950s but the military, under certain definitions has been using AI for decades. I mean, arguably, it's much more reliant on machine vision for targeting, as a result of, you know, facial recognition, object detection and the like, the use of large language models in in targeting is, you know, relatively, relatively novel. And I think we only got a potential glimpse of that. And really, the field was still pretty opaque. When reporting about this emerged through the through the IDF in Gaza, with the use of lavender the gospel in GoDaddy. And even then, it wasn't really clear where large language models fit into that picture. Now we essentially have confirmation that they have at least been used for part of the target selection process and also target prioritization in Iran. And you know, one of the reasons why I say that these two red lines identified by both anthropic and open AI are insufficient is because, you know, in my mind, the more immediate risk to the use of large language models comes from their integration into AI enabled decision support systems. You still have the same issues with hallucinations, with biases, you know, with over reliance or under reliance, and the fact that you know we now have reporting that appears to be confirmed by several news outlets and also by CENTCOM, that the you know that the US is able to select 1000 targets per day, you really only get that level of target selection with with the help of AI assistance. And you know, at that pace, under those time constraints, it becomes increasingly hard, and, you know, just more difficult for any human operator to vet the target, you know, the target selection process for actual accuracy. This morning, I think it was in the New York Times that ended up releasing an investigation, saying that the you know, the tragic event, the strike by the United States using a Tomahawk missile that struck in a girl's school and killed, I think 175 is where we last counted, mostly children. Was the result of essentially, a database labeling error, right? And they had not updated the historical data to reflect the fact that there had been that this building was not, in fact, a legitimate military target. But there's that old adage, right? Garbage in, garbage out. And if you are producing 1000 targets every day, you know, as a human in the loop, I have to actually wonder, what role do you have to play there, and is it even realistic to assume that at that level of target selection, that we have people who are actually going to be effective at civilian harm mitigation?

Professor Andrew Reddie:

Is it fair to say, Sarah, that you know, effectively, what these llms are providing is, effectively a user interface is being put on top of a relatively traditional intelligence or surveillance tool that helps in target selection. And so effectively, you've got this chain of targeting officer interacting with an LLM that's interacting with, you know, the targeting data that they would have already been addressing. Because I think that's kind of what's driving some of the research questions for shops like ours in this space. Because obviously you've got the potential for failure modes at each one of those integrator points, and we tend to focus primarily on the kind of automation bias or the servicing of lots of targets that might have previously been, you know, that we would have failed to unearth. But there's also a failure modes associated with layering these llms on top of other sensor architectures, if you will, or targeting architectures. Is it, in your view, is it kind of like a UI, UX layer, and that's what's driving the that's what's driving the Delta compared to what it would have been, say, five years ago, before we had these llms,

Sarah Shoker:

Yeah, you know, it is not clear to me, right? So I do think it's definitely providing a UI layer. I think I have also just general questions about the speed of targeting as well. Is it actually speeding up target selection? Or would the target selection? Would we still be living in a world where the Maven smart system, which is the decision support system that the model cloud was integrated into, it would still be able to select 1000 targets, and Claude is just, you know, the tool that is used to only retrieve those targets from from the databases, or are they actually, you know, is Claude also combining intelligence, because conceivably, you know, if it has access to these other databases, that is a technical capability. Large language models do you have? They can synthesize information for you. So it's not actually clear. I think the landscape is still actually quite opaque here, as it relates to when we're talking about targeting, which is, of course, itself, a really broad category. What, specifically was Claude doing? Because even terms like target selection and target prioritization still leave room for interpretation. So we're trying to make educated guesses, I think at this point,

Vivian Bossieux-Skinner:

Do either of you have anything else to add on any of this or other questions you want to cover?

Professor Andrew Reddie:

I think the only thing that I would add is that it's a rapidly moving story, and also too, right? There's a lot of uncertainty about kind of the timeline of how this started, and certainly uncertainty about how it's going to play out under the future.

Vivian Bossieux-Skinner:

Well, thank you both so much for being here and for all our listeners. You can obviously find this podcast anywhere you get podcasts. Make sure you subscribe and tune in next week.