All Source Podcast

From Action Plan to Mission Impact: Data or Bust

INSA Season 1 Episode 2

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 35:59

In this episode of INSA’s All Source Podcast, From Action Plan to Mission Impact, hosts Chitra Sivanandam and Yevgeniy Sirotin sit down with Sean Batir to examine why data, not just models, is the decisive factor in scaling AI for national security. The conversation breaks down how America’s AI Action Plan reframes data as a strategic asset and what it takes to build secure, interoperable, and mission-ready data environments across government and industry. 

Yevgeniy Sirotin

All right, welcome back. Uh and this is INSA's all-source podcast from AI Action Plan to Mission Impact. My name is Yevgeny Sorotin.

Chitra Sivanandam

And I'm Chitra Sivanandham.

Yevgeniy Sirotin

And today we're going to be talking about data or bust.

Chitra Sivanandam

We're here with uh Sean Batir. If you don't know him, he is the former CTO for Project Maven over at NGA and currently at AWS. So we are going to uh hopefully get a lot of time to poke his p pick his brain and see what's been going on.

Sean Batir

That's right. Yep. Good to meet you, uh, Yevgeny, Chitra, and thanks, Nsa, for having us.

Yevgeniy Sirotin

So the goal of this episode is to explore how the AI Action Plan transforms mission data into a strategic asset and what it takes for the IC, the DOD, and the DIB to build platforms, infrastructure, and trust needed to scale AI responsibly. Before we dive in, let's set the stage for why this conversation matters so much right now. Um, America's AI Action Plan is a high-stakes national roadmap designed to ensure U.S. leadership and the global AI race, uh, and uh a race that the White House explicitly frames as a national security imperative. Um and at the heart of this plan is a simple but powerful idea. We can't have world-class AI without world-class data. Uh the action plan explicitly elevates high-quality data to a national strategic asset, uh, noting that the United States must lead uh the creation of the world's largest and high-quality AI-ready data sets. Um, this marks a big shift, and we're moving away from treating data as sort of as a byproduct, and we're now treating data as a resource that enables AI-led uh scientific discovery and uh uh requires uh modernized infrastructure. So the AI Action Plan connects this directly to national power, arguing that breakthroughs in AI and scientific data are going to reshape the balance of global power over the next decade. Uh today we're exploring what this means for national security organizations, specifically how can uh the intelligence community and the Department of War uh build secure platforms, interoperable data environments, and cloud-enabled labs needed to turn raw mission data into a real operational advantage. This is the data or bust mission.

Chitra Sivanandam

Well said, Yevgeny. All right, we're gonna give you the hard question first. So the AI Action Plan calls high-quality data a national strategic asset and says the competitors are racing to build massive AI-ready data sets. So, in your experience, what does it take for the US to build similar world-class mission data sets while we're still protecting privacy and sensitive sources and obviously classified data?

Sean Batir

That's right. I guess let me maybe kick that answer off by focusing first on one of the key elements that I found to be pretty particularly successful with the AI action plan, which is its emphasis on sort of public and private partnership. I think when we think about how the United States actually leverages its unique gifts and talents, what makes us different than many of the central command sort of economies is the fact that we've got such a robust ecosystem of the private sector, right? We've got the hyperscalers where I bel I'm now seated. We've got some of your critical, you know, single um vendors, you've got the core mid-levels that are actually building out AI-enabled mission command. And then you've got small startups that are working on very bespoke capabilities. I think for us to think about how we build a system that has quality data, we have to sort of look at how we establish private public partnership. And looking at it through the lens of my Maven experience, we were able to actually bring the private sector into at that time the Pentagon and then later in other government spaces in order to actually curate and label that data. And I think when we look at building quality algorithms, first with computer vision, and then now as we are moving into more frontier AI, large language model reality, all of these still require and rely on quality data that it's relevant to the domain space at which we're searching. And I've seen it be incredibly successful on the Maven program. And zooming out, I think we have many lessons learned that we can replicate across the Dib, across the Intel, and across the Department of War communities.

Chitra Sivanandam

So it's a mixture of bringing in uh commercial capabilities with giving them the mission exposure.

Sean Batir

Yeah, exactly. I think I've observed that there's a lot of the secret sauce behind each company, while it might vary greatly, is sharpened and refined through the impetus of private sector um competition. And I think what we as the United States government should look at how we can better partner with them. And I've seen that um manifested over the past five years as I worked for the tail end of Trump one throughout the Biden administration and then in Trump two. And across all the administrations, I think one thing that is quite fantastic is that the private sector has continued to move forward along, continued to innovate. And the technologies behind building efficient label data and labeling that data even faster while also crystallizing human intuition has been the mainstay for many of the programs that I've supported.

Chitra Sivanandam

I mean, this definitely brings to bear like this um notion of the classification friction, right? So as much as you bring in people from the outside, there is still kind of a gap related to how well they really understand these classified enclaves, the data, the mission sets the operators are supporting. So, how do you deal with really these security enclaves and this this friction around unclassified and classified?

Sean Batir

Yeah, I think it has to do with levels. Um, and maybe this also ties a little bit into something that I really loved about, you know, the Trump uh administration's AI action plan, which is that it highlighted the importance of us as a nation building a pipeline for cleared talent, particularly those with backgrounds relevant to not only national security, but also to artificial intelligence and building that sort of research background. I think when we started our work on the program, the ecosystem was relatively experimental. I think we had to figure out how to pull in individuals who were US citizens but had never gone through the clearance process before. And for those of you who've been through the clearance process, it varies greatly and is very um circumstantial based on your particular legend and background. And while I won't comment on the process or the details of that, because that is a security officer's uh and our personal security's priority, I will say that I loved that the AI action plan highlighted on building out mechanisms to streamline that to make it at least easier to arrive at a decision of someone's clearance level. Um, by building an ecosystem where people can arrive to either that yes, no, or pending decision sooner, that reduces a lot of the ambiguity and it also provides us with the ability to think not just in like a 90-day cycle, but like forward and sort of a one to three to five year cycle. And the cycles are important at different sort of tempos because when you're building capabilities for national security and for defense, there are technologies and capabilities that we can build on the unclassified network that are highly dependent, I think, on the unique capabilities of particular startups or medium-sized organizations. However, when that technology starts touching more sensitive data, naturally we have to move up into that sort of dedicated region or classification domain. And in order for us to efficiently um be prepared, we have to basically build that reserve not just of a network for people to utilize, but also a reserve of people. I think what I loved about the um the action plan was reframing our workforce and our compute resources as a potentially contested um element that is almost akin to petroleum reserves, right? Like we are we have the muscle memory of thinking about how in c in crisis contexts you need to have a certain amount of petroleum available. How do we think about that in terms of compute? And then also how do we think about that in terms of the labor and the people that can actually conduct that said work?

Chitra Sivanandam

Wearing your um AWS hat, are you seeing that you have um you can approach it differently? I know I feel like I've anecdotally in talking to people, there there is some sense of like, well, I I know how to use bedrock on the outside, so there's enough of a translation to do something on the inside. Do you see something different now wearing your AWS hat related to the these this bridge on classification and working low side versus high side?

Sean Batir

Yeah, I think that you know, the way that our organization operates is that at the end of the day, we align to the requirements of the government and to our government sponsors. And so while one capability might be built in a certain configuration on what we call the low side or unclassified networks, when you move high side, we may or may not modify that capability. Um, and all those modifications are done at the request of the government. And so we always put the government's requirements first. Um, and that's why sometimes there might be a little bit of drift or uh slight change in how the systems are implemented between what you would call the low and the high side.

Yevgeniy Sirotin

With the with the I have a sort of a follow-on on that, is like with the innovation that's happening right now in the ecosystem, you know, you have new um capabilities, new kinds of models. If you look at something like, you know, Hugging Face has uh a tremendous number of uh even just derived LLMs and and customized LLMs. Can you talk a little bit about, you know, how do we sort of leverage some of those things and identify the, you know, the really promising ones to bring into some of these use cases? How do we make them sort of what's the path to making them available in those environments?

Sean Batir

Yes, I think, you know, I've seen that path be very successful when you have government officers or government officials, excuse me, who are particularly open to widening their aperture. And what they actually end up doing, right, is asking for input from both the private sector, but also building that mechanism to actually just have that conversation. I think what that in practice looks like could either be something as simple as having a weekly recurring call between government and the particular organizations like hyperscalers or medium-sized or startups that are working on a particular contract or on a particular program. Um, but that can also extend to a more hands-on intervention, right? Where I've observed, you know, government come out to a facility and actually share their problem or an unclassified proxy and then bring the private sector in. And I'm seeing this happen more frequently as time goes on. And so I think that that is the element of culture and personnel that is quickly catching up to the technology uh that we have built.

Chitra Sivanandam

And I think as it extends to um the data problem, are you seeing seeing um use cases with data coming out and me being accessible and available so that people understand the differences? I think often there's challenges with interoperability mainly due to like simple things like formatting of data. And had somebody known, like, oh, this is what I'm dealing with, it's really not that um not that complex, but without seeing it and without knowing it, people sometimes go down the wrong path, right? Is there something you're seeing that changes um that relationship between public sector and private sector as it relates to that that fundamental data sharing?

Sean Batir

Yeah, I guess maybe I'll break that apart in two parts. So data sharing I think is a little bit separate than the metadata question. Um, with respect to metadata, I think look, when we started out in like 2019 to 2021, um there were a variety of different standards depending on what level within a particular government agency or department you were discussing. I think as time has gone on, um, there has been a better effort across the United States government to establish those types of standards. But the reality is that when we centralize around a particular mission and when we work backwards towards solving that mission, we notice that there are some invariant properties that appear to basically establish stable schemas on to support the government. And so hopefully that answers that element, um, which is again, I think I've seen success appear by working backwards and identifying the invariant elements that are required of mission. And then in terms of sharing data, um, that definitely is highlighted actually in the AI action plan. I think when we think about the context of burden sharing and the fact that, you know, the United States needs to sort of like shift towards a framework of excellence and building sort of like US manufactured models in alignment with national security and then share that capability and competence with the rest of the world. I have observed that this has engendered conversations for us to think about how we partner with entities like our closest allies in the five eyes, how we partner and support NATO, and how we partner and support um some of the APAC countries.

Chitra Sivanandam

And how do you think about then um uh the data and the um I guess validity of the data or uh problems in erroneous data as it relates to model training and um developing new capabilities? Like there's I think a lot of discussion and concern as to how data poisoning can affect models that we depend on, as well as like the model poisoning, how do we protect the data or look at the pedigree and validity of the data?

Sean Batir

My observations and from my experience uh dealing with data poisoning have been that this typically gets filtered out when we scan for data quality. Um, steganography and in particular looking at input imagery and video that has been somewhat manipulated at this point as of 2026, um, purely on unclassified channels. I think there there are multiple methods uh in the research community that have empowered researchers to identify when this type of manipulation is occurring. And additionally, for more sensitive workloads, I will say that one approach that we have um observed is looking at multi-agentic frameworks. So even when a model might miss it on the first pass, uh, we observe that agenc frameworks uh are able to sort of like catch that in a secondary triage attempt. Uh now I'm not saying that that's perfect, but I think this is the exciting territory that we're now entering in in the second half of the decade, which is how can we utilize this type of emerging technology to continue enforcing data quality in near real time? I think we have controls in place for the training data, but at test time, at real-time inference for mission, um, that's almost a soft call to action where we must ask, you know, back to the public, like, hey, we'd love to learn from your technologies. Um, we'd love to invite you into sort of the conversation and really align with what the AI Action Plan says, which is let's partner across the United States government and um and the private sector.

Yevgeniy Sirotin

Just a quick follow-up on that, you know, since we're talking about sort of the having that high quality data as the strategic asset. So what do you think is the role? So one of the things the AI Action Plan calls out is this notion of you know building out maybe test beds and environments where you could bring the AI systems together with the data while still sequestering that data and keeping it, you know, keeping it at the strategically um positioned so that, you know, maybe our adversaries aren't get also getting access to this to train their systems. So can you comment on sort of what you think the role of test beds uh should be for that kind of thing?

Sean Batir

Yeah, I mean, super directly I would assess that test beds are highly central to deployment um and evaluation that remains compliant and safe. As we think about security, and in in turn, as we think about building models that can actually support mission, sometimes we'll need to perturb the conditions through which the models are stimulated, right? By different types of intelligence that they are processing or data that they are processing. And as a result, I think these sort of test beds or sandboxes are quite central to characterizing model behavior before we field them in the real world.

Yevgeniy Sirotin

No, I think that's it's super important because we all want to. One of the things we raised in the first episode, Chitra and I were talking, and you know, being able to develop that trust in the model by bringing it together with labeled really high quality data, you could really assess how well it is gonna perform in your use case. Because in operations, things are more difficult to uh to to to assess.

Sean Batir

That's right.

Chitra Sivanandam

So this leads me to think, um, so what is your opinion of synthetic data? And where did you guys get too far? How far did you get down the road in uh Project Maven? And how are you guys approaching it at AWS? Because I think um A, there's there was always a conversation on whether synthetic data um was good enough for training a lot of models. And um with today's AI, um, I mean, I use synth I use the AI's to build synthetic data all the time. And um, it's always a question as to like, oh, is it good enough um to get 80% of the way there or not? So I'm just curious your opinion of um where it was and where it's going and and how we should be thinking about the role of synthetic data.

Sean Batir

Yeah, I think synthetic data seems to be particularly helpful when we have situations or types of scenarios that are rare or difficult to observe. Uh on the Maven program, all I can comment on on these channels is that we looked at synthetic data back in the past, um, and we decided that this was not the right path. Um, we did do a scarce evaluation, and that occurred all before I joined that program. Um, the details of how the Maven program uses synthetic data are still classified, so I can't comment on that on that.

Chitra Sivanandam

Um breaking news here on the podcast.

Sean Batir

That's right, that's right. Um, however, in terms of um AWS and our broader research ecosystem, there's quite a plethora of um details about how synthetic data has been used more generally, um, especially across Amazon research, uh, in the context of helping us build sort of agentic reasoning frameworks. And so I think when we think about building scenarios and um different types of situations to build agents that are more robust, synthetic data can definitely help create and perturb almost like mini-worlds, mini uh constrained environments that perturb the state and action space in a manner that is commensurate to what you would see in the private sector for robotics. And so I urge anyone listening to this podcast to take a quick look at synthetic data for uh agentic reasoning. You'll be pleasantly surprised at what you can find on the open source with ArchiveX.

Yevgeniy Sirotin

No, that's amazing. I I want to turn our conversation, uh, I think this is a good segue into we want to talk a little bit about you know trust, control, and and truth-seeking AI, since we're sort of broaching on some of these topics now. Um, let's talk a little bit about interpretability, right? So one of the things that's happened in old, sort of more image-based AI systems is their outputs were sometimes, their decisions were sometimes very difficult to understand. You know, these were convolutional neural net models, the weights didn't really tell you about why it was arriving at a particular decision. And with LLMs, of course, um that's changed a little bit because now they can more or less explain some of their reasoning using natural language. So the action plan makes it clear. Of operators can't explain AI's output, they won't use it. Why does interpretability matter so much in real mission environments? And where does lack of experience? Explainability limit adoption today.

Sean Batir

Yeah, absolutely. I would say very directly, the reason why interpretability is important is because if AI is generating a recommendation or information that further develops what we would call a point of interest, then we need to be able to establish provenance. And so you need to understand where that data is coming from, what the AI had actually done to produce that output from said input, and then what is the origination of that input? These are critical both in the intelligence and in the defense operations practice.

Chitra Sivanandam

That makes a lot of sense and makes me think how do we really get our hands around provenance and make sure that we can, you know, trust and believe where we think we get it, especially if multiple models are basically permutations of some other thing downstream, right? How do we really understand and utilize like the right capability for the right tool as we move into these agentic frameworks? I think there's some there's an interesting story there, I think, on um the large foundation models and thinking about provenance of those versus things that are small and bespoke and tailored and derived from those, right?

Sean Batir

That's right.

Chitra Sivanandam

So so what's your thoughts on guardrails and how we think about all the research being implement uh put in place and um implemented related to how we implement guardrails and how we um improve them as we go forward?

Sean Batir

Yeah, that's a very unique question given some of the um recent dialogue around what it looks like to field frontier models in support of the United States government. I think, look, I think every department and agency has a slightly different mission. And so there is no such thing as a sort of one size fit-all model, right? I think the guardrails exist for a reason, right? I think, of course, many of us listening to this podcast by now are probably aware um of the sensitivities around two particular use cases, including using frontier models for mass domestic surveillance um as well as for fully autonomous systems. And while every frontier model provider um is arriving at an approach to address this problem, I think it's important to realize that the default guardrails that are available in commercial models, they do exist for a reason. But the requirements, let's say, just for the Department of Labor, will likely be different than the departments for one of the intelligence agencies. And that will also be different than the requirements for the Department of Treasury or for the Department of War. And I think that is the beauty almost of this country, and also returning back to some of the core tenets within the AI action plan, which is that in a world where we have such diverse mission, we also have such diverse optionality of models, right? I think the private sector and the United States' um strength in commerciality has thus engendered many different types of models that were trained, you know, on widely available data. And then the process through which they sort of build those guardrails um and refine or perform something we would call post-training is variant. And I think having that variance is good. I think maintaining that diversity is good, and that is where the choice between um the United States government and that frontier model provider is made front and center.

Chitra Sivanandam

Um as a follow-up to that, um, leads us down a path of saying, okay, what have you seen related to like these evaluation systems and ecosystems, um, red teaming, purple teaming, um the capabilities to figure out like, is it is it ready both for um our our safe use of models as well as potentially for um you know assessing opportunities and understanding even late trail breaks and things like that, right?

Yevgeniy Sirotin

No, I I think I think uh Chitra, I think you're you're you're hitting the nail on the head. Like, what does the AI evaluation ecosystem look like? So given that we have this diversity of approaches in the market, you have this option options available, people are tuning these systems to different use cases. Um we talked a little bit, we you talked earlier about a public-private kind of partnership as you go forward. What do you think is the role of like um evaluate public-private partners in evaluating the systems? You mentioned different agencies have different needs. Do you see sort of testing still being the primarily the provenance of it of a specific agency to do all that? Or, you know, given this how much is going on in the market, are you thinking that some of the evaluation could be um done within the private sector as well? And what do you what do you think the roles are in swim lanes?

Sean Batir

Yeah, this is actually a great question. I think putting on my prior hat as the Maven CTO, we observed the best success when the government, aka when I was the government officer, when we would provide all of our performers with a common set of expected um outputs. And we actually had a third-party um company come on board to actually perform that sort of test and evaluation that was independent and separate. That way we would try to minimize, hopefully, um, any one provider from making their capabilities seem like the best. And so, in that sense, the sort of test harness and standards were given by the government, and then the um the private sector or company that was participating in that said program or contract uh would have to basically submit their data. And I think in that way, that almost creates a sort of um a standard that is tuned to that particular mission set.

Yevgeniy Sirotin

Yeah. Yeah. No, I I I I I think that's a sort of like a three-legged uh stool where you have the government setting the APIs and and sort of the benchmarks that there's going to be evaluated against, um the provider, technology provider, and then an evaluator that sort of comes in. It'd be I think it would be great to sort of see that scale as as we get further and further into implementing the AI action plan.

Chitra Sivanandam

Well, I'll take the devil's advocate position there and say I think the hard thing there is you have to already have some idea around with this data and with this test scenario, this is what I think I might expect or not expect. And it's um finding those like gaps in between where like putting the wrong data through or doing the wrong thing with the uh model or the mission gives you an unpredictable effect that I think we have to try to figure out how to be resilient towards. So I I challenge whether that's actually sufficient in how we approach it today.

Sean Batir

Yeah, I think in terms of red teaming, this is where I would say that the best practice would be for us to explore inviting some of our frontier model providers and those that are supporting mission into a um into the right space. And in that space, we could walk through details of particular mission scenarios and outcomes. I think this is one of those sort of gray areas, right? Because with red teaming, of course, we're thinking about what could our adversaries uh potentially do to United States systems um and United States AI capability. And it's a very important question. I don't know if I personally don't think that that conversation should be had on the open internet.

Chitra Sivanandam

Yes.

Sean Batir

Um agreed. Yeah. But I think that we have the expertise, and again, returning to the action plan. I think that this is only uh even greater impetus for us to build our clear talent and invite them into those sort of those spaces where they can contribute their insight and their knowledge. So if you're a researcher out there and you're listening to this podcast, please do check out INSA and think about how you might be able to contribute uh either on the commercial or on the government side.

Chitra Sivanandam

So let me let me turn the attention real quick into um things like prototypes. You know, with the um the action plan and a lot of the subsequent directives, there's um heightened interest in um CSOs, the commercial solutions offerings, and um OTs and prototypes. Um in your experience, what separates a good AI prototype from one that doesn't get past the early experimentation phase?

Sean Batir

I love that because that was my life for the past five years. So I like to almost think of this term as AI for mission. Um, and you'll start noticing that mission-centric AI is appearing across multiple agencies. What makes the deployment of AI particularly successful is when that user is thinking about the operator or the end user. Um, and basically the models are not just providing text, right? But they're actually accelerating an outcome. They're either improving the quality of a particular operator or analyst's output, or they're reducing the time it takes to arrive at a decision. Um, and there's even greater power when that occurs at scale. I think I've seen highly technically gifted teams and individuals offer solutions to the government that went nowhere. And that often happened because they were divorced from the use case. They never had spoken to an engineer. And I think, sorry, um, they the engineers had never spoken to an operator. And I think the best sort of pairing is physically planting your researchers and engineers adjacent to or next to an intelligence analyst or a financial analyst or an operator uh in the DOW. Because it is through that direct interaction where I think innovation really happens.

Yevgeniy Sirotin

I love it. Fostering sort of an alignment between AI technology developers and the use cases in which the technology is being deployed. Fantastic. Yeah.

Chitra Sivanandam

Yeah, I have to agree. Yeah. Never pass up the opportunity to shoulder surf. Yeah.

Yevgeniy Sirotin

Yeah.

Chitra Sivanandam

I'd say one other thing that I think we can think about is um as we kind of think about what's left on the action plan and where we are today. Is there anything that you've seen related to like misconceptions? If you had to point out like one or two things, like, you know, what are the the what are those big misconceptions that some of our leaders might have about AI mission data and what they can and can't do?

Sean Batir

Yeah, I think it's 2026 at the time of this recording. Um, and I'm still hearing people who are saying that these models just aren't good enough. And I would challenge that assumption, um, both for those of you on the private sector and also on the US government side. You know, I've been on both. At this point, I think we need to be a little bit more rigorous about how we're actually prompting many of these large language models, because I've seen the same model respond in a way that was commensurate, you know, with an entry-level analyst based on critical prompting. And I think sometimes people who are not skilled at prompting models um will be quick to blame the company that released that model, when in fact it's actually because they had never learned how to prompt in the first place. Um and so I think that's a key misconception. I think there's an element of personal responsibility there.

Chitra Sivanandam

Um it's not always the data, it's not always the model.

Sean Batir

That's right.

Yevgeniy Sirotin

Yeah, we can all stand to be better prompt engineers.

Chitra Sivanandam

Yes. Yes, exactly. All right. I think uh as we wrap up this episode, again, thank you very much, Sean, for being here. Um and maybe just leave it at one final thought on if uh if you look at it um across your experience, what do you see as the biggest opportunity for the US to accelerate our progress and where we are today?

Sean Batir

Yeah, I think the most exciting part that I can think about in the next two to three years is let's all start experimenting. I think all of you who are listening on this line, think about where you want to be in three years as an individual, as a member of your company or as a member of the United States of this or and of your nation, and then work backwards from your desired outcome. I think if you think about how we can invite these AI-empowered agents to work alongside us, you'll be present, you'll be pleasantly surprised at how models and human AI collaboration can bring you closer not only to your ideal end state, but also to achieve mission outcomes at a rate unparalleled.

Chitra Sivanandam

Well said. So thank you all. Um again, this was InSA's All Source Podcast from Action Plan to Mission Impact. And thanks again, Ivgeny and Sean Batir.