What's New In Data

Strands Agents: A Model-Driven Approach to AI Agents with Clare Liguori (Senior Principal Engineer at AWS)

Striim Season 6 Episode 11

In this episode of What’s New in Data, we’re joined by Clare Liguori – Senior Principal Engineer at AWS – to dig into the evolving role of retrieval in modern AI systems. Clare unpacks why retrieval shouldn’t just be thought of as a technique to get documents — it’s a strategic tool that can unlock smarter, more adaptive agents.
We also explore how AWS is thinking about orchestration, what actually counts as “reasoning,” and why the real power lies in combining structured memory with real-time context. If you’re building with agents, this one’s packed with insight. Show Notes

What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

Welcome to What's New in Data. I'm your host, John Kutay. Today we're joined by Clare Liguori, senior principal engineer at AWS. Clare recently announced Strands Agents, a lightweight open source SDK that's already powering agent workloads at Amazon Q developer, Glue, and much more. And now that it's open source, the whole community gets to benefit from the simple abstractions that it gives us for building agents like applications. We're going to dive into the core principles behind Strands Agents, and also some practical advice for software engineers to break into AI engineering and how it's a little bit different from what they're used to. But more than that, Clare shows us what it means to think big about AI and how powerful it can be when you just start hacking on real problems. Let's dive right in. everybody, thank you for tuning in to this episode of What's New in Data. Super excited to be chatting here with Clare Liguori, Senior Principal Engineer at AWS, working on Strands, among many other exciting projects. Clare, how are you doing today? Good. Thanks for having me on. Absolutely. Clare, tell the listeners about yourself and your journey into this space. Yeah, I've been at AWS for almost 11 years now, and throughout that whole journey, I've really been focused on developers and developer experience. And, let's say two over two years ago, you know, generative AI as a real application for users was exploding. There was ChatGPT going on. Lots of new models were coming out, lots of new projects were coming out. And, I started to think, as well as others at AWS, about how generative AI could change, how we develop code, how we create applications as developers. And that really led me to start thinking about agenetic AI as well, when you can let the AI write an entire application for you and interact with the world for you, to pull in information and manage your, your AWS resources and things like that. So over the last couple of years, I've been so excited to see how powerful models have become, to enable building AI agents. Absolutely. AI agents are already proving their value, both for the indie developer and the and the enterprise. And I think one of the cool things about AWS is it's, you know, it has that range of, you know, being, valuable to, to, to all those, types of developers. Would love to hear, about your, your launch with Strands and what the product is. Strands agents is a open source library for developing AI agents. One of the key, design choices that we made with Strands is that it's what we call model driven, a model driven approach. When I started building AI agents back in early 2023 with, anthropic cloud v1, I think 1.3. I was looking at some of the code today. It was we were just starting to build agents in the industry. I think the, the original react paper that showed that that LLMS could interact with their environment through tools had just come out. But it was super hard at that time. You would I remember at the time, using Claude, you had to put all of these instructions and examples and XML tags and things like that, and you couldn't really get it to do much. That was complex. So one of the first agents that I tried to write was, troubleshooting problems with your AWS resource configuration. Things like, you know, you get this very opaque error message that something's broken with your application, but you don't know what. And so I was trying to figure out, well, could agents just kind of explore my AWS account and figure it out for me? And at the time, they just couldn't deal with that much complexity. And so you had to have kind of fallback to, prompt chaining, agent chaining, very explicit agent workflow for very explicit, specific use cases that those agents would be able to complete tasks for. But over time, now that we have this sort of latest crop of, of large language models, there's so much more capable of driving agents themselves. And that's really where we came to this model driven approach, because we were seeing that actually, we didn't need a lot of complexity. And some of the internal agents we were building at AWS with all of this chaining and workflows and things that you could really throw the tools directly at the model, and the model can figure out which tools to run. And it's actually a lot better at complex use cases now. And so you don't necessarily have to create a bunch of individual agents that are, sort of domain experts. You can have actually have one very generic agent with a set of tools. So that was really the, the beginning of Strands agents with us internally within one of the products I work on. Q developer realizing that there was a simpler way and there was actually was actually even, more higher quality because this is now how these models are being trained. They're being trained to use tools. So we, started to re-implement, throw away some of our agents internally and rebuild on this model driven approach. And we were seeing it was a better user experience, it was more accurate, and it was actually faster for us. Right. It's obviously faster to develop something where you write a prompt and you give it some tools and off you go. So that was we started developing it internally for us at AWS, and then, you know, we started seeing such great, impact internally for us that we, we wanted to release it for the, for the community. And so this model driven approach is really at the core of it, where it's really focused on let's leverage the power of how powerful these models have become, especially with tool use, and, and get rid of a lot of the complexity that that came from, you know, using earlier models like Claude V1. Yeah. And that's a great call out that, these lines are now being trained on using tools. You know, we'll get into this, but some of the protocols that are coming out, like MCP and A2A lots of acronyms, you know, these are all sort of materialize and they're productive. So, you know, there's certainly a lot of interesting directions it can go in, but it all starts with the model. Like you said, and having, you know, the, the, the models, the the prompts, the the evals, the tools, like as you're kind of, core building blocks. And then from anything there, you could kind of, from those abstractions, you can build applications on top, like what are the like, what are the other core architectural principles behind Strands, and how do they reflect a shift in how these agent systems are being built? Well, a couple are one, it's not tied to any the particular model, just that your model have your reasoning capabilities and tool use capabilities. And what we're seeing is that lots of models are now being trained that way. And lots of models either have sort of a messages API or a tools API. In the case of, anthropic APIs. And so we were really cognizant early on of what are the interfaces that we're providing for you to be able to plug in your favorite model. That could be something that's running locally on your laptop with a llama. It could be something that's running on AWS with bedrock. But we also partnered with anthropic to provide, anthropic API support, meta for, for their new, llama API. And we're working with a lot of other partners to provide additional APIs. So we wanted it to be something very, flexible in that in that case of, you know, regardless of what provider you're using, we're finding that there's a lot, more consistency in interfaces to models these days. And then we're also seeing more consistency, of course, in, interfaces to tools, like you said with MCP. That's been amazing for, having a more standard interface across whatever agent framework you might be happening to use, but then also for the tool author to be able to provide MCP and, servers for, for anybody to use, regardless of their agent framework. And so it's been amazing to see the growth in the community of those MCP servers. I mean, there's thousands now out on GitHub and, and on, you know, PyPI and and npm, and you can bring any of those to Strands. Right. So we're trying to support standard interfaces as much as we can. It supports open telemetry. So you can bring in whatever telemetry provider you prefer. And then you can run it anywhere. It doesn't need to be. It can run on AWS or it can run on, you know, your laptop or anywhere else you need to run it. So we wanted to provide something that wasn't tied necessarily to any individual provider, but really instead showcase this, this model driven approach that's been so powerful for us. I and that's a that's a very astute point that, you know, you have the model and you can have multiple models and you can have multiple runtimes and, you know, having the power to sort of develop locally and then scale in the cloud, and not be restricted, you know, based on, you know, your code shouldn't be tied to, you know, one specific environment. So it's taking a lot like when I looked at when I looked at it, I mean, it seems like it's taking a lot of just software engineering, you know, both, you know, scalable principles and practical, like, easy to develop with and, and I, and I like the other part like you mentioned, it's, it's, it's model driven, but but it can work with multiple models, which is becoming pretty standard for AI application development, even in the if you're doing things like, as a judge or, model cascading and you want to test, you know, cheaper models through some workflows and then, you know, potentially have to fall back to more, expensive ones. You know, if the cheaper, fast model is not working as well. So it's super flexible and I think it's, it's, it's a really nice reflection of where, AI applications are standardizing their approaches, diving in deeper into MCP and in a to a and these types of things. I, I would love to see your take on it. Like, where do these like emerging protocols. Like I mentioned, MCP and A2A sort of redefine the communication between intelligence systems. one of the specific, use cases that we've seen for multi-agent because I think, I think there's, what we've seen is that, agent to agent communication is not always need to be something special. So we've had a lot of success with internally with, the agent as tool model where, the agent is, a bit more of a sort of a request response where the sub agent, you're calling it as a normal tool. You could put it behind MCP, you give it a prompt, it comes back with a response. We have multiple systems internally that, that run that way. And some of that is because there are, agents that have specialized knowledge or which really ends up being specialized access to data. Right. But often what we see is it's also a way, organizationally to, to split up a very complex system into multiple teams. Right? So at Amazon, we're kind of famous for the two pizza team. And, you know, letting teams own, their, their whole system. And so in some ways, multi-agent is sometimes more of an organizational, solution where, you know, you have five different teams that want to own some experience or domain expertise in a, in an overall agent system. But we do see some more complex, use cases for agent to agent communication. So one challenge that we've had with MCP is with MCP being able to model many, many tools. So for example, we have an agent internally that models, more than 6000 tools. So there was those mapped to a huge number of, of AWS APIs. And that's you can't put that in your in your prompt context. Right. If you have an MCP, server that exposes 6000 tools, you'll probably blow out your context window. And if not, your requests are going to be very slow and very inaccurate in terms of what tool it chooses. And so we've done a lot of work around agents where, inside of this sort of sub agent, it does a lot of work around selecting the particular tools that you should, exposed to the LLM based on the customer's input. So, you know, if they asked about S3, it's probably going to be an S3 API. If they asked about EC2, it's going to be an EC2 API, and sometimes it's a little bit more, ambiguous. Like they might ask about a function and not say lambda. So, being able to figure out, you know, what are those maybe top five most relevant tools for the LLM is something that that agent to agent communication has been really helpful for, for us, and being able to, maintain state across a session in that sub agent in the agent as tool use case. Often you don't necessarily save state in that subnets that request response. And I'm done. But we have some of those use cases especially around that, that selection of within 6000 tools for the session, becomes very important. We're doing a lot of work with, we're doing some work with A2A. We have an example of using Strands with A2A. And that enables you to do, you know, to kind of hook into that entire ecosystem of having, agents talk with agents that could be implemented in any agent framework. Right? You might have some link chain, you might have some crew, you might have some Strands, and they can all communicate together. But we're also doing a lot of work, in the MCP community upstream to, try to evolve that, to be able to support agents, community over the MCP protocol, which is super interesting. So one example is, we a few weeks ago now, I think we got merged a new feature in the MTP spec called elicitation, where the MCP server can respond back with and elicitation along the lines of I need this information from the end user. And the the core agent, you know, goes off and and returns control back to the human user. And that's going to be super important, of course, for agent agent communication. So I think it's very interesting to see, you know, both how these both evolve, right. But also how we start to think about the lines blurring between, communicating with agents and communicating with tools if they're in the same interface. So at a high level and, and tell me if I'm oversimplifying, but, would you categorize MCP as the best way to, for, for agents to communicate with tools and then who is the best way to have, like these multi-agent communication today? Yeah, that's that's the, the pattern that I tend to see, is where, you know, it's really simple to wrap, MCP the interface around an agent. You know, it's, it's an API, that you can stand up. So we've had customers have success with, for example, putting an agent inside of a Lambda function and then just putting, you know, making that an API. That is MCP compliant. Some of the work that's been done upstream and MCP, I think has more enabled this. So in the original spec, it was just, mostly local MCP servers, which was great for, you know, Ides and chat clients. And if you look at those thousands of MCP servers out there, they're not they're not actually servers. They're really just local process with, you know, standard IO streams to your, your chat client locally. But where I see the industry going is really embracing, what we call the remote MC server. Right. The MCP server that's just behind an API. And that's really just become possible, I think, in the last couple of months, even, as the upstream community has been, adding new ways of, of building scalable MCP servers that are remote, with, something called a streamable Http transport, which basically just means, you know, putting it behind, more familiar Http APIs. Yeah. And that's definitely a big boost for, performance and just some other things like that that you would expect. An API to, to, to deliver. So that's a really exciting, drop for sure. And, and like you mentioned, you know, it just became available in the last few months every time. And just listening to these episodes back in January, like, there's there's been so much, innovation even since then, I think, like the rate of development, in AI obviously correlated with the, you know, I code gen tools, but, you know, and then, of course, the excitement and the value, it's bringing and I. Yeah. And, and when you look at, the frameworks that, that now, you know, companies like AWS are delivering, like you said, you've been working on this since, 2023, but it's on the, on the, on the Q developer side and the, the the amount of applications you've seen built then versus now, obviously a huge change in the, in the amount, but also the requirements around them, the, you know, being able to handle the scale of these data processing engines and databases, and, and making it streamlined for, enterprise application development. And then of course, like even even the SMB and mid-market ones as well that have large data volumes. So that's been just really fun to see. And I felt like it was a great turning point in the industry when you saw, Strands agents come out because it's just sort of this kind of light layer of abstraction to build abstractions to build these applications. It's, and it's really grounded in this, this idea, like you said, that, you know, we start with the models, right? Which is a powerful way to look at it. Definitely. And I think one of the interesting things that I've seen over the last, I would say six months maybe is, how differently we think about this model driven approach in terms of even just information, right? When we we all kind of started out building agents. It was all about rag, you know, right before you called to the model, you always supplemented your context with pulling from some vector DB, right. And one of the things I'm starting to see now is, what, internally, one of my colleagues calls retrieval as a tool as opposed to retrieval, augmented generation. I don't love the acronym Rat, but but there it is. So, you know, we're seeing a lot more of, you know, don't actively give information to the model that might be too much information or even information that it doesn't need. Like if it has enough information in its training data, you might actually make things worse by proactively giving it information. Instead, create a tool that that can retrieve information for it. So as opposed to proactively having kind of a prompt pipeline where you always augment that prompt, instead provide that vector DB as a tool where it can retrieve documentation. And I think one of the what are the places where we're seeing this that's interesting is in actually a lot of the development tools with MCP. So we're seeing a lot more of what I call documentation MCP servers, where, you know, in the agent space especially I was looking at the training dates recently of the latest Claud models and since, Claude 3.7 came out, there's been like ten new agent frameworks. And then even since sonnet four came out, I think I read that their training data ended in, in March of this year. Strands came out in May and so it doesn't even know about Stands yet. And so, we're starting to see a lot of these, you know, projects provide LLM tags so you can retrieve those through an MCP server and again, the MCP server is a tool. So it's letting the LLM decide, when do I actually need this information for my task versus have proactively throwing a bunch of information at it. So that that's been a big change in sort of those data access patterns that we've seen from moving from that prompt pipeline where you would try to get it every time, and then letting the model decide when it needs that information. Absolutely. And timely. Because just, at least from the time we're recording this episode, last week, in the Strands GitHub repo. And that's one of the other great things about Strands. There's so many great actionable, easy to run, examples on the GitHub. I'll have that down in the, the link to that in the show notes. But there was just recently one that came out with, Amazon Aurora, de SQL, running through Strands agents and just like you mentioned, it's, it's retrieving data with these tool calls. Right? So you look at the architecture diagram and it's and it's reading read only query, it's doing transactions. There's get schema and different just high level tool call APIs that interact with the database. And when, when Mark Broker he was he was on What's New in Data last season. And, you know, we were chatting about how agents and, serverless databases actually are very complementary technologies. So it's very cool to see sort of the materialization here. And, and like you said, it's it's it's super powerful too, rather than having a rag and trying to like populate your, your, your prompts with, with contacts kind of manually or, or overloading it. The, the agent can kind of decide when it wants to make a tool call, to, to retrieve very specific data with specific parameters and it makes the application more deterministic. And we've also seen that, it also makes it much more generic. Right. You don't have to have a very specific prompt pipeline where you determine the, the query to the database ahead of time and then add the data to the, to the prompt. You're able to apply these agents to much more generic problems than you could before by effectively teaching the model how to query, and then giving it some, some boundaries, right, giving it some guidelines. But generally it's, it's much more, flexible for, you know, a lot of different tasks that you could throw at it. Yeah, absolutely. And, and it gives a good guardrails for, for things like data access. You know, even in that example, there's, you know, it specifies, you know, read-only queries and, retrieving schemas, but then it bubbles up into this powerful experience where, you know, the agent can, you know, get all the tables and do some operations. But but a smart, engineer who, who understands, you know, how to set up database applications can still predefine those tool calls in a way that's that's safe and scalable, and you can even do your, your explain and analyze to to to make sure those are efficient queries. Or you have you might have agents interacting with systems that that with databases that are trafficked on other applications as well. So it's a very powerful concept. And it's so nice to see in very, succinct and well-organized lines of code, how it can be implemented, in that, in that GitHub repo, along with, with a lot of other fun examples. And since, you know, we're on that topic, like, what's one of your favorite tutorials, in that, in that GitHub repo that, that you like to call out? It's actually not in the GitHub repo. I put it in the, in our blog post, about the Strands launch. When we were, we decided to open source Strands. It had a different code name internally, right, that we, and so we were trying to come up with, okay, what are we going to call this, this open source project? We need to come up with a name and, and, you know, the lawyers have to sign off on it, and it, you know, it can't be too similar to any other companies or whatever. And, as it turns out, in the AI space, all words are taken. Everybody's got, you know, blank AI. Right? So, so we were having this trouble where, you know, we do a brainstorming session where we come up with, oh, this would be a really cool name. That would be a really cool name. And then we'd go and search on it online and it would say, oh, of course, of course it's taken already. Right? Or we find one that didn't have a very, very clear company that was already using it, but someone had already registered the domain that we would want to use, or the GitHub org that we would want to use. And so I wrote, with Strands a little bot that would, help me to come up with names. So in the prompt I would give it some like some guardrails around, you know, how we name things and then, some ideas of themes that I really liked. Like we had, you know, a DNA theme going. We had a, like a sewing theme going as well for a while. We had a, like a car theme going for a while where you would think about, like a motor going around a root or an axle going around and around. Anyway, so I gave out all this ideas and then I gave it a few tools. I wrote an MCP server, for my favorite thesaurus website, because I always felt that helps when I'm brainstorming. I wrote a, a, domain name search, you know, figure out whether it was registered or not. I discovered that, through this, I didn't know this, who is is no longer the hot thing for figuring out for domain name registration. It's now Rdapper or RDAP. And as it turns out, the.dev TLD on my Google doesn't support who is, it only supports RDAP. And so I was using this who is MCP server that I found, you know, on GitHub. And it nothing.dev was coming back and of course we wanted to use it. We wanted to to, you know, explore using that .Dev. This was a very developer focused project, right. And then I also had like the GitHub MCP server where it could look up, possible org names. And that helped us just churn through ideas, which was really cool. And we were able to prove like, hey, this one doesn't work. This one doesn't work because it's got these, conflicts already. And that was just very exciting. You know, one, it was super easy to write with Strands. It was, I don't know how many lines of code, but really small. And then also being able to take advantage of, some of the MCP servers that are already out there, like, I was able to pull the GitHub one off the shelf, and then it was really easy to write up one where I was, web scraping. My favorite this the source website to be unnamed. And then, you know, adding, this domain name lookup, one that I found. So, you know, it was it was just so easy to kind of pull these things together. And the the most work that I was really doing was thinking about what are the the real kind of business value. Right. I was thinking about what are the, themes that we want to explore, what are the names that we really like or don't like, as opposed to doing all this manual lookup. So it's really, really fun for me. And that that also speaks to, the simplicity and how it was just able to aid your creative process. Well, while you were working through a higher level challenge was like naming something and that's, you know, and then, for, for, for AWS, the the scope of naming something obviously has even more, challenges to it. Which, which, which, you know, you're able to, to navigate things to, you know, some, some clever, both brand brand thinking and also tooling, but, you know, and it's not going to be we're not far away from a future where, like you said it, you just you you have this tactical thing you have to get done, and. Okay, let's let's outsource it to some agents. That can do a lot of the, hard work for me. And, you know, I didn't know about the RDAPR whois thing. And so, yeah, I find it. But, you know, it sounds like it's something that, you know, you discovered in the process. Maybe your agents discovered in the process and told you, Yeah. No, exactly. And, and, you know, there's one end where. Yeah, you'll you'll use these products which sort of wrap the agent experience and make it easy for you for, for for folks who are, you know, non-technical and, you know, they'll just type in some, some instruction and it's going to deploy a fleet of agents to get it done. And then, on the other hand, you're going to have developers who either through hand coding or, AI code gen, can can use like a nice wrapper, framework to, to, to build this stuff for themselves, at a, at a rate which was just not possible before. And it's from that perspective, it's, it's super exciting both like as for, for like the non-technical consumers of AI and, and the builders who, who, who want AI at their, at their fingertips. So, I, I totally recommend, we'll have a link to that, to your, to your blog in the show notes as well. And it's in to the, to the GitHub repo. But I always just encourage people like the people who are creating these frameworks, including yourself, are making it easy to run locally and then deploy in the cloud. So start with your your laptop, your laptop is, most people's laptops are pretty powerful these days. There's a lot of great, tooling, but especially with Python, to build really nice applications that are portable, that, you know, you can you can really get far on your laptop and then still be in a state where you can deploy to the cloud without changing too much, in terms of, you know, just just environment variables. So I do totally recommend everyone to just just get, get hacking and, you know, hack around and play with these frameworks and, and try it out because you, because people don't even understand what they can do. You know, just because it just seems impossible. Like, even if you've been in software engineering for 25 years, I hear that a lot. People just didn't understand what was possible. And it's it's so powerful. Like, you know, you kind of just have to go, go do it and play around and get that, that empirical knowledge, yourself and Strands is a great, great framework for that. You know, looking ahead like, you know what foundational. But so we're talking about all the amazing stuff that's there, but what are the foundational building blocks that are still missing from the agent ecosystem? Well, I think one is that I'm kind of excited about is thinking about how can we reuse these agents across, other developers like, you know, we think about we have libraries, right. And, and we've created this amazing ecosystem of libraries across all these package managers. But I also think that agents are kind of the new, you know, MCP library and city servers are the new libraries. Agents are going to be the new libraries. And so how do we share these or, you know, make them available over APIs or something where, you know, as a developer creating an application, maybe what I'm doing is pulling together a set of MCP servers and pulling together a set of agents. You know, I've seen that, you know, even just with my little naming bot, I was seeing that the power of being able to just pull in these, these MCP servers, whereas before, you know, I think about API integration work using some like a, like the GitHub library, for example, would still take me a long time to build to write all that client code and, MCP servers, I just pulled it in and, and, you know, wrote a prompt. And I think that sharing agents could potentially be even more powerful. So I'm, I'm looking forward to that world where, you know, we we start to share them as well, like libraries. And then I think also, the it's still very unsolved about building up entire applications around agents. You know, you still have to do a lot of work to, to put it behind an API somewhere. You have to, you know, figure out authentication. You have to figure out how does my my react code in the front end interact with this agent? What does it look like in the UX? I think we're still very stuck in the, generative. The AI means chat world. So I'm looking forward to, you know, the UX around agents evolving as well. Where we start to think about, how could agents help in my work other than having a conversation with it? So I think there's just so much space. It's and it's like you said, this space is changing so fast that, you know, you can't even imagine what's going to come in three months, right? Yeah. And I, I really like your point about just being able to share agents sort of like sharing a, an API, but you have everything, ready for you to use there to execute that agent and, and also, one of the things that's really, it seems critical for AI applications is, you know, the traces, the evals, the, you know, you're kind of your integration tests for, for the AI, like, changes to the, to the prompts. How does that actually impact the, the the outputs? So when I come back to, you know, your idea of, of sharing agents, doing that in a way that also like, delivers like the trust that it's that it's working the right way. And I think we're all as an industry trying to figure out like, what are that? What are those sort of units of trust? And of course, you know, you have LLMs as a judge and in confidence scores and all these things that are really useful and, you know, just borrowed from, from machine learning. I always when I'm working with software engineers who haven't done agentic or AI development before, I'm like that. That's where you should really start. Like how we come up with your framework for deciding like what's correct. One of the things I've noticed working with teams at AWS who are starting to build generative AI capabilities is exactly that hurdle that, us as engineers expect testing to look like a define an input, I define an output. I get that output when I provide the input. Then things are working. And obviously generative AI is not like that. You have to you can't rely on it providing you ever the same answer as what you got before. And so thinking about, what success criteria looks like, what good looks like is it know 95% accuracy of tool selection. Is it, 95%. You know, LLM judge saying this is close enough to your ground truth answer. How do you even think about creating test sets and ground truth? Answers. These are problems that, as an industry software engineers have never had to deal with. This has been scientist ML scientist stuff. Right. And so we're all kind of needing to become scientists a little bit. Right? And we, we're not used to, using the scientific method we used in, learned in grade school at work. Right. Having a hypothesis and coming up with a test and, and executing that test. And so, I think that, evaluation tools today are still really hard for engineers to grok in a lot of ways. You know, even that question of, well, how do I come up with the test set that this evaluation tool is going to go through is really hard? I think there's so much room for improvement there that speaks to, you know, the fact that we are complete amateurs in space. You know, I don't have a I don't have a science PhD. So, and then I think also mapping it to, the reality of the, you know, the way that your users are actually interacting with your system. Right? One thing that we say at AWS, for any service we launch, they're going to be customers that use it in ways you never expected. And I think that's probably true of any application. And so you have before you launch something, you have this guess as to how people are going to use your system. And you create a test set based on that guess, but then you have to follow back and you have to validate, okay, what kinds of questions are people actually asking of my of my agent, or what kind of inputs are they actually providing and try to match that distribution in your test set? And that can also be really hard, right? How do you how do you think about that kind of matching that data, the live data set with what you're testing against? Yeah. And we're sort of scraping the surface of this. But you know, if I were to ask you generally like what's your advice for software engineers, whether they're new or experienced or just fundamental computer science background, what's your advice to software engineers who are breaking into AI engineering? One is, try Jupyter notebooks. That's something I had never I had never heard of, honestly, coming out of, you know, a computer science program. And then all of a sudden, I'm working with these, you know, ML and AI scientists, and they're all sending me these Jupyter notebook files, you know? So, we do actually have a lot of, Jupyter notebook examples in the Strands examples repo. So that's a good place to start. But I think also thinking about, again, kind of how, how testing is going to look like one thing that I often see from myself as well as other engineers is, what I started call vibe checking, which is, you know, we know about vibe coding, but how do you know when your thing works? How do you know when your agent works? You just kind of give it a couple of inputs, and then you personally like the outputs, and then you send it on its way. Right. Starting to think about, okay, how can I actually write these down and think about what do I expect the answer to be? And thinking about how you would compare, you know, good versus not good. So I think that there is a lot of kind of intuition that you have about what is good and what's hard is writing that down. And so trying to kind of look through what, what you would think of as good. We actually have a, we have another product called I'm not going to get it right, but it's something like bedrock automated reasoning. And one of the challenges that we have with that product is basically looks for hallucinations in, large language model outputs. But one of the things you have to do ahead of time is codify. What is the truth? It can only tell you it's wrong if you've already told it the truth. And then it uses mathematical proofs and automated reasoning to determine whether it is true or not. When it comes out of the model. And one of the things that we found is people don't know it's true. And so what's really interesting is, you start with here's what I think is true. And then it kind of takes you through some scenarios and says, okay, if this is the case and this is the case based on what you've told me, this would be the result. And then you say, no, no, no, no, that's not right. So one example is, if you give it a tax code, and it does it best to take a tax code and turn it into mathematical models, then it'll give you. Okay, if you are, a married couple with two dependents, then your deduction is ex and someone who's a subject matter expert on on actually doing that math can come in and say, no, no. And this is why. And it can go back and, and, regenerate those, those models. And I think that's a very interesting for how we think about what good looks like in our application. You know, play, play kind of a game with yourself. Okay. If this was the input and that was the output, what would I like about it or not like about it? And then that effectively is you're oh I'm judge prompt. Right. And so if you're able to give those kind of self introspect and come up with some of those, those guidelines and that can really help you to build the application you want to build, that, that follows what, what you, you know, what you want to put out into the world, Yeah, I love the I love the, term vibe checking, you know, equally as important as the, the the vibe coding and, in the AI engineering process and, you know, we. Yeah, you know, we we as humans, we have this natural reflex to be really good at kind of critiquing, critiquing what we don't like. But, you know, it's funny how just having, like, a positive mindset and defining what's good, in a very structured way, is actually very productive for AI, right? So, you know, when you're coming up with your, your ground truth and you're coming up with, just ways to, to to define that clearly. And like you said, it takes it takes intuition about the core problems that your AI application, is solving. Right. And I think that kind of forces everyone to be, you know, even on the engineering side, better, better product thinkers. Because ultimately, yes, having that that domain expert, who can define what's good, and do that up front and think about the thousand scenarios of generating good outputs. That's, that's ultimately going to be super helpful, and make the AI application that much better. So yes, great, great advice to software engineers who typically think the other way, which is like, hey, I'm going to develop all the code first and then test. When it's yes, I know, it's almost like test driven development could make a comeback. It could it could make a big comeback with with AI applications. Clare, it was super great having you on this episode of What's New and Data. Where can people continue to follow along with your work? I'm on LinkedIn and I'm on blue Sky. Excellent. We'll have links to, Clare Liguori's Blue Sky and her LinkedIn below, along with, her awesome blog post going through and demonstrating Strands agents. And, thank you to the listeners for for tuning in today. This is super fun episode. A lot of great insights. Clare. Thank you again for joining. Thanks for having me. everybody, thank you for tuning in to this episode of What's New in Data. Super excited to be chatting here with Clare Liguori, Senior Principal Engineer at AWS, working on Strands, among many other exciting projects. Clare, how are you doing today? Good. Thanks for having me on. Absolutely. Clare, tell the listeners about yourself and your journey into this space. Yeah, I've been at AWS for almost 11 years now, and throughout that whole journey, I've really been focused on developers and developer experience. And, let's say two over two years ago, you know, generative AI as a real application for users was exploding. There was ChatGPT going on. Lots of new models were coming out, lots of new projects were coming out. And, I started to think, as well as others at AWS, about how generative AI could change, how we develop code, how we create applications as developers. And that really led me to start thinking about agenetic AI as well, when you can let the AI write an entire application for you and interact with the world for you, to pull in information and manage your, your AWS resources and things like that. So over the last couple of years, I've been so excited to see how powerful models have become, to enable building AI agents. Absolutely. AI agents are already proving their value, both for the indie developer and the and the enterprise. And I think one of the cool things about AWS is it's, you know, it has that range of, you know, being, valuable to, to, to all those, types of developers. Would love to hear, about your, your launch with Strands and what the product is. Strands agents is a open source library for developing AI agents. One of the key, design choices that we made with Strands is that it's what we call model driven, a model driven approach. When I started building AI agents back in early 2023 with, anthropic cloud v1, I think 1.3. I was looking at some of the code today. It was we were just starting to build agents in the industry. I think the, the original react paper that showed that that LLMS could interact with their environment through tools had just come out. But it was super hard at that time. You would I remember at the time, using Claude, you had to put all of these instructions and examples and XML tags and things like that, and you couldn't really get it to do much. That was complex. So one of the first agents that I tried to write was, troubleshooting problems with your AWS resource configuration. Things like, you know, you get this very opaque error message that something's broken with your application, but you don't know what. And so I was trying to figure out, well, could agents just kind of explore my AWS account and figure it out for me? And at the time, they just couldn't deal with that much complexity. And so you had to have kind of fallback to, prompt chaining, agent chaining, very explicit agent workflow for very explicit, specific use cases that those agents would be able to complete tasks for. But over time, now that we have this sort of latest crop of, of large language models, there's so much more capable of driving agents themselves. And that's really where we came to this model driven approach, because we were seeing that actually, we didn't need a lot of complexity. And some of the internal agents we were building at AWS with all of this chaining and workflows and things that you could really throw the tools directly at the model, and the model can figure out which tools to run. And it's actually a lot better at complex use cases now. And so you don't necessarily have to create a bunch of individual agents that are, sort of domain experts. You can have actually have one very generic agent with a set of tools. So that was really the, the beginning of Strands agents with us internally within one of the products I work on. Q developer realizing that there was a simpler way and there was actually was actually even, more higher quality because this is now how these models are being trained. They're being trained to use tools. So we, started to re-implement, throw away some of our agents internally and rebuild on this model driven approach. And we were seeing it was a better user experience, it was more accurate, and it was actually faster for us. Right. It's obviously faster to develop something where you write a prompt and you give it some tools and off you go. So that was we started developing it internally for us at AWS, and then, you know, we started seeing such great, impact internally for us that we, we wanted to release it for the, for the community. And so this model driven approach is really at the core of it, where it's really focused on let's leverage the power of how powerful these models have become, especially with tool use, and, and get rid of a lot of the complexity that that came from, you know, using earlier models like Claude V1. Yeah. And that's a great call out that, these lines are now being trained on using tools. You know, we'll get into this, but some of the protocols that are coming out, like MCP and A2A lots of acronyms, you know, these are all sort of materialize and they're productive. So, you know, there's certainly a lot of interesting directions it can go in, but it all starts with the model. Like you said, and having, you know, the, the, the models, the the prompts, the the evals, the tools, like as you're kind of, core building blocks. And then from anything there, you could kind of, from those abstractions, you can build applications on top, like what are the like, what are the other core architectural principles behind Strands, and how do they reflect a shift in how these agent systems are being built? Well, a couple are one, it's not tied to any the particular model, just that your model have your reasoning capabilities and tool use capabilities. And what we're seeing is that lots of models are now being trained that way. And lots of models either have sort of a messages API or a tools API. In the case of, anthropic APIs. And so we were really cognizant early on of what are the interfaces that we're providing for you to be able to plug in your favorite model. That could be something that's running locally on your laptop with a llama. It could be something that's running on AWS with bedrock. But we also partnered with anthropic to provide, anthropic API support, meta for, for their new, llama API. And we're working with a lot of other partners to provide additional APIs. So we wanted it to be something very, flexible in that in that case of, you know, regardless of what provider you're using, we're finding that there's a lot, more consistency in interfaces to models these days. And then we're also seeing more consistency, of course, in, interfaces to tools, like you said with MCP. That's been amazing for, having a more standard interface across whatever agent framework you might be happening to use, but then also for the tool author to be able to provide MCP and, servers for, for anybody to use, regardless of their agent framework. And so it's been amazing to see the growth in the community of those MCP servers. I mean, there's thousands now out on GitHub and, and on, you know, PyPI and and npm, and you can bring any of those to Strands. Right. So we're trying to support standard interfaces as much as we can. It supports open telemetry. So you can bring in whatever telemetry provider you prefer. And then you can run it anywhere. It doesn't need to be. It can run on AWS or it can run on, you know, your laptop or anywhere else you need to run it. So we wanted to provide something that wasn't tied necessarily to any individual provider, but really instead showcase this, this model driven approach that's been so powerful for us. I and that's a that's a very astute point that, you know, you have the model and you can have multiple models and you can have multiple runtimes and, you know, having the power to sort of develop locally and then scale in the cloud, and not be restricted, you know, based on, you know, your code shouldn't be tied to, you know, one specific environment. So it's taking a lot like when I looked at when I looked at it, I mean, it seems like it's taking a lot of just software engineering, you know, both, you know, scalable principles and practical, like, easy to develop with and, and I, and I like the other part like you mentioned, it's, it's, it's model driven, but but it can work with multiple models, which is becoming pretty standard for AI application development, even in the if you're doing things like, as a judge or, model cascading and you want to test, you know, cheaper models through some workflows and then, you know, potentially have to fall back to more, expensive ones. You know, if the cheaper, fast model is not working as well. So it's super flexible and I think it's, it's, it's a really nice reflection of where, AI applications are standardizing their approaches, diving in deeper into MCP and in a to a and these types of things. I, I would love to see your take on it. Like, where do these like emerging protocols. Like I mentioned, MCP and A2A sort of redefine the communication between intelligence systems. one of the specific, use cases that we've seen for multi-agent because I think, I think there's, what we've seen is that, agent to agent communication is not always need to be something special. So we've had a lot of success with internally with, the agent as tool model where, the agent is, a bit more of a sort of a request response where the sub agent, you're calling it as a normal tool. You could put it behind MCP, you give it a prompt, it comes back with a response. We have multiple systems internally that, that run that way. And some of that is because there are, agents that have specialized knowledge or which really ends up being specialized access to data. Right. But often what we see is it's also a way, organizationally to, to split up a very complex system into multiple teams. Right? So at Amazon, we're kind of famous for the two pizza team. And, you know, letting teams own, their, their whole system. And so in some ways, multi-agent is sometimes more of an organizational, solution where, you know, you have five different teams that want to own some experience or domain expertise in a, in an overall agent system. But we do see some more complex, use cases for agent to agent communication. So one challenge that we've had with MCP is with MCP being able to model many, many tools. So for example, we have an agent internally that models, more than 6000 tools. So there was those mapped to a huge number of, of AWS APIs. And that's you can't put that in your in your prompt context. Right. If you have an MCP, server that exposes 6000 tools, you'll probably blow out your context window. And if not, your requests are going to be very slow and very inaccurate in terms of what tool it chooses. And so we've done a lot of work around agents where, inside of this sort of sub agent, it does a lot of work around selecting the particular tools that you should, exposed to the LLM based on the customer's input. So, you know, if they asked about S3, it's probably going to be an S3 API. If they asked about EC2, it's going to be an EC2 API, and sometimes it's a little bit more, ambiguous. Like they might ask about a function and not say lambda. So, being able to figure out, you know, what are those maybe top five most relevant tools for the LLM is something that that agent to agent communication has been really helpful for, for us, and being able to, maintain state across a session in that sub agent in the agent as tool use case. Often you don't necessarily save state in that subnets that request response. And I'm done. But we have some of those use cases especially around that, that selection of within 6000 tools for the session, becomes very important. We're doing a lot of work with, we're doing some work with A2A. We have an example of using Strands with A2A. And that enables you to do, you know, to kind of hook into that entire ecosystem of having, agents talk with agents that could be implemented in any agent framework. Right? You might have some link chain, you might have some crew, you might have some Strands, and they can all communicate together. But we're also doing a lot of work, in the MCP community upstream to, try to evolve that, to be able to support agents, community over the MCP protocol, which is super interesting. So one example is, we a few weeks ago now, I think we got merged a new feature in the MTP spec called elicitation, where the MCP server can respond back with and elicitation along the lines of I need this information from the end user. And the the core agent, you know, goes off and and returns control back to the human user. And that's going to be super important, of course, for agent agent communication. So I think it's very interesting to see, you know, both how these both evolve, right. But also how we start to think about the lines blurring between, communicating with agents and communicating with tools if they're in the same interface. So at a high level and, and tell me if I'm oversimplifying, but, would you categorize MCP as the best way to, for, for agents to communicate with tools and then who is the best way to have, like these multi-agent communication today? Yeah, that's that's the, the pattern that I tend to see, is where, you know, it's really simple to wrap, MCP the interface around an agent. You know, it's, it's an API, that you can stand up. So we've had customers have success with, for example, putting an agent inside of a Lambda function and then just putting, you know, making that an API. That is MCP compliant. Some of the work that's been done upstream and MCP, I think has more enabled this. So in the original spec, it was just, mostly local MCP servers, which was great for, you know, Ides and chat clients. And if you look at those thousands of MCP servers out there, they're not they're not actually servers. They're really just local process with, you know, standard IO streams to your, your chat client locally. But where I see the industry going is really embracing, what we call the remote MC server. Right. The MCP server that's just behind an API. And that's really just become possible, I think, in the last couple of months, even, as the upstream community has been, adding new ways of, of building scalable MCP servers that are remote, with, something called a streamable Http transport, which basically just means, you know, putting it behind, more familiar Http APIs. Yeah. And that's definitely a big boost for, performance and just some other things like that that you would expect. An API to, to, to deliver. So that's a really exciting, drop for sure. And, and like you mentioned, you know, it just became available in the last few months every time. And just listening to these episodes back in January, like, there's there's been so much, innovation even since then, I think, like the rate of development, in AI obviously correlated with the, you know, I code gen tools, but, you know, and then, of course, the excitement and the value, it's bringing and I. Yeah. And, and when you look at, the frameworks that, that now, you know, companies like AWS are delivering, like you said, you've been working on this since, 2023, but it's on the, on the, on the Q developer side and the, the the amount of applications you've seen built then versus now, obviously a huge change in the, in the amount, but also the requirements around them, the, you know, being able to handle the scale of these data processing engines and databases, and, and making it streamlined for, enterprise application development. And then of course, like even even the SMB and mid-market ones as well that have large data volumes. So that's been just really fun to see. And I felt like it was a great turning point in the industry when you saw, Strands agents come out because it's just sort of this kind of light layer of abstraction to build abstractions to build these applications. It's, and it's really grounded in this, this idea, like you said, that, you know, we start with the models, right? Which is a powerful way to look at it. Definitely. And I think one of the interesting things that I've seen over the last, I would say six months maybe is, how differently we think about this model driven approach in terms of even just information, right? When we we all kind of started out building agents. It was all about rag, you know, right before you called to the model, you always supplemented your context with pulling from some vector DB, right. And one of the things I'm starting to see now is, what, internally, one of my colleagues calls retrieval as a tool as opposed to retrieval, augmented generation. I don't love the acronym Rat, but but there it is. So, you know, we're seeing a lot more of, you know, don't actively give information to the model that might be too much information or even information that it doesn't need. Like if it has enough information in its training data, you might actually make things worse by proactively giving it information. Instead, create a tool that that can retrieve information for it. So as opposed to proactively having kind of a prompt pipeline where you always augment that prompt, instead provide that vector DB as a tool where it can retrieve documentation. And I think one of the what are the places where we're seeing this that's interesting is in actually a lot of the development tools with MCP. So we're seeing a lot more of what I call documentation MCP servers, where, you know, in the agent space especially I was looking at the training dates recently of the latest Claud models and since, Claude 3.7 came out, there's been like ten new agent frameworks. And then even since sonnet four came out, I think I read that their training data ended in, in March of this year. Strands came out in May and so it doesn't even know about Stands yet. And so, we're starting to see a lot of these, you know, projects provide LLM tags so you can retrieve those through an MCP server and again, the MCP server is a tool. So it's letting the LLM decide, when do I actually need this information for my task versus have proactively throwing a bunch of information at it. So that that's been a big change in sort of those data access patterns that we've seen from moving from that prompt pipeline where you would try to get it every time, and then letting the model decide when it needs that information. Absolutely. And timely. Because just, at least from the time we're recording this episode, last week, in the Strands GitHub repo. And that's one of the other great things about Strands. There's so many great actionable, easy to run, examples on the GitHub. I'll have that down in the, the link to that in the show notes. But there was just recently one that came out with, Amazon Aurora, de SQL, running through Strands agents and just like you mentioned, it's, it's retrieving data with these tool calls. Right? So you look at the architecture diagram and it's and it's reading read only query, it's doing transactions. There's get schema and different just high level tool call APIs that interact with the database. And when, when Mark Broker he was he was on What's New in Data last season. And, you know, we were chatting about how agents and, serverless databases actually are very complementary technologies. So it's very cool to see sort of the materialization here. And, and like you said, it's it's it's super powerful too, rather than having a rag and trying to like populate your, your, your prompts with, with contacts kind of manually or, or overloading it. The, the agent can kind of decide when it wants to make a tool call, to, to retrieve very specific data with specific parameters and it makes the application more deterministic. And we've also seen that, it also makes it much more generic. Right. You don't have to have a very specific prompt pipeline where you determine the, the query to the database ahead of time and then add the data to the, to the prompt. You're able to apply these agents to much more generic problems than you could before by effectively teaching the model how to query, and then giving it some, some boundaries, right, giving it some guidelines. But generally it's, it's much more, flexible for, you know, a lot of different tasks that you could throw at it. Yeah, absolutely. And, and it gives a good guardrails for, for things like data access. You know, even in that example, there's, you know, it specifies, you know, read-only queries and, retrieving schemas, but then it bubbles up into this powerful experience where, you know, the agent can, you know, get all the tables and do some operations. But but a smart, engineer who, who understands, you know, how to set up database applications can still predefine those tool calls in a way that's that's safe and scalable, and you can even do your, your explain and analyze to to to make sure those are efficient queries. Or you have you might have agents interacting with systems that that with databases that are trafficked on other applications as well. So it's a very powerful concept. And it's so nice to see in very, succinct and well-organized lines of code, how it can be implemented, in that, in that GitHub repo, along with, with a lot of other fun examples. And since, you know, we're on that topic, like, what's one of your favorite tutorials, in that, in that GitHub repo that, that you like to call out? It's actually not in the GitHub repo. I put it in the, in our blog post, about the Strands launch. When we were, we decided to open source Strands. It had a different code name internally, right, that we, and so we were trying to come up with, okay, what are we going to call this, this open source project? We need to come up with a name and, and, you know, the lawyers have to sign off on it, and it, you know, it can't be too similar to any other companies or whatever. And, as it turns out, in the AI space, all words are taken. Everybody's got, you know, blank AI. Right? So, so we were having this trouble where, you know, we do a brainstorming session where we come up with, oh, this would be a really cool name. That would be a really cool name. And then we'd go and search on it online and it would say, oh, of course, of course it's taken already. Right? Or we find one that didn't have a very, very clear company that was already using it, but someone had already registered the domain that we would want to use, or the GitHub org that we would want to use. And so I wrote, with Strands a little bot that would, help me to come up with names. So in the prompt I would give it some like some guardrails around, you know, how we name things and then, some ideas of themes that I really liked. Like we had, you know, a DNA theme going. We had a, like a sewing theme going as well for a while. We had a, like a car theme going for a while where you would think about, like a motor going around a root or an axle going around and around. Anyway, so I gave out all this ideas and then I gave it a few tools. I wrote an MCP server, for my favorite thesaurus website, because I always felt that helps when I'm brainstorming. I wrote a, a, domain name search, you know, figure out whether it was registered or not. I discovered that, through this, I didn't know this, who is is no longer the hot thing for figuring out for domain name registration. It's now Rdapper or RDAP. And as it turns out, the.dev TLD on my Google doesn't support who is, it only supports RDAP. And so I was using this who is MCP server that I found, you know, on GitHub. And it nothing.dev was coming back and of course we wanted to use it. We wanted to to, you know, explore using that .Dev. This was a very developer focused project, right. And then I also had like the GitHub MCP server where it could look up, possible org names. And that helped us just churn through ideas, which was really cool. And we were able to prove like, hey, this one doesn't work. This one doesn't work because it's got these, conflicts already. And that was just very exciting. You know, one, it was super easy to write with Strands. It was, I don't know how many lines of code, but really small. And then also being able to take advantage of, some of the MCP servers that are already out there, like, I was able to pull the GitHub one off the shelf, and then it was really easy to write up one where I was, web scraping. My favorite this the source website to be unnamed. And then, you know, adding, this domain name lookup, one that I found. So, you know, it was it was just so easy to kind of pull these things together. And the the most work that I was really doing was thinking about what are the the real kind of business value. Right. I was thinking about what are the, themes that we want to explore, what are the names that we really like or don't like, as opposed to doing all this manual lookup. So it's really, really fun for me. And that that also speaks to, the simplicity and how it was just able to aid your creative process. Well, while you were working through a higher level challenge was like naming something and that's, you know, and then, for, for, for AWS, the the scope of naming something obviously has even more, challenges to it. Which, which, which, you know, you're able to, to navigate things to, you know, some, some clever, both brand brand thinking and also tooling, but, you know, and it's not going to be we're not far away from a future where, like you said it, you just you you have this tactical thing you have to get done, and. Okay, let's let's outsource it to some agents. That can do a lot of the, hard work for me. And, you know, I didn't know about the RDAPR whois thing. And so, yeah, I find it. But, you know, it sounds like it's something that, you know, you discovered in the process. Maybe your agents discovered in the process and told you, Yeah. No, exactly. And, and, you know, there's one end where. Yeah, you'll you'll use these products which sort of wrap the agent experience and make it easy for you for, for for folks who are, you know, non-technical and, you know, they'll just type in some, some instruction and it's going to deploy a fleet of agents to get it done. And then, on the other hand, you're going to have developers who either through hand coding or, AI code gen, can can use like a nice wrapper, framework to, to, to build this stuff for themselves, at a, at a rate which was just not possible before. And it's from that perspective, it's, it's super exciting both like as for, for like the non-technical consumers of AI and, and the builders who, who, who want AI at their, at their fingertips. So, I, I totally recommend, we'll have a link to that, to your, to your blog in the show notes as well. And it's in to the, to the GitHub repo. But I always just encourage people like the people who are creating these frameworks, including yourself, are making it easy to run locally and then deploy in the cloud. So start with your your laptop, your laptop is, most people's laptops are pretty powerful these days. There's a lot of great, tooling, but especially with Python, to build really nice applications that are portable, that, you know, you can you can really get far on your laptop and then still be in a state where you can deploy to the cloud without changing too much, in terms of, you know, just just environment variables. So I do totally recommend everyone to just just get, get hacking and, you know, hack around and play with these frameworks and, and try it out because you, because people don't even understand what they can do. You know, just because it just seems impossible. Like, even if you've been in software engineering for 25 years, I hear that a lot. People just didn't understand what was possible. And it's it's so powerful. Like, you know, you kind of just have to go, go do it and play around and get that, that empirical knowledge, yourself and Strands is a great, great framework for that. You know, looking ahead like, you know what foundational. But so we're talking about all the amazing stuff that's there, but what are the foundational building blocks that are still missing from the agent ecosystem? Well, I think one is that I'm kind of excited about is thinking about how can we reuse these agents across, other developers like, you know, we think about we have libraries, right. And, and we've created this amazing ecosystem of libraries across all these package managers. But I also think that agents are kind of the new, you know, MCP library and city servers are the new libraries. Agents are going to be the new libraries. And so how do we share these or, you know, make them available over APIs or something where, you know, as a developer creating an application, maybe what I'm doing is pulling together a set of MCP servers and pulling together a set of agents. You know, I've seen that, you know, even just with my little naming bot, I was seeing that the power of being able to just pull in these, these MCP servers, whereas before, you know, I think about API integration work using some like a, like the GitHub library, for example, would still take me a long time to build to write all that client code and, MCP servers, I just pulled it in and, and, you know, wrote a prompt. And I think that sharing agents could potentially be even more powerful. So I'm, I'm looking forward to that world where, you know, we we start to share them as well, like libraries. And then I think also, the it's still very unsolved about building up entire applications around agents. You know, you still have to do a lot of work to, to put it behind an API somewhere. You have to, you know, figure out authentication. You have to figure out how does my my react code in the front end interact with this agent? What does it look like in the UX? I think we're still very stuck in the, generative. The AI means chat world. So I'm looking forward to, you know, the UX around agents evolving as well. Where we start to think about, how could agents help in my work other than having a conversation with it? So I think there's just so much space. It's and it's like you said, this space is changing so fast that, you know, you can't even imagine what's going to come in three months, right? Yeah. And I, I really like your point about just being able to share agents sort of like sharing a, an API, but you have everything, ready for you to use there to execute that agent and, and also, one of the things that's really, it seems critical for AI applications is, you know, the traces, the evals, the, you know, you're kind of your integration tests for, for the AI, like, changes to the, to the prompts. How does that actually impact the, the the outputs? So when I come back to, you know, your idea of, of sharing agents, doing that in a way that also like, delivers like the trust that it's that it's working the right way. And I think we're all as an industry trying to figure out like, what are that? What are those sort of units of trust? And of course, you know, you have LLMs as a judge and in confidence scores and all these things that are really useful and, you know, just borrowed from, from machine learning. I always when I'm working with software engineers who haven't done agentic or AI development before, I'm like that. That's where you should really start. Like how we come up with your framework for deciding like what's correct. One of the things I've noticed working with teams at AWS who are starting to build generative AI capabilities is exactly that hurdle that, us as engineers expect testing to look like a define an input, I define an output. I get that output when I provide the input. Then things are working. And obviously generative AI is not like that. You have to you can't rely on it providing you ever the same answer as what you got before. And so thinking about, what success criteria looks like, what good looks like is it know 95% accuracy of tool selection. Is it, 95%. You know, LLM judge saying this is close enough to your ground truth answer. How do you even think about creating test sets and ground truth? Answers. These are problems that, as an industry software engineers have never had to deal with. This has been scientist ML scientist stuff. Right. And so we're all kind of needing to become scientists a little bit. Right? And we, we're not used to, using the scientific method we used in, learned in grade school at work. Right. Having a hypothesis and coming up with a test and, and executing that test. And so, I think that, evaluation tools today are still really hard for engineers to grok in a lot of ways. You know, even that question of, well, how do I come up with the test set that this evaluation tool is going to go through is really hard? I think there's so much room for improvement there that speaks to, you know, the fact that we are complete amateurs in space. You know, I don't have a I don't have a science PhD. So, and then I think also mapping it to, the reality of the, you know, the way that your users are actually interacting with your system. Right? One thing that we say at AWS, for any service we launch, they're going to be customers that use it in ways you never expected. And I think that's probably true of any application. And so you have before you launch something, you have this guess as to how people are going to use your system. And you create a test set based on that guess, but then you have to follow back and you have to validate, okay, what kinds of questions are people actually asking of my of my agent, or what kind of inputs are they actually providing and try to match that distribution in your test set? And that can also be really hard, right? How do you how do you think about that kind of matching that data, the live data set with what you're testing against? Yeah. And we're sort of scraping the surface of this. But you know, if I were to ask you generally like what's your advice for software engineers, whether they're new or experienced or just fundamental computer science background, what's your advice to software engineers who are breaking into AI engineering? One is, try Jupyter notebooks. That's something I had never I had never heard of, honestly, coming out of, you know, a computer science program. And then all of a sudden, I'm working with these, you know, ML and AI scientists, and they're all sending me these Jupyter notebook files, you know? So, we do actually have a lot of, Jupyter notebook examples in the Strands examples repo. So that's a good place to start. But I think also thinking about, again, kind of how, how testing is going to look like one thing that I often see from myself as well as other engineers is, what I started call vibe checking, which is, you know, we know about vibe coding, but how do you know when your thing works? How do you know when your agent works? You just kind of give it a couple of inputs, and then you personally like the outputs, and then you send it on its way. Right. Starting to think about, okay, how can I actually write these down and think about what do I expect the answer to be? And thinking about how you would compare, you know, good versus not good. So I think that there is a lot of kind of intuition that you have about what is good and what's hard is writing that down. And so trying to kind of look through what, what you would think of as good. We actually have a, we have another product called I'm not going to get it right, but it's something like bedrock automated reasoning. And one of the challenges that we have with that product is basically looks for hallucinations in, large language model outputs. But one of the things you have to do ahead of time is codify. What is the truth? It can only tell you it's wrong if you've already told it the truth. And then it uses mathematical proofs and automated reasoning to determine whether it is true or not. When it comes out of the model. And one of the things that we found is people don't know it's true. And so what's really interesting is, you start with here's what I think is true. And then it kind of takes you through some scenarios and says, okay, if this is the case and this is the case based on what you've told me, this would be the result. And then you say, no, no, no, no, that's not right. So one example is, if you give it a tax code, and it does it best to take a tax code and turn it into mathematical models, then it'll give you. Okay, if you are, a married couple with two dependents, then your deduction is ex and someone who's a subject matter expert on on actually doing that math can come in and say, no, no. And this is why. And it can go back and, and, regenerate those, those models. And I think that's a very interesting for how we think about what good looks like in our application. You know, play, play kind of a game with yourself. Okay. If this was the input and that was the output, what would I like about it or not like about it? And then that effectively is you're oh I'm judge prompt. Right. And so if you're able to give those kind of self introspect and come up with some of those, those guidelines and that can really help you to build the application you want to build, that, that follows what, what you, you know, what you want to put out into the world, Yeah, I love the I love the, term vibe checking, you know, equally as important as the, the the vibe coding and, in the AI engineering process and, you know, we. Yeah, you know, we we as humans, we have this natural reflex to be really good at kind of critiquing, critiquing what we don't like. But, you know, it's funny how just having, like, a positive mindset and defining what's good, in a very structured way, is actually very productive for AI, right? So, you know, when you're coming up with your, your ground truth and you're coming up with, just ways to, to to define that clearly. And like you said, it takes it takes intuition about the core problems that your AI application, is solving. Right. And I think that kind of forces everyone to be, you know, even on the engineering side, better, better product thinkers. Because ultimately, yes, having that that domain expert, who can define what's good, and do that up front and think about the thousand scenarios of generating good outputs. That's, that's ultimately going to be super helpful, and make the AI application that much better. So yes, great, great advice to software engineers who typically think the other way, which is like, hey, I'm going to develop all the code first and then test. When it's yes, I know, it's almost like test driven development could make a comeback. It could it could make a big comeback with with AI applications. Clare, it was super great having you on this episode of What's New and Data. Where can people continue to follow along with your work? I'm on LinkedIn and I'm on blue Sky. Excellent. We'll have links to, Clare Liguori's Blue Sky and her LinkedIn below, along with, her awesome blog post going through and demonstrating Strands agents. And, thank you to the listeners for for tuning in today. This is super fun episode. A lot of great insights. Clare. Thank you again for joining. Thanks for having me.