The Problem With AI Observability Nobody Wants To Admit Artwork

Curiouser & Curiouser

Curiouser & Curiouser is a podcast for leaders, builders, and curious minds navigating AI, GenAI safety, and governance in a rapidly changing world.

Produced by Alice, the enterprise trust, safety, and security platform for the AI era, the show draws on frontline adversarial intelligence to explore how AI systems are stress-tested, red-teamed, governed, and protected across their lifecycle.

Each episode looks at how AI is actually showing up in the real world, how organizations evaluate it, where it breaks, and what it takes to build systems people can trust.

We cut through hype and fear to explore how AI shapes trust, decision-making, and real-world work, one rabbit hole at a time.

Explore more from Alice:
Website: https://alice.io
YouTube: https://www.youtube.com/@Alice.io.advance.unafraid
LinkedIn: https://linkedin.com/company/alice-io
X: https://x.com/alice_dot_io

All Episodes

Curiouser & Curiouser

The Problem With AI Observability Nobody Wants To Admit

May 18, 2026 • Alice • Season 1 • Episode 8

0:00 | 54:01

In this episode of Curiouser & Curiouser, Mo sits down with Alison Cossette, Founder and CEO of ClariTrace, to talk about why your enterprise AI is moving faster than your visibility into it and what to actually do about it. They get into control native AI, why observability and traceability aren't optional anymore, and what leaders need to put in place before something goes wrong.

🔗 Podcast: https://alice.io/podcast

Follow the show so you don’t miss the next episode.
New episodes every two weeks. Stay curious.

SPEAKER_00 0:00

It's shocking to me that observability of our systems is not table stakes. You need to have someone on staff who can understand and look at that observability at a system level and be able to discern that impact, be able to communicate that to the relevant business. You need to make sure that you've got your internal observability with anything you build or anything that you buy. You need to have someone internally who can leverage those at an intelligence level, not just a log level.

SPEAKER_01 0:33

If AI has ever made you stop and think, wait, what is happening? You're not alone. I'm Mo, and I'm a security researcher asking the same questions. On Curiouser and Curiouser, we're having open conversations with experts, researchers, and leaders working at the edge of this space, talking through how AI is taking shape, what's shifting, and how people inside the work are thinking about it as it happens. So join us and listen in as the conversation takes shape. Welcome back. Another week of me and my friends. We are back on Curiouser and Curiouser. Uh I'm Mo. Uh, you've probably had enough of me already. But someone you haven't had enough of is my friend Alison Cassette. Um, not cassette, cossette, uh founder of Clara Trace, uh and an overall AI nerd. She's super cool, met her at a conference, and was like, we got to talk some more. And it's it's been an awesome, it's been awesome. So I'm gonna let you talk about yourself because I've found that I'm horrible at talking about other people.

SPEAKER_00 1:35

I don't know if I'm so good at talking about myself either. So between the two of us, we're we're in a spot. Um, as you mentioned, I'm Alison Cossette, started out as a healthcare data scientist, been in the AI world since before it was cool, when it was just a bunch of math nerds somewhere in a corner building algorithms and waiting for somebody else to put them in production. Since then, obviously the world has evolved, as have I. And most of what I'm doing these days is deep AI research, seeing where we're going, trying to build systems that will allow us to have an AI architecture and an AI ecosystem that makes what we do and what we do for humans better.

SPEAKER_01 2:16

Yeah. Um, and speaking of that, actually, like uh just like the whole AI thing. Um, I know for I know sometimes you've kind of um just as like this awesome AI person, you've said that it's easy to call AI like a black box and that it's just like an excuse for bad design. So those are really those are honestly fighting words. Um, especially when I I don't know. I'd say that the the industry kind of loves the complexity, right? Because it just makes it uh it makes the end solution seem so much more interesting. So maybe you can tell me a little bit more about that kind of like thought process. Like what makes it so that like maybe this black box is smoke screening something else that's a lot bigger? Like what are we missing in there?

SPEAKER_00 3:06

It's a really great question. And I'm glad you brought it up. So am I saying that there is no black box? Not really. But what I want everyone to understand is that at the end of the day, everything that we've built is technically math. It's very complex math, it's very high computational math, but ultimately it really is math. So whereas in a traditional ML linear regression, this is the formula for how much your house should cost, so many dollars per bedroom, so many dollars per square foot, so many dollars per lot size was very explainable. Now we're in a generative moment where we use these foundation models and we don't necessarily know exactly where this piece came from or that piece came from. The good news is while those weights are not open in most cases, there are people doing really phenomenal work around explainability, even within the foundation models. Think what Anthropic's doing with mechanics, mechanistic interpretability, where um you can actually did you ever hear of the Golden Gate boost that they had done at one point?

SPEAKER_01 4:18

No, I haven't. What's that?

SPEAKER_00 4:19

So, what they had done is they found within the model where the Golden Gate Bridge was and they cranked it up, right? Won't get into the deep technical linear algebra of um neural networks, but they cranked up the golden gate bridge. And then they asked it, what's two plus two? And it said, two plus two is two golden gate bridges, and another two beautiful golden gate bridges gives you four glorious golden gate bridges. I paraphrase. But what they found is that it wasn't just the golden gate bridge, but it was the awe of the golden gate bridge. And so that's what really got me thinking. We know that it's math, we recognize that there are activation functions in different places, we can clearly see when we have insight into these models themselves that we can manipulate. So when we follow that forward, what does that make possible? Is it that yes, it's complex, but it means it's very plausible that the ability to steer the AI, the ability to steer the foundation model even, is not as far off as a concept as we think. So that's how I ended up at the concept of control native AI, which is where you and I met.

SPEAKER_01 5:34

Yeah, and control native AI was actually really interesting. When you were explaining it, it was really cool. So maybe for folks that haven't heard about it, um, why don't you give a really short roundup on what control native AI is in your own

The bowling ball analogy: control native AI explained

SPEAKER_01 5:47

words?

SPEAKER_00 5:47

Sure. So that's sort of the technical insides of it. But the way I like to, the analogy I like to use is AI is somewhat like a bowling ball. It has a lot of power behind it, it has a lot of force, it can do really great things, and we throw it down the bowling alley, and boom, things happen and it's glorious and magical. And then we put on guardrails because we realize that there are certain things coming out of generative AI that maybe we don't want coming out. It's built on what humans have said and written, not all things that we've said and written we necessarily want repeated. And so we put in these guardrails. So, okay, you're in the bowling alley, you're bowling the ball, it doesn't go into the gutters, you've got the guardrails are great. But we also recognize as we move into agentic systems, and again, we're in generative systems, so they're not bounded. What happens if, say, you know, agents start to swarm? We've seen them do really interesting things. I'm sure we're gonna hit on open claw today. Or we've seen agents um in labs working together to manipulate markets, they were given trading capabilities. And so as we get, it's it's almost as if the bowling balls can now fly. They're in three dimensions, they're moving in all different places. And now, if the bowling ball's mission is to knock over the pins, what if all the balls start working together? And then they just destroy the pin resetter because they're focused on this one mission. This is the the optimization that they're going for, this is the reward that they're optimizing for. And so I'm thinking about this and I'm thinking, well, if we take it back to this small concept of mechanistic interpretability, what would it mean if instead of having the guardrails around where the balls were going, what if we had sensors inside? What if we could see the directions that they were heading? Now, what this means mathematically, we can certainly get into, but that's what I mean when I'm talking about control native systems. How is it that we can build not just the models, but the systems around them to have an understanding of where they're going, not just where they've gotten.

SPEAKER_01 8:01

So control native AI, um, just the high level from what it sounds like is more about all the things around it, right? So that you can understand the directions in which it goes. So having the right observability, the traceability, and being able to understand maybe um all of the different factors that cause it to go in a certain direction, have certain responses, et cetera.

SPEAKER_00 8:25

Yeah. And I think the important thing is to see where it's going. So the way I like to think of it is the first thing that we need to have is the observability. And I think as an industry, we've done that really well. There are companies that are phenomenal at it. We have Arise, they're really great. There's a lot of folks that are doing that well. From there, we need to have the understanding and the interpretability. And when it comes to some of the agentic systems, in the last couple of months, it's kind of starting to catch up a little. Um, but only at that point when we have the clarity and we have the understanding, can we start diving into what it means to steer? And that's where I think we need to start looking forward to. How do we steer the system? Or how do we understand the direction that it's going? My view on that is I'm a big fan of the digital twin. I'm a big fan of understanding, creating the digital twin of the system, really looking at the physics of which tools are we using, which agents are collaborating more or less, what are the live physics of the actual system? And that's going to give us the understanding of where it's going rather than just a log level

Why guardrails are fences in space

SPEAKER_00 9:35

output.

SPEAKER_01 9:35

You know, when we were talking about it, I remember we were talking about how like uh you have this right now, guardrails are like the really like hot topic, right? For being able to stop that output. And um, I remember talking about guardrails, and you were just like, guardrails are useless. They're the worst. Guardrails are are fences in space. Yes. And I was like, that makes a lot of sense, right? Um, because again, they're stopping one thing, but I mean, there are so many directions in which AI is pulled in, and you can't always predict where it's gonna go. And guardrails are almost uh exclusively made because you know exactly where the response is going, right? Um, but I think you miss out on all these other um these tertiary effects and outcomes, right? Um, while the response may be blocked, um, again, in a deterministic system, it's possible that just the prompt that was getting you to that response is causing a shift or a drift in the system and how it responds in the future. So you're kind of opening up to uh out, let's call it like um the like this AI native control is almost like looking for like this AI systems intelligence piece, right? Where you're really trying to understand the intelligent piece of that. I think like when you when you look at it, right, that's not really something that companies have ever had the budget for or they've ever had to really think about, right? So like you there's I don't think uh we've ever walked into a room with a CTO or like the chief AI officer, new, new new uh new position that's been uh kind of coming out, right? Um like how do you even open up to these people and say, like, hey, you need more AI systems intelligence? Like, how do you demonstrate the value of traceability when you've got a room full of people that literally just want the best models, they want them to work better, and they want to create business value or like the meme says, shareholder value, right?

SPEAKER_00 11:33

We've been working on ClaraTrace for almost two years, and it's not an easy conversation to have. We, you know, we were told early on you're ahead of the market, people aren't there yet, they're not ready to take that on. Uh, so it is a tough conversation and it's not something that folks are thinking about. It's starting to come into the conversation a little bit. Um, I had had a conversation with one of the heads of AI for one of the big consulting companies, and they had said, now we've built this great platform and we have hundreds and hundreds of agents, but we don't know what they're doing or how they're collaborating. And all of a sudden I was like, oh, the moment is coming over getting there. And it's because people aren't thinking about it yet. They don't know that it matters, they don't know that it's an issue. And I mean, when we think about even where agents were a year ago, last February was the first AI engineer agent summit in New York. Now we've got agents as the topic of many Super Bowl commercials a year later. So agents as a regular everyday uh commonality were there. What I think we have are behind on is the risk debt that we have built up by this rapid adoption. And we're starting to see it come through. We're starting to see some of the challenges that people have run into when they let things run wild. So we're getting there. But honestly, I the sad thing is it's until it goes wrong for somebody on a really high level, people aren't going to be talking about it. Right. People don't, people don't buy people don't think about their life insurance when they're 18. They think about it once they have children and there's something at stake. They think about it after they've lost someone. And it's that same kind of viewpoint is people aren't there yet because it's still new and they don't recognize what's coming. And so part of my directive and what's really important to me is to get out and talk about it as much as possible to get that conversation moving. So all of the chief AI officers and the CTOs can really start thinking through how do we address this in this macro way?

SPEAKER_01 13:48

I think addressing it is it's just not an even conversation, especially where we're at right now. Um things are moving way faster than I think any of us have anticipated. Um but I mean, if we had to, if we had to take a step back and really think about how we got here, um, let's go back in the days, like you said, last year that there's so many changes. Um, you know, we had the first AI AI conference for engineering or AI engineering conference. Um, you go even further back, you go to 2022, you've got the first you know, instance of like ChatGPT getting big, right? And before then, right, you actually just had like people installing Anaconda on their systems and taking a really long time to build very rigorous pipelines uh around machine learning, right? And it felt like um during yeah, like we we've been there. And you know, the amount of of I I was at a shop that was doing this, and there were so many, there was so much concern around compliance and just data sovereignty, like where's the data residing? Um, making sure that there was a lot of data integrity, a ton of validation, a ton of testing, right? Because no one wanted anything in machine learning to like go us astray, even 1%. Like you had to be so sure that everything you were getting, whether it was uh forecasting or predictive model, um, all the way to just um to understanding how um a model was getting to a conclusion, um, everything needed to be like super like just sealed. It feels like we've kind of lost a little bit of that in just trying to keep up with the velocity. So, what is it going to take to make the pendulum swing back to like this extreme rigor? Or do we even need to go back to a place where um we need that same rigor again?

SPEAKER_00 15:51

Yeah, all really great observations. One, yes, the the level of rigor and responsibility we had when we were building them by hand was immense. And it's why when AI first became a commonly used tool by people who didn't understand it, I kind of felt like we gave the middle schoolers cars to drive. Like you don't know how to drive it yet. Just slow down a little bit. Can we start with a go-kart? Um and so acknowledged A. Um, I think the point that you bring up that's the most important part is the velocity. That because the technology and the development of technology and deployment and our ability to deploy is moving at such a breakneck speed that it is very hard as a business is and as a C-suite and as leaders to know how do we we don't even necessarily understand the entirety of the risk, but we do know what our competitors are doing. We're very familiar with that risk. We clearly see that if we don't embrace this, we don't move quickly, we're going to be left behind, especially for small media businesses. This can be existential. And so unfortunately, it's just the nature of how fast things are moving and how much people have a comprehension of the risk involved, that they have a good, clear idea on their business risk, they know how to manage that. And while it's not necessarily a black box, the depth of technical understanding and technical prowess isn't necessarily there for people making decisions. So I think to your point, part of the reason why the chief AI officer is predominant, and I would encourage everybody to have someone in your organization that is the AI expert because you need someone to bring that understanding and to really become an important part of your risk identification profile to know where things can be. You know, I think at the conference that we met, there was this one moment where I said, when you're in generative circumstances, the risk plane is infinite. And to have an understanding of that, you know, it used to be, like you said, getting to the data. Is it valid? Is somebody trying to hack into our system? There are these very clear risk moments and you could identify them. But now that risk is more ethereal in some ways. And so I think unfortunately, as the case with most things, it's not going to become a common conversation until something goes tragically wrong. Um, you know, in the military, they say until somebody dies, or in business, it's until somebody loses a lot of money. And one of those things is going to occur and everybody's going to say, oh no, we should probably look at this. The good news is there's lots of people doing phenomenal research. Companies are working on it. So the information and the tooling and the answers are out there. People just have to be ready for it.

Agentic AI is relentless and it will find a way

SPEAKER_01 19:13

I wish I was half as persistent as AI was because it's so driven to a goal, right? It's driven to an objective. And it will find any path to that objective as it can. Um, you know, I in I've been in a couple of startups, and there's always that one engineer that finds a way to complete a task in a way that like totally would flag security if we figured out how to how to catch it fast enough, right? Uh again, in this case, those flags aren't being flipped at all. You know, and these agents are going to find ways. It's not when, it's not if they're going to find a way, it's when. Because again, 24-7, always trying to complete a task, it's all about when are we going to complete the task. Yeah. So it's just the the area of risk is is so hard to imagine. And just um catching it and kind of figuring out a good control for it is really difficult. But I think that's kind of what you're what you're kind of thinking about as well. It's kind of top of mind for you.

SPEAKER_00 20:16

You know, I always go back to the series finale of Silicon Valley, which is probably 10 years ago now. I don't even know how long it's been, where the AI got so good they had to, they had to kill it because they knew they couldn't unleash it. Well, we clearly learned nothing. Um, but to your point, it is, it really is relentless. And so that's why I guardrails don't work, right? Guardrails don't work because the AI will come up with so many other places. There's no way we can even imagine all the edge cases or all the possible ways. And then what do we do? We have one AI trying to figure out all the ways to stop it, and the other one is trying to find all the ways to break it. And like what are we even doing? So that's why I feel like if we move towards this control native place where we watch where it's going, we watch the way it's steering, we are sort of partners of observability in the actions themselves. We can start to really look at it from the inside rather than from the after effect, right? So we've got you've got an open claw that's running all night to try and execute a task. Have we looked at what that looks like? Do we know all the places that it went and what it was trying and what it was understanding? Do we see what those patterns of thinking are? Where when does it turn back? When does it follow down a branch? When does it prune the branch and try another mechanism to really begin to map those mechanisms of understanding? Because we're never going to. Cognitively imagine, or even with computation, imagine all the infinite outputs. But the question becomes can we see going in certain directions? So for example, let's think about mechanistic interpretability. It's like the most basic element, mechanistic interpretability in some output that is clearly part of a guardrail that it shouldn't be going. It's not just the words that came out, it was what was the direction, which different pieces were activated, what inside the model was being leveraged so that we understand when someone's going in that direction. Right. One of the things that I haven't done any research on this, but I'm really curious about. So we have these algorithms that can take photos of humans and turn them into things that are not okay. Right. So you know, we don't need to get into the details, don't want to trigger anybody, but we take humans and we put them in situations that are not okay. The distance between the original photo and the output, if we think just linear algebra, the difference between the original and the output is a given distance, is a given vector. We know what that vector is, we know where that is in vector space, right? The I think we completely under-leverage vector distances as a mechanism, right? Because we don't necessarily need to look at the output and reclassify it and you know if you know that's the case, you can make it so that that area is either a blank space, it can't be done, or like ways that it gets understood, right? So it doesn't have to be the exact guardrail, but how can we use the math to understand where something is going and possibly redirect or block, but take a take a really deep mathematical approach to what's happening. I haven't done the research on it, but I definitely think there's there's something there that we can leverage.

SPEAKER_01 24:06

I mean, there's a lot of credibility to it as well, right? Like for instance, I remember back in the day when computer vision was like still a huge hot topic, right? Um, there was some study. I wish I had it just in front of me, but essentially you could um paint over a stop sign, right? And a computer could read it as right, go 45, whatever, right? It can just go through. And that was because it was reading pixels, and it would basically read a couple of pixels, and it would uh it's see that this part was covered or it was a different color than it expected for stop sign. I was like, oh, this must not be stop, it must be a different sign that has a pixel here, and the pixel is this color. So it makes sense, right? Because, or at least in some in this case, the machine was very easily fooled by this literal, like uh, I don't know if this particularly would be considered map, but by its own expectations, right? It expected this type of thing and it got something entirely different. I think in this sense, and maybe this actually will lead us into a conversation about the what an impl a good implementation of control native AI looks like. But when we think, when we go back to your scenario with the um the you know images being generated, right? And you have image A, um, let's just say it's an apple, right? And image Z is an apple that is still on the tree, right? And like how do you know this person is actually generating an apple that is on a tree versus an apple that is somewhere we don't want to talk about, right? Like what are the steps that are actually determined between these things? I'm wondering, is there a combination of like prompt analysis, maybe understanding like uh like the tokens that are being used for the request and what these typically take? Like, I don't know if there, and again, you're coming, I'm a I'm just a security guy, right? I'm just making assumptions. I'd assume that like a tree in like a certain dimension or uh of an image has a certain expected like value for creation. So like it'll take X amount of tokens to create this tree, right? So I'm wondering if there's any way to do like some predictive analysis and kind of make an estimate that like, hey, there's way more tokens than we need for this type of request. Maybe we like look a little bit deeper into the thought process, right? Because there are like um concealed in some models, they have these like uh thinking tags or they use um idea tags, or they, you know, they have these different types of things that going back to what you were speaking about are black boxes and are not really exposed to observability. But this is where the model kind of thinks, but again, this is also where the money's made. So let's go back to control native AI because this is technically all of the observability that you have in your control and purview that you can use to get an understanding of where your AI is going. So, what would a control native approach to this type of solution to generative content look like?

SPEAKER_00 27:18

Yeah, I mean, this is where I wish I was on the research team at Anthropic because I'd be very happy spending my days exactly trying to solve this problem. Um I I the the fact is that the math is very deep. The number of people that are working on this is very small. Um, as I said, they're my favorite people working on it. So I and there's a lot of work to be done to get it to the place where it surface enough to be consumable. So the question for me is how can we, on the enterprise side, on the non-foundation side, what can we do to do that? I mean, I know what my old school, you know, mathematical approach would be. I don't know that it's scalable, right? So, like I said, I would I would look at anything that I could extract, because most of the time these aren't open models, right? I would probably do all my research on an open model where I could actually look at the weights and I could track things. Um, but that's kind of silly because that's not what everybody's really using. So it's not really that helpful, right? So that's part of the challenge we have is we get back to the velocity moment, the velocity of I'm gonna use the hyperscaler model to get what I need. And I'm going to hope that their approach to it makes sense and kind of cross your fingers that things don't go wrong. I mean, it's not a good answer. Like, I want to have the answer for you, Mo. I just I don't have it.

SPEAKER_01 28:56

No, I mean, that's fine. And I think that's where we also are, right? We are in an area where we're super curious about figuring out what is the thing that we should be doing, right? We are really trying to understand. I don't think any of the conversations that we have necessarily have to end in solutions, um, but they should at least take us into steps towards finding those solutions. And on that, I actually think you're working on some pretty interesting stuff. Um, now I know that you were kind of working with NIST on a couple of guidelines, right? So I would love to hear a little bit more about that.

SPEAKER_00 29:35

So most of my work um in the in that area was around understanding what are some of the mathematical pieces and really speaking to how do we frame risk? How do we classify the different um the different elements of it? So you take the the classic um you know AI risk framework and how do we put Gen AI onto it? And what we came out with was like a pretty decent matrix of understanding, but the again, it's it's all about what people are gonna do with it, right? So we have frameworks that we put out and you know guide people in ethical choices. Most of it is about application, though. It's not about the actual build, it's about um, you know, understanding what what is the decision being made on this output, what's at risk. Where I think things are more interesting, I would love to see a foundation model that's just built on, say, children's books or is built just on something that is very bounded, right? Because what we know to be true is we can't have something come out of the model that it hasn't seen. Well, that's actually not entirely true, but you know, it because the models have gotten so sophisticated. I have to I have to alter that a little bit. But I would love to see some smaller models built on very closed cases that can hopefully avoid some of that negative noise that can come out of them.

SPEAKER_01 31:13

You know, the smaller models are really good to look at in terms of you know, how do you scale certain solutions and certain controls, right? And you can see how like you could see the impacts a little bit easier. Um at the same time, the small, I will say the smaller models are getting way bigger um just by the day. Like the things that are considered small are actually quite large. Um and it it's getting it's getting it's getting interesting. It's getting interesting all across it. So guess going back all the way to the beginning, right? Where we were trying, where we were in the room with the CTO and the chief AI officer, and they want to get their things done, right? What are kind of the ways we enable these folks to feel comfortable about the risk that they're taking while also maybe providing their teams with some level of observability and transparency and traceability in their AI systems? What are like the kind of the high-level, if you had to say like maybe three things that you would say, these are the three decisions you should make and the three things that you should have implemented in your product or your um your AI pipeline that you're using in your organization? What would those kind of look like? And how would you apply like some of those control native AI solutions here?

Three things every enterprise needs to implement now

SPEAKER_00 32:34

The first thing is observability. You know, it's shocking to me that observability of our systems is not table stakes. It's just not. Partially because So what does that look like? I I mean, honestly, if you're if for if it's me and I'm buying a product, right? Most people aren't necessarily building your own agents. If I'm getting something, all of the OTE has to be built into the system and it has to be easily accessible. Right? It's a non-negotiable. You have to have the you have to have the OTEL traces, right? A. B, you need to have someone on staff who can understand and look at that observability at a system level and be able to discern that impact. And that person needs to be able to communicate that to the relevant business. Right. So you need to make sure that you've got your internal observability with anything you build or anything that you buy. You need to have someone internally who can leverage those at an intelligence level, not just a log level. And then you need to make sure that they have an understanding of what's at stake for the business for each of the areas that this is hitting. And that communication, that person needs to be able to educate, just like we did when we had our basic linear regression models. Our job was always to provide a model to explain the risk so that the business unit could make a decision that they feel confident about. We need to have that now in the same way that we did before, but we need to have it at this agenc

Traceability vs observability

SPEAKER_00 34:20

level.

SPEAKER_01 34:20

So we talked a little bit about traceability, but like let's dig a little bit deeper into that. So, what does traceability mean here, right? Is it actually figuring out how a decision is made, or is it understanding everything? You know, it can get conflated with observability because I'm even talking about it out loud, and I'm like, well, observability should be about being able to see how you're getting to something. Traceability, what does that really look like? Is it the actions that the AI or an AI agent is actually taking across the environment? Is it more about the things that are happening around the agent that can be now you can look back at and attribute them as a root cause for an action that an agent is taking? So, like what is traceability uh here and what does that look like?

SPEAKER_00 35:06

Uh the way I view it, I view I view traceability as the unit, right? The actual trace, the ability to follow a decision or an outcome from an AI back through all of the agents, trace to all of the data sources, to everything that went into it, all the way back to the source of that data, right? That's the traceability of the actual decision and the branching that led to it. Observability for me is the one layer up. We see all the traces, we see all the movement. Observability to me is saying, okay, these are all the actions that are being taken, but what does it mean? What does that understanding? Because even if we look back at traditional ML ops, drift isn't one output, right? All of a sudden we have one thing that's a little bit different. That doesn't mean the entire model has drifted or the environment has drifted. It's when we see a certain tipping point, a threshold, you say, oh, now we can see this is considered drift. And so for me, that's that's where that is. I I think that it's really important for us because of the nature of agents and these interactions, we cannot look at it at log level. We cannot even look at it in two dimensions. We have to take time elements into account when we're looking at observability, because velocity shift of importance is going to be one of those big indicators that we want to start looking at. Because when we're looking at the system, what's anomalous is not really the gold standard for me when I'm looking at it. What I want to look at is what's influential. Right. I want to look at what is the I don't know, the tool call or the data source that is really influencing many things. What is like a big shift, like example, you looked at how quickly someone was responding in your WhatsApp chat, right? It was the it was the velocity of the change that mattered. Right. Nothing he was saying, the bot was saying was anomalous. If you had had a traditional anomaly detection, everything sounded like him, it was his verbiage. You wouldn't have necessarily said, oh, somebody hacked in and it's his mom responding in our WhatsApp or something, right? Right. But it's the velocity of that change and the velocity of and the level of influence. So now what you saw is you saw a very quick change that was very influential. And those two things together, I think, is what we need to start understanding when we look at observability. Because, you know, I was talking to one potential client and they said, well, you know, you'll come in and you'll look at our systems, and you know, from there you'll be able to determine if something is anomalous. And I was like, Well, I don't see it that way because if I come in today and there's already been an infiltration or something's amiss, I am assuming when we show up day one, everything's okay. But we don't know that. Right. So by looking at influence, by looking at impact, it allows us as humans to figure out which parts of the system should I even be looking at. Right. We have, yes, we have AI to help us with all of those things, but without having that metric and that understanding, we're never going to necessarily find it. So some of that shift in understanding like what should we be thinking about from risk, that's one of the other pieces I think is really fascinating and an open conversation right now.

SPEAKER_01 38:56

It's like so much to digest. And you you say this like really good word, which is influence, right? And it's a new, I don't want to call it a new concept, even though it feels new, right? Like I think for as long as I've been in security, which is not very long in in the scale of how long computers have been around. I've always been used to um, again, computers being very much call and response. And I do one thing and I get the exact output that I expect, right? And it's just how we worked, right? As uh even as pen testers, you know exactly how you're gonna get the response you want. All you need to do is continue to ask the question in different ways. Um, you know, this is essentially how you test a service. Um and you you go and you now we have AI, right? And obviously we still have the same vector, right? With prompt injection and and all of these things. Um we do just go and you do call response and you figure it out, right? But then there's now this like other factor of influence, especially when you look at agenc, right? And where we are at now, where agentic um relationships are now just agents are influencing each other, right? You take it back to your space example, and now it's like, okay, well, this agent is doing this thing, and you know, it's basically peer pressure where you're getting pulled in one direction, you're getting pulled in another. And it's so difficult. Like when you think about environments that have a handful of agents, and then you think about that even larger when you've got an environment with hundreds of thousands of agents like the one we're currently living in, right? How are all these different things just impacting each other? And it's like we're in such an interesting territory right now when it comes to evaluating risk on the scale. And I think um, especially after the open AI um acquisition, or I don't know, yeah, I would call it an acquisition, um, of open claw, right? Now they're bringing it in. We're about to see, in my opinion, a very much quick movement of this technology into the hands of the everyday consumer really fast, right? You won't need a Mac Mini to run uh um, I call them multis, but I don't think that's what they're called. Um, but our little claw friends anymore, right? Now you can actually run your AI companion like likely from uh OpenAI's platform. And you know you can give it the same access and everything like that. I think we're like, man, it's like it's going real fast. Uh, but it's a really exciting time to be kind of tackling these problems. So moving away from agents for a second, right? Or maybe actually moving into the literal hands of agents. I know that you were literally just here to do a robotics competition. And robotics has kind of been one of those things that you've been involved with for a while. So, what was that kind of like? And then also like, what are you actually doing with robots and why? And when are you getting your neo?

SPEAKER_00 42:16

Oh, yeah, I know. Um, so I've been interested in robotics for the last couple of years just because I've got an interest in sovereign systems. I have an interest in data privacy and data protection. And the the meta glasses, as someone who feels very strongly about data sovereignty, were kind of hard for me because you couldn't opt out of your data being used to train. So I didn't love that as a policy. And as we know, everything's moving very quickly. Robotics is also moving very quickly. And we've had robot vacuums in our homes for years, but the humanoid robot is coming and it's coming very quickly. And so, as someone who's always looking to the future, I'm always thinking, okay, what does that mean? What do we need to prepare for? What infrastructure do we need to be thinking about? How can we be thoughtful about what's coming? And so I got really interested and had been talking to friends about inference at the edge and data and these things. But then I realized I was like, well, I gotta spend some time with some hardware in order to actually have an opinion. So what does that look like? So um a friend of mine, uh Andrea Turcou, who works at H2O, came out from France. I said, I'm going to go do this physical hackathon, come join me. So she flew out from France. I flew from the East Coast with this really wonderful fellow, Sidir Dadi, who was sitting at the table next to us. The three of us got together and we built a single claw arm, an SO101. It's got a camera on it, and it opens a book and it turns the pages and it reads aloud. Now, it doesn't seem like it's, you know, or it's shattering. Robotics. But as a mother of a son with autism who has a reading disability, who was served really well from audiobooks and different things like that, it just really got me thinking about what is now possible as a with this as an assistive device, right? So whether it's reading, um, whether it's, you know, learning to read in a foreign language or extra support for special education, or, you know, you're in a part of the world that doesn't have as much access to technology or education, you know, what could be possible? So it got me thinking about that. And what I really want is I'm the highest compliment I can give any builder, any founder, I mean, really any human, or probably AI as well, is is it thoughtful? Right? Is what I have built thoughtful? Is it thoughtful as far as risk? Is it thoughtful about impact? Have we have we figured out what it can do for good or evil? Um, and so I started thinking about, well, we've got to find a way to fund it, right? Because, you know, nonprofits are great, but it's a rough time. It's a rough time right now, right? So then I was thinking about robotics data sets. And what's really interesting to me on robotics data sets is that historically everything that we had built AI for was on a screen. So whether it's text or video, everything's two-dimensional. But we have to take into account force and we have to take into account weight and just physics in general. And so as we're looking at the data sets, um, when I was at the lab lab AI hackathon last weekend, people were doing really cool things. We were using a lot of um Isaac Sem and DeepMind. And a lot of what we were looking at was ways that people could use VR or iPhones to create these data sets. But for me, I'm like, but there's no there's no physics in it. But if we're in the simulation, we have all the physics, but no real world noise. So there's a gap in robotics training sets, which is real world collection. And historically, it's very expensive, right? It's very easy to do things in the lab, very hard to do things. So then, you know, a constraint is always just a puzzle to solve. So the the question became so we want to put this assistive technology out there. We know that there are these constraints in robotics. What would happen if we leverage these tools to really understand paper manipulation for robotics? So, our goal right now in the project that we're working on is our big lofty goal is to have 1 million SO101s across the globe serving 240 million children with learning disabilities. It's a big goal. I know we'll get there. I don't know how long it will take. And at the same time, take just the telemetry data, just the movement of the arm, just the understanding of, you know, we'll move into other hardware eventually, right? But the understanding of what it means to manipulate paper. So my personal goal is to participate in building robots that can perform origami. And all of this long story to say, because when the humanoid robot comes into my home and comes into your home, I want it to be able to be thoughtful and have that delicacy and that understanding of something very fine so that it can make good, intelligent movements to serve whatever it is that you want it to do.

SPEAKER_01 47:37

That is a lot, but sorry, um, no, no, no. I'm not saying no, no, no. I'm not saying what you're saying is a lot. What I'm actually saying is that paper is a lot, right? Um, paper in general is a very, it's a very delicate thing. And um, you know, I've been really obsessed with world models, not because of the world building aspect of them, but because um, again, from a security angle, I'm more interested in the physics piece, right? Not necessarily to think like, oh, um, you know, ball bumps into wall. Um, if ball is rubber, it bounces back. It's more like what kind of like chemicals can we actually like simulate there, right? Um, can we start to simulate chemical reactions against different substances? Can we go and like recreate the physical properties of things accurately, right? And if so, like what types of physical phenomena can we actually um again recreate in a fully simulated AI-driven environment? For example, what happens when paper touches water, right? Like, I don't think a robot will understand that, right? It'll just dip it in and then you know the paper is either super heavy paper and it can deal with it for a little while, or it's super soft paper and tears apart, or it's paper tissue and it just doesn't even come back, right? But like paper is so is a lot, is what I was I didn't say it right, but paper is a lot. And for me, it's like how do we like LLMs are not enough to actually get this information into a robot? I would say we need to go further and actually figure out how do we how do we up level the comprehension or the portability of a world model that has full understanding of objects in a world and how they interact with other objects in a world. And how do we come full circle with that kind of concept? Get it into robotics in a way where like you can have this robot turn the piece of paper and not necessarily think about it in um in measurements of like force, right? Like how hard do you need to like push on this or um you know, like these things, right? And instead have an understanding of like, okay, well, this is how thick it is, and this is what my fingers look like, right? Not, you know, right now it feels binary. It's like grab this, right? And grab it with X amount of force, but it doesn't have an understanding of like why, right? So that why is so important. It's the same reason why my cat, for example. Um, my cat did not grow up around other cats. So one of the biggest behaviors that cats have is when they when they scratch or bite, it's actually a playful thing. But if they don't have any indicators of like, oh, this is hurting you, guess what? Your cat is going to scratch you and you're gonna bleed because it doesn't understand, oh, this is actually causing you pain. But again, the same thing with machines, right? They don't really understand what the for what the output or the effect of the output of the force is on whatever object that they're interacting with. And you really have to be careful when you control or when you go and you teach it these things. But we're teaching it with words, we're teaching it with numbers, we're not teaching it with an experience, which is where I think world models kind of bridge that gap, or at least in theory, in my opinion. And again, not a scientist or a researcher, but I love reading that stuff. So that's kind of where where it excites me. So you've been in this space for a very long time, was the well, actually, you haven't. You have a really interesting background and you've jumped careers into this particular governance risk and compliance space. And I wish we had a whole other hour because I would have loved to dig into your background more. But um, you've been in governance risk and compliance for a while, and we are most certainly in a race with AI. But are we in the race that you think we're in? I think we're in a race between our ability to build agentic AI and our ability to govern it. Out of these two sides, our ability to build or our ability to govern, who's

Who's winning: builders or governors?

SPEAKER_01 51:53

winning.

SPEAKER_00 51:54

Building is way ahead. Building is way ahead. I mean, partially for the reasons that I said that we have a very good idea of data governance. We know you can access this data, you can't. We're really good at that. We're really good at you can use this tool, you can't. We are not good at understanding the mechanisms of what these tools are actually doing. And we can't govern what we don't understand.

SPEAKER_01 52:21

Yep. And I think that is kind of like everything that we've been talking about, right? It's the builders are have a huge advantage in this space. And as as govern governing folks or people in GRC, um, it's all catch up. It's all catch up. And honestly, it's the exact same place red teamers have been for uh the entirety of the red teaming practice, right? So governance net is real, Mo.

SPEAKER_00 52:47

Governance debt is real.

SPEAKER_01 52:48

Yeah, I know, and that's another thing that we could have talked about again. Another hour.

SPEAKER_00 52:52

Like another time, another time.

SPEAKER_01 52:55

So thank you so much for joining us today. And where can the people find you? What are you working on? What's coming up for you?

SPEAKER_00 53:03

I'm always working on a lot of things. Uh, the best way to find me, honestly, is always on LinkedIn. I'm always there. I try to respond as much as I can. Um, I've got, you know, an algorithm paper coming out in the next hopefully two to three weeks. Um, some retrieval stuff, working on some permissioning ideas for agent-to-human understanding. And LinkedIn's always the best place to find me and see what I'm up to.

SPEAKER_01 53:28

Well, thanks so much again for stopping by. And we finally got it. We finally did a podcast.

SPEAKER_00 53:33

Thank you so much for having me again. I as soon as I met you, there was an instant intellectual chemistry. So I was super happy to be here. So thank you for having me.

SPEAKER_01 53:43

I'm always happy to host Curious Find as we do here on Curiouser and Curiouser. If this episode helped cut through the noise, like or subscribe so you don't miss what's next. Thanks for spending time with us. Until next time, stay curious.

Mo Sadek

Host