Verification, Validation & Certification of AI in Safety-Critical Applications Artwork

EDGE AI POD

Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.

These are shows like EDGE AI Talks, EDGE AI Blueprints as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.

Join us to stay informed and inspired!

All Episodes

EDGE AI POD

Verification, Validation & Certification of AI in Safety-Critical Applications

April 23, 2026 • EDGE AI FOUNDATION

0:00 | 18:39

A cyclist disappears to the model, not to your eyes—and that mismatch is the heart of safety-critical AI. We open with the “vanishing cyclist” to show how tiny, imperceptible perturbations can flip life-or-death decisions, then walk through a practical path to trust that spans data, verification, and deployment. Along the way, we share real stories from BMW, Airbus, and Madrid Metro to ground the engineering in results, not hype.

We break down how to build a resilient pipeline: domain-specific data labeling, realistic synthetic generation for rare and risky scenarios, and tight interoperability across MATLAB, Python, PyTorch, TensorFlow, and ONNX. We dig into explainability beyond classification with D-RISE for object detectors and semantic segmentation, helping you see what the network actually uses to decide. Then we raise the bar with formal verification for robustness—mathematical guarantees within defined perturbation sets—so you aren’t mistaking the absence of found attacks for true safety.

Finally, we get practical about the edge. Model compression and projection recover accuracy with fewer parameters, enabling fast, power-efficient deployment to CPUs, GPUs, and FPGAs, backed by code generation for the entire application. We also cover runtime safeguards like out-of-distribution detection to catch smog-on-the-runway moments and escalate safely. Throughout, we connect the work to evolving standards, the EU AI Act, and updated workflows that adapt the V-model for learning systems, so your process and artifacts are ready for audits and certification.

If you care about trustworthy AI for cars, planes, rail, and medical devices—and want tools and habits that survive contact with reality—this one’s for you. Listen, subscribe, and leave a review with your biggest trust gap or the safeguard you’d ship first.

Send us Fan Mail

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Why Safety-Critical AI Is Different

Real-World Use Cases From Industry

Three Trends Reshaping AI Governance

A Trust-Centered AI Engineering Loop

Data Quality, Labeling, And Synthesis

MATLAB–Python–ONNX Interoperability

Explainability For Detectors And D-RISE

Formal Verification For Robustness

Compression And Embedded Deployment

Handling Out-Of-Distribution Inputs

Regulation, Standards, And Workflows

Final Recommendations And Closing

SPEAKER_00 0:07

Hello everybody. Thanks for joining me today. I have a slight of a different question for you. I think you have seen quite a bit on deployment, and we'll touch upon deployment, of course. But I have a question on whether you would trust the models that you work with or you work on with your life. So this may be a bold question to start this session, but hopefully it helps you get some perspective of uh of the talk today. So in order to provide some context, um, let me just put you in a situation. You're building a ADAS piece of software, and you're working for an automotive company, and you want to build, you want to put your AI model to the test. So, what I'm about to share with you is uh the story of the uh vanishing cyclist. If you're into AI, you may have heard of the story of the vanishing gradient, and that's a real thing in AI. Um, the vanishing cyclist, I just made it up for the talk. Uh but this is you, and and you're just you know running or just you know cycling your bicycle in a standard road. And what I'm about to share is how the story of the vanishing cyclist works. It's basically that neural networks are quite vulnerable to imperceptible perturbations, and these can cause major safety critical impacts. So this is the original image, that's that's where you are, and we add some noise, some perturbation to that image, creating uh what we're gonna call an adversarial image. This is a variation of the original image. And to the naked eye, and I mean you can probably not see any difference, but there is a difference to the AI model. So we're using uh Deep Lab V3 Plus. I was excited to uh see Deep Lab V3 shown by Mohammed on the SD workshop a few a few hours ago or an hour ago or so. And when we run the original image, we basically get our cyclist, right? As well as you know, some other objects in the scene. Because we're doing image segmentation here. We're basically classifying each one of the pixels to a class. When we run the same algorithm, same neural network with our adversarial image, the cyclist is gone. And so this opens up a whole new set of problems, especially when we touch the um the area of safety critical systems. And safety critical systems are everywhere today. They are in the in the in the cars we drive, they are in the planes that took us many of us here, or trains, um, medical devices. And thankfully, we have good software practices in place, right? And so when it comes to traditional software development practices, we have a whole set of standards that each one of these companies putting together these um these products need to follow, right? And these standards allow us to achieve the highest level possible of quality. When we incorporate AI model into the mix, um the situation is challenging and tricky because there are no standards out there yet, or some of them are being developed. And and of course, there's interest by many of us in incorporating these AI models into these safety critical systems, but we need to do it with very high assurance. So that's what this talk is about. So my name is Lucas Garcia. I should have introduced myself. I'm a product manager for AI products uh at MathWorks. Uh, I've been with MathWorks for uh over 17 years now, um, working with customers and helping them use our tools effectively. And I'd like to capture some of the examples of how AI is being used by MathWorks customers in the context of safety critical systems to motivate a bit of the story that I'm about to share. So, one example comes from BMW. BMW deployed this machine learning model um to basically detect over steering and deploy it to the uh ECU. We have examples from Airbus who designed uh these um onboard uh FPGA-based deep learning uh processor for anomaly detection. And I would want to throw in a personal story since you know this this is from my home city. Uh the Madrid Metro, I don't know if you've ever been here or been to Madrid, uh, adopted uh machine learning to do predictive maintenance in tunnels. And all of these stories are basically sharing some common themes, and I've captured these themes into three key trends. So the first trend that I would like to capture is how AI governance is now underpinned by standards at pretty much every level, all the way from government regulation here in Europe, we're ruled by the EU AI Act, all the way up to AI verification and validation. Those are very specific standards to guarantee that AI models are robust. The second trend that I'd like to bring up is how industry organizations are playing a key role in defining all these processes. And you can tell by now that the um V uh the V model or the V diagram that some of you may have used before no longer applies when AI is added into the mix. So some adaptation of that V model needs to take place. And here's some examples of how some organizations have adapted that V model. The third trend that I'd like to highlight is related to how some of our customers have come up to us and said, all right, we need specific tools to address these challenges. So um they've been asking for static analysis or dynamic analysis techniques, and I'll touch base on some of these later on. And so this brings me to a common AI-driven system engineer design process that I want to capture for this talk. Um, and I think for many of you, this will um be familiar. You typically have we typically start with a problem. So you have a problem, it's right in front of you, you try to dissect the problem, maybe uh understand what some of the user expectations, concerns, requirements, and you start to come up with you know these whole set of uh steps that are going to help you ultimately develop an AI model, integrate it within a larger system, verify it, deploy it, and then monitor its performance. But building trust in these AI models has to come with feedback. So in order to be build confidence in these AI-driven engineer systems, we have to incorporate constant loops. And I'm sure that I've missed one or two green arrows in a couple of places, right? So uh today I'll touch upon five key areas in this diagram, and those will be some key recommendations for building uh confidence in these AI-driven engineer systems based on some of the efforts that we at MathWorks have put into writing some of the upcoming standards for AI in aerospace. So I'll touch upon data, I'll touch upon modern model development and training, the integration, the um verification of models, and then operation monitoring and deployment, as well as finally the AI governance that's ruling everything together. So this is what it looks like. This is my outline, and I'm not gonna spend a lot of time in each one of these items, but happy to follow up um with any of you. So starting with data. Um, data is is food for AI, right? And uh we all know that when it comes to building AI models, if you have junk food, it's not gonna work well. So we have to have good data, of course. So we at MathWorks have been working on uh providing uh tools, domain-specific tools, to support data processing needs. And of course, this is an AI talk, and so I had I was forced to add some pictures or uh of either dogs or cats, and I'm more of a dog person, so that's why you you get a dog there. So we are building these AI-guided automation tools that will help you label uh your data. At the same time, we understand that real data is often messy and it's insufficient, and so you have to explore what's out there for you know uh domain-specific data synthesis. So uh we've been working on providing feature extraction capabilities and data synthesis techniques. Uh, of course, we've um leveraged some of our tools like Roadrunner scenario that you see up here, and of course, some of the gaming engines that uh a lot of us have been have been using for data synthesis and and data generation. State of the art in AI research. I think um we all agree that uh most of the state-of-the-art AI research happens today in PyTorch. And uh for um for this to work nicely, what we've done is we've been putting a lot of effort for the last 10 years or so to make sure that the coexecution between MATLAB and Python was seamless, right? And so to that regard, if you want to call Python from MATLAB, you can do that very easily. Right from the command line, you just type py dot name of the module dot name of the function, and you let MATLAB handle the data passing between MATLAB and Python. Likewise, if you feel more comfortable on the Python end, you can pip install the MATLAB engine and use MATLAB as you know yet another Python package. So that's this bit MATLAB and Python interoperability. But there's more, we've been working on interoperating with uh TensorFlow, with PyTorch, and more generally with Onyx. And so today what we can do is we can just with a click of a button uh load uh deep network designer and ask it to import, let's say in this particular case, we're importing a PyTorch network, and once the network is imported, we can visualize it in Deep Network Designer. Now, it's often the case that before converting you want to save some time. Maybe we've done it already for you, and um and that's something that's um you can find in our deep learning model hub. So here we can find more than 60 pre-trained models and and leverage those. Now, a lot of what's happening these days in AI is related to generative AI as well, and so in this regard, we've also been connecting with the top LLM APIs, OpenAI, OLAMA, and others. And uh you can just download um you know some of our repos and and in GitHub to get started. Speaking of uh LLMs and ways to improve your productivity, um we have uh recently released uh co-pilot capabilities also right within MATLAB. So you have the ability to ask uh MATLAB Copilot for you know tips on on how to build some specific code or help it explain, help, help you explain what's going on in the in the code itself. And so this you know connects with some of the uh verification of AI models that I want to touch upon next. Now, explainability, of course, has been key for many years for image classification problems, but it was a bit of a question mark how does this apply to object detectors? So we've been working towards implementing some of these capabilities like D-Rise, that it's basically techniques like GradCam for object detectors. And in fact, the way it's been implemented in our tools is that you can take it, you can take either a MATLAB object as an input or you can take an Onyx network. Now, going back to the opening of my session today, um, one of the main concerns when deploying neural networks is basically this issue with robustness. So you have an image of a vehicle, you add some noise, and the model does no longer see the vehicle, right? It gets predicted as something else. And so the question that we've asked ourselves for quite some time is can we actually prove mathematically that there's given an image, given an input image, or given an input data, right? So this can this doesn't have to necessarily be image related, given an input data, there are no adversarial examples to that data point. Now, if you try to find adversarial examples, because you can you know uh brute force this problem, uh, and if you cannot find any adversarial examples, that doesn't mean that the system is safe. So the fact that you cannot find adversarial examples doesn't mean that you have verified the model. So it could be that you're just in this very same situation where you have explored one area of the perturbation space, but not the other area. So in this case, for instance, the the red dot is missed by by some of these um some of these approaches. However, a verified model cannot be attacked by any attack, but by any attack. And so that's what we're gonna be seeking to do. And so we've implemented tools for doing this using formal verification. So we actually have mathematical guarantee that that um that there's no um uh counterexample given an image and given a perturbation set. Now this is edge AI, and so of course I want to touch space or touch upon the the topic of uh deployment to the embedded system. And um before doing that, I've noticed that there's you know quite a lot on model compression, quantization, pruning. That's quite a lot of uh it generates quite a lot of interest. And uh this is definitely one step that's part of the workflow that I wanted to highlight. So um model compression can bridge this gap between AI modeling and embedded deployment. Here you can see an example where we're using a technique called neural network projection. So we are projecting the activations of a neural network into a subspace that's equally as representative as the original activation space, but it uses far, far um uh less learnables. And as you can see from from the pictures here, after fine-tuning the projected network, we we regain a lot of the accuracy that it was lost by by doing the projection step. But the projected network is significantly smaller. From here, we can um deploy the code, we can generate C, C, CUDA code, FPGA or HDL code for FPGAs, and actually uh we can do this not only for neural networks but for the entire application, and that's part of uh that's part of what we do at at MathWorks. If you want to learn uh more uh about some of these capabilities, there's a an awesome demo uh we have at the at the booth that that touches base on some of these problems. And before I move on to the last topic, I want to capture one last um item related to in and out of distribution data. So the the goal here is that your system must be able to identify unknown examples and either reject them or escalate them to a human for safe handling, right? So imagine this situation, you've trained an AI model to detect some uh images in the runway, uh, but you haven't considered that there could be smog in the runway. So pollution, highly, highly dense pollution in the runway. And so when this happens, your model is not going to behave well, right? Because it hasn't seen that data, but it's going to give you an output, right? So it's better that you have some runtime monitoring system in place that flags that issue and tells you that you should add accordingly. My last topic is on regulation and governance. We've been contributing to um recent and upcoming certification standards for automotive space and uh also erratical systems. Um so some of these standards have been released already, some others will be released in the near future. But one key aspect here that we've understood well is that in order to understand all the implications, uh you need to follow these end-to-end workflows. And so we've been working on putting together reference workflow examples that have you know tons and tons of scripts and functions and models so that they can be used by industry and adapted to their own needs. So that's it for me. These were my key recommendations. You know, take an eye on the data, make sure that uh you know you uh you use high-quality data, leverage the state-of-the-art research, spend some time explaining and verifying your models. That's something that often goes unnoticed. And then I think the deployment aspect is definitely very well covered throughout the conference. And you know, keep an eye on regulation and governance, especially if you're dealing with safety critical systems. Yeah, that's it for me. Thank you.