EDGE AI POD

Reimagining Edge AI using Breakthrough Tools with Ali Ors of NXP

EDGE AI FOUNDATION

NXP Semiconductors is revolutionizing the integration of AI within embedded systems, focusing on enhanced tools and dedicated acceleration. Their latest innovations like the EIQ Software Stack and EIQ Time Series Studio are designed to streamline AI deployment, bridging the gap for developers transitioning into AI methodologies.

• Overview of NXP’s focus on automotive, industrial, and smart home markets 
• Introduction of new AI-accelerated MCUs and application processors 
• The importance of effective data management and compute efficiency 
• Discussion of the EIQ Software Stack and EIQ Toolkit for developers 
• Insights into the EIQ Time Series Studio for automated model generation 
• Highlights of challenges faced by developers in AI implementation 
• Key takeaways on NXP's commitment to enhancing edge AI capabilities

Send us a text

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Speaker 1:

So in that theme of tool chains and software that's taking the friction out of developing and deploying AGI solutions we have Ali Ors from NXP that's going to talk more about the tool chains in the NXP universe here. Thanks, ali, great, I don't know where they found that picture. That's when I thought Autonomous Driving was a podcast. So thank you very much, pete, and get going. Where they found that picture? That's when I thought autonomous driving was a podcast. So thank you very much, pete, and let's get going. So I'm going to be talking about actually a new addition to our enablement that we just announced last week. But before I get into that, just to give a bit of a background on NXP, who we are, where we focus in terms of business and what we're trying to build. We're across a lot of different market verticals at NXP. We're fundamentally a semiconductor processor vendor, very big in automotive space, but also we focus on industrial, on smart home and similar verticals in these businesses and we're trying to make it all about more and more adding safe and secure intelligence into these domains. Talking a bit about our product portfolio, I mean there's a lot of features of how this portfolio has on your right side, but from our perspective, starting from left going up to the right, we do have devices in what is traditionally a microcontroller definition, our traditional MCUs. This is some of our devices coming from both the Freescale and NXV product lines of LPC and Kinetis that have been around for a very long time, and a couple of years ago we announced MCX as a new MCU product line in that domain. So these are traditional MCUs. They use ARM Cortex-M cores as the main CPU, sometimes multiple cores. You know sub-200 or sometimes sub-300 megahertz in terms of clock speed, but typically around 100, 200 megahertz, so very efficient and small devices, typically around 100, 200 megahertz, so very efficient and small devices. Then we have something that we call the crossover MCU. So these are iMX RT products, the RT real-time component. We call them crossover because they run very close to, sometimes, applications processors. They're not a traditional MCU. They're clocked at a very high clock rate. They have, so we get up to a gigahertz on the Cortex-M cores. You can have multiple cores. You can have DSPs, gpus, devices that are typically used in portable electronics with display with a lot of capability. And then we have our Linux-capable applications processors. So this is the imx product line, the 8 series. Now we're in the 9 series and building on top of that, and this is a very heterogeneous compute platform, sometimes up to six Cortex-A cores, so your main CPU, a very capable GPU, dsps, et cetera, across all of these products.

Speaker 1:

Now I can say that we do have dedicated AI acceleration, so we've added accelerators into the smallest MCU, the MCX. We've added acceleration, of course, into our apps processors. We have currently announced at least, or available in the market in silicon, three different product families the iThonomics 8M+, the iThonomics 9.3, and the 9.5, and more coming in that domain. And we also have fairly new still, the iThotomix RT-700, which is a very interesting product because it has an accelerator core as well as a very large amount of RAM, so you can actually load fairly large models in terms of an MCU class device and run these on devices. Because we play in industrial and automotive, quality, security, safety are critical components for us in what we build in the technology. So our products have a longevity program 10 to 15 years Sometimes they go way longer than that in availability. That means that the product has to be in the market a very long time and still be viable in that space that we do Now. That's so, as I mentioned, we have hardware acceleration now across all of these product families that we're building up on.

Speaker 1:

So the MCXN, the small microcontroller that I mentioned, has an integrated NPU, which is an internal NXP design, has an integrated NPU, which is an internal NXP design. So we went with our own IP in some of our roadmap devices and starting with the MCXN was the first device that has EIQ Neutron NPU in the mix, and we decided to do that because we wanted scale. We wanted to maintain the same user experience across the board, from MCUs to crossover MCUs to applications processors. We wanted the same compute. Of course we do scale the compute based on the type of market it goes into, but we chose to go with a highly scalable compute architecture and then, as it gets larger, there's more and more additional blocks that go onto it to really manage the data, manage the movement of data, to keep it highly efficient.

Speaker 1:

But this device is basically out and has been out in the market for a while. It actually won the TinyML microprocessor award back in 2023 when we first announced it and started showing it, and we do have a lot of demos outside with the MCXN94, even able to do vision ML processing, which is not typical in an MCU of this class. And then the RT700 is a crossover MCU, as I mentioned this again, if you compare the amount of acceleration you can get on MLPerf-type of MLPerf Tiny benchmarks, you can get up to 172x better performance on the NPU than you do on the CPU of this device. So comparing it to a Cortex-M33, if all you had was a CPU, the NPU gives you a very big boost in overall performance that you're able to achieve at the same clock frequency. And this device is interesting because it does have a very large amount of SRAM, as I mentioned, so you can get into a lot of larger models. And it flips a bit the paradigm that usually you have a memory limitation on a lot of applications, processors, in what you can put on that device and run on that device, whereas here you're more limited on the compute because you do have a lot of RAM. You can learn, you can put in a large model and your memory you're not memory bound, your compute bound in a sense, but again something that I'm very excited about having in the market because of that capability, the microcontroller domain and what you can run on the device itself.

Speaker 1:

I mean as a silicon vendor, of course, we like talking a lot about the hardware, what goes into the hardware, but there's a lot of software that we build on top of that as well. We're mostly into trying to enable our users, our developer space, and we look at it from multiple perspectives the types of personas, the types of users that actually work with our devices. Other speakers ahead of me talked about this as well. There is a fundamental shift that is needed for embedded developers to start using AI. So we do recognize that and we want to help that transition from traditional embedded development to leveraging AI methodologies, ai code, as well as bringing that embedded developer into using AI based solutions. But we also need to service more experienced users like data scientists and ML experienced engineers. So the tools try to achieve both sides of the spectrum of capability and expertise that the users have. You need to be able to bring the embedded developers on and ease their ramp up in terms of understanding, but also give enough tools and enough capability to the experts to efficiently use your tools and deploy onto your hardware. And we do this with what we call the EIQ Toolkit and the EIQ Software Stack that we provide Same as we went to unification in terms of hardware, compute hardware.

Speaker 1:

In terms of the NPU, we're also looking at unifying our software capability. So that is, all the microcontrollers, the crossover micros, as well as the apps processors and the MPU lineup. They're all leveraging the same toolkit. So the user experience is very similar in terms of creating AI models and deploying these onto the devices. And we do support different flows. We call them bring your own data or bring your own model. So that is where the user can actually come in with a data set, create the model inside the tool and deploy. Or with the bring your own model, they leverage an existing model, theirs or something from Hugging Face some other library could be Model Nova and then leverage that model, potentially to deploy, convert and deploy onto the hardware itself, convert and deploy onto the hardware itself. So that brings it into the types of models that we support.

Speaker 1:

Of course, as a general-purpose processor vendor, we try to support everything in terms of modalities, of the types of sensors, types of data that could be coming in. Vision is where a large part of the market is or has been, of course, language processing as well with voice, but vision processing, voice and sound processing on edge are somewhat better understood domains, or at least better understood from the sensor and signal processing perspective. But then you get into time series, which is everything other than vision. It's where you're collecting samples of data within timestamps. This is different sensors like temperature, accelerometers, vibration, voltage, current All of these are time series signals and there's a different set of algorithms and models that you can create for that type of solution. So we decided to focus on that and make sure that we were also enabling that domain within the use cases, and these are the types of time series applications that we can talk about. You have motor health monitoring, general systems for anomaly detection, classifying events and even regression. So looking at the the health of a battery, for example, over time, like charging, discharging cycles.

Speaker 1:

The challenges other speakers have touched on these, though I mean those applications are great, but you still have the same challenges. You have a data challenge. You don't. Typically you have limited availability of data specific to those use cases or they're very specific to the end use. There's a long iterative development cycle. It's not always obvious. Sometimes this problem looks small, but the development cycle is still fairly large. As I think Rajesh mentioned, it takes you six months to get to a POC, but then another six months or maybe even more to get into actual deployment, which is a challenge, and there's a limited availability of open source models, user case examples and test data sets in this domain especially, and when we tie that back into a traditional ML development where you go from training to optimization, validation and deployment, all of these steps, if you're lucky, might take just a few weeks. So several weeks in each step, and we want to improve upon that, especially for certain types of use cases, and that's where we brought in a tool called EIQ Time Series Studio.

Speaker 1:

So this tool, what it does, it actually automates a lot of the data science aspect of what you're doing. So you do have the user bringing in their data and this could be multiple channels of different sensor data, and then the tool provides you functionality to actually curate that data. So clean it up, look at biases, look at unbalanced data that you might have, do some level of feedback on your data. But the most important step is that the training and optimization is automated. So it's an automatic ML flow or auto ML flow, and the model generation is automated and the tool creates multiple models based on your data. Most of the time it'll create multiple models based on your data. Most of the time it will create multiple models based on that data. You can emulate online, validate it and then deploy onto an NXP device and we started building this with our Cortex-M based CPUs and looking at adding ML, the NPU, into the mix as well as a target to run these. So what you do in the tool maybe I'll skip it just go straight into the studio model generation.

Speaker 1:

So this is actually part of the GUI of the tool. On the left side you can see that there's multiple models that with the user's data the tool has created and for each model there is a set of KPIs and benchmarks that it provides. It provides your accuracy, provides your RAM usage and ROM usage or flash usage, and in each model you can actually decide that in this model the accuracy is 99.77%, so very high, uses about one kilobyte of RAM and uses slightly over 128 kilobytes of flash memory. And you know there's a few things that in the confusion matrix you can see that this model has mis-inferred, missed in terms of its addition. But you can choose a different model in that list that it's created that maybe has lower accuracy but uses a lot less memory. So it depends on what you're trying to do in terms of what you're trying to fit on depends on what you're trying to do in terms of what you're trying to fit on, which device you're trying to fit on. You can select a device, or you can run against a large device and then look if you can fit your model back into a smaller device. All those options are available overall and basically this is the overall flow that the tool has, where you start with your data input, curate your data if you want to run into the algorithm and training, look at the outputs. If it's qualified, you can continue on to the emulator, create a report and deploy onto the device, or go back iteratively into process, look for better solutions or look for alternatives that achieve the same goal but with different accuracy rates, different memory consumption rates.

Speaker 1:

So key takeaways from our side is that we continue on building with our EIQ software stack and EIQ software tools. This is something that we're investing a lot of energy and resources into. It's all about making edge AI a reality on edge devices, on embedded devices, microcontrollers and apps processors Our overall portfolio. We keep adding more and more devices that have a dedicated accelerator because we're seeing that as a need in the market and more workloads are AI-based, so there's a large benefit of having an integrated accelerator in the mix of these devices. And then, overall, one of the new announcements I mean we've also announced changes and additions on our generative AI capabilities. But on the microcontroller side, and what we consider tiny, the EIQ Time Series Studio is a new tool that actually automates and takes away that complexity of the data science part and automates the model creation efficient, deployable, productized model creation on a wide portfolio of microcontrollers for NXP. And that brings me to the end of my talk. Thank you very much, thank you.