Dan Cooley of Silicon Labs - The 30 Billion Dollar Question: Can AI Truly Live on the Edge? Artwork

EDGE AI POD

Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.

These are shows like EDGE AI Talks, EDGE AI Blueprints as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.

Join us to stay informed and inspired!

All Episodes

EDGE AI POD

Dan Cooley of Silicon Labs - The 30 Billion Dollar Question: Can AI Truly Live on the Edge?

April 24, 2025 • EDGE AI FOUNDATION

Imagine a world where your smart glasses don't just identify objects but tell stories about what they see—all while running on a tiny battery without heating up. This cutting-edge vision is becoming reality as semiconductor companies tackle the monumental challenge of bringing generative AI capabilities from massive cloud data centers down to microcontroller-sized devices.

The semiconductor industry stands at a fascinating crossroads where artificial intelligence capabilities are pushing beyond traditional cloud environments into battery-powered edge devices. As our podcast guest explains, this transition faces substantial hurdles: while cloud-based models expand from millions to trillions of parameters, embedded systems must dramatically reduce their footprint from terabytes to gigabytes while still delivering meaningful AI functionality. With projections showing IoT devices consuming over 30 terabit hours of power by 2030 and generating 300 zettabytes of data, the need for local processing has never been more urgent.

For developers creating wearable technology like smart eyewear, constraints become particularly challenging. Weight distribution, battery life, and computing power must all be carefully balanced while maintaining comfort and style. The hardware architecture required for these applications demands innovative approaches: shared bus fabrics that enable different execution environments, strategic power management that activates high-performance cores only when needed, and neural processing units capable of handling transformer operations for generative AI workloads. Most impressively, current implementations demonstrate YOLO object detection running at just 60 milliamps—easily within battery operation parameters.

The $30 billion embedded AI market represents a tremendous opportunity for innovation, but also requires robust software ecosystems that help traditional microcontroller customers without AI expertise navigate this complex landscape. As next-generation devices begin supporting generative capabilities alongside traditional CNN and RNN networks, we're witnessing the dawn of truly seamless human-machine interfaces. Ready to explore how these technologies might transform your industry? Listen now to understand the future of computing at the edge.

Send us a text

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Speaker 1: 0:08

We're going to talk a little bit about at least our viewpoint of realizing some of the promises of generative AI, especially as it pertains to edge devices, and try to talk a little bit about the challenges that still exist in this market for us to get there. So some of the things that I'm going to talk about in this presentation have probably been said a couple of times in other presentations. We talk a little bit. I mean, a lot of people have been up on this stage and have talked about how edge AI in particular is evolving and how the use cases that people are trying to implement is rapidly expanding and Alive Semiconductor. We're a microcontroller company and it's always interesting because we kind of talk to customers that are running, because we kind of talk to customers that are running very large models doing these things today on cloud-based servers, and when we talk to them they're very excited about. Well, how can I bring this down into these small form factors that you have? Some of those customers might not be. I want to say I mean they kind of understand the concepts of constraints, but it's pretty dramatic when you look at the lack of constraints that exist in the cloud environments versus microcontrollers when you have to run these solutions and they have to be battery powered. We see that on the cloud side you have LLMs that are expanding from millions of parameters to trillions of parameters in a very short span of time. There's this concept now that people talk about. You have the large language model that can be trained and that can answer any kind of question. But then what if I reduce the scope of that? What if I take it down a notch and create something like a small language model instead, in order to adapt it better for some of those constraints From our perspective, providing MCUs that need to be battery powered? When we look at some of those SLMs, we're talking about reductions from terabytes of data down to gigabytes of data. That's still a lot for otherwise like ours, but still there's been some pretty impressive kind of developments in focusing those type of models around certain types of knowledge and also being able to optimize them for the customers that we talk to, about enabling those generative functions in embedded systems to be able to create devices that not just kind of respond to a command but that actually operate somewhat independently, as I've indicated on the slide here, that can make decisions on their own without an operator necessarily being there to instruct it what to do in a given scenario, and then also AI algorithms that I want to say over a conversation, for instance, can kind of remember things that have been said in the past. So, adding that contextual element that generative AI can bring to these types of solutions.

Speaker 1: 3:32

We see a lot of use cases in healthcare, a lot of use cases in manufacturing, transportation, et cetera, that rely heavily on AI that is cloud-based today, that want to improve that by moving the compute that is needed closer to where the data is being generated for efficiency, to lower latency and for a number of other reasons. And essentially, the question that I'm going to try to talk a little bit about in this presentation is that semiconductor suppliers that provide devices to host this kind of AI. They face some challenges in how they design these systems and you know some general thoughts on how to overcome them and also talk a little bit specifically from our perspective on what we are doing to address this. And you know what we already have available from our perspective on what we are doing to address this and what we already have available there is also in terms of the size of the models themselves and the resources that is required to run them. There's also this very rapidly growing problem. I don't know if the cloud vendors would agree with calling it a problem, but there's a lot of data today that is being pushed to clouds all the time from devices in order for the models that are running in that context to interpret it.

Speaker 1: 4:55

This is not a typo on this slide production that we've seen that estimates that the amount of data that is being stored will reach in the order of 300 zettabytes by 2030, right, that is a lot of data, and most of that data I know that somebody mentioned this on the first day of the show. I don't remember the gentleman's name but in reality, most of that data is not particularly relevant for the device's operation. It's just that it's recording everything and it's sending it up all the time, but still, once it arrives, it takes up a lot of space and it consumes a lot of resources just to host that data. Also, on the other side of that is the power consumption that is required for that hosting. It is estimated that installed IoT devices will consume more than 30 terabit hours of power in 2030. And that's also something that is very rapidly expanding.

Speaker 1: 5:58

So, again tying this back to the need to run machine learning closer to where the data is being generated is something that we truly believe is just going to increase. We are already seeing it. But for things like being able to do autonomous local processing to reduce latency, to be able to reduce the footprint of the devices themselves, we kind of have to get to that point where we can fit more machine learning type intelligence into smaller devices. So some data right from projections about the AI market as such. The projection that we are kind of going based on is that the embedded AI market will reach approximately $30 billion worth by 2030, with a compounded annual growth rate of about 10% from 2022 to that year. So it is really rapidly growing. It's a huge business opportunity.

Speaker 1: 7:01

And again, some comments here about some of the use cases for also taking this not from traditional CNN or limiting it to traditional CNNs and RNN type networks, but to also get to the point where we can enable generative AI in these embedded devices to improve human machine interfaces, to make language understanding better, so that it doesn't feel like everybody has a home assistant of some kind and you give it a command and you have to wait for three seconds before you get a reply back, and it's not exactly what you wanted, but to make that more seamless and also to make it smart enough to understand that now he's saying something that references something that was said previously. So you can increase those capabilities. The concept of having small, battery-powered hardware devices that can execute neural networks that contain transformer operations efficiently is what will enable all of this. We will have better scalability with transfer learning in these devices. We'll be able to reduce that huge power consumption number that I mentioned on the previous slide, and also be able to implement it in a way that they don't take up as much memory as we need. So there we go.

Speaker 1: 8:31

This slide is trying to illustrate a little bit about how we look at the different classes of devices today. There is for sure going to be and this is some names that we came up with, by the way, this isn't something that is industry standards. If you haven't seen them before, it's just because I haven't had the pleasure to talk to you before. I haven't had the pleasure to talk to you before. There will still, for certain, be a space for the really, really big models to be cloud hosted and so operate in those environments. Not everything will be possible to take down into a pure embedded environment and most likely not everything will really be suited for that either but what we kind of define as the mid-tier complexity models, the small language models of today and things of that nature, and also the things that are essentially being done today with traditional CNN and RNN networks, but maybe improved by adding generative elements to them, and RNN networks, but maybe improved by adding generative elements to them. This is really what we see as the main target for devices like the ones that we produce and some of my other friends at this event going forward To be able to have a small embeddable system that can deliver up to about 10 tera operations per second and that can be battery powered and not generate a huge amount of heat as it is doing.

Speaker 1: 10:00

That. That is really what we feel, that the energy that the industry needs to get to. We strongly believe in a combination of you know-type functionality and some sort of hardware acceleration in order to take the complex mathematical operations that needs to happen when you process the neural networks and just run that faster In microcontroller systems. You often talk about power efficiency, and usually what power efficiency means is that you finish the work fast so you can go to sleep sooner, because then you can last longer on the battery. So to have that energy efficiency so we can support battery operation becomes a kind of a must-have requirement and then also very important and this is something that we see from a lot of the customers that we engage with today to be able to do this in a very small form factor, because there's a lot of use cases for these devices where they're not only battery powered but where physical space is the biggest challenge of the design.

Speaker 1: 11:04

This tends to be true for almost any kind of wearable. If it's something that goes in your ear or something that you have to wear on your face, you don't have a lot of area to work with and also, depending on how you wear the thing, it can't get hot. You don't want to put something on your face if it's something that needs a cooler attached to it. So, yeah, some additional comments here, kind of going into some specific use cases. We're talking about these computational constraints that many embedded devices have limited processing power today, limited memory, so it's challenging for them to run the complex AI algorithms, and then also the fact that doing the kind of compute that you need to process neural networks is often ill-suited to run on a CPU core because it will be running at 100% utilization through that entire cycle and that burns a lot of energy right Makes it very power hungry.

Speaker 1: 12:04

Security is another topic that is becoming more and more urgent, if I can use that word, for these type of applications, because there's always been the case made for protecting your IP. In an embedded system, you don't want somebody to be able to extract the firmware and steal it. But depending on what these smart devices do, you might also have a very strong need to protect the data that it's working on. In healthcare, for instance, for patient privacy concerns, if it's something that is acquiring imagery, that imagery might be sensitive. You want to make sure that at all times, only the metadata that is generated from the operations is what is being exported out of the device. As one example, this is a use case that we see is growing quite rapidly.

Speaker 1: 12:54

All kinds of smart eyewear right is being designed. I've seen a couple of people at this show with the MediQuest glasses on, for instance. There's a lot of other types of glasses with different features being worked on and for those kind of devices, at least the way we understand it.

Speaker 1: 13:12

Weight is one of the biggest inhibitors. You know you can kind of go the Apple Vision route and create, you know, a huge, massive honking headsets, but a lot of companies want to do something that looks, you know, more like normal natural glasses and it becomes uncomfortable to wear something like that for extended periods of time if they weigh too much. And also kind of how the weight is being distributed, you know, if you have everything all the electronics on one side, including the battery, there's more pressure being put on that ear, for instance. So that's a huge concern. Battery life, of course, is important because you can't put a big battery in those type of things, but you still want it to last at least a day. There may be multiple cameras involved, there may be audio involved, there might be the need to record and save video and then, depending on exactly what features are implemented using machine learning or AI in these things, the models themselves can be quite large.

Speaker 1: 14:14

So there's some real constraints in terms of just how you make like what is needed in order to make a product like that in a way that it will be considered successful and that it can look, that it can have some fashion elements to it and not look like a piece of technology, and that's the challenge that we're trying to overcome. So it essentially boils down to a lot of architectural issues that need to be solved in the microcontrollers. We need to be able to get to that high level of performance. We need to work on removing memory bottlenecks, because with machine learning, one of the things that becomes difficult for microcontrollers in particular is that you're moving a lot of data through a device that wasn't necessarily built for that high-speed data transport. And also, when you move data, that's when you use a lot of power, right? Every bit of information that you move across a bus in a system like this uses some level of power. So we need to try to squeeze that down so that data movement is not what kills us, and we need to up the speed of the interfaces so that we can communicate fast, so that we can have the necessary resolution for the imagery that we take in.

Speaker 1: 15:30

We can process multiple audio streams in parallel and things like that, and then do all of this at a power cost that enables the devices to run on batteries. So this is a diagram of how our current architecture is set up in order to be able to at least start solving some of these issues. We try to design a common bus fabric that is shared between different execution environments that we can apply depending on exactly what data it is that we're working with at a given time. We have different configurations of CPU cores, different configurations of NPU widths in the system. So if you're working on slow moving data, you can use the efficient parts of the system to process that to maximize your battery life and then, when you need to process more data or when something particularly interesting is happening, you can switch over and run that on more high-performing cores. They will use a bit more power when they're operating, but the idea is that you shut them down when they are not needed so that they don't contribute to the overall power consumption any more than they have to.

Speaker 1: 16:49

There's another thing that we feel very strongly needs to grow and develop as much as possible for any of this to be possible for user to realize into applications, and that's to have a software ecosystem that can grow, flourish and support all the innovation that is happening across embedded AI in general. So at the core, aleph Semiconductor is an MCU company, so we talk to a lot of MCU customers that don't really come from the AI backend. They don't have data scientists on their team. They can see that in the machine learning space there's a lot of solutions that they would like to apply to devices that they either already have some kind of solution for, you know, to make them better or to add functionality that they haven't been able to do in the past, but they kind of have to figure out how do I add all of this? You know the MLOps steps that are needed in order to, you know, find your model, optimize your model, train your model and then integrate that with the solution that they've already designed. It's not, it isn't necessarily rocket science, but for some people it's like, well, this is the first time I've ever had to think about this. So this is something in particular this piece here when it comes to optimizing models specifically for constrained systems that we feel needs to have very broad support in the ecosystem, and we see now I mean we started shipping our devices with integrated MPUs back in 2021. There is several of my extended family of chip suppliers at this show that have similar solutions in the market now, which is great. It's good that this is starting to happen, because it enables these really cool solutions to be made.

Speaker 1: 18:47

The thing that I'm a little bit concerned about when it comes to creating this rich ecosystem for folks to draw from is if NPU architectures and things like that starts diverging too much, because that creates a risk for people that are new adopters of this technology. They're going to look at it and they're going to go. Well, if I have to pick between all these flavors and I don't necessarily intuitively or have the technical background to understand which one is going to be better do I want to make the investment now or do I want to wait? And then also, in some cases, when they look at that, they might also think about this is a rapidly evolving field. There's a lot of things happening, and is my solution? If it is something that is limited in scope, is it going to be able to keep up? So we try to go with.

Speaker 1: 19:44

I mean I can't really call it an open architecture, but this is a licensable architecture. We work with Arm using their Ethos accelerators specifically because they are licensable by other people. So it can kind of de-risk that choice. So somebody who wants to use the technology but might not like the direction that our product is going into can then take what they've done and move that over to something else, or if they come to us, that would also be nice. So to try to get to the point where people can innovate and it can be easy for developers to collaborate in technology across platforms and to have interoperability between systems, models and framework from a lot of different people that contribute into the ecosystem. That's what we want to get to.

Speaker 1: 20:37

Software tends to be kind of 90% 95% of what makes these things really valuable, so the hardware makes it go, but it's the code that really makes it special. We have taken an approach in our system I talked a little bit about it to try to configure an architecture where we can do efficient processing for the most part and then have a second half of the system that can be woken up and do work when more involved work is needed. And we try to make that scalable so that we can have a broad performance continuum. We can have devices that are tailored for certain performance points and we're also adding features like wireless into these devices and the version of the NPU that we've been deploying accelerates CNN and RNN networks really well. But I'm very excited about the fact that well, we did an announcement around CS timeframe that we're actually working now on bringing out a second generation of these devices that also will be able to handle generative AI workloads. They will have support for accelerating transformer operations. We demonstrated this to customers at CES and I'm going to show you a video on the next slide here. Show you a video on the next slide here.

Speaker 1: 22:02

But to be able to kind of get that kind of functionality into these physically small devices that use very little power, this is an actual I hope that it's reasonably accurate. I know Kvapn is here, but his OpenMV cam that he designed using our E3 device, the entire system, when it's running YOLO, is at about 60 milliamps of power consumption, so that makes it possible to do that kind of work on batteries. And, like I said, we're already moving towards the second generation. We have the silicon running generative AI. In this case we're kind of combining it with a YOLO model so that we can take a picture of something and then have a small language model that we've optimized to fit in device memory, create a story about what the camera saw. We showed this to some people that made toys and their eyes just lit up. They got very excited about what they can do so.

Speaker 1: 22:58

I worry about the children, but on the other hand, I think it's going to be some cool stuff coming out of this. That's really what I wanted to cover here. I think that, in general, hardware suppliers are going to have to do some work to tweak the way that they do their design in order to be able to accommodate this technology, and I have no doubt that once those hardware platforms become available, then a lot of the models that are run on the cloud are going to migrate down. We won't have to push data all the way up anymore, and we have the potential to make this kind of technology a lot more ubiquitous by essentially making it seamless. It won't feel like you're interacting with a machine anymore. Thank you so much. Any questions?