EDGE AI POD

Revolutionizing Resource-Constrained Devices with Cutting-Edge HIMAX Edge AI Processors

EDGE AI FOUNDATION

Unlock the fascinating world of edge AI technology with insights from Karan Kapoor of Himax Technologies. Discover how Karan and his team are pioneering the development of ultra-low-power AI processors designed for resource-constrained devices. We'll explore their latest innovation, the WE2 processor, which optimizes the balance between model size, speed, energy efficiency, and accuracy. Karan shares the intricacies of integrating hardware and software systems, along with groundbreaking AI applications like facial recognition and keyword spotting, all while staying true to Hymax's mission of making AI cost-effective and accessible for all.

In the second segment, we spotlight a transformative partnership that brings Wwiseit2 into Edge Impulse's AI ecosystem. This collaboration is a game-changer, introducing a no-code platform that simplifies data collection and synthetic data generation. We unravel the process from model design to testing, supported by an AutoML feature that ensures precision. With ARM's Ethos U55 powering the AI processor, the episode explores how partnerships with innovators like GrowVision AI are crafting an advanced ecosystem. Developers, stay ahead of the curve by following these cutting-edge developments on platforms like LinkedIn and GitHub.

Send us a text

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Speaker 1:

Okay, we're going to get started with Karan Kapoor from Hymax. Welcome, okay. Hello everybody, this is my first TinyML talk, and it's a pleasure to be here, especially with some absolute legends from the tiny AI and edge computing industry, with some absolute legends from the tiny AI and edge computing industry. I'm Karan Kapoor and I'm a senior engineer in the ASIC division of Hymax Technologies. In ASIC, we work on ultra-low-power AI processors and we develop some use cases for them, one of which we will be discussing about today, which is the WE2, or YSI2 for long. We have some strategic partnerships that we have developed also, and a whole ecosystem around it, and I'll also be discussing some of the problems that we have been able to overcome. So we work in the edge AI and tiny ML industry, and we have seen this before throughout the day.

Speaker 1:

There are some things that plague our industry, and these are some of those things. We want our models to be really small. We want them to fit on that 16 MB flash size, so we have that constraint. Second, we want our model to be fast, and so we want low inference speed from that to work in the real-time scenarios that we want to deploy them in. Third, we want our MCUs to run these AI modules and AI models on them for months at a stretch, and we don't want to replace our batteries, so we want them to be really energy efficient as well. And if that wasn't enough, we also want high accuracy from our models for them to be actually viable in the field during the operation. And achieving a balance between these four things is not a simple job. It requires the right kind of engineering and also we see, as the compute and memory resources are growing with the AI industry, it is really complex and it's a hard job to map these and deploy these models on resource-constrained devices. As we deal with Also with our edge intelligence solutions. We want them to be highly performant Again, we want them to be feasible in the field. Then we want them to be highly performant Again we want them to be feasible in the field. Then we want them to be really small so that we can deploy them anywhere, and also we want a cost-effective solution.

Speaker 1:

But supersizing DNNs, which is the current paradigm of the AI and gen AI industry, does not really work when we are dealing with resource-constrained settings like edge devices and edge computing. Second, mapping these AI models is really tough and it's a tedious process onto the hardware, which requires that right kind of engineering. And finally Matteo mentioned this in his presentation the right kind of expertise is lacking in cross-system stack design methods. The embedded engineers don't really know AI and the AI engineers don't really know embedded programming, because it's quite tough. So, looking at the spectrum of AI, we see that we have generative AI, in which we have been able to do great things. It also consumes a lot of energy. We have been able to create images, videos, speech, text, some of which has also been percolating and leaking into the edge AI industry, where we are having conversations about deploying small to medium-sized models on these edge devices.

Speaker 1:

But looking at the endpoint AI industry, or what we call extreme edge AI, we have had some pretty interesting use cases come up during the lifecycle and the whole span of endpoint AI. We have keyword spotting, we have people counting, facial recognition, facial detection, also people sensing, and this is where the Wwiseye solution offering by Himex really lies. We plan on and what we want to put out in the market is a universal, low-cost, low-power solution that is accessible. So introducing accessible AI, also leading to democratization of AI. We want a product. The Wiser solution aims to offer something that is accessible by someone who is working on a personal project and also in the industries. So, looking at the processor the newest and the fastest processor by HiMax, which is the YZI2, we are looking at a dual MCU configuration which is 100 and 400 megahertz, and also an NPU. The NPU, by the way, it has become a de facto standard for energy-efficient computing in edge AI and endpoint devices. Computing in edge AI and endpoint devices and with this configuration, with the NPU, we are looking at a range of tiny ML use cases and applications unlocked that were not possible before just using the MCU and this configuration of dual MCUs and one NPU. It offers orders of more magnitude, computational power, also having some serious latency reductions and obviously offering that energy efficient computation in our Edge AI applications. And when you combine that with multi-layer power management system, cryptography engine, rich peripheral support, you have an end-to-end AI product that is pioneering the endpoint AI industry.

Speaker 1:

So earlier we talked about de-siloing the hardware and the software systems and interleaving them for energy efficient and to produce stable solutions. And also, when we are looking at the intersection of ASIC and AI, we are looking at a design space that is really non-trivial and also non-familiar with some of the industry experts or even some academia academic personalities and extracting these solutions from them. It can be a really non-trivial task as well. So that's where Hymax steps in. We provide that right kind of engineering and every AI engineer and ML engineer at Hymax. They are experts and trained in hardware performance, software design and with that we offer full-stack AI algorithm support, driver support, hardware implementation, along with complete optimization of the whole ecosystem that we offer. And when you combine this with easily compatible plug and play production ready AI cameras, you can unlock that range of AI cases that were not possible before, some of which is head pose detection, human presence detection, facial detection, recognition, keyword spotting.

Speaker 1:

I'm looking at the model optimizations because with ASICs we are pretty much constrained in the hardware that we have. We look at the supersizing or the increasing of today's AI models, and this is also becoming a big environmental concern, by the way. We are looking at increasing the model size, but they also hit a ceiling after you increase the number of parameters unless you are working on a special use case, and this does not really work with endpoint and power constraint devices, because we might have latency constraints, we might have privacy constraints or reliability issues, because the data cannot leave your chip and go to the cloud for inferencing, and also we need real-time, swift decision-making, and so software optimizations, including and interleaving them with hardware data flow, is pretty crucial when we look at endpoint AI applications, and at HyMAX we implement a parallel combination of all these techniques that you see on the right, while maintaining the accuracy, so we get a really optimized model that is designed and optimized for our Wwiseye ecosystem. So, summing up, the ML workflow looks like something like this you have your ML op stage, in which you get the training data, you have a backbone and then you train your model on that data. You set the training parameters, you get a model with a specified accuracy or you reach your accuracy requirements. Then you have the model optimization flow, in which you perform the optimizations that we saw on the last slide, so knowledge distillation, you can perform pruning, quantization and you actually extract model from a neural architecture space that is suited for your endpoint devices, and in this we get an intake file which is then passed on to the ARM stage or the software for your endpoint devices, and in this we get an int8 file which is then passed on to the ARM stage or the software development stage, and in this case we have major contributions from ARM. We have Vela, which is an optimized format of an int8 file that is produced from that step, the Vela file. It helps to offload a lot of operations from the MCU to the NPU and that's what that makes the operation really quick and really energy efficient. And also we have the fixed virtual platform in which you can start building the software even before you have the silicon in hand. So with this we can test our model for its accuracies on certain metrics like latencies, tensor, arena size and all the things related to the configuration that we have here. So with YSI 2, we have Ethos U55. And then once you have your model Vela converted model, you deploy that on the fleet of Hymax WE2. And from there you can feed back the inference results and what you're noticing about your applications back into the pipeline to the MLOB stage. Great.

Speaker 1:

So let's move on to the strategic partnerships and the partnerships that I talked about before. We are really lucky to have support of some industry leaders from software tool chain, saas, algorithm, and all of this is really backed by a supportive, passionate and active community and this really helps to nurture an ecosystem where WE2 and new use cases for it can develop. One strategic partnership that I would like to highlight before I move on to the ecosystem is with GrowVision AI. With the help of GrowVision AI and collaborating we have developed with the help of Seed Studio, we have developed an AI module called GrowVision AI, which is powered by Hymax WE2. And this is an AI module that is fully compatible with most of the seed sensors. We have an AI module that is fully open source and offers a standard CSI interface fully open source and offers a standard CSI interface. And one child of GrowVision AI is what you see on the right. It's called SenseCap Watcher. So it's an actual physical AI device that can see, that can hear, it can comprehend. It has on-device private deployment AI, which is great, and it's really exciting to see how our AI processors and YZI in general is being used in creating products like these that are not only available to the industry and the manufacturing settings but also to deploy for people in their private space.

Speaker 1:

So, looking at some of the ecosystem and what we have been able to do for Wyzeye is we look at some in-house efforts by our team. We have been really working on vision and voice. In vision, we have people detection from NVIDIA TAU. We have object detection, post detection from Ultralytics. In object detection we actually have 80 classes. That is pretty impressive. And from Google MediaPipe we actually have 80 classes. That is pretty impressive. And from Google MediaPipe we have face meshing, and all of these models that you see on the screen are highly optimized for WE2, and they also offer competitive frame rates. In voice, we have been able to run the transformer encoder-based architecture and in that we have the keyword spotting. So it's really exciting. We try to push the limits of our AI processors and aspects to the limit and see what can work on it and what cannot, and this is just a result of that.

Speaker 1:

So the ecosystem, the contributions by Seed Studio, with which we have developed the GrowVision AI, include SenseCraft AI. It's an easy no-code deployment platform in which you just plug and play the GrowVision AI and you have the option to choose from over 380 community models. These models include license plate detection, ship detection, horse detection, and this is where the power of the community really lies, what Pete mentioned in the starting of the seminar. And then, looking at UltraLytics, we have UltraLytics Hub, which is also a no-code platform. They specialize in you only look once models or YOLO models. You can upload a custom data set or you can choose from one of the available data sets for your models. You can choose from a variety of nano to Excel pre-trained YOLO backbones. In our case, we showed object and post detection in our repository and you set the training parameters and you get a model that is up to the mark and you can finally deploy that model using the ML workflow that we saw before or use the repository C Grow Vision AI. We do so.

Speaker 1:

Finally, moving on to Edge Impulse, edge Impulse recently integrated Wwiseit2 into their line of AI processors and this is a really exciting partnership. It's a no-code platform again, in which you can collect data during the data acquisition stage. You can choose to collect the data from your phone, using your MCU, using your laptop, whatever you want. You can also generate synthetic data, as Dimitri highlighted before. Then you move on to the model defining or the model structure stage, in which you design the impulse, so how the model will look like, what will be the input features be, what are the intermediate stages, and then, finally, you train and test the model. They also offer an AutoML feature in their console and with which you can produce a model that is highly accurate and according to your accuracy requirements, and then, finally, you deploy the model using a C++ deployment package or scan the QR code to actually look how your model will perform when it's on your device, and that's great.

Speaker 1:

So I've reached the end of my talk. We have a powerhouse of an AI processor which is powered by ARM, ethos U55, npu and dual MCU config, and we are really excited to see that we have been able to innovate on different horizontals and not just one vertical, and the ecosystem that this chip offers is just unprecedented, is just unprecedented, and the collaborations with GrowVision AI and other ecosystem partners is really exciting when the people have been able to deploy their models or their MCUs in also private and industrial settings. So that's it from me. Thank you so much. Discovered the latest updates. We keep on posting new updates about YSI and about HiMax on our LinkedIn page and also, if there are any developers here, they would be excited to see some updates that we frequently post on YSI Plus on our GitHub. And that's it, thank you, everybody. Awesome, thank you, everybody. Awesome, thank you.