Revolutionizing Automotive AI with Small Language Models with Alok Ranjan of BOSCH Artwork

EDGE AI POD

Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.

These are shows like EDGE AI TALKS, EDGE AI BLUEPRINTS as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.

Join us to stay informed and inspired!

All Episodes

EDGE AI POD

Revolutionizing Automotive AI with Small Language Models with Alok Ranjan of BOSCH

November 28, 2024 • EDGE AI FOUNDATION

Unlock the future of automotive AI with insights from Ashutosh and Guru Prashad of BOSS India, as they unravel the transformative potential of Small Language Models (SLMs) in vehicles. Discover how these compact powerhouses are reshaping the landscape of vehicular edge systems by offering customization and performance reliability, all while keeping data privacy at the forefront. Dive into the captivating interplay between SLMs and their larger counterparts, Large Language Models (LLMs), and learn how they together address domain-specific tasks and complex computations in the cloud. This episode promises to equip you with a deeper understanding of why SLMs are the secret sauce for advancing automotive AI in this data-sensitive era.

We also spotlight the remarkable optimization journey of the Tiny Llama 1.1 model. Learn about the fine-tuning process that brought about an astounding 700% improvement in throughput and a drastic reduction in model size. Uncover the fascinating world of edge deployment using devices like Pi 4B and Jetson Oren Nano, and explore the audio and chat functionalities that are setting new standards in vehicular AI. Finally, imagine the future of personalized interactions in cars, where generative AI transforms the way we communicate and engage with our vehicles and surroundings. This episode is a treasure trove of forward-thinking solutions and innovative ideas, perfect for anyone eager to explore the cutting edge of automotive AI.

Send us a text

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Speaker 1: 0:24

And now it's time for a local presentation titled bringing small language models on edge learning and challenges from automotive domain. The floor is yours.

Speaker 2: 0:41

Thank you. Thank you, danilo. Good morning everyone Happy to be part of this event. First of all, thank you, tinyml Foundation for having this unique conference. Thanks to all the organizing committee members for this opportunity.

Speaker 2: 0:55

So I work with BOSS India and we do research around AGI and TinyML for both mobility and non-mobility domain. So this work is a joint collaboration with my colleagues Ashutosh and Guru Prashad, and what I'm going to talk today is that how you can leverage the SLMs to address the domain specific problem statement, address the domain specific problem statement and while we address, you know, the problem space for such a enterprise side, what kind of you know the performance we can see from the real-time deployment, from the AAC wise and what are the certain challenges we faced as part of this problem space solution. So now let us understand, know what are the current motivations for talking about. You know the SLM and how SLM is helpful to unlock new value proposition. We have seen, you know, the lot of buzz and then huge cases around generative AI or the larger LLMs. The size are also getting increased. But when it comes to the customization and we want to solve it for you know any specific smaller data set or domain specific data set. We have seen the performance concerns. We really don't have a control how you are going to process the information, whether the performance will be reliable enough so that you take certain decisions or offer the new solutions for the customer or the end users. So with SLM we will be having a lot of the customization support.

Speaker 2: 2:39

Another critical aspect from the community side is that we really don't have, you know, the infra available when we want to run the generic llms and when it comes to the running those llms on the edge again, it's very, very challenging, not in terms of, say, the cost, but in terms of your energy, for energy consumption, for the inferences and from the sustainability point of view. This is more critical again for the automotive domain cause, in automotive domain, when you want to get it integrated on the vehicular edge, we really don't have that much of computing for what we typically see in the data centers. So this is where, when we have the dedicated or the smaller task, I truly believe that SLM will be going to play a major role and we can offer and then do a lot of the interesting solutions around SLM In terms of performance. There is no doubt that when we are going to deal with the domain specific dataset performance compared to the generic L, generic llms will be better and the same. I am also going to demonstrate through one of the video what we will be seeing now from the user's perspective privacy and then the different. You know, the regulations around privacy and data security is a concern now. In the typical current scenario, we have solutions which are currently hosted on the cloud or the centralized server, but the users are not really keen to adopt such products. Where your sensitive data and then the private data is concerned, with the help of SLM, it is possible to address some of the you know the problem space in the privacy and then bringing more trust in your product offerings. And then the solution, what you are going to have. So this is another driving factor around SLM.

Speaker 2: 4:40

This is one of the report in 2024 published which talks about, you know, the market growth is also very interesting. And then we will be seeing a lot of you know the solutions which will be coming for different industry focus on how, say, the automotive domain will be going to leverage the SLM and the larger LLM which will be coexisting together. And then this is where we see the new AI architecture which is going to be adopted in the automotive domain. We will be seeing, you know, the scenarios where you have solutions which are currently deployed on your vehicular aids, where we will be having personalized SLM to address your domain-specific task, multi-modal SLM to understand your different data inputs. We also envision where getting getting, say, the mixture of experts integrated on regular aids will be playing a crucial role and the way solution is going to come when you have the connected and automated vehicles on road. We are not going to just have, you know, the slm which will be just focused around the customized or smaller tasks, but when it comes to dealing with some of the compute, heavy tasks, the complex tasks, again we have to go to the centralized architecture. This is where the interplay between the SLM and cloud hosted centralized services will be coming very handy, where we will be having opportunity to deal with all sort of different data, be it your image text, lidar, whatever it is. In fact, we have seen most recent work which are published in 2024, talking about leveraging generative AI to understand the capability for the technology, like you know, the b2x, where vehicle will be extending a lot of data to the infrastructure. Right now.

Speaker 2: 6:53

The critical question from the automotive domain side is that when we envision SLM to be integrated on the vehicular edge and interplay with the cloud centric architecture, what kind of you know the huge cases we want to execute on the vehicle itself and then what we want to offload to the cloud. So this is where the the future around and then the solution perspective around this slm and then the having said, the centralized llm hosted on the cloud will be very critical. In addition, since you will be driving, say, the vehicles will be on the road, it is not always true to hold your data, which are always seen as part of the training data. We will be also seeing the challenges around situational events, which are more dynamic in nature. Right Now, when you have such kind of dynamic data in place, it is very challenging to use those data, use those information to the vehicular age itself, where the SLM will not be working. And in this case, we really understand that situational events will be very much helpful when we want to get a deploy on the centralized architecture or the cloud native solution.

Speaker 2: 8:11

So now let me share the problem statement, what we are solving. So consider a scenario since the connected features inside a car is getting increased day by day, we are seeing a lot of solutions talking about. When you have the connectivity and the stable network, you can access all those sort of integrated features, the the vehicles are getting more feature rich. But what happens when, when, say, someone is driving and then you don't have the stable network connectivity, right, you, you really don't have. Or, let's say, when you are driving from point a to point b, you will be having, say, the intermittent network availability. The bandwidth which need to be there to, let's say, deal with your audio input or dealing with your images, will not be sufficient enough to serve the solutions which need to be consumed by the driver or the user itself who is driving.

Speaker 2: 9:14

In addition, we also want to digitalize the owner manual where you can have a control while you are driving to understand the different variety policy or the operational and maintenance side of it. So with the help of SLM, it is possible to offer you the offline services and the users will be benefited. They don't have to really bother because we really don't want to just force it. A smaller task where, let's say, if I'm just trying to understand how to go and operate in the eco mode, how to you know open, if I am just trying to understand, having said the 360 degree camera configuration, which may not be very much familiar to me, but I just want to configure as per my requirement. Now, having such kind of request from the user while you don't have the stable network, it is going to be very uh, unpleasing experience for the user. But when we have solutions around SLM, where we have integrated solutions on the SLM and then get it deployed on the automotive-grade hardware, it is possible to solve those kind of scenarios.

Speaker 2: 10:26

Then in scenario two, again, we envision that SLM to enable the offline services, it is possible to get it deployed on your mobile phone. And just imagine that you are stuck where your vehicle is breakdown and you need certain support. You may be just clicking a photograph and trying to understand what may go wrong. How can I fix it? When I say fix it, it's not you know from, say, the, the service guy side of it, but it's just trying to understand what went wrong. And possibly you can also try to get the contact information of the nearby, your, uh, you know, the service operators or the service dealers, those who are available there. Only right. So, uh, that's the scenario two. Uh, could you please go back and in scenario three, three, uh, we want to you know the enhance the in-vehicle experiences where the small agents are possible to get it integrated on the vehicle. Uh, imagine a scenario, let's say, when you are driving on road and one of the glass window is open, right, your AC is turned on, so you can get an input from your JNI agent, which are which is possible to get it integrated on your vehicle, saying that the window is open and AC is on. Do you want me to close the window? You can do that. So such kind of you know the small solutions which we will be seeing in the connected vehicles ecosystem will be possible to get it developed in using, say, the SLM.

Speaker 2: 12:02

Now for today's talk. What we will be seeing as part of demonstration also is that we have done, say, the scenario one and scenario two which are currently hosted on the edge device one and scenario two which are currently hosted on the age device. The scenario three is work in progress which we are hoping to get it completed in the coming months, and then probably we discuss on those lines also so quickly about uh, bringing you know the making your owner manual, uh, digital license, uh, so these are just say, the number point of view how the data looks like. We will be having both you know the structure and unstructured data when we are dealing with the PDF file. So what we have done, we have done fine tuning on the tiny llama 1.1 billion chat. We wanted to develop this conversational agent. We have processed this information, we extracted this information, but but right now, in the current uh demonstration, what you will be seeing, we don't have, you know, the extraction done or understanding around uh the image and then what are the text embedded on those images, so that that that we have not uh done uh in this work. But this is possible where we can make it more mature now. So the scenario where you want to do it, a lot of configuration to understand and all is uh possible, and then we will be seeing the same uh.

Speaker 2: 13:26

So tiny llama 1.1 is used here, where we have used the base model. We have done the fine tuning on our server machine and once that is done, we have done the fine-tuning on our server machine and once that is done, we have optimized. We have followed, you know, the post training foundation approach where we converted it to the first 8-bit to understand the performance and then the 4-bit scenario. Can we go next? Yeah, so this is benchmarking results. We have done a lot of benchmarking, not only to the fine-tuned model I don't know if you can see here the green one.

Speaker 2: 14:05

We have leveraged the base model, which was having a size of 2.2 GB. We deployed this huge case around both your CPU and GPU architecture, considering Pi 4B, pi 5 and Jetson Ori Nano. So Jetson Ori Nano, if you see, here it was the maximum throughput. What we have received is 26 token per second. And then the model size is 1.2 GB, but for the 4-bit we have jammed it down to 637 MB and the maximum throughput, what we have received on an average, is 40 token per second. Right, so in terms of model size optimization, we have done it 71.8 percentile. The throughput is improved by 8X and if you are going to convert it in the percentage, it's somewhere around 700 percent. Uh, the throughput improvement, the average load time of the model is just one second and, along with tiny and llama, 1.1. We have also tried deploying, you know, the Mistral 7B, where we recorded the performance for a 4-bit for 9 tokens per second. Tiny Lama, 27 billion. We have also deployed on the edge and we recorded throughput around 8 tokens per second.

Speaker 2: 15:30

So right now, whatever the responses we have have seen, it's much faster to address the problem. What we are going to have. So in our solution we have integrated both your chat along with your audio input, where you can get the interaction then through your audio and then text to speech, and speech to text is also integrated as part of the solution. So this is, you know, about the benchmarking, what was done. Now I will be playing the video, if I can do that. Yeah, so can you see my screen? Can someone confirm I'm just trying to play the video? No, okay, so looks like it's split. Yeah, so here, if you see, you can see the performance real time deployed on Oren Nano.

Speaker 2: 16:36

We have asked multiple questions and you can observe, you know, how fast it is and how precise and concise answer it is giving when we have trained it on the custom data set for the owner manual. So this is one of the questions and this is the settings where we have used that. When we have compared the results against your generic tiny llama 1.1 4-bit quantized one, we benchmark and then we have received accuracy around 93 percentile. We still find a scope to improve the performance further because certain data are not processed in the right way, where the model, the slm itself, is not giving you the proper answer. I would not say that it's completely wrong, but it's it's just giving you, say, the half-cooked answer, which are partially true and not always true all the time. So this is around your this, the fine-tuned model.

Speaker 2: 18:11

We would also like to show you the corner cases, where it started feeling, where the mix was, when you are dealing with your, say, text and when you have a combination of the numerical values. So this is where, let's say, when we are asking about the particular address, ideally the response should be that the the response should be where it should be asking you know the further question. For an example, when I am asking the address of one of the oem, you know the location? It is giving you the answer. It is not. It's not nothing like that. It's not giving you the answer, but ideally, let's say, in a state there could be the multiple uh service providers and ideally the answer should be that which you know, the city or which location area you are interested to get that response.

Speaker 2: 19:04

So we believe, as part of our next step, that we will be just, you know, doing that data processing in the best way possible so that we capture those corner cases, what you are currently seeing here, where it is not giving you the proper answer. In addition, we have also asked, let's say, in the car, how many sockets are available? Usb sockets are available there also. It failed to answer. And the performance, when it was done, for both your, let's say, the base model as well as the optimized one for the generic one, it was really worse. But as part of you know the performance is concerned, the slm is outperforming for our domain specific problem statement working real time with faster response time.

Speaker 2: 19:54

So here for an example when you are asking about one of the question, it is saying that you about the usb, it is saying just replace it. One and the same next question when I am asking that how much current we can output we can get from the usb c socket, again it is not giving you the proper answer. So these are the certain corner cases, what we have observed when we are solving this kind of you know the problem statement in place. So I think this gives you the you know the real time feel when it is deployed and how faster it is to run, such you know the solutions on the edge and you can further leverage it for make it more robust where the corner cases are not there. So, for an example, in your user manual you can have the multiple warning indicator, but when you are just asking about the orange indicator of the infotainment system, ideally it should be just asking you know the, what kind of, you know the symbols you are seeing, and then so. But it is just giving you the answer which it finds more accurate for giving you the response.

Speaker 2: 21:00

Another part where what we have observed that the context window is also very limited, where, if we can expand it, because TinyLama 1.1 is having very limited context window size, if we expand, we certainly want to do certain design of experiments around it so that we understand the performance on the real-time device and understand how we want to further make it more accurate, to bring the robustness in the solution in place. So with this we'll switch back to the presentation. So with this we'll switch back to, you know, the presentation, all right. So this, this we have covered because we just wanted to, this we have already shown. Yeah, this this is completed, all right.

Speaker 2: 21:47

So from the automotive point of view, if you see, we really need support from the standardization side, what I mean here. Let's say, when you want to get a deploy for the automotive domain, what kind of use cases, whether you want to realize the septic critical application or the generic smaller task. And if we want to, let's say, say, also trying to understand, because people have been doing research around connected and autonomous vehicle to understand the new scene generation, once you have the new scenes on road, do you want to take that decision on top of it? And if you want to, you know, understand, or taking the decision on top of it, what inferences you are getting from this kind of solutions. You really have to follow the standardized way and this is where, as a community, we have to come together. We have to build more partnership where we exchange, you know, the knowledge that what kind of huge cases, what was the hardware used, how the data was processed, whether it was, you know, done, piloting done, or it was just done on the synthetic data. So this is where, especially for the automotive domain, estin ration will be playing a major role to bring you know the generative ai on the age for the vehicle ecosystem.

Speaker 2: 23:05

Now, whenever we have the new age technology in the market, it is also also very critical that we bring the trust among the users. Let's say already we have seen a lot of increasing concerns about data privacy for the native AI as well as the AI products all around the world. But if we have the privacy by design, when we are designing the new architecture for the SLM for getting integrated on the vehicle, we have to enable the user with that trust that hey, this is the way your data is processed. We are not leveraging the data in this way. We are not sending the data on the centralized way. We are just doing all this. You know the computation on the device which is not leaving your data. We are just doing all this. You know the computation on the device which is not leaving your data. So how you bring that trust among the user is another person from the automotive domain where we need you know the efforts around building such architecture which covers those areas.

Speaker 2: 24:07

Now we have seen also some of the reports where researchers identified that the performance of SLM are hardware dependent. Now, if that's the case, where we say that the performance is hardware dependent, what if you have one of the premium segment of your vehicle? Another could be just having the less compute infra. Now, when you have the same set of solutions deployed on the automotive side, are you going to have a similar performance? How we are going to do this kind of benchmarking where we understand where your solution or the SLM is going to be deployed from the automotive domain point of view, where you know not only on the design focus for your slms but it's also for your optimization part, on the uh inference, and then the further you know the other optimization where you have to deal with again uh, we truly believe that slm and llm will be interplaying each other. What kind of use cases we want to derive in the collaboration phase would be certainly need a lot of research efforts where we need to have, you know, the experimental insights, and then the lot of discussions around how, what need to be offloaded on the cloud, what need to be just running on the edge, regular edge, and, yeah, these are the key takeaways.

Speaker 2: 25:33

Takeaways we have, what we have observed that slms are promising when we it comes to the domain specific task. We have identified the scope that if you have the better data for the training, it is possible to address the corner cases as well, and future of the connected automotive experience will be just hybrid architecture. It's in recent efforts, just spoke. Right now we also need fair benchmarking results because most of the automotive domain solutions, what we have seen it's done on the closed data set right. If we as a community, if we can come up with, you know, the openness in the data set through the community research, it is possible to address, you know, the certain open questions which might not be covered by, uh, the researchers who have done work around it. So just having you know the fair benchmarking results shared with the community will further play a major role for automotive domain when we want to get it realized on for the automotive domain side. So with this I will stop here and I am happy to take questions. Danilo, you are on mute, I can't hear you.

Speaker 1: 26:42

No, no, okay. Thanks a lot, alok. It was really interesting your presentation because not only the topic SLM, but also because it's coupled with the automotive domain. So thanks to clarify and expose the need of Gen-I applied to the automotive contest starting with SLM. So there are a couple of questions before to close this part of the agenda it's about. Let me quickly find it's about it's from Chanseok Kang.

Speaker 1: 27:28

Is there any protection for hallucination while using in-vehicle SLM? I think it's related to the safety aspect, which is super important in automotive. Any comment from your side? Anok aspect, which is super important in automotive, any comment from your side, anok.

Speaker 2: 27:42

Yeah, so we really believe that prompt engineering is going to play a major role and we can also bring the, let's say, when you are getting the response from the SLM, you can bring the context part there so that you can just have the continuous embedding to derive the relationship, what you are going to, you know, share as an output, but ultimately it's a lot of. You know the experiments. It would be very early to understand but, as I have mentioned, you know that if you want to go for, say, the separate critical applications, we really have to understand what kind of you know the decision you are going to take. Uh, based on the inference which is coming out, but a lot of design experiments need to be done for the prompt engineering and then bringing the context part. This is where we see potential to address the hallucination in slm uh, I also have a question for you, alok.

Speaker 1: 28:39

Um, it's clear and thanks to share your view about the use cases. So the digital manual, the digital assistant, these are super important. I see really a strong need about this kind of support in automotive vehicle. I think it's also a matter of fast adoption. I hope it will be very soon adopted. What else? What's next? I mean digital assistant only about understanding my questions and provide me answers so that I don't look at the manual or anything more wider than this in your vision, in your future vision.

Speaker 2: 29:31

Yes. So certainly I think from the future applications we see a potential around the Gen-AI or smaller agents which will be taking care of certain functionalities integrated in your vehicle. For an example, I had cited one example where your glass window is down and you are driving right, so if you're driving so it can be controlled through those SLM. But the larger reason where you know the OEMs are thinking about bringing the more empathetical scenario where you can interact with the conversational agent and then do a lot of personalization, where you solve, let's say, if you are, you know, exposing your data and then you have the stable network connectivity you can also go for the composing emails and then giving you the perception around what when you are, let's say, driving from parking yard right, and once you have started it can give you the surrounding uh information also.

Speaker 2: 30:31

So such kind of emerging use cases people have started talking about, but it's still in the progression phase where we will be seeing, you know, the adoption around it. So again, I also believe that SLM is not going to be just just be helping in what we have discussed today. Right, it has to be you know where you are going to have cloud native LLMs also to bring more feature-rich solutions for the users in vehicle because, again, when we want to go for the on-road, for the autonomous or the connected vehicle segment, let's say when you are dealing with the infrastructure, it is going to be those directions also which where the slm and cloud native LLMs will be playing a major role thank you.

Speaker 1: 31:17

Thank you so much, anok. It was really interesting your presentation and then the way you spoke about thanks again for your availability and to contribute to the forum. Uh, I think it's time to switch. Thank you, thank you everyone. Yeah, thanks again. Great, thank you.