
AIAW Podcast
AIAW Podcast
E143 - Revolutionizing AI and Autonomous Systems - Pier Luigi Dovesi
In Episode #143 of the AIAW Podcast, Pier Luigi Dovesi—AI Scientist at Silo AI, and Co-Founder and Chair of the Board at The Good AI Lab—discusses his work across smart cities, autonomous driving, and generative AI. He explores the ongoing challenges in automotive AI, comparing Tesla vs. Waymo and highlighting the importance of end-to-end approaches for autonomous driving. Pier Luigi also sheds light on synthetic data, and how research—from groups like GAL—can guide ethical, impactful AI development. The conversation covers diversity in AI, the future of AGI, and how open-source frontier models may shape an inclusive tech landscape, offering insights on how innovation can steer us toward a more equitable future.
Follow us on youtube: https://www.youtube.com/@aiawpodcast
CDPR.
Pier Luigi Dovesi:But like all AI conference, nowadays I think it's more of a legacy reason the name of computer vision. Of course most of the paper will be related to computer vision. But nowadays I mean I would say, like the taxonomy that we had in AI back in the days, like computer vision scientists, nlp scientists, that really doesn't hold anymore. It's a multimodal of the same thing. I mean it's almost anachronistic nowadays. If you say, ah, I'm a computer vision expert, I mean of course you are, but I think nowadays you don't even have a deep understanding of computer vision itself if you're not truly multimodal, meaning that you have a really solid understanding all across the board.
Anders Arpteg:I'm going to have a question added to this list later about latent space reasoning, but I'll do that later.
Henrik Göthberg:It's a pixel story.
Anders Arpteg:Yeah, pixel-ish Okay, but in any case okay, cool. So you submitted one or two papers.
Pier Luigi Dovesi:We submitted two works, two papers, two papers, yeah Cool, and you will know later this month if you're accepted, or Well, not the real acceptance, it's going to be just the initial reviews, but usually they're quite telling. You can smell it if you're going in or not. Most of the time. Yeah, and then we're going to have our rebuttal.
Anders Arpteg:Have you done?
Pier Luigi Dovesi:any reviewing yourself. Yeah, not so much, because I try to dodge it as much as possible.
Anders Arpteg:It takes so much time. I've done a number of them for ICLR and whatnot, but it's a lot of work.
Pier Luigi Dovesi:It's a lot of work. I think it's such an important work also to do it properly it's part of your duties, so to speak.
Henrik Göthberg:Yeah it's part of your responsibilities, so and what was the angles of the papers? Can you give us a hint? So?
Pier Luigi Dovesi:both of them navigate through this topic. That, I would say, is the main research topic that I always worked on. Most of it it's domain adaptation. So domain adaptation effectively means it's it's kind of the problem that curses ai. I would say, uh, most of like it's the major one. I I think it's the major problem of ai that what distinguishes, like, for example, any human, any creature, like living being, to an AI, that we are not static, we learn, we adapt to the situation, we have evolved to follow this pattern the survival of the fittest effectively.
Anders Arpteg:I want to say adaptation. Some people, at least me, differentiate between learning and adaptation. Follow this pattern the survival of the fittest effectively. I want to say adaptation. You know some people, at least me, differentiate between learning and adaptation. Yeah, learning meaning something that should change the parameters of the model. Is that what you mean now, that adaptation is something that does not necessarily change the parameters, or well, of course adaptation does change the parameters for them.
Pier Luigi Dovesi:Of course I I want to keep it as a very open term right now, because originally it is literally changing the parameter of the model. But now there is much more going on, because right now a model, they don't exist, they don't take a form only during training. But we know that now a lot of what a model can do is also happening during the inference. So there's kind of a leakage of the capability of the model from the training phase to the inference phase. Right, think about chain of thought or diffusion, like actually.
Anders Arpteg:Are you moving towards more? And go to Phish now for the topics of these papers even though you can't speak about them.
Henrik Göthberg:Just say stop, if you can speak about it just say something I can't say about the topics, are you okay?
Pier Luigi Dovesi:oh, please, I can't. Yeah, I can go into it. So I will say, okay, I will put it a bit in a context first. So, um, one of them, uh, they could I mean both of them take a multi-modal approach to domain adaptation. So we don't focus specifically on computer vision problem. We focus on problems which are a mixture between computer vision and language, meaning that I would say semantic understanding of a scene, meaning that, okay, I see something and I need to understand what each object is, but I'm not constraining myself to a fixed set of objects. Is it some kind of robotic application? Then it could be applied to everything. Right, it could be applied to autonomous driving, to robotics, but anything that involves an understanding of an image and then giving an interpretation of this image.
Henrik Göthberg:And even a moving image, or as you move around.
Pier Luigi Dovesi:Well, we take just a single image within this, but of course, if you have a sequence of images, then you're that's the next level of the same problem.
Pier Luigi Dovesi:Yeah, I guess I mean. Of course that complicates things right, because you have one more dimension, so it's an image you can see in two or three dimension, depends on if you see the colors of the third dimension. But then then you go in 4d, effective if you have, it doesn't change. Of course it makes the problem more complicated but it at the same time it doesn't intrinsically transform and so does it know more in online learning kind of thing.
Pier Luigi Dovesi:Yes, that's what we do. So the the idea is that so I'm gonna go now, uh, in one of the two work, uh that we do, and then in this work, we actually want to model adaptation as giving a model, uh, effectively a library of, let's say, specialized small models. So when the model gets an image from the environment, then we introduce an observer module that sees this image and then tries to understand, okay, what this image is about and tries to understand the semantic meaning of this image, then goes into the library that we filled with smaller models and then it picks which of these models are relevant for.
Anders Arpteg:So it's a kind of mixture of expert, but in a more deliberate sense.
Pier Luigi Dovesi:Yeah, you can also see it, if you want, as a rug for vision. That can also be another way of seeing it, all right, with the difference that you don't pick context, like text, to put in your context window, but you take parameters. Then you take these parameters and then you don't take one single model, right, but you can take a number of them and then fuse them. Very interesting, so it's. It's, it's like the inspiration of it is very simple.
Pier Luigi Dovesi:Just imagine that, uh, you I mean, because the thing is this one no matter how big is your library, let's say that you have a hundred, a hundred thousand model in this library, no matter how many domain like, and every library and every book of this library, no matter how many domains like in every book of this library, could represent a data domain. So imagine that this model is deployed in an autonomous car. Then you see snow, right, and then maybe you've never seen snow before. So what happens is that you will go, you will see, take this image with snow. Then you give it to this observer module. This observer will go you will see, take this image with snow. Then you give it to this observer module. This observer will go into the library and look for which of the many, many books that you have in this library is about snow, so it's a more advanced vector search.
Anders Arpteg:Instead of just comparing the vector, you actually have some more AI model to do that.
Pier Luigi Dovesi:Yeah, so I mean it's effectively a vector search because we use a model called do that. Yeah, so I mean it's effectively a vector search because we use a model called Clip, of course. Okay, so it's a multimodal. It's called Contrastive Learning.
Anders Arpteg:I forgot how to spell it.
Pier Luigi Dovesi:Yeah, but anyway, it's a paper from OpenAI published in 2021. And effectively, it's a paper from OpenAI published in 2021. And effectively, it allows to convert. It's amazing because it projects both images and text in the same latent space. So if you have an image and if you have text, you're projecting the same latent space and then they will be close together if they have the same semantic meaning.
Henrik Göthberg:So you can really then merge semantics, vision and text Exactly.
Pier Luigi Dovesi:That's kind of what it was transformative. I think it's very.
Anders Arpteg:I mean, it's a huge problem, but it's one of the starting points for multimodal, but actually I think it's one of the most commercially interesting models as well, because it makes search so more advanced and useful and actually super simple to use but me, me from the outside, can you, can we say that, oh, we want to do multimodal, but isn't clip like a pivotal moment in the multimodal I?
Henrik Göthberg:I see it as I'm one of them as a yes.
Pier Luigi Dovesi:I think that there's not. There's not been so much fuss about it in the news, Because of course it's not something that it's too techie. Yeah, it's a bit techie, but I think it's been in the field.
Henrik Göthberg:In the field hasn't been represented enough. We used a lot of this at Torian.
Anders Arpteg:It was one of the biggest ones.
Henrik Göthberg:So we use Clip a lot, but we haven't talked about it as extensively as others?
Pier Luigi Dovesi:No, no, because usually it's kind of a component within, but even in autonomous driving it's essential. I mean, just imagine that one of the reasons why, like, because people, when they talk about autonomous driving, you always think about what is in the car and the the, effectively the algorithm that is running in the car, and you don't think about all the pipeline, what Tesla called the data engine. Right, but the data engine, it's effectively that huge and sophisticated pipeline that takes the data from the fleet and then convert it to something usable. Yeah, and the problem of autonomous driving as we know it's like. Now you're jumping the gun here.
Anders Arpteg:Right, right, I will stop, I will stop. I'm a super cool and actually we could continue for a long time.
Henrik Göthberg:This was an introduction and we got stuck into your papers.
Anders Arpteg:And we can do one hour on them Best of luck on the papers and please share it. If you got accepted to the conference later, that must be a big thing. Have you been published in CVPR before?
Pier Luigi Dovesi:So we published already in equivalent conferences, so in ICCV and ECCV. We published in ICRA and TDV. So there have been quite some cool stuff.
Anders Arpteg:Well, with that, we'd love to very much welcome you here. Let's see if I can say the name for once properly, but Pierre-Luigi Jovesi. Yeah, that's perfect. Wow, nice First time.
Henrik Göthberg:You can do P and I will do P. They both work.
Pier Luigi Dovesi:Well, I mean, as we heard already, you're an AI expert, expert lead, I think it's called in silo, right, yeah, right now, I think in AMD the official title is senior member of the technical staff, which is very kind of a jargon to say. Sounds amazing. I guess the goal is to say no, no, it's.
Anders Arpteg:yeah, you're an AI expert, that's what it means yeah, you're an AI expert and deep learning and automotive and autonomous driving expert and so many more things, and we have a lot of very, very interesting topics to speak about. But before we go into that, it would be awesome to just hear a little bit more about yourself. Who is really Pierre? How would you describe your?
Pier Luigi Dovesi:background. Oh, wow, okay, so I mean, as you say, I'm an AI scientist. That's the only, I mean, my definition. I've always been interested in AI. I think I've been very lucky because I was studying in the time where AI was really starting to accelerate into, like there was the explosion of deep learning. I remember before starting my master, there was this I watched this video from OpenAI and they were doing at the time this competition in video games. They released this model playing Dota, which I never played Dota myself.
Anders Arpteg:What year was this approximately 2000,.
Pier Luigi Dovesi:I think 17 or something, 17. And I was absolutely amazed about what it could do. It was like able to beat champions in the game and being creative in finding solutions. And then, the same year or maybe a couple of years before, they were AlphaGo. So that's what really inspired me.
Henrik Göthberg:When did the first GPT come out?
Anders Arpteg:Is it 18, 19?, 19 maybe, was it no.
Henrik Göthberg:GPT-1.
Anders Arpteg:No, but you're thinking about, think about chat gpt. But that that was uh gpt 3.5, it might be the one.
Pier Luigi Dovesi:If I'm not wrong, maybe 18.
Henrik Göthberg:I think because 17 and then and then 18. This is what I was hinting at. It's almost that era, right?
Pier Luigi Dovesi:I remember there was the almost that era, right. I remember there was the gpt2, which was already, yeah, extremely inspiring. Inspiring, there was this.
Henrik Göthberg:Remember the story with the uh 18, 18, 18.
Pier Luigi Dovesi:It makes sense and while the two was probably uh 19 and there was a story of the unicorns. Uh, that was all over the news. They asked the chad gpt2 to write I I mean, nowadays it's not news anymore, it's like so boring, almost right. But there was a story of okay, can you describe like write a news article about researchers finding unicorns, and then the article was amazing. Very plausible, yeah, and at that time it was.
Anders Arpteg:How did you consider the progress of GPTs, you know, given the original transformer and Birch, and then GPT models came about and they started to just accelerate in scale. Were you surprised by the efficiency of just scaling up GPT models in size.
Pier Luigi Dovesi:Well, let me see, think there, there was a time I think we reached a bit the end of that, like we milked that, that, that that field as much as possible. That's why right now, maybe like everyone was expecting gpt5 coming out and the amazing transformation of it and instead we're kind of the field is a bit moving away from just that. It's not just. We're realizing that there is a limit of what you can do in just having a prediction model.
Anders Arpteg:It's amazing, it's great, it's kind of the engine of it, but then you have to do something more right now but you know I'm a bit older than you or much older, but but you know, before you know, tpt's came about and before we had the transformers, just scaling up the model and adding more parameters didn't work. I mean, they were completely overfitting the data all the time and it didn't generalize at all. And suddenly you have a model that you can simply just increase data size and model size and it just works.
Pier Luigi Dovesi:Yeah, and to me that was super surprising, and there has not been much revolution in the architecture ever since. That's what surprised me.
Henrik Göthberg:But this is, you said it. This needed to happen and we needed to get the knowledge base, in that sense, filled up.
Anders Arpteg:But now 2025 and onwards the field. We will use that, but now we need to do the other parts to go. Perhaps we have another pivotal switch. I mean, transformer was a super big thing, of course, and now perhaps we are starting to see not just increasing the size of models and data, but actually changing the architecture for once I hope so.
Pier Luigi Dovesi:It would be. It would be interesting to see now there are some papers coming out that are extremely promising for that. Yeah, like I think two weeks ago there was this paper called Titans just published itself like transformers with, I think, an adaptable context window, so to learn, memorize and forget at test time. That's very exciting. And then there are Transform transformer square that I was looking at yesterday, uh, released by sakana ai.
Anders Arpteg:Uh, super cool, uh, japanese now we're jumping into rabbit holes here directly, but okay so you get started. You're basically an ai scientist. You you got interested in open eis, dota in 2017, et cetera. And what happened after that?
Pier Luigi Dovesi:Yeah, I moved here to Stockholm to study In KTH. Right yeah, I studied for my master at KTH. It was actually in robotics, because I come from indeed more an automation engineer background.
Henrik Göthberg:Was that Bologna you come from? Yeah, I studied Bologna University.
Pier Luigi Dovesi:Well, I studied my bachelor's both in Bologna and in uh in china. I did a kind of double degree, which one in china where in china, tongji university in shanghai.
Pier Luigi Dovesi:Oh, so you speak chinese now no, I was speaking a bit in china because that's kind of necessary, but then I I forgot everything. That's sad development for that. But yeah, so I came here to study, study robotics. Then I joined a company here in Stockholm called Universus doing autonomous driving. They were doing a bit of consultancy also when I joined. Now they also have a product focusing on Mercedes, but I was always in the autonomous driving part. You were Sensei Act.
Henrik Göthberg:Yes, at some point as well.
Pier Luigi Dovesi:Exactly Because Sensei Act and Universus are partnering. Partnering, yeah. So I was a consultant for Sensei Act for a very long time and working there.
Henrik Göthberg:So you were from Universus aosovo and from Sensei Act. Effectively, yes. Effectively, yeah.
Pier Luigi Dovesi:And then you joined Silo at some point. Yeah, then in 2023, I joined Silo and effectively I'm still there. Silo got acquired this summer by AMD.
Anders Arpteg:If you could just give a super brief introduction to what Silo is. And then we'd love to hear the story about when AMD acquired Silo.
Pier Luigi Dovesi:Silo is, or was, let's say, the largest European pure play AI service provider. In some sense it was a consultancy company, uh, with a very academic uh hundreds of years in ai yes, actually, uh, it was really an incredible community, an incredible community um. And so yeah it doing consultancy and especially focusing a lot in really cutting-edge AI projects, and so that was really what inspired me of joining them effectively.
Anders Arpteg:And yeah, and when you were in Silo, what did you mainly work with?
Pier Luigi Dovesi:That was fun because I was working really on every potential aspect of AI. So I joined and my first project was on stable diffusion and then I worked in application of autonomous driving, but not for cars, for like really huge robots in working mines. Then I worked on LLMs and the LLMs on the edge, and then I did, I would say, many other. You can't talk, there's so many, but I mean, who doesn't have any launches right? Awesome?
Henrik Göthberg:Trying to steal a scope here.
Anders Arpteg:I think if we should potentially do a pivot here and start speaking a bit about autonomous driving as well.
Henrik Göthberg:We were almost there. We were almost there in the introduction, so let's go back to that we are not pivoting. We're going back to Okay.
Anders Arpteg:Returning to the topic, then, but perhaps one way to start would be what do you think about the current challenges? We have known Elon to say that next year we'll have autonomous driving for many, many years, but not just Elon right. What do you think the challenges with autonomous driving is?
Pier Luigi Dovesi:Well, okay, I mean, let me say, I mean there are many right. I would say, of course, if we have to list them, I mean it could be safety is the number one. It's very obvious.
Anders Arpteg:So it's easy to be like a music recommendation system and put it out there.
Pier Luigi Dovesi:Yeah, and the second part is like regulations, yeah, and that's connected with safety and I mean, and the robustness in some case, so which is always? I mean they're all connected. I would say safety is, uh, encompassing all of them, right, because if there wasn't an issue with safety, there wouldn't be problem with regulations. And if you had a robot system that could scale to any scenario, every weather condition, every driving style, then you wouldn't have problem with safety. So it's all around this that I see. Then, of course, you might also go into more market dynamics, I see. So, of course, is it economically sustainable to buy I don't know a Waymo car? I'm not sure about that. I mean, look at the sensor setup. I don't know how much couldmo car? I'm not sure about that. I mean, look at the sensor setup. I don't know how much could be each single car for that, and maybe the maintenance that you require. But then this goes not in. Can we solve autonomous driving? But it goes into the problem. How can we make it?
Pier Luigi Dovesi:Cost efficient, cost efficient? And how can it live? As a product and not just as a technology?
Henrik Göthberg:But could we maybe, if we want to talk about AI and autonomous driving, looking at this from the outside, it seems like there are sort of two different schools of thought. It's the Waymo approach versus the Tesla approach. If I'm simplifying it, Could you just elaborate a little bit about what? Have you know? What is Waymo? Most way of getting to robo taxi, so to speak, and compared to tesla's view? Yeah, how would you contrast those two approaches, I would even say that they're.
Pier Luigi Dovesi:I mean, nowadays it's changing a bit, but I would say they're not even real competitors for that because they're so different.
Pier Luigi Dovesi:They're so different in the way that they approach it. I mean just, I mean tesla is approaching this in a. I mean they, they want to start from the product perspective, like they're they already cars on the road and they release fst on large scale, which is not so advanced compared to, uh, what you might have in Waymo or other deep tech focus players. But then, of course, by doing so, they solve other problems, like they use, I would say, the power of having a huge fleet deployed on that that eventually might give us a scaling capability that will be might even overtake, overtake, yes, when it comes. And then Waymo is doing a totally different business. It's robot taxes and this is supposed to be like they won't really crack. I see the autonomous driving problem as a thing, so they don't care how much it takes, how many sensors you need you really need to do it and, of course, they are way more advanced in the capabilities of the cars.
Anders Arpteg:First we just need to give some background. Tesla, I think everyone knows, but Waymo, you know, it's a Google company and they are using a lot more sensors, like LIDARs et cetera, that Tesla is not yet at least using when driving.
Henrik Göthberg:They have a much more expensive approach to do it right, and they also, if I understand it right, the Waymo robotaxis. Right now. They also have a technique of how to map out the certain area where they are deployed.
Pier Luigi Dovesi:Well, I think the mapping out happens even beforehand, so they rely on this thing called HD maps, which are maps that are not just like the google map that we all use. They have everything in them. They are potos and stuff. I, I, I, of course, I don't know, but then there is a ridiculous amount of details and a ridiculous amount of updates but this is a huge distinction right that one relies on the hd map and one.
Pier Luigi Dovesi:I think everyone relies on hd maps in some regard, but of course, waymo is releasing uh like is having these autonomous fleets only in very selective scenarios and I think I suppose they do a very careful mapping of every city before its release and hopefully they will move, they will, I think, relax this constraint slowly, but and this also I would say I mean the. The thing is like you need this such a powerful, I mean I would say, intelligence. The intelligence of the car and the sensor setup are something that you can actually in some sense sum together to reach a certain threshold. Some companies are investing right now so much in having a much deeper and capable intelligence layer, so to say, and contextual understanding, and if you have it at such a high level, then you don't need. Maybe, at the same time, maybe you challenge yourself in not needing so much sensor stuff. If you have way more sensors, then effectively you you do it because you you know that you don't have enough contextual understanding, so you will need to have raw information.
Pier Luigi Dovesi:So having more sensor relaxes the performance needs for another part that's why we're moving, like if you go, if you see we're actually like the amount of sensors is actually going a bit lighter every year, like if you look at companies such as Mobileye, the Intel one, they decided, okay, we're going to drop the lighter part.
Anders Arpteg:Oh, they did. I didn't know that.
Henrik Göthberg:This is, of course, in order to figure out how can we make this scalable cost efficiently, and all this, yeah.
Pier Luigi Dovesi:It's mainly that, of course, and at the same time, I think I mean I still believe the, I mean I I know this is not very uh. I'm not anymore in the autonomous driving, so I can speak freely yeah, I can have opinions right.
Pier Luigi Dovesi:So I I still think that there is a sense of what the Andrei Karpathy philosophy is saying. Okay, a human doesn't need 10 lighters on a car and 4 lighters and 12 cameras. I mean, we can do it in a much easier way. So I think, as AI progresses and contextual understanding of AI progresses, then you should be able to not have this need. I mean, it's kind of a fairly easy task for a human to drive or to do other tasks in the wild with a limited sensor.
Anders Arpteg:And just to give some background as well. I mean, waymo actually have had Robotaxis in production for many, many years, right, and it has a few selected cities in the us, but still, and tesla still do not, but they are claiming to do soon.
Pier Luigi Dovesi:So they have very different go-to-market strategies for this right, vastly different that's why to me it is even hard to see them as real competitors, even though now tesla wants to move in the robo tax system in the domain competitors, even though now Tesla wants to move in the robot taxis in the domain. Then now I might see them. Okay, now maybe, and I think maybe when Tesla is releasing their own robot taxis, then I suspect I know that they will have, like teleoperators, to improve, to improve safety, which of course kind of defeat the purpose of the point. But I think that's not the point, right, that's not the point of. It's more like to prove that you can do it and to see to which extent you can do it. And I think Waymo is now extending so much the number of not so much, but extending significantly the number of cities that are supporting. I think before it was just San Francisco and Phoenix and now they moved to Chicago and Medellin.
Goran Cvetanovski:New York as well.
Pier Luigi Dovesi:I don't know if New York, because New York as well. I don't know if New York, because I think they're staying only in the south of the US.
Anders Arpteg:No snow. Okay, so, given we had cruise as well before, they had some fatal accidents as well and they shut down all their robot taxi business. But who do you think will be the long-term winner? Which strategy do you think will, in five plus years, turn out to have the best approach?
Pier Luigi Dovesi:I mean, I think there's not going to be a single. I'm going to give you one of those boring answers, right?
Anders Arpteg:I know you want to.
Pier Luigi Dovesi:But I think, yeah, I think there's not going to be a single winner. I mean, they're targeting different strategies. There is, of course, Waymo. They have the most advanced capabilities at the moment. Zoox is moving quite well, I think. Tesla, of course. What?
Anders Arpteg:do you think about Chinese like UID Tesla? Of course. What do you think?
Pier Luigi Dovesi:about Chinese like UID. Yeah, they can probably have the advantage of more relaxed regulations and probably also cost-effective, but I'm not informed about the Chinese market that much. And then I think there's kind of also this wild card of you know, if you know, wave, it's a company in the UK. They raised $1 billion just in one year. It's founded by. I love it also because it's founded by a scientist, by Alex Kendall, which when I knew Alex Kendall I was doing my thesis, I remember, and his papers was kind of the baseline of my my, my own paper and it was like okay, so it was a pure scientist. He studied in cambridge and did a bunch of internships, the scientists actually can build companies as well and such a successful scientist, it's a really an inspiration.
Pier Luigi Dovesi:And then he founded uh wave and it's tremendously successful. And they approach is basically when they say there was this, uh, so, uh of course I was talking with my most senior colleagues in universes in particular and they say like, ah, but tesla is doing all these end-to-end approaches that are not very safe, and then wave is doing that, uh, like times 10.
Anders Arpteg:So they, they really want to, they really try to solve this with a pure end-to-end approach we have to go into this topic now, so I was hoping the next would be a bit about like the key concepts of um, of autonomous driving. But let me just, you know, give some in my view of it, which probably is wrong, you can really describe more what you mean with the end of end-to-end approach, with which wave may be doing, but to my understanding at least, if you take version 12 of fsd in tesla, the big change there was that normally, if you take the three components like perception, control and planning, planning, routes, controlling, planning and controlling yeah right, planning and control and perception from from the sensors.
Anders Arpteg:Normally most companies, including the sweden we have scania, for example, doing self-driving tracks and whatnot, and they just have ai for the perception part and then they more or less do hard coded rules for the planning and the control. That's a very old style approach, yes, but I think you know tes Tesla did this as well before version 12.
Pier Luigi Dovesi:Yes, yes, everyone at, because I mean AI wasn't even sophisticated enough to support more of that.
Anders Arpteg:Is this what you mean with the end-to-end approach, or how would you describe what Wave is doing?
Pier Luigi Dovesi:Yeah, you can imagine the autonomous pipeline as a kind of a number of blocks connected one to each other. So you have the perception layer. Then you have what was called at some time like the localization and planning at some point. So you first perceive the environment. Right, then what we were doing.
Pier Luigi Dovesi:It was like okay, now that I perceive the environment, I need to convert this information in where am I what? What is the map? What is the map looking like? And one where am I in the map? And okay, now I have a view of what the map looking like and where am I in the map? And okay, now I have a view of what the map looks like and where I'm in the map. And now I'll have to understand, okay, but where do I want to go? And then you have the planning part of it and then you need to do effectively, when you have the planning, you create a path to reach your objective and then you control, so you effectively send signals to your actuators so it could be the motor, the wheels, the steering wheel and everything.
Henrik Göthberg:So the mapping would the mapping be equivalent of, I think, this tesla, using the word vector space for this.
Anders Arpteg:So they use it after perception you get from sensor space into the sensor space vector space right it's basically latent space, right, yeah.
Pier Luigi Dovesi:So the thing is like this was the I would say, the legacy approach, where you have all these components and everything, and so the reason why people liked it it was, uh, uh, that you, it's explainable effectively, so you can think. Storytelling was easy. The, the yeah, the safety approach was also like, it's easier because you can understand, okay, okay, why, if there's something that fails, you know which layer it fails. Okay, I didn't attack the pedestrian, so the problem was in the perception, or maybe the I don't know loop closure of the SLAM system in the, whatever mapping failed, okay, then that's the problem. Or maybe the control overshoot it, and then I cannot control it.
Pier Luigi Dovesi:Overseer, yeah, overseer, for example. But then you also degrade the performance of it because you need to rely on a number of heuristics to do it and the pipeline is all disgregated and then information doesn't flow and, simply, every time that you try to engineer things, you put human knowledge of it. At some point, you realize that you are enforcing your own biases as a human, and then you start and an AI can do it better. That's a bitter lesson in AI, right?
Anders Arpteg:Sort of scale if you have to rely on human written rules.
Pier Luigi Dovesi:Yes, it will fail more or less sooner or later. So what happened is that slowly there was effectively the AI leaking from the perception more and more into the pipeline. So the first one to kind of give up was the. So you have perception and then you have all this part of localization and mapping. So you have perception and then you have all this part of localization and mapping and now all that part has been effectively merged into the perception. So they are just a unique system. And then you slowly leak into the planning and send. You have one big system going from perception to planning, and now it will go and leak even more into the control part. And that's what it means and when. That basically means imagine. And then you have the controls out. So it's like send a video in and then the, the ai, will tell me okay, now you have to turn left or right but it's still different models, you think, but they're connected, or do you think it's literally like one and the same?
Pier Luigi Dovesi:model, so it doesn't happen in one day. This transition right, and there are still people and I mean I've seen experts who think that this is not the right way. So I'm just giving you my my view on it. Um, but yeah it it started to happen, have happened by having several models, and then you take several models, you train together and so you can already say it's end to end, and then you stop having separate models, but you just have one big model with the output sticking out of it. So can you at least visualize what the AI is thinking regarding the mapping, or what is the AI thinking about the perception, and slowly you will probably start to drop also these.
Anders Arpteg:I remember Elon speaking about this, and I think Carpathia was as well as at some point. But. But one of the benefits of of the having this kind of end of end of end-to-end approach is that if you train the perception part separately, you basically need to manually annotate. You know what is a lane, what is a pedestrian, what is a traffic light, etc. And if you actually connect it directly to the control through planning, then perhaps you don't need to annotate as much at least.
Pier Luigi Dovesi:Yeah, the annotation and the data problem are insane in ai like, because one can think that the problem is the data, that we don't have enough data, which is true. But just consider that they're. Consider that you can usually afford to annotate less than maybe 0.1% of your old dataset because the annotations are so expensive, even if you annotate a lot. But then just imagine if you have a fleet of cars that collect hours and hours of images every day, and not just images. You have every possible sensor mounted on them, so you have this huge flow of data and you just have to pick the tip of the iceberg and then you can annotate that with huge problems, because every time that you have an annotation you're basically saying this part of the, of the image, of the input I care about, and everything else I discard it.
Pier Luigi Dovesi:Right, because if I annotate, let's say, a pedestrian, then I'm saying that maybe many other information around I don't really care, so you lose. Annotating also requires like losing information effectively, and that's, of course, not really a sustainable approach. So everyone now is moving towards having data engines, so systems that can complement the annotation process using data that you don't annotate. So you can either pseudo-annotate them or maybe even better, you use them as they are. You really learn from their old data. You have this self like unsupervised learning approaches.
Anders Arpteg:So you think Elon is correct more or less, that their system now can learn how to drive without having humans annotating what the lane is, or traffic light is, or pedestrian is, or car is?
Pier Luigi Dovesi:I think maybe unfortunately that you still need a bit of annotation, yeah to bootstrap it in some way, right, I mean yeah, you, or to test to even provide some um safety guarantees. That's. That's something that you cannot really escape, but I really think that they will remain this very tiny bit of the equation and you will have way more data that are absolutely non-annotated. You will have some data that you might want to pseudo-annotate, and then you will have a lot of synthetic data as well.
Anders Arpteg:Awesome. And if we just continue a bit on the autonomous driving part before you have so many super interesting topics? I think you know with RAINai.
Henrik Göthberg:Yeah, pick one of the topics.
Anders Arpteg:Let's you know, we're soon into the new section, I think as well. So before we go there, perhaps just continuing a bit in the autonomous driving and, um, if, if we think more. I mean one thing that's been discussed, you know, is generative ai or static data. Could that have a role, do you think, in autonomous driving in the future?
Pier Luigi Dovesi:yeah, I mean. Well, definitely, I mean. This is, you know, the answer of this question, but how, how?
Pier Luigi Dovesi:yeah, okay so there are. There are, I think, many, I mean many surprising when we think about, I mean, synthetic data. They don't come. There's no free lunch the problem. So of course synthetic data can help. The problem of autonomous driving is also this one, right, you have. So I usually, when I give a presentation about autonomous driving, I usually say like look, you have, right now we have superhuman perception in autonomous driving cars. You can see and they can perceive and map way better than any human and do it around the car you know in, in real time.
Anders Arpteg:Yes, the way, a way that no human can ever do.
Henrik Göthberg:And lighter through fog.
Pier Luigi Dovesi:But then this is true until the unexpected happen, and then, of course. But then the problem is that the unexpected happen all the time. So you are in a situation where that's what's called the long tail of data, where most of the situations sorry, like most of the data are extremely like, most of the situations are extremely rare, verging on being unique. So, of course, having synthetic data which can somehow make more frequent of this, like increase the frequency of situations that are extremely rare, could help the AI. Or you can augment real data. This is another thing that is really interesting. Like you take real data and then you modify them to create an unseen situation. Or now there are application of, let's say, for example, wave developed their own ghost gym, which you can effectively start from a video. They also have a system called Gaia. You can create a video, take a video that really happened and then modify it and change the ending of it.
Pier Luigi Dovesi:So what if you have this pedestrian who's walking down the street now, who's decided to stay still or decide to have a different path? So this is it's difficult to say is this real data? Is it synthetic? It's both. And what I was saying, when there is no free lunch. It means that, of course, as you introduce synthetic data, you will also introduce something which is called a domain shift. So you will introduce a bias in your model because these data coming in, they will have a different distribution compared to the real data, and this is inevitable, even though, ah, but I cannot see it as human.
Anders Arpteg:Sure, you cannot see it, but the AI will see it, rest assured, because you have someone that generates a synthetic data that is not real in some sense.
Pier Luigi Dovesi:Yes, yeah, and of course you benefit from the fact that it has perfect annotations because they're synthetic. But then you will pay this in this domain shift, and that's why adaptation it's, uh, it's, so I would say it become now it's not applied much, I would say, but it will become way more relevant.
Henrik Göthberg:But this is even linking back to the relevancy of your paper here. To talk about the domain adaptability yeah, because you have now extrapolated, you end up with a problem around the corner that you are now working on.
Pier Luigi Dovesi:Yeah, exactly, did I get it right? No, it's perfectly correct. Like the first problem of adaptation started actually because of this you had a synthetic data generator and then we realized that when you train a model on synthetic data, you release it in the wild. It's terrible, it doesn't work. So people started to say, okay, how can we? So our first approach would be can we adopt the data, so make the data look more real and then train? Okay, that works. But then we realized, okay, still not enough. Even if you try to make this data look more real from the simulation, the model will still not behave.
Pier Luigi Dovesi:So what you need to do now is that you take a model trained or that is being trained, trained on synthetic data. Then you take real data and you don't even need, the most often, the labels of the real data, but just by. There are many, many ways of doing it, but just by influencing let's say it this way, influencing the model with the real data, the aspect of the real data. Influencing, let's say it this way, influencing the model with the real data, the aspect of the real data, you effectively transition the knowledge of the model towards performing in the real world, and this is kind of the first part, so how to use synthetic data. The problem is right now I think domain adaptation can even help us in another way that if you enable adaptation to happen online, meaning in the car, then you have a system that could adapt and change in real time. So let's say, you see a new weather condition that has never happened before, and then your model will actually rewire itself while driving, adapting to the new driving scenario. Wow, that sounds amazing.
Anders Arpteg:That's super cool. It's strange, you know, I worked with online learning for like 20 years ago and then it was no deep learning, it was like traditional machine learning. But then in deep learning, you know, the online learning part has just disappeared. So I am so glad to hear that Potentially it would be some kind of rejuvenation, so to speak, of online learning and of course like safety, people now will like put their hands in their hair.
Pier Luigi Dovesi:You're like already, because of course it's already kind of problematic to have a system and to prove that it's safe. Imagine if you have a system who rewire itself based on the situation, like of course the safety is. But then I mean the there are many ways, like one of the approaches we find is that you can actually make this adaptation uh process very explainable by uh. Then this was indeed our last work like, if you pick, if you decide, if you pick several other models and you merge them together, then you know what is going into your, into your adopted model. You know that you are like taking 20 percent of them.
Pier Luigi Dovesi:Let's say, I see, let's put in this way I have a model that has been trained as no data, that maybe has been trained only on sunny data. Now it's I, I deploy it in the snow, during the night and what's up. Okay, how it does two domain shifts One is the night, one is the snow. So if I give a library of models one is sun, is snow and the other one is clear night then I cannot pick only one. But if I could pick 50% of one, 50% of the other, and then inject these to plug in in the model, then I can explain how the adaptation is happening.
Henrik Göthberg:That may be for us HTTP, but it's a bit more digestible Super cool lines of thought here, awesome, and I think it's time for AI News brought to you by AI AW Podcast.
Anders Arpteg:So, yes, we usually have this kind of middle pause where we take a break from the podcast and the amazing rabbit holes of discussion into autonomous driving and whatnot, with Pierre here. But now let's take a few minutes to speak about some of the recent news that we've had in the last couple of weeks. And who wants to go first? Should I start? Perhaps you can start? So I'm a big fan of the JEPA paper from Janne Koon, for example, and then the thinking of latent space reasoning, or at least moving away from just having the traditional type of transformer models and moving more into reasoning. So there was a paper actually by Microsoft and Microsoft Research in Asia. It's called R star math and they tried to get really good in reasoning and math by having a super small model. It's just 7 billion parameters and it's nothing near to. If we think that 01 or GPT-4.0 had $1.7 trillion parameters in it. This is like hundreds of it. So very small model, more or less, but actually beating the performance of 01. And this was really interesting, I think, and how they actually did that was they actually used QEM, the Chinese open source model, and it was pre-trained then for normal kind of language model capabilities, but then they actually trained it by syntactic data approach. So they basically generate in Monte Carlo such kind of a way to come up with an answer for math problems. And one way that I've been speaking in the past but I hope and believe these kind of AGI models will move towards, is thinking about AlphaGo, for example, or AlphaZero in the past, where you have human data. That is like how you play the game of Go, but AlphaGo also had syntactic data and actually having this combination of the two made it surpass the performance of humans. Now I think you know we can see here how they could actually use mainly syntactic data from a pre-trained model still, but they actually from scratch, more or less generate their own data in how to solve this kind of math problems and then use a mix of supervised fine-tuning and reinforcement learning to build the model in a number of steps and then having a model that is significantly smaller but still so much better in reasoning. So I think this is so exciting because I think it's not really good for society if we just keep increasing the size of these models. But actually finding these kind of small models that actually outperform for specific tasks like reasoning and math in this case is very exciting.
Anders Arpteg:So the way they did it in some sense is that they used basically chain of thought, you know where you try to ask the model to explain a number of steps going forward. But they also made it produce code, python code, meanwhile. So by having this kind of Python code combined with the chain of thought steps, they could verify what works or not and in that way they can actually quickly then build up. You know, these are a proper path forward that will lead to the correct math answer in the end and these will not. And then they had what they call SLMs, so small language models, instead of a large language model.
Anders Arpteg:So they had two SLMs, small language models one for the policy, which basically is I have a background in reinforcement learning so I can go very deep here but I shouldn't but they have one model for the policy, which basically is what step to take, what action to take.
Anders Arpteg:And then they had they call it a PRM, I think the policy reward model.
Anders Arpteg:It's basically a value model saying what is the value of going to this step, yeah, so some kind of proxy or what the reward would be if they actually go forward in this step and they train these two slms small models, language models to be able to just traverse the possible space. And then they use montecarlo pre-search similar to alpha go. It's actually very similar to AlphaGo and AlphaZero, being able to just go through the search base by both doing exploitation of the current policy but also exploration. I think it's very similar to what AlphaGo does it moves into the language model space and reduce the size of the model improves performance significantly. I think they had an example of they take the small 7B QN model, which is the Chinese model from Al-Adab I think, and without having fine-tuned it it basically had like 58%. Having fine-tuned it, it basically had like 58% on one of the tests and it went to 90% accuracy. And they also took Microsoft's PHY3, this is my research. They took that and it moved from 41 to 86% accuracy.
Pier Luigi Dovesi:All right.
Anders Arpteg:This is, of course, I think, the future and I think it has a lot of connections to also Jetpack, in some sense moving to having this kind of multi-step reasoning.
Anders Arpteg:I don't like the test time compute term, so I'm trying to avoid that, but it has this kind of multi-step kind of reasoning steps also while training, which I think 01 and 03 has, also while training which I think 01 and 03 has, and then also being able to move in. It's not really latent space, but there's a step they take. It's not per token anymore, it's actually a whole chain of thought step. So it's a moving beyond token, ultra-aggressive kind of thinking, which is also what Janneke would like to do. Anyway, I could speak much more about this, but I think it's a super exciting time and I think also this means that we can start to see some kind of internals of probably how O1 and O3 works, without them having released the details of it, because they have been able to surpass or at least catch up to the performance of O1 with this much smaller model that they probably have. Yeah, yeah.
Pier Luigi Dovesi:But I mean this is, I mean it's very I would say it's very inspiring to see how the model size is not just all it matters, right, Because it's what they talk about, system one and system two, thinking right that of course a bigger model and like I think that we can say that they have knowledge I mean, of course it's just prediction but to, in order to solve this simple task, of course they have to store so much knowledge in it.
Pier Luigi Dovesi:But then there are still some problems that inherently we realize that they cannot be solved just by going system, uh, just purely predicting, and you need reasoning, of course. This changes a lot in the world. I think it's the same that's what I also say that this taxonomy of AI, computer vision and NLP are falling apart, because it's the same that we saw in image generation, not just in text generation. Right Four, five, even three years ago we were using guns to generate images, when you were generating the image just in one shot. Right now, instead, we have progressing refinement, right with the diffusion model, and a chain of thought is nothing else than an early mental diffusion model.
Anders Arpteg:Right, it's a multi-step kind of process, right.
Pier Luigi Dovesi:It's just the same thing. Let's say, if you need to write a book with a chain of thought, you first write the chapters titles, just the content. Then you provide maybe an abstract for each chapter and then you expand on it, and then it's literally what's happening when you have diffusion.
Anders Arpteg:Yeah, well said.
Pier Luigi Dovesi:And it's just moving, like from the training domain, uh, to having more in the, in the, in the inference phase. So basically now you, you need to see them together and, of course, the more you expand one, the more you can shrink the other if you want exactly it's like two orthogonal kind of dimensions.
Anders Arpteg:One is simply you increase the parameter size of the model yeah, you can get some performance.
Pier Luigi Dovesi:Or you increase the reasoning steps yeah and having a good way to do that, and then you can decrease instead the parameters and you can say that the performance is somewhat the area that you describe by navigating in each dimension.
Anders Arpteg:Yeah, yeah, cool, cool. I think it's super cool so.
Henrik Göthberg:So let me go in between, because I will be the least nerdy of these news. I think um, so um, I'm actually I want to highlight um. I did, as a preparation, a little bit deeper revisit and looked at mark zuckerberg's's interview podcast with Joe Rogan, and what's led me down that path is that there was several different news pieces that went quite viral on LinkedIn and different things that has been said. So there are three pieces here. The first one was, like all the news outcry, we will take away the fact checkers and we will use open notes, and it's been a very politicized discussion and I urge anyone who wants to have it from the horse's mouth please have a listen carefully to how Mark is describing it. That's number one. Number two was to hear his prediction on user interface. You know how we have computers. Should we really be living in the physical world and then switching to our screens? That's another interesting take to hear from him. And then the third one was a quite large debate where someone used oh, mark is saying he will replace his mid-junior engineers. So with these three things in mind, I went in and tried to form my own opinion. What did he really say? If I take the first one.
Henrik Göthberg:What everybody needs to understand that, with the amounts of data that we are dealing with billions of topics Number one, what the fact checkers is looking at is 0%, so they're not really doing a good job. This is number one. Number two what the fact checkers are actually looking at and working on is very specific topics in relation to politics and stuff like that. So, when it comes to the hardcore stuff, like terrorist stuff, drug stuff, illegal stuff, there are many, many, many other approaches that they are doing and they are not stopping them at all. It's not changing other approaches that they are doing and they are not stopping them at all. It's not changing. So it's the fundamentals of how to deal with this political stuff. So it's a little bit like you take one small thing and you politicize that and you make a discussion and debate about that, which blows the whole thing out of proportion in terms of what it is and if it really works. And the most interesting thing what Mark did was to highlight the real problem is you know how you understand the confidence level. You know, do you want it to be 80%, 90% accurate all the time? Then you're going to shut down innocent people. Is that what you want, or you want to go the other way around, and I never want to shut down an innocent person and therefore you know you will let you will leak the real terrorist stuff. So this whole balancing act is super, super hard. So it was quite interesting to actually hear a really deep debate on how meta, in eight to ten years, has iterated on these ideas and so the way we get this snippet and the political outcries go to the source and listen to it.
Henrik Göthberg:That was number one the AI story. It was a huge viral topic on LinkedIn and I was participating. People were saying I was basically saying don't use AI will replace engineers rhetoric. I think it's a very dangerous rhetoric.
Henrik Göthberg:First of all, if you're not super experts, you understand that what we are talking about is a reinvention of an engineering department. It's the fundamental workflow in engineering changes that leads to the path where you can have less engineers doing the same work. But it's not as simple, as you have an old process and you take one guy out and you put an AI in and then, when you listen carefully which is to Mark he's using the word augmenting we're going to augment our engineering process. But he says this quite quickly, and then they are talking about the implications and consequences. So here's the core story what works as a discussion on geopolitical or macro perspective versus what is actually happening in reality when you're changing something and you bring in AI into engineering or coding, in this case, and where Mark himself he uses. So this is also very easy to miss what he's saying and which then had a huge debate and you know, oh, we will replace 80%, blah, blah, blah.
Pier Luigi Dovesi:But I mean who isn't right, like who isn't using AI already to enhance engineering. I mean, I think the major, like most of the benefits of AI was also in code helping, like making code helping the engineers.
Henrik Göthberg:It's hardly news If you're going to see a serial call. This is a non-issue. So what, ultimately, mark is saying is that if you're not on the productivity frontier of the tools you should be using as a software engineer, you're toast. Wasn't that relevant five years ago, 10 years ago? So if you haven't followed the techniques all the way through, don't you need to follow the techniques?
Anders Arpteg:Well, just to have it to write code, of course, is a big augmentation that we're seeing, and just seeing the decline of stack overflow. And when you just ask, you know chat GPT, not only to help you write but also to understand and fix problems, is immense of course. But I think another point is really we'd had Anton Olsika here as well, speaking about the GPT engineer, and lovable, and that can do more than just you know, augment the engineer in some potential way. You can simply prompt it to write a whole application and change an application, and that's another level. I would say. So that could be, but still I would say the human is still there.
Henrik Göthberg:I think so too, and I think the core topic is that we need to have one type of rhetoric when we look at the macroeconomic effects, that we, that less engineers is needed to do much more work. This is, this is true, but how many engineers do we need? There is a paradox in here that when, when, when we grow in maturity of something that our consumption demands just explodes and we, we want more software, and all of a sudden, well, what's the net effect? Who?
Anders Arpteg:knows. I mean. It's not that we we don't need to write code. We will. We need to write much, much more code than we currently can, way more, so it's simply that we can do so much more, yeah, and and, but I I think it's lost.
Henrik Göthberg:I was pushing hard on LinkedIn and I got pushed back on. This is not replacing engineers. This is augmenting engineering operations at the fundamental core how it's done, right, but it means we need engineers in order to cope with that new socio-technical context. So I mean I stopped there. But the last one was really interesting also to hear. He basically thinks we're going to live in a world where the physical vision and the artificial vision you know, computer screen is overlaid, so we go into glasses, we're going to go to contact lens. So he doesn't really believe in this.
Henrik Göthberg:He's been trying that for 10 years.
Pier Luigi Dovesi:A bit of fixation with his metaverse, which has been proven quite unsuccessful.
Henrik Göthberg:No, this is not metaverse.
Pier Luigi Dovesi:This is not.
Henrik Göthberg:This is R AR.
Anders Arpteg:Yeah, but that is a similar thing. He's trying to rebrand it. He's going from VR to.
Pier Luigi Dovesi:AR, I don't know. I don't know. Let's put it this way I have a huge respect for MetaEI. I think they're doing great. Then Mark is Mark.
Anders Arpteg:Everyone has his own opinion. It's fun to see when Mark pushed for AR for I don't know five years ago and the company was just going down in market cap and then they switched to now we're going to focus on AI and it just exploded in value and now it's you know, he's becoming one of the richest man in the world suddenly again, because they actually focused on ai instead of ar and and, of course. But I we.
Henrik Göthberg:The interesting thing is how we have been predicting that here, and you being a little bit of a jan lecun fanboy, we were like, oh you know, and finally Jan Lekun got his seat at the table, so to speak. You know, the stuff they were doing was awesome all the time.
Pier Luigi Dovesi:I'm curious what Jan Lekun, how he will react to this change of heart, which of Mark that he had recently.
Henrik Göthberg:to be honest, which one was that Proprietary, open source, or which one?
Pier Luigi Dovesi:No, I mean, of course, I think the MetaEI has distinguished itself thanks to a lot of publications, really important publications that they read recently Open, yeah, and also their stance on open source, which, of course, it's also a commercial mover. Clearly, they don't do it for the benefit of humanity, they do it because it helps them to provide industry standards and they position themselves in owning effectively these industry standards. But at the same time, there is a part which is for the benefit of society having open source. I strongly believe so.
Anders Arpteg:I don't believe when they say I'll add this to as a topic.
Pier Luigi Dovesi:It's a huge topic I don't believe when they say it.
Henrik Göthberg:I'll add this to as a topic. I'd love to hear more about it. It's a huge topic. We have discussed a lot. I love someone who's initiated and have opinions about that.
Anders Arpteg:But I don't think Jellicum would be comfortable, maybe, with this new Could you give it to us in a nutshell I think for one, mark Zuckerberg and Elon Musk were enemies in the past, and now it seems like Mark is at least slightly moving towards Elon.
Pier Luigi Dovesi:Yeah, surprising, right, definitely not an interesting move.
Anders Arpteg:But then Jan LeCun was also in a big clinch with Elon Musk. They had these kind of big fights on Twitter and they were calling each other for whatever kind of horrible name the whole of what is research or you need to be published.
Henrik Göthberg:So I don't think.
Anders Arpteg:I agree with you. I don't think Jan-Milkun is very happy with this move, but we'll see it will be super exciting future, I think, to see what happens there. It's a cool company we've met at.
Henrik Göthberg:Anyway, but for me it was a little bit like sometimes you can go and watch the two hours and then you get a little bit different feeling for topics than what was sort of in the news flashes on social media.
Anders Arpteg:Pierre, do you have anything you want to bring up some interesting news or something?
Pier Luigi Dovesi:no, I don't have so much. I mean, you covered so much already.
Anders Arpteg:I feel like Goran, anything for you?
Goran Cvetanovski:I have actually one. I don't know about this, but if you know about this, but two days ago, three days ago actually, the Biden-Harris administration released something which is called interim final rule on air diffusion, which is basically just a cheap export control measure, diffusion, which is basically just a chip export control measure. So now the united states is basically um has this measure, when they are, to my ex, to my perspective, actually policing which countries and which companies uh can have access to chips or not, exactly, which means that they are controlling the entire game of AI, which is super for them, because I understand why they are doing it. They are doing it for security measures, economic strength and et cetera. But this also gives us to a technical balkanization of the world, if I can use that term. I like that. I like that Right. Or, as we are calling it, ai divide In this measure.
Goran Cvetanovski:basically, how it's placed is that there are 18 allied countries and I believe probably the list has not been released but Sweden probably is one of them, so they don't need to apply for that, that the cheap orders that are around 1700 GPUs are not also basically concerned with this measure. But then you have, for example, entities that have headquarters in some of these allied countries. They are limited to how many chips they can actually use. I think it was around allied countries. They are limited to how many chips they can actually use. I think it was around. I will not speculate actually about this, but there was some kind of number and then you have.
Goran Cvetanovski:So basically, even in europe now you're divided just because somebody decided that this is going to be the case. So you have, like european countries that are part of this 18 that can actually get like a bigger chipset import, and some of them not. Also, they limit. So you can think like, oh, but European Union is like all together and there is a governmental to governmental policies and agreements and etc. Well, they also thought about that. So there is like a restriction actually on almost everything. And, of course, why I'm pissed about this is because it's affecting the stock market quite a lot and I reinvested in.
Goran Cvetanovski:NVIDIA, and now they're shitting on my stocks and Jensa was furious, yeah, I mean it didn't fail that much, but you can see that actually right now it's becoming the chips. Of war is actually starting. Before it was the AI divide, now it's the chip divide.
Henrik Göthberg:We already talked about that a little bit last year in the final, but now this was the next piece of the escalation of the chip war.
Anders Arpteg:I mean, it's moving into an area which I don't want to go into, but of course it has a lot of geopolitics of AI, that will just increase more and more coming years, of course, because of the impact of AI and also the impact to national security, which I think they are partly mentioning here.
Goran Cvetanovski:Another interesting thought is that, basically, now United States is putting it themselves as we own AI. We decide what AI is, we decide how many chips you're going to get, although all the manufacturing units, the machines that are building these chipsets, are actually done in Europe, in the Netherlands, but it doesn't happen in Taiwan as well. But it's just a little bit annoying. And kudos to all of this, but I just don't see how this is going to.
Henrik Göthberg:I saw one number. Of course, china is on the other side of who's on it and we still need to understand. If I take the Nvidia perspective I heard a number, I read a number as much as 40% of the GPUs that Nvidia sells, 40% goes to Chinese type.
Goran Cvetanovski:Which is logical. I doubt that.
Henrik Göthberg:No, but why?
Goran Cvetanovski:Because if they're using Nvidia chips in the big, large language models in China, of course, there was articles about old models of machines that basically these chipsets are made, and it's not exactly true.
Anders Arpteg:There's been export restrictions for Mediaship to China for a long time, but there are ways to get around it.
Goran Cvetanovski:So enough said, we'll talk about this later.
Anders Arpteg:Imagine what will happen now with Trump. I'm really interested to see what he will do to it and how that will compare to what Biden has done. So that will be I can imagine, but I don't.
Goran Cvetanovski:I saw the entire what is called interview with Zuckerberg we're going to talk about that later as well. I thought there was a lot of nonsense in that as well.
Anders Arpteg:But yeah, that's good Awesome of nonsense in that as well, but yeah, that's good Awesome. Pierre, I'd love to get more into your very interesting work with the good AI lab, but before we go there, I saw you mention something about Rain AI and it's also a passion interest of mine. I don't know where your stake or thoughts about this is, but please elaborate a bit. What are your thoughts about Rain AI?
Pier Luigi Dovesi:All right, okay, I think I should give some context around this.
Anders Arpteg:We spoke about them before, just for your knowledge, all right.
Pier Luigi Dovesi:Okay. So I actually didn't know about Rain AI until I got contact I don't remember it was late, early 2024, uh, or late 2023 and uh, by a colleague like actually one of my quarter who talked with a colleague and they said like, hey, look, I I looked into rain ai and they're using one of your paper in there as a backbone uh, in their one of your papers. Yes, in, uh, in their work, I mean, and I, yes, oh, really In their work, I mean, and I was like, okay, what's going on about it? And, of course, in AI, I can provide some context, for I'm not going to. I mean, I think it's quite popular as a company, but of course, they're still a startup.
Pier Luigi Dovesi:They work on neuromorphic computing yes, effectively having these chips that they have DC, effectively having these chips that they have dcmi, I think it's called uh. Basically, the computing is happening in memory, yeah, and this allows to have online learning and, in this context, online adaptation. So, of course, but one thing is to have the chip that is able to train and like online, so in in the car, in the robot, in the agent, but at the same time, you need to know how to learn like, you need to have a loss function, uh, an objective function to to optimize, and that is another thing that you miss when you're training online. So our paper that we published in uh 2023, indeed uh, quite late 2023, and it was immediately in their website.
Anders Arpteg:Um, yeah, if you go, in rain.
Pier Luigi Dovesi:Ai approach you, you'll find it's a big success man, thanks.
Pier Luigi Dovesi:And. And so we were a bit quite surprised because it was out of like just after three months, they, they, they found it. Yeah, yeah, I don't know. And of course we've been shortly in contact with them. I chatted with, I don't remember, the head of products or the CTO, I don't remember anymore and yeah, they indeed are fond of this work and we were not expecting it to be used so quickly, also because of course we were aware expecting it to be used so quickly, also because of course we were aware of it was just an academic work with, I mean, many limits in many sense, but I'm happy that they're finding a use of it.
Henrik Göthberg:But of course neomorphic computing in itself is I think it's we talked about that.
Anders Arpteg:It's one of my favorite topics. All right, okay.
Pier Luigi Dovesi:So yeah, that's why I got to know RNAi. Should we just do a?
Anders Arpteg:quick recap.
Henrik Göthberg:Yeah, let's do a recap and then go back to the old school question or the classic question If we want to invest in the next generation, is it neuromorphic or quantum?
Anders Arpteg:Quantum is what? Quantum is small. Okay, anyway, let me just describe a bit my thinking about neuromorphic, and I'd love to hear your thoughts about this and if you agree with my thoughts here. For one, I think an obvious problem with the big models that we have today is the extreme energy consumption that they're having, and even now, every major hyperscale needs to basically buy a nuclear power plant to be able to drive their data centers. So a problem, then, with the traditional kind of gpus and machines that we have is that they have separated memory and cpus.
Anders Arpteg:It's a von neumann architecture that's been around since 1940s ish, and in a human brain is different. So, human brain, you actually have neurons that have both a state and computation in place, so computation in memory, so to speak, and this is more or less what you know morphic computing is trying to do as well. So then, not having to send memory between a memory ship and a cpu ship, and instead having the memory in place directly, will significantly reduce the energy consumption. So instead of having transistors that have to flip all the time to move memory back and forth, back and forth, back and forth, you just have memory there when you need it.
Anders Arpteg:And the other problem is that for normal CPUs and GPUs they are synchronized on some kind of clock, like gigahertz kind of frequencies, and the human brain is much slower.
Anders Arpteg:It's like kilohertz, it's like a million times slower, but still with more computational power than the biggest GPUs that we have.
Anders Arpteg:So the reason is that they then operate separately so they are not synchronized, they are asynchronized, and then they only require energy if they change the state, if they flip, if they do this kind of spike.
Anders Arpteg:So we have spike in neurons in brain. So the only energy that's required is really when you change the state in a neuron, the memory in a neuron, and and if we can move to that to not have to move memory back and forth and only require energy when you change a state in a single neuron, of course the energy consumption could drastically reduce. And if we take the energy of a human brain, it's like in 20 watts kind of region, and if you take like a big cluster of H100s that's necessary for O1 or a big large language models, it's in megawatt kind of range. So it's a million difference in energy consumption. And if we have any chance in the future to not only have hyperscalers making use of AI. We need to find much more energy efficient ways to work with AI, and I think you know morphic computing is the way to do it.
Pier Luigi Dovesi:Yeah, I see, I mean, my take on it is not. I mean, of course, that you, what you were mentioning is true. My take on it is looking into the more adaptation point of view. I do believe that I mean when they talk about even AGI, which I think is unlike such a, when they talk about even AGI, which I think is such a vague term, that is not really I mean. As we are approaching it, it makes less and less sense, as we see. But still, if you want to go in that direction, you cannot have an AI which is static and is frozen and needs some engineering. Uh, sf saying, okay, we're gonna release a new version.
Pier Luigi Dovesi:Uh, you need it online learning yeah, you need to adopt, you need to be able to adopt. And then, and so, if you need to be able to adopt, then it means that you are somewhat embedded uh, like embodied, sorry. And then, and if you need to be embodied and you need to adopt, then you need the architecture and hardware that allows you to do that. And then, if it's in memory, computing or spiking, then, and this, is memory stress, so they have.
Henrik Göthberg:So do you see the logic that pneumorphic architecture is more suitable for adaptive or online learning? Do you see they complement each other, or can you do it equally with the von Neumann architecture? Well, of course.
Pier Luigi Dovesi:I mean, we have used classic GPUs to develop our systems, but I would say they definitely need. So. If you have this neuromorphic computing, then they definitely need online. I mean, uh, this neuromorphic computing, then they definitely need online. I mean, if you want to learn online, you have the problem that you need to have an objective function online. Yeah, and most of our system they don't have an objective function that is available online, because maybe you need annotated data so it's more like for new work.
Henrik Göthberg:Neuromorphic computing to work, they need online learning. Yes, to work, they need online learning.
Pier Luigi Dovesi:Yes, of course. Yes, they need algorithms that can support online learning. So it could be adaptation algorithm, it could be unsupervised learning, it could be anything that can do this learning cycle without any human intervention. And it also has to be, I would say, what we investigate in that paper that they are using. Basically, it focuses on three questions.
Pier Luigi Dovesi:So what in the network should be adopted? So, because most of the time you think, okay, I have to adopt the whole model, but this doesn't make any sense. So a model is made of billions of parameters. Do you really need to adopt all of them? No, maybe you can just adopt. If you could select which part of these parameters into the adaptation, then you can focus on that. Uh. Second question that we focused is not a what, but is how to adapt. So, uh, actually controlling which kind of adaptation they need. And the final question is when to adopt, because most of the time the time I mean, if they I'm in an environment that I know very well, I don't need to do adaptation, but then I need some trigger that tells me, okay, now you are in the unknown. And now that you're in the unknown, you need to start adapting and you need to start adapting for this amount of time, this amount of parameters and in this specific way, so interesting when you go deep on this.
Henrik Göthberg:It just opens up new problems or new ways, things you need to solve.
Anders Arpteg:Yeah, you could easily have a rabbit hole there as well. You know, time is flying away and I would love to cover a bit about the good AI lab and then also go a bit more philosophical in the end here.
Henrik Göthberg:We're adding then one hour yeah To the podcast, Of course.
Pier Luigi Dovesi:I can sleep here, sure, here. What is the good? Ai lab? Yeah, so this is something very new, uh, even though the people who are working in it that we've been working together for years already, but then we decided to, uh, to actually get more, a bit more official and also to expand. So, uh, effectively, we are a non-profit organization, we're an independent ai lab and we're putting together, I would say, scientists, professors, researchers, engineers from, I would say, the the AI-leading institutions worldwide. And when I say, of course, ai-leading institution, I really mean it. I mean that we have people from MD, from Google, we have people from the University of Toronto, university of Bologna, tu, munich, kth, eth, zurich, we have King BonTouch even. I mean there's, of course, a strong presence of Stockholm people.
Anders Arpteg:Yeah, top AI companies and universities not only in Sweden, but also in the world.
Pier Luigi Dovesi:Yeah, worldwide and we come together, like for mainly two, on the two foundation pillars. One is academic AI research that we we've been doing for a while, and the second one is that is on good AI initiatives. So, when it comes to the academic part is effectively have a follow-up of what we've been already doing and it start is something that is started uh, on the picture as well. That's good yeah, yeah, you can see the website. It's quite new, uh and uh, yeah you, we have what's your role?
Anders Arpteg:were you part of creating it, or something, or how?
Pier Luigi Dovesi:yeah, I mean it's a it's, it's a collective initiative so share of the board yeah, I'm uh.
Henrik Göthberg:I mean, right now I'm uh you're sharing it right I'm sharing it right now, yeah, um yeah, is head v good yeah yeah, she's, she's among our advisors and uh, yeah go back and redo the mission, the reason and the rationale for the academic pillar. Well, you know what. What are you? What are you doing? Are you trying to get even closer in collaborations? What are you trying to achieve? What is the academic?
Pier Luigi Dovesi:pillar. It started from a need when it comes to the academic. It started from, I would say, a need that we felt that, okay, we I mean doing research in AI is something that I mean. We think AI is extremely important topic. It's fun, but it's also important, and having an understanding of it, having a presence when it comes to research, is something that is equally fundamental. We are all working in leading companies but at the same time, of course, when you work in a company and you do research within the company, then there are some ceremonies that you need to follow when it comes to DIP, when it comes to the topics that you will research, and these might not maybe align to the interests that you're currently interested in. So what we thought it was like okay, we want to create a space Right now, I would say there's also these topics about is the Good AI Lab more of a space or more of a subject? Like right now it's more of a space.
Henrik Göthberg:Arenas to come together.
Pier Luigi Dovesi:Yeah, an arena to come together and do research together. We have students from universities right now working with us. There are some these two works that we submitted are from that group as well, and now we have two students that are now officially starting as students for the the good ai lab so the papers that we were talking about in the introduction today, are they submitted from the good, from well, or from silo, or from good ai lab, or a combination.
Pier Luigi Dovesi:So the Good AI Lab was not yet a thing at that point, but the people working on that paper were already there, and then they worked as a collective effort, I would say of all these partners that I'm describing.
Henrik Göthberg:So the paper is coming from people that is not in Silo AI but actually not part of.
Pier Luigi Dovesi:Well, I am in Silo and of course, in the paper. For example, one of them there's uc calgary. Yeah, he's in silo, so but then we also have people, uh, from google, from eth, in those papers.
Henrik Göthberg:We have people from munich that's already collaboration around the paper. Yes, exactly.
Pier Luigi Dovesi:And then, uh, we also thought, uh, another concept that was that, okay, ai innovation is important, but then innovation without a strong guiding purpose is hollow. You need to bring, I mean we are like, by us recognizing the importance of AI, we also recognize the impact that it has. And right now the impact of AI and the benefit that is bringing to society are, even though we think that they are, of course, improving our life, we it's only improving the life of a very few privileged and only improving it. It's somewhat. It is floating here at the top it's a divide.
Henrik Göthberg:It's a divide here also in terms of the objectives we are and how we are spending money in this.
Pier Luigi Dovesi:Yeah, correct, and we want AI to actually float, like to kind of flow outwards and downwards and like don't just be something for the privileged, but actually support and be effective when it kind of like reach the places that have been long, long being forgotten. And without how we do it is like we are engineers, we're scientists, we're not ethicist or a philosopher, we're not like and we're very pragmatic people and that's why we are starting collaborations with the humanitarian aid association and with other non-profits active not in ai but actually in education or, as I said, humanitarian aid, medical support, for like active uniquely into providing essential services for people.
Anders Arpteg:why do you think this is important? I have an academic background as well. I certainly can feel that academic value production is going down, especially compared to the big tech companies. Is that one of the reasons? Because this is a place where, potentially, tech companies are working closely than with academics? Do you think it's necessary for that reason? Or what's your thinking about academic research in general in AI?
Pier Luigi Dovesi:Yeah, it's kind of I mean it's becoming more and more, let's say it's getting a bit more unique, like doing now real, like we see, if we look at the main AI advancement that we saw in the past years and all the news that we talk about, they're all from large industry. Yes, so the understanding, what is the role of academia in this age of AI Exactly? It's actually very difficult. I think there is still a need. I think it's not about that. There is not a need. Of course there is a need, but what is the impact that it can have? What should we focus on?
Pier Luigi Dovesi:Yeah, and, the truth being told, we see that there are fewer and fewer breakthroughs coming from Pure Academia, and it seems that even Pure Academia can have an impact only when it's partnering with these large companies, and I think that that might represent a problem when it comes to the independence of AI development, which is effectively only following the agenda of the AI leaders in the industry, and this I mean, of course, science is never. I mean, science can never decide its own purpose, right? Not even if it's academia, because, of course, your grants and your money are coming from the government or from industry. So science is never free. But it's even less free if it's pure driven by the industry. So I'm not against that. I think there's amazing work coming from industry and I'm a fan of them. But at the same time I see the societal limit of that.
Anders Arpteg:It would be sad if the only research that we're seeing is coming from industry in some sense. I mean it's not bad, as you say. I mean I think we can do a lot of really good stuff for society even if it's driven from industry in some sense. I mean it's not bad, as you say. I mean I think we can do a lot of really good stuff for society even if it's driven from industry. But there should be a place for academia, right yeah?
Henrik Göthberg:But could I understand now, because this space, for me it's a little bit like also in the topic that we have people with these ideals. That works in industry Even you work in industry with amd and silo who who sees this deterioration in in or limitation in objectives that the research is doing? And then we have academia here where you see, we, you, you stay set it and it's hard to the truth is that it's less and less breakthroughs coming out of academia when they're working alone. So here then we have a space we need to create an arena where we can together, you know, take a little bit from the commercial side, push this here and then, in the same way, grab and lift the academic papers back up to breakthroughs. Is that a way to describe also this space? Is as a bridge?
Pier Luigi Dovesi:I don't know that could be the part of the ambition. Yeah, uh, also because we are not. I mean we are in the one. When I mean we are an independent, yeah, I love it, I really mean it. I mean like we are joining. There are a lot of people coming from these institutions, but the institution themselves have no no, there's no part part. Of course there is a declaration of like okay, everything is extremely transparent in this, but of course there is not a direct involvement. Is there any funding or no funding?
Anders Arpteg:So you do it on free time only?
Henrik Göthberg:Yes, exactly so people who want to invest in the ideal of where we're going brilliant minds because you have no puppet strings with the funding. It means when you come together you can decide on where do we want to direct our efforts, and then you've seen one way of doing. That is okay. If you want to do ai for all and for good, we can go to humanitarian aid type let me speak about that.
Anders Arpteg:You know how do you guide your research. Then what are your ways to choose what to do research on?
Pier Luigi Dovesi:yeah. So the two initiatives, I would say they are, uh, quite independent. So, when it comes to ai research, we, we focus on academic ai research and that's it, and there is no right now any constraint on that or guidance on the research based on what is good AI research. When it comes to good AI initiatives, I would say they are not research at all, so these are solving real problems. There is more engineering yeah, we are in a very early stage of that, I must say so, while we, when it comes to the research, we have years of experience, we have people I mean we have professors, we have experts for that. For the good AI initiative, it's something that we're starting right now. We're talking now with non-profit and humanitarian organization, but I don't expect that to become research. I see more Solving problems.
Pier Luigi Dovesi:Solving problems, problems, yeah, real problems. I mean the. It could be about ai education, for sure, uh, but then they have real problems like the. I mean now the main contact that I was. So I was talking with uh president of uh doctor without borders and uh, she's now in lebanon, so for a while we cannot. Our collaboration was like sorry, I don't have time right now, I'm in the hospital, literally. So I and they have very concrete problems. It's not about research, and I mean just the logistics that you have to deal with with medical supplies. They present incredible challenges for them and of course they have. All their fundings is spent purely on medicines and medical supplies, so they cannot afford a fancy predictive logistic system. They have a lot of people. You'll be surprised. It's a new world for me, but you'll be surprised how many people they have just for logistic aren't even doctors.
Pier Luigi Dovesi:They are just making sure that the doctors receive what they need, but you so so used to summarize, so you know.
Henrik Göthberg:So, on the one hand side, you have the research, and this is this is driven by, I would argue, that joint common interest in a certain field or topic, like computer vision, adaptive learning. And then you have where you, as as an entity that wants to do good, decides to, we want to put our brain towards real world problems that we choose important, like doctors without borders, logistical problem for example, we also like we, we're still in talk with them, and then we there's also, I mean, another project that we might, uh, we're discussing is also regarding impact analysis.
Pier Luigi Dovesi:Uh, for educational projects. So when you have um, so there I mean every charitable organization that works in a bit of a different way. Some of them are supporting projects that are, let's say, grass-rooted, so they start from locals, that they send an application to receive funding to do this societal uplifting project, so to say, and for that it's way more important to understand the impact analysis of this project. So what is the impact, the societal impact of this project? That, okay, maybe okay, 10 more children got an education this year. That's the impact and that aligns with the reason why they got the funding. But then this involves a huge amount of documentation. It involves uh, uh, scattered documents and difficult matrix to measure and to visualize. And then for that you also, you also need AI.
Anders Arpteg:I mean, it's so impressive and I hope you feel as proud as I feel of you for doing this. I almost wish I could get involved myself.
Pier Luigi Dovesi:You can.
Anders Arpteg:It's amazing work and I think that is such a good idea, and what an impressive cast.
Pier Luigi Dovesi:Yeah thanks.
Henrik Göthberg:Some very impressive names coming together on this, really cool.
Pier Luigi Dovesi:I'm really humbled. Akhleto for the team.
Anders Arpteg:Awesome. So time is flying away and we're getting a bit more into philosophical space. Now I'm thinking should we do open source or latent space reasoning or frontier model? Should we start with open source? Just a bit of the future of open source perhaps.
Henrik Göthberg:Yeah, you have the open source discussion and then you also have the. You can combine the prediction between the frontier models. You know what's the position and what's the use of the frontier model. How would that involve versus, for instance, open source with the fear of being the baby model or you know. So it's some sort of open source in here, but it's also a prediction.
Anders Arpteg:Let me ask a question. Something we have said in the past is that we know Meta, for example, have strong open source initiatives.
Henrik Göthberg:Open weight.
Anders Arpteg:Open source and open weight and open data and whatnot. The question is can that prevail? And if we move into trillion parameter models and models that become closer and closer to AGI, can we really release them as openly as we do? Jan-li Koon says so. I actually don't think so. Do you think we'll see a future where even meta will stop publishing models publicly?
Pier Luigi Dovesi:as they grow. I mean for meta, I don't know. I don't know really, but I think the whole narrative that these models are too dangerous to be released in the wild has been debunked multiple times, right. Narrative that these models are too dangerous to be released in the wild has been debunked multiple times, right, like just think, when there was chad gpt2 and they never released it. They say, like these may be too dangerous. Now we have models that are us, maybe even stronger than gpt4. They're released and that's not the reason why the world is falling apart.
Anders Arpteg:Right, there there's no evil entities that and you don't see a point where the model could be strong enough, where it's too dangerous to release them.
Pier Luigi Dovesi:I think there could be. I mean, first of all, models can be dangerous and do a lot of damage, even if they're not open source. It's not the open source that creates the problem. Why would we think that a model kept uh uh private? Why? Why would a corporate? Why do we trust that a corporation could be more reliable than uh?
Anders Arpteg:but you want to use it for warfare, you know but they are using them, uh not open ai perhaps though I'm not sure, but but there are, they are already.
Pier Luigi Dovesi:Uh, and I find it deeply sickening how much AI is used in warfare nowadays and how little AI is actually used for humanitarian aid. And he said there is so much AI, so much development developing the production of AI weapons, and I mean that's not AGI and those are corporations, so you can create a lot of damage and actually you will create the most damage for society with proprietary model.
Anders Arpteg:Not releasing it to open source is not you don't think there is any extra dangers with open sourcing models?
Pier Luigi Dovesi:I mean on a theoretical level I see what you mean that maybe what if any bad entities could? But I mean they can do it. They are already doing it right. You already have AI guiding.
Anders Arpteg:I mean, it's, like you know, us trying to have export restrictions on ships. You know it's not preventing them to get the ships they can still do it but it creates a barrier at least. That makes it a bit more difficult.
Pier Luigi Dovesi:Yeah, but why do you think that? I mean because you're assuming now that the US is using AI only for yeah, okay, but then you get it right, so there could be American companies and corporations using AI the most despicable ways.
Anders Arpteg:Yes, but the you know the point is, if we don't restrict it. Okay, let's take this analogy, then, and I'm just doing it to have a discussion here.
Pier Luigi Dovesi:Of course there are pros and cons with each approach.
Anders Arpteg:But it's like saying I have a kid at home, the kid is three years old, four years old and I have a gun and either I lock my gun in my safe it doesn't prevent the kid from getting it. It could potentially, you know, when I forget to lock the safe or something, it could reach the gun. But compared to having the gun on the kitchen table available all the time for the kid, that doesn't really understand the consequences. It's an easy choice to be closed with access to the gun.
Pier Luigi Dovesi:For the kid. Well, this analogy assumes that the big AI corps are a caring parent and the people are a mindless, stupid child. Yeah, okay, but I don't believe in this narrative. That's what I was saying.
Anders Arpteg:If we skip the big state actors or the super big tech companies, there are a lot of small. Yeah, I don't want to go in this straight. Okay, there are people, lots more people, that are like terrorist reasons or whatever, they don't have access to a lot of funding, but they really want to do something bad but why would I mean, would really gpt4 be what the terrorists would look for?
Pier Luigi Dovesi:I I'm not, I'm not sure. I I mean, I mean we saw, I mean we were saying this with gpt2 that okay, too dangerous to be released. Now we have, uh, llama 3, which is way more powerful gpt2. Yeah, I don't see so many terrorists thinking about LAMA-3 when they need to act, but at the same time, there are a lot of companies using LLMs to select their next targets for when it comes to weapons and any sort of solution.
Anders Arpteg:So, in short, if we were to summarize, we could simply continue to open source models and give them away, both data and code and models.
Pier Luigi Dovesi:I don't think the danger comes from that and, of course, also seeing that the model, I mean this also opens for a bigger discussion. If you need, if you have a, let's say that you're really okay. This heavy intent and I mean it's not like having the mod will be enough. You need a huge server, you will need a huge team of computer scientists. So I, I, I releasing the weight, not like precluding keeping the weights closed for justifying for a matter of safety, uh doesn't hold because you are assuming having a huge server of GPUs, uh, putting it together, uh, having these models, the work in these, uh distributed, most is big anyway, it's so complicated. So you safeguard it.
Anders Arpteg:You don't think the threshold kind of argument makes any sense, that just making it harder is worthwhile.
Pier Luigi Dovesi:No, I purely believe it's just.
Henrik Göthberg:What I get out of here most listening is that Pierre is pointing and using a word here narrative and what we are experiencing right now whether we believe it or not, we buy into it now is that we are being, I would argue, groomed into different types of narratives here, and I remember we had a friend here on the podcast. We talked even about AI, apartheid, the importance of open source to Africa and what happens. You know, we talk about the tech giants to Europe. You know, don't forget about the AI or tech apartment ICA is doing to the farmers when they say they need 5G, even to be part of the cold chain, right?
Henrik Göthberg:So what we're talking about here is that we need to look at the whole world and for this argument, open becomes this huge topic of pros and cons. That is quite complex if you go from tech giants to terrorists to Europe and then all the way to we want an inclusive society. We want an inclusive society. But what we hear now and even I to a point we had this conversation about what we believed in was the trajectory will open source win or will frontier model win? And even I've been caught up in this narrative and even I I think we have shifted in how we have discussed this as, yeah, of course, open source, no brainer. And then in December, now both me and Anders were talking about, oh, maybe we need to have close frontier models, and then we have baby open source.
Anders Arpteg:I think this year we'll see Meta not releasing a model.
Henrik Göthberg:Yeah, so it's one thing. What we believe in is the right way is one thing what we are predicting predicting. But as I find it very refreshing that we have someone of your caliber and and that sort of points to this narrative, because it's very healthy, are we a part of a narrative or, oh, you know?
Pier Luigi Dovesi:but I I also believe that one. I mean there isn't. Why we have open source right now is not. I mean, is not because these companies who are releasing open source are doing it for the benefit of society? Of course not, like it's a.
Henrik Göthberg:We know this, we understand it Well right Like.
Pier Luigi Dovesi:That's why I think I'm quite confident. Even if Meta will stop releasing open source, someone else will, because that's a market position that you have. That's why that's how you cut the gains and the advantage of the first movers. So clearly, if you're at the top in the, like OpenAI or Google, you don't release anymore in open source because you want to protect your position in the market. Meta was lagging a bit behind when it's come to the frontier model. Of course they want to release everything. They want to become the market standard. So they say sure, you might have this advantage, but I'm still tailing you and I'm burning all your advantage because everything that you do is super source.
Henrik Göthberg:Yeah, but it's deeper than this, because if you look at OpenAI, the model is the value proposition, is the product, and if you look at Meta, the model is part of an ecosystem where they want to sell other things. So it's hardcore, hardcore. We want to have our value streams or value pools, and they need to be fueled by models, but at the same time, for us to win, we want to undermine or we want to cut down the moat of the other ones who have this as a core value pool.
Anders Arpteg:So it's interesting because Meta's value pool is different to OpenAI, I think there actually are reasons not to open source, so I'm still going to stick with it, I think that will also happen because of regulation, and we already seen that even closed source models is not being released in Europe because of regulation. To be able to defend an open source release of a big model like this will be even tougher, I think, in the future. So this is also a big part of why I think this will be a direction that we will unfortunately need to go to, I think.
Henrik Göthberg:And it's interesting because I was really happy to come back to my open source ideals and I don't know where I'm on the scale right now. I'm really confused.
Anders Arpteg:Awesome. And who knows the answer? We have to ask Shetty BTF.
Henrik Göthberg:No, no, no. Ask an open source model.
Anders Arpteg:Pierre, there is a lot of stuff happening in AI and a lot of companies are working towards AGI and we are seeing some changes finally to also the architecture. What do you think the the biggest steps are coming years, two, three years to come closer to something that we potentially can call agi or artificial super intelligence or whatever we call it. Do you have any any predictions, guesses for what we will see?
Pier Luigi Dovesi:well, okay, so easy question, easy question, yes, let me think. Um, I think we've seen now, like we discussed a lot during the today about oh three and how the computation is moving. Today about three and all the computation is moving, uh, kind of steadily from like leaking into inference time, so it's not just something that stays as it was before, purely training, train, something bigger and better. Also because realizing these other things, that data, uh like we're finishing data where there's not much more data to extract. They have one internet and I think that's been mentioned. Ilya had this story. Yeah, data.
Pier Luigi Dovesi:This is the fossil fuel of I still believe that was my personal opinion that we can still extract more from this data. So still, it's true, that's my personal opinion, that we can still extract more from this data. So still, it's limited, but I guess you can extract a bit more, so there is a limit of efficiency.
Anders Arpteg:I think potentially LLMs or foundation models in the future will have a similar moment as AlphaZero. So AlphaGo already had a mixture of synthetic data and real data. Alphazero had zero human data and only synthetic data and was better than mixing with the bad quality of human data. Definitely yeah. Do you think that that could happen for foundational models as well, that we stop relying on human data in that sense? What about model?
Henrik Göthberg:collapse then.
Pier Luigi Dovesi:Well, I think the limit of that I mean you can have the limit of that is, of course, you need the knowledge, you need human knowledge. But you can have data from the world, yes, of course, but augmented with synthetic data, you mean.
Anders Arpteg:No, I mean if you take okay, good question. So one is okay, human annotations is out to start with. That's been out for a long time, right? So we can have self-supervised learning and that works without human annotations. So that's no question, that's out. Then if we take alpha zero, that means removing human experts showing how to play a game, right, so that's out. That's what we potentially are thinking here as well. Then we can see the R star math kind of approach, where it actually is generating data by itself completely. It's based on a pre-trained, you know, ultra-aggressive token model, but still the reasoning part is actually completely self-generated.
Anders Arpteg:So they did data all the way through. So it started to move, I would say, in this direction, and then we can potentially do the same even for vision. If we are starting to see it for text, we can see it for vision, for audio. I think potentially that could be even more powerful than having to use sensors to get the data.
Pier Luigi Dovesi:The main difference that I see with AlphaGo is that in AlphaGo you have one game. The objective is very simple you need to win the data.
Pier Luigi Dovesi:Yeah, the main difference that I see with alpha go is that in alpha go, you have one game. The objective is very simple you need to win the game. The rules of the game are not changing. So the problem you go in the real world, the game. No one is telling you the rules of the of the game and it's um, the, the dimension of the game, are like so it's much more difficult than real world, no question about it.
Anders Arpteg:It's much simpler in a game like go and then there's other need that they have to be useful.
Pier Luigi Dovesi:It needs to have real knowledge.
Anders Arpteg:So right, you need to know things that but the only thing you can generate, like you know if you take math problems or like the arc challenge, you know, with the kind of thing you know in math problems, you can easily take like a. I wanted to have this answer and then do like.
Pier Luigi Dovesi:Yeah, because you can see math. It exists, I would say, in space, which is independent to the real world. Yes, so then it gets very philosophical.
Anders Arpteg:But even for a car, wouldn't you say that using a physical engine that generates the world, and then you know, creating. You know, these kind of scenarios where you drive off the cliff or whatnot, and and that potentially being more powerful, but then you will have the same biases.
Pier Luigi Dovesi:So when you build the engine, then the the buy, like how can I say then the first of all, you have two problems. The first one, as we saw, is the domain shift. You train this simulated environment. If it gets good enough, well, but then if it's good enough, I will say how do you know that?
Anders Arpteg:the who generated this environment, developers- so you will learn, but it's easy to generate like a physical engine. It's much harder to understand how it works afterwards, like going back from that.
Pier Luigi Dovesi:but the thing is like yeah, it's really smart, it can actually outplay the engine quite easily, like you can under, like it's easy to do a similar. Okay, I mean if we were able to generate a physical engine, a simulated engine that was truly representative of uh, of the real world, then we would be living in a simulation, right?
Anders Arpteg:No, no, no, no, I don't, no, actually not. I think it's actually much easier to do a simulation than actually understanding the world. So it's easy to just, you know, take the next step and see how the frames, would you know, work, in a more like vision kind of sense, but to, from that, go backwards and try to define, you know, what does it really mean? What is the car, what is the cliff? That's much harder and I think actually it is rather easy from a humanly crafted like physics engine to do a very, very realistic simulation and then use that.
Pier Luigi Dovesi:I mean, that's literally what we see in r star math, I think well and in alpha zero, for I I agree with you when it comes to this cell, well contained problems. But then when you go to things like autonomous driving or leasing, then the amount of work that would be needed in the simulation, it's just.
Anders Arpteg:You could just do multiple trees, or you can just sample in the space.
Pier Luigi Dovesi:Just to give you an idea the new, most promising simulation environment. They're not anymore. I would say they're not anymore. I would say they're not they're. They're not anymore purely game engine. They are a mixture, a hybrid between real. So take real data, bring in the simulation and then take simulation data and bring it to some form of real, realistic scenario.
Pier Luigi Dovesi:But there is always this kind of seem to real and real to seem that goes back and forth this is like the game engines of unity or or one of the real hardcore underlying yeah, but but then now, now the goal is to take, let's say, I take a video that really happened, I bring into the simulation environment that looks like exactly like the real thing, and then I create something very similar and then I create like a thousand variation of that. So, which still means that now the simulation environment are kind of getting better by taking real data.
Henrik Göthberg:So what you are objecting to is the pure synthetic approach. You think there's a ping pong match going back and forth here, you do synthetic, and then you upgrade it with some approach. You think this there's a, there's a ping pong match going back and forth here you do, you do synthetic, and then you upgrade it with some. You know, you validate it back and forth or you install like the. You know, like the game engine starts with real images, put that makes thousand simulations and then check that back, you know. So. So is that what you're saying? That this combination, yeah I think the purely synthetic, you don't?
Henrik Göthberg:you think it's the hard one?
Pier Luigi Dovesi:well, it depends on the problem. If you just need to do, for example, just text, okay, then then maybe we can already find an understanding like sure you can probably generate with a very good llm and maybe like text which resemble, like you can have, but then still that that will be simulated text that comes from what? From another AI trained on real text. So you will still have this need that you cannot get rid, of, which is the real stuff, to go purely like training AI without any, any real data. I don't see it happening today.
Anders Arpteg:Agreed not today, but I think there could be a reason for that, potentially if we have an alpha zero moment for foundational models. We've certainly not seen it yet, but I think it could happen.
Henrik Göthberg:I want to humor Anders here a little bit. So we are talking now about different ways where we need to go in order to get to more and more general AI and intelligence, in order to get to more and more general AI and intelligence. So the other dimension that you want to talk about is you call it the latent space. I saw another paper talking about this as a concept space, a concept.
Anders Arpteg:Is that the same? No, it's not the same, but it's getting closer to it. At least it was a meta paper, I think.
Henrik Göthberg:The meta paper was the concept.
Anders Arpteg:So it's basically going to a sentence instead of a token.
Henrik Göthberg:Yeah. But then one layer from the end token representation, but I think once again. The computer vision example was, of course, going from pixels to concepts. This is, concepts like animals, From concepts of animals. These are four-legged animals and these are now cats, and this is not the same as latent space.
Anders Arpteg:It's moving in that direction, but just a very, very small step.
Henrik Göthberg:It's not the JEPA approach yet, but it is in that direction. But you're at the JEPA.
Pier Luigi Dovesi:Yeah, it was a Joint Embedding. Predictive Architecture, prediction Architecture yeah, no, I think that's another. That's very fascinating, like I think that's also what we're saying right that you cannot have a single element in your ai pipeline doing all the work. You need this world, like I think in jpy, you have these world models, and then you have the configurator, yes, uh, the perception module, and there are all this collaboration that you have with them. So, um, I I think, yeah, we're definitely. I mean, I see, like right now even it resonates, yeah definitely.
Pier Luigi Dovesi:I mean even in our work that we do in the field of domain adaptation, we already introduce. Most of our work have on a component called observer and another component called the orchestrator. That effectively are these two components like understanding if you change a domain, how this change. Look like giving a semantic meaning to that and then orchestrate either the training time, learning process or maybe orchestrate which model to pick from your library and then fusing that.
Henrik Göthberg:So then you are, and you could even argue the mixture of experts approach here. Or is that different? Or is that actually? Is it different ways of describing the same thing? Is it actually different?
Anders Arpteg:I think it's different, but still, you know it will be exciting times and um I yeah.
Henrik Göthberg:But yeah, please, please you. But I was thinking about the yep, I was thinking about the latent space.
Anders Arpteg:Let's take one approach, but I think if speak too much. But okay, let me take one theory and see what you think about it so we know of course you're interested in multimodal kind of approaches.
Anders Arpteg:And then today we have large text models or large language models that of course are operating and doing predictions in the token space. In the token space. I think also we know that the first layers of any kind of GPT kind of model is also just encoding from the syntactic to some kind of semantic representation, then a lot of layers and then some kind of decoding happening in the final layers, going from some kind of semantic representation or some latent space back into some token space. Yes, and then you could think, just from an efficiency point of view, it's kind of stupid if you just want to generate a number of tokens, to go back and forth in token space all the time, absolutely and just you know, stay in the semantic space and latent space and just do the predictions there, right 100% yeah.
Anders Arpteg:So that's one thing. The other thing, then, is that if we want to combine different modalities, and then we know that text has some modality and then you have images and the patches potentially from them, if you want to do some reasoning combining them, of course you can't operate in sensor space. You need to move into latent space to do that and then you can combine both the textual representations and the image representation. So if you want to do multimodal kind of reasoning, you need to be not in in sensor space or token space or image space, but in latent space. Absolutely yeah, so, given all of this, it's much more efficient to move in latent space and do the reasoning that we're seeing 03, 01, r star math doing. They are still doing reasoning in token space.
Pier Luigi Dovesi:Yeah, which is terrible. I mean it is terrible right.
Anders Arpteg:So it seems obvious to me why don't someone start moving into latent space reasoning? Because it's very complicated?
Pier Luigi Dovesi:No, but it's absolutely true. Even I always come back to this comparison. Think about diffusion models. So diffusion models are this progressive denoising approach, the early diffusion models, like, if you think, what was the name, the one that you use, the one open, ai, dali? The first one? I think it was applying, I'm not sure I don. Yeah, the first one, I think it was applying, I'm not sure, I don't. I don't remember, but I think it was applying diffusion process on the pixel space and it was pretty bad. Then stable diffusion was, I think, one of the first papers then saying we don't do the diffusion process on the pixel space, exactly, we take the number of times. Yeah, right, so all the the future today have an auto encoder basically around it. So you open the pixel space. Exactly, we take the image. That's about the number of times, right.
Anders Arpteg:So all the fusion models today have an autoencoder, basically around it, so you operate the latent space.
Henrik Göthberg:Yeah, so we even have a pattern here.
Pier Luigi Dovesi:We even have a pattern here. Yeah, so that's the thing when you do, and as I see it, I see chain of thought just are the mental. I mean, of course, mathematically they're not the same to diffusion, but they are already people applying diffusion to chain of thought to get this bridge.
Anders Arpteg:But this is actually what you know John LeCun is speaking about in the JEPA model. So they speak about the embedding joint. Embedding Joint meaning both the input and the output. So you know, first, if you want to have some output, you don't predict the output space, you actually predict in the energy space in the middle. So that's a joint embedding part. You go from the sensor to some kind of embedding, the latent space, and then you do the predictions there and then you have multiple steps this is literally what he's saying in this paper and then you have some kind of world model that is trying to see if you do this prediction, what will the value be from a world point of view in the next state? And then you can judge. You know, should you move in that direction or in that direction. Yeah, this is our star map.
Henrik Göthberg:More or less, yeah, but what are then the breaks? Because you said this is so obvious, we should go here. But why aren't we going here? Oh, it's complicated.
Anders Arpteg:So can we break. You know what are the steps. Then we have to go to the good AI lab and do a paper on this.
Henrik Göthberg:If you bring this angle in, you can get him on board of this paper for sure, because it's complicated.
Pier Luigi Dovesi:we say but what are the steps to go in this direction that we're not doing, then? No, I mean, it's not like we're not doing them. I think there are already some chain of thought papers that are not calling it anymore chain of thought. They're more like this iterative refinement or applying energy models or diffusion models to the embedding space. The thing is that they're just still very experimental and a bit exotic nowadays. And, of course, another problem is that you lose a bit of explainability of the thinking process, but I would even claim that maybe you don't really care anymore about that. Explain a bit. I mean. They think it's like what is the cost of these explainability? If the cost is a worse model than I, personally, okay with lower explain it depends on the use case.
Pier Luigi Dovesi:That's always oh yeah, that's yeah, the same is the same when we have autonomous driving right, that simple, explainable pipeline or an efficient pipeline, exactly so no, but I agree that we need to get rid of english within the model right. We, they cannot reason in english. Doesn't make sense. We, we, as human, we don't reason in english or any language so, or maybe sometimes we do, but it's more like um right, it's not a fully formed english, or or, if it is, it's just some of our thoughts.
Anders Arpteg:There are some abstract version yeah, oh pierre, time is, oh jesus, it's so long.
Henrik Göthberg:We try to keep it within two hours, but we fail we're actually trying to keep it in one and a half hour, but now it's two, and this is kudos to you and this is this is and we've been stopping the one rabbit hole left in the eye and there's so many left.
Anders Arpteg:There's so many left, yeah. One final question then. All right, if we assume that there will be some kind of AGI, and we can think about what AGI means, et cetera, but don't care about that right. Assume at some point we will have an AGI that is better than most humans at an average task or whatever. If that becomes a reality, we can think about two extremes here and a spectrum in between, but some extreme. One extreme would be that we have a dystopian like the Matrix and the Terminators and the machines will kill us all, and when we lose control, because at some point we probably lose control of the AI in some way.
Anders Arpteg:And the other extreme would be that it becomes this kind of utopian world where we live in the world of abundance, as elon call it, where you know the, the price of goods and services will go to zero and we basically will have not only universal basic income, as you call it, but actually universal high income, because we can live in luxury, more or less compared to today, because it's so. We have like free energy, we have free, um, free housing, we have solved cancer, we have fixed the energy problem, we have fixed the, the climate, uh change and whatever, and we start to live in some kind of utopia, where do you think we will end up in, let's say, 10 years? Is it going towards more the utopian or more dystopian kind of future?
Pier Luigi Dovesi:I think both views are somewhat, I see, quite equally unrealistic. To be honest, I think what it it's going to really happen is that, as every other technology, as we develop it further, the real risk is that we will increase further social inequalities.
Anders Arpteg:So both of the two things and the concentration of power. Right, Because that's literally what we're seeing today. In an extreme, I would say.
Pier Luigi Dovesi:And maybe you can even say that these two things will happen at the same time.
Henrik Göthberg:So there will be some part of the world that will benefit this is Farker Jansson's comment from Rice we will have both.
Pier Luigi Dovesi:Yeah, I think then I agree with that. There will be part of the world where, of course, a lot of pruning will be sold, but then at what cost? And then I don't think in any way that this will be a democratic process or that. I mean, we've seen how the world is today and I don't think it's going to change the fundamentals.
Pier Luigi Dovesi:Yeah, change the fundamentals. It's going to just make it more extreme in this sense, and that's also actually something that connects. I mean, this thought is also about the foundation of the good AI.
Henrik Göthberg:Yeah, I was just going to say that because one of the reasons we started we started to talk about the AI divide. We talked about consistently the AI divide for five years and it has always proven that the social inequalities, who the haves and who haves nots, has always been a problem and it ultimately leads to war. It leads to revolution. Um, so if that happens in ai, so so maybe that also puts a finger on you know what, why you, why you have the good ai lab and how we should. You know how we should write our own future, and maybe one of the key objective functions on a macro geopolitical level is inclusivity, or take away the AI inequalities, and that's maybe one of the key objectives we need to work on.
Anders Arpteg:Yeah, and one thing, you know, I've been saying for a long time it's not really that we, it's the problem where AI will take control. Actually, I'm longing for that time. I feel more safe at that time. What I'm really scared about is short term. I'm longing for that time. I feel more safe at that time. What I'm really scared about is short term, and what we're seeing right now is that AI will create a lot of power to the people or the company that have the best of it, and we literally see that and seen that for a number of years with the hyperscalers and now with some Elon companies. And if that concentration of power will continue to increase, that's super scary to me. And this is the short-term kind of and then of course, the whole warfare kind of angle is super scary. But this is the short-term danger, not really when AGI take control, but when people are getting increasingly higher power and the disconnect between the rest of the world.
Henrik Göthberg:That's super scary to me, yeah, but to me then and to end on this note.
Henrik Göthberg:So we could argue about as we had before do we need to have a closer open source models and then it's one evil or one cost over the other. So if you go go the frontier model, that might drive a more and more and more extreme distribution of power or inequalities, and then you have that problem to deal with. And if you go open source, you are doing that because your main objective function is equality, but then you have other risks with that, and which risk is worse? So I could argue, if you look at this on a geopolitical perspective, but a societal, sustainable earth perspective, which one trumps the other is a tricky trade-off.
Pier Luigi Dovesi:Well, yeah, I mean to me it's not a trick. I have a very candid I mean I'm very candid about that. I don't think I think the justifications for not going open source are really coming from. I mean yeah, I mean the profit and the economical creating these entry barriers, so the risks that are upplayed from the frontier model.
Henrik Göthberg:you are thinking about that as a narrative for the people in power and where you really say AI for all. Ai for good is AI for all.
Pier Luigi Dovesi:Well, I'm not saying no. I would say the good AI lab has its own. That's a different story. We are not discussing about it.
Henrik Göthberg:No, that gives us your input.
Pier Luigi Dovesi:But yeah, but my personal opinion when it comes to it is more, yeah, that I think the development of AI, I mean it's not about AI. How can I put this? The thing I'm not afraid either about when the AGI will take control, even though, because, as we move into it, we are understanding that AGI I mean what is even AGI? It's kind of this concept of breaking apart.
Henrik Göthberg:We will pass it and look back at it and try to put a number on it. When did we pass it?
Pier Luigi Dovesi:We are already seeing that. It's kind of a spectrum of it right, so it's not a single breakthrough.
Henrik Göthberg:Level one to five. Okay, now we reach it.
Pier Luigi Dovesi:But I truly believe that it's going to be better for the people who are privileged, but I don't see it getting better for everyone, and that's what we should really work on as a community and as an industry.
Henrik Göthberg:to work on becoming better or inclusive for everyone is maybe one of the key objectives. I think so. Okay, I'd like to end it there.
Anders Arpteg:Perfect, perfect, perfect In the notes. Thank you so much, pierre, for coming here. This was was amazing discussion. I hope you can stay on for a bit longer after the camera turns off and going to speak about so many more interesting topics.
Pier Luigi Dovesi:All right, thank you so much. Thank you so much, pierre. Thank you.