DataTopics: All Things Data, AI & Tech
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
DataTopics: All Things Data, AI & Tech
#73 LLM Hunger Games: The Ultimate Showdown - Rootsconf recap (Part 3)
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
In this episode, we wrap up the Rootsconf mini-series with a thrilling finale with Sophie De Coppel and Warre Dreesen's workshop from our internal knowledge-sharing event:
- AI Hunger Games: A showdown between AI language models like GPT-4, Claude, and Gemini. Who aced coding, games, and social interactions?
- Human vs. Machine: Fun experiments like “Find the Human” and “The Chameleon Game” highlight where humans and AI shine—and stumble.
- Model Personalities Explored: Discover why some models seem nerdy, others boastful, and how creativity plays a role in performance.
- Engineering Insights: Behind-the-scenes on implementing and testing AI models in competitive scenarios, from advent-of-code puzzles to group chat debates.
Join the fun as hosts and guests break down the playful and thought-provoking ways we’re pushing AI to its limits. Let the games begin!
You have taste in a way that's meaningful to software people.
Speaker 2Hello, I'm Bill Gates.
Speaker 3I would recommend TypeScript. Yeah, it writes a lot of code for me and usually it's slightly wrong.
Speaker 1I'm reminded incidentally of Rust here, rust, this almost makes me happy that I didn't become a supermodel. Cooper and Netties.
Speaker 4Well, I'm sorry guys, I don't know what's going on.
Speaker 1Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here. Rust Data topics. Welcome to the data. Welcome to the data topics podcast. Rust Rust. Rust Data Topics. Welcome to the Data Topics Welcome to the. Data Topics podcast.
Speaker 5Hello and welcome to Data Topics Unplugged Deep Dive, your casual corner of the web where we discuss all about the Hunger Games of AI. My name is Murillo. I'll be hosting this intro together with Bart. Hi hey, bart, I cut you off just before we started. Sorry, what were you? You wanted to say something.
Speaker 3That it would be very cool if we would have this, that thing that they have in the movies, to start a scene like the flappy thing that does the clap Ah of course. The numbers and the titles on there. I think that would be a nice prop to have. It would make you feel just a little bit more important.
Speaker 5Well, speak for yourself.
Speaker 3No, I'm just kidding, but I'm just the co-host right To make you feel more like an actual actor.
Speaker 5Let's make it happen. Bart, this is actually for people listening. This is going to be, I think, christmas will just have happened or will just be about to happen. I also checked, so Merry Christmas to everyone. Maybe this can be our Data Topics Christmas gift, the clapping thing.
Speaker 3It could be. I'll write to Santa. Let's see. It's time for the third mini episode today of our RootsConf interviews. Rootsconf is our annual knowledge sharing event, an internal event where we have a lot of our colleagues presenting ideas, projects that they did, the research that they did on a lot of different interesting domains. It's presented through talks, sometimes by a single person, sometimes by multiple people. What Murilo did is that, after these talks, that he dragged some people into the podcast room with him and then we've released these mini interviews a week at a time, and this week will be the third and final one. And what is this one about, murilo?
Speaker 5But before because I noticed that I haven't mentioned that I also presented one, and I is this one about marilo, but before because I noticed that I haven't mentioned that I also presented one, and I just wanted to, you know, share a bit of what I did okay, okay, okay, go ahead, go ahead um strike ups on armor yeah, yeah, I don't know, I just I just noticed that I feel like I'm talking about all these people.
Speaker 5But I also wanted to to share a bit. I thought I had a lot of fun delivery. So it was a workshop. You were there as well, bart. I joined the workshop. Yeah, maybe you can give me your feedback recorded in a bit, but I had a lot of fun building the workshop.
Speaker 5Delivering the workshop was very fun as well, and the idea was to kind of come up with two parts One well, each person basically gets a chat, gpt, and then you have to protect a password that is given by the system prompt. So the first part is like you have, you have to yeah, you can test stuff and you build your defenses. There's also like some programming that you can add to it if you want. And then the second part is that people try to capture each other's passwords, and I think the idea is also to bring a bit the experience experience like, okay, how reliable are these models, how reliable are not these models, what are some things? We can defend it, what is not, and have a bit of a competition healthy competition. I had a lot of fun building it. I learned a lot of stuff in building it as well, and I also had a lot of fun delivering. I don't know what you thought about it, bart.
Speaker 3It was really cool. It you thought about it part it was really cool. It was a bit of a. It was in teams to attack all our teams and you had a bit this uh, we were all in the same room, so that altered a bit to the effect um and uh, it was a bit of a gamification around jailbreaking, right. Yeah, it really gave people and a very intuitive feeling on what is jailbreaking uh, and, at the same time, really actively trying it out. Yeah, really cool, I set it up yeah, it was.
Speaker 5Uh, yeah, it was cool it was. I feel like he went by really fast. I wish I had more time, yeah, thinking then I still ran out of time. But uh, I think I usually run out of time, but that's not what we're here to talk about. What we're here to talk about is the gen? Ai showdown. So actually it was called the hunger games or ai hungry games or something by sophie, the couple and yes um, so what was their talk?
Speaker 5basically they, they had a, they had some games, basically, and they took the big lms I think gem, the Anthropic one, which I think is Cloud, that they use Cloud, sonnet and ChagPT. I think that was it. I don't know if that was the fourth one and basically they had some different games around it. So, for example, one is that they had all the models play the Advent of Code, which for people that don't know, is basically Christmas-themed coding challenges and they see what's the model that went the furthest there. They also had one that was like uh, there's a game called, I think, mr white. I want to say that, um, basically, each person gets a word and then one person gets a similar word or like a blank word, and then every person describes the like, gives one adjective above that word, but the person that doesn't know needs to make it up right. So they did something like this with LLM. So each LLM had like a turn to, yeah, describe it, and then, after a round or five rounds or something, everyone needs to vote who they think Mr White is. So they did it also with LLM, indeed, they also did one was find a human, also with LLM, indeed, they also did. One was find a human. So basically they had questions like again, like around the table kind of thing, and then we had one volunteer from that was watching the session to try to trick, you know, try to give a very chat, gpt like answer, and then everyone votes who they think the human is.
Speaker 5So a lot of like little fun games like that, you know. Um, that kind of highlight the different components. So, for example, the entropic models, what I also hear, and also my experience, but also what I see in blog posts and whatnot, that the entropic models are the best ones for programming today or the ones that look like they have the best results. Um, this was also the model that went the furthest on the advent of code, but it didn't do better on the other ones, right? So also it was a bit funny because talking to them it felt like some models, they had a bit of a personality of it.
Speaker 1Like.
Speaker 5OpenAI was a bit more show-off, like it would really say like, oh yeah, because OpenAI models can do this, this and this um, so it was a, it was, it was, it was. It was very interesting to to hear their insights here and there.
Speaker 3So it's very cool, very cool talk as well, let's go and listen, let's do it all right, thanks everyone merry christmas, happy new year, enjoy the holidays.
Speaker 5You have taste, in a way that's meaningful to suffer Alrighty, and the roots come still Now, with Sophie. Sophie or Sophie.
Speaker 4Sophie.
Speaker 5Sophie, my bad, so I feel like I've been doing this wrong the whole day. Sophie and Wache. Yes, how would you say Wache, wache. You scratched the R a bit, no, but I feel like it's a. It's a Belgian thing, but like Belgians, they all scratch the r. There's something. Some of them don't, some of them like kind of roll it like war yeah, it depends if you have or Flemish or Wallonian ah, yeah, okay, and isn't there like also the scratch, the r like, or you can roll it?
Speaker 5but also I've noticed some people that kind of say the r like this or maybe probably also difference between dialects. Yeah, indeed okay, but it's good, good enough. Yeah, sure, we'll keep working on that. Um, thank you all for joining. I think this is uh. Is it both their first? No, I think, sophie, you're. Last year the roots conf also recorded a short snippet, or no?
Speaker 4Yeah, I did the voice cloning About voice cloning.
Speaker 5Indeed, I remember I was there, I was paying attention, cool. So welcome back, and this is your first time on the pod. Okay, welcome. Thank you. Maybe I know you were there before, sophie, but for the people that didn't hear that one or people that would like a refresher, you know, update Sophie 2.0,. Would you like to introduce yourself for the people that don't know you yet?
Speaker 4So I'm Sophie. I've been at DataRoute for like two and a half years already, I think. I started at DataRoute as an intern freshly out of university and now I'm a fully-fledged engineer.
Speaker 5Look at that, can we get the applause maybe, or maybe the harp, like a metamorphosis kind of thing. Okay, no, never mind, just we can imagine that it happened.
Speaker 4I mostly specialized in ml, started like computer vision, but then quickly went to gen ei.
Speaker 5Uh, all sorts of gen ei by choice or by need well, it started with like the whole dali images generated so you were computer vision, and then there was gen ei computer vision, and then you got got hooked there.
Speaker 4No, in my free time I do a lot of artist hobbies, so painting and everything. So the whole generating images was a big thing, especially around artists. So I got into it with a project, also surrounding artists, with the Prismax, and then I went into text and voice later on. So, like I did, like the full circle and uh, was your internship?
Speaker 5was the style transfer? No, yeah, also a bit artsy yeah right, it was computer vision also in ai yeah, before dali, but then dali came out and completely obliterated my internship.
Speaker 5So well, but it's fine, it was an internship. I also um. Sometimes this happens, like even with nlp. They are like people that were doing research, professors that spend years and then lms come and destroy everything. Or the same thing. I heard it will happen with the um deep learning for computer vision. When it came out I remember even the professor q11. He was explaining. He's like, yeah, and we spent so much time doing apnc, um and yeah, then deep learning comes and it blows everything out of the water. You know it's like now it's llms or not lms, but it wasn't that time. It's deep learning everywhere. So it happens. I also talked to um. Another sophie from uh yeah, was from space. I don't know if space is a company anymore. The people from Explosion.
Speaker 4I met her on the meetup.
Speaker 5On the meetup that you also presented, indeed, and I was also talking because she also has a research background, right, and I share this perspective with her that, like when you do research, you kind of bet in one technology.
Speaker 5And you kind of become a very expert on one thing, but then if you bet on the wrong horse, quote, unquote then yeah, like it's not, like it's, it's not like it's. You throw it everything on the trash, right, but it's like it's everything. This is what's up now, right, like you, yeah, you specialize in one tool and turns out it was another tool that that won everything. So it happens. So for you, it was just an internship for some people with like years and years of work and and I remember what she mentioned, I got lucky.
Speaker 5Yeah, I remember what she mentioned. It was like ah, but we have to believe that we move the needle a bit. Of research, right, like we contribute to all these things, right Like, maybe, yeah, maybe no, that's not the winner because someone spent the time to invest in it, right?
AI Technology and Career Transitions
Speaker 4Yeah, but it's also not like especially for her, it doesn't completely disappear, like some techniques of nlp can just still be used in the llm context yeah, true, I'm also wondering, I mean, how much can you translate topics right, like?
Speaker 5I'm not sure, because I remember for computer vision it was really like the, the filters, which was more like manual right, and I think, yeah, you can always reuse some of it right? I also think that the skill of thinking critically about problem and I don't know for computer vision, for example, the knowledge of the different convolutions and how to extract features and to understand that the images are just matrices and understand this and understand that you can play with images. I'm sure that a lot of it still translates right, but I wonder how much.
Speaker 4But very cool any um fun facts, anything, anything, any life updates since then. Uh, I did like the voice cloning for the mall.
Speaker 5Then after, yes, which was fun appearance.
Speaker 4Yeah, yes, uh, I think famous now I already did it at roots golf, but I couldn't talk about it yet ah, okay, so I had to keep quiet. And that's also why I did the talk with santa, because, like, we worked on the mall so we had the experience, but we like wrapped it around, like, oh, this is just a fun research uh, it was like.
Speaker 5Oh no, it was just friday afternoon. I just didn't have anything to do, so just play cool. What is it? Maybe for people that haven't watched the show or people that are not familiar? What is the mall and what did you do?
Speaker 4So the mall is a sort of team play where you have like one saboteur that they do different challenges and one among them tries to sabotage the challenges but they don't know who it is and they have to guess and the person that guesses the right person wins. And there was one challenge where we created different voice clones of the different candidates and then we said, like the voice of the mole is the only real voice in there.
Speaker 4So, they had to distinguish, okay, which one is like most real. But of course, yeah, the technology was already quite a band advanced, so they couldn't really tell okay, so you got him.
Speaker 5Yeah, okay, cool, cool, cool, yeah, and I think if people can, people still find it online if they look for those, probably. Yeah, maybe if you can share, maybe you can put on the show notes as well, very cool, and now for you back back to you in the studio. Sorry, it's fine, what is fine?
Speaker 2I'll do my best. Yes, so I'm water. I also joined data woods like two years and some months ago, together with sophie you also metamorphosized to a data engineer now yeah, I was a data engineer for a bit more than a year and then I got thrown into gen ei and now I'm here so now I got addicted now, you're not kind of hi, I'm Marvin.
Speaker 5I'm a gen ai addict.
Speaker 2Hi, what's up I do problem engineering for a living yeah, I talk to machines for all day.
Speaker 5Yeah, basically okay, cool. Um, any fun facts? Any?
Speaker 2yeah, fun facts. This is not my first podcast. My first, my first gen ai project I did was immediately for like a podcast for the aws session yes which was like I was selling myself as like an ai expert, and then the last thing I said was it's really easy.
Speaker 5I only started like four weeks ago and they all gasped yeah, so yeah, but I think also for jenny, there's not, it's not like yeah, if you're, you cannot have five years of experience with you, right, so, but um, it's cool and um, what was the for the aw? So it was, it was a podcast, but it was also the twitch live stream.
Speaker 2It was like podcast setting twitch live stream.
Speaker 5Okay, um, so it was for rag chatbots on aws so maybe again for people that we did talk about Rack before on the podcast, but for people that forgot what it is. What is it?
Speaker 2So if you ask an LLM a question, it doesn't know things about your company. So you put the things of your company in a database and then, before you give it to the LLM, you first query the database and you put those answers to your question so that it can look from the articles in the database and give a precise answer, fine-tuned for your company.
Speaker 5I would say Okay, so you did a RAC chatbot for AWS using AWS infrastructure.
Speaker 2Yeah, it was with some articles of the elections of VRT.
Speaker 5Oh, okay, and maybe AWS. What is the state of Gen AI and AWS? Because I know that, well, openai seems to be the big player. Right, openai is a partnership. I don't know if they're partially owned by Microsoft, but there's definitely a tight link between the two. Even there's the Azure OpenAI service, right, so still a lot of well, I think that still a lot of people, when they think of Gen AI or companies, even if they are on AWS, they still have Azure accounts just to use the OpenAI stuff.
Chatbots and AI Game Evaluations
Speaker 2It's quite okay. I mean, aws is fully ingrained with, like the Clouder models which worked pretty well. You had, like AWS Petrog, I think it was called where you had access to all the different models, but in the end we just used the OpenAI APIs and that also worked from AWS.
Speaker 5Okay. So what did you use AWS for?
Speaker 2For the setup and the search service.
Speaker 5Oh, okay.
Speaker 2We used it for the documents and the regs and they had services to connect to OpenAI. They also had their services to connect to open to open ai. They also had their services to connect to claudia. So it was kind of the same but different.
Speaker 5I would say okay so then the infrastructure was like the vector database and all these other things were only the guys deploying in an alanda okay and you work. You had to do some live demos on the twitch uh, yes, did you have backups? No was no. Was it a screen recording that you just pretended you were moving the stuff?
Speaker 2You can share. It's okay, we can stop recording. It was just. It just worked. It was nicely written code, yeah.
Speaker 5I don't put bugs, it's fine, it just worked. You guys don't have the same experience. No, okay, just make it work. The curse of the demo. Okay, it's fine. Okay, very cool, very cool, very cool, very cool. And you're both here at the RootsConf. Are you enjoying the RootsConf? Yeah, of course. Yeah, okay, cool, but you're not here just enjoying the RootsConf. You also presented, you also share knowledge, your knowledge, with people.
Speaker 2No, yeah, sort of.
Speaker 5I like how he's like oh okay. Well, what did you do?
Speaker 4We just had a fun presentation about LLMs.
Speaker 5Yeah, the title was LLM Hunger Games. Oh wow, right. Yes, what is? Yeah, the title was LLM Hunger Games. Oh wow, right. What is it about?
Speaker 4so we went over, we did a bit of the basics of LLMs, but then we quickly went into different kinds of games we created to just evaluate them a bit and let them fight against each other. And we even had a game where we put in a human so that there was some interaction in there and the human also had to fight for their For their life. To win.
Speaker 5Are they here still?
Speaker 4Maybe Dorian is still recovering. Dorian was the one Dorian is a colleague of.
Speaker 5So Dorian was the one. Dorian is a colleague of ours, Maybe Wadir. What were the games?
Speaker 2Yeah, so we started. Basically, we need something for foundation models. What is it? Let's make some games out of them. The ones I worked on was the Advent of Code, which will start next week. So I was like, okay, we all use LLMs for coding, how well do they actually do? So I found some APIs for Advent of Code that you can import your own puzzle data. So I just copy-pasted my assignments into the LLM and it gave me a solution that automatically ran and it just said the answer was 5,000, whatever, and you just fill it in and see if it works. And did it work? One model got to like day five and the others failed before that.
Speaker 5So but uh, and why do they fail? Like it's just because you think? Why would you say that they fail? Is it just because their models are not good enough?
Speaker 2yeah, they gave the wrong answer, but I think the advent of code is pretty clever and like they knew this would happen, so they put their assignments in like very vague long text format. So they tried to persuade you a bit and put it into a story. So it's harder for llms, I think, to understand the assignment okay, okay, okay, interesting.
Speaker 5And which was the model that won? Maybe?
Speaker 2it was the cloud model, the latest one, and gpt for row was, like always, nicely formatted, but it just wrote wrong code.
Speaker 5Yeah, I also, so I'm actually using Cursor. These days, cursor is like a VS Code fork for AI stuff. They also have models you can choose from. I have a feeling and also this is the general opinion that Claude 3.5 Sonnet, so it's the latest Claude model. It is better than ChachPT 4.0. That's what I For coding but also it agrees with the results you find.
Speaker 2Results are everything but scientific. Well, it's empirical.
Speaker 5If you did 1, 1000 games and claude is on, wins 90, 90 of them. That's. That's research. Right, like this dude's like benchmarking. They just try how much stuff and see who's who's on top we didn't have time for so much. I did five games, but you presented this well, um, okay, cool. So then it was like each model they were trying, and then you had dorian as well.
Speaker 2That was also trying some things yeah, for that one we had like a group, group chat setting and then we are like just talk to each other and try to guess who the human is between us so we had a chat conversation with different llms and then dorian also in there, but disguised as a player, and the chatbots had to find the human among them, and Dorian had to disguise himself.
Speaker 5So this was a different game. Yeah, it's different. So one game was basically a competition to see who can go further in the Advent of Gold. This was just LEMS and Claude won, yeah. And then you had this one that was like all of them are in a chat, um. And then you had this one that was like all of them are in a chat room and then based only on the text, right.
Speaker 5So I guess the latency is not a thing no just based on the chats, then, uh, they need to find who the human is. Yeah and uh, they could find them. Story yeah, yeah, they found dorian yeah very easily well not all of them which models did you use, by the way?
Speaker 4OpenAI, Gemini and Cloud. Openai 4.0 Cloud 3.5, sonnet and Gemini not the latest one, but the one before.
Speaker 2I forgot the name the latest one came out last week and how does it actually work?
Speaker 5one before, I forgot the name okay and the latest one came out like last week yes, okay and uh, okay and then. So how does it actually works? Like so, they are in a chat room, so each, each bot has a turn, or how does it work?
Speaker 4yeah it's. Each bot has a turn and you have like a sort of uh, general prompt that this guy that describes the game and it's visible for all LLMs, and then you have LLM-specific prompts on how their tactic should be. And then there is the group chat where you have all the messages that get sent to all the different LLMs.
Speaker 5Okay, and then how does the guessing happen? At any point or after five messages, everyone says okay, now you vote. Who's the?
Speaker 4Well, you can implement it in different ways, but how we did it is like five rounds of talking and then one round of voting.
Speaker 5Okay, and then that person gets kicked out. It's almost like the werewolf game A bit yeah, what's the goal?
Speaker 2a bit Something like that.
Speaker 5Okay, cool, and you developed this. What was the UI or something? Is it like a streamed app or is it a?
Speaker 4Well, we just ran it in Terminal. Oh good, so not super fancy. We built mainly upon an existing repo called Chat Arena, okay, but we had to implement some stuff ourselves, like the game, of course, but also like backends for Gemini.
Speaker 5I guess the reason I ask is if there's someone listening that wants to give it a try. Is there something that is possible or not yet, or not at all? I'm not going to work on this.
Speaker 4The general repo is available online or fork of it isn't, I think.
Speaker 5No, but maybe you can put the public repo as well in the show notes you need some work to get it working with your api keys.
Speaker 2Like we had azure open ai, so we had to have our azure account in there and that requires some work.
Speaker 5You have to know a bit what you're doing yeah, I see, I see, I see, and um how, how easy was it for the models to find those doors? It was like first voting. They all guessed. Which model did not guess?
Speaker 2doria the first, I think, open ai oh really I'm not sure oh, I don't remember.
Speaker 5I thought it was the second one, so claude claude, then yes so claude is like yeah claude is like a nerd that has no social interactions.
Speaker 2Uh, knowledge it was really weird, like Claude was really targeting Torian at one point it was like normally everyone asks like general questions and Claude was really, like you know, you, player four, what do you think of this? And then in the end he still voted wrong. So I'm like I thought he's really onto it, he's really noticing it, and then he didn't vote.
Speaker 4But the questions were also a bit like all over the place right, the temperature was pretty high, I think yeah, maybe I think I know what you mean.
Speaker 5But people you said like the temperature is really high, it's like, well, it's winter in belgium. What do you mean?
Speaker 2so when you have a model, you can give it a temperature that basically tells it how creative it is. So if you have it all the way at zero, it's almost deterministic. I would say it's always the same very dry, boring thing. And if you have it like all the way to the max, it's just, yeah, throwing stuff out, slightly coherent with it, but can get very creative if you have a very high temperature. I see, I see, I see. So for the spectacle we did put it very high.
Speaker 5I think um, maybe also so how many games you had? You said you found five, five, yep. I don't know if we have time to cover all the games, but like so you mentioned it was, I think, maybe also.
Speaker 4so how many?
Speaker 5games you had. You said you found five, five. Yeah, I don't know if we have time to cover all the games, but like, so you mentioned it was like the find, the human chat, the advent of code, which, by the way, for people that don't know what the advent of code is, if you're listening, this December there's a guy that he just puts one. There's like the advent calendar, so it's like one chocolate or whatever per day until Christmas. The advent of code is something similar. So every day there's like a puzzle and then, based on the puzzle, you also get like a text file or something. Then you have to manipulate the text file to get the answer. You submit the answer and then, as you do, you unlock other levels. Right, so that's what the advent of code is. So you can also have private leaderboards. It's quite a lot of fun. It's in the month of December. So, people, if you're listening now you can Google the admin code or we can put in the show notes as well. So coding challenges.
Speaker 2Chatbot. What were the other ones? I also had a look at an image, because models are multimodal these days. So what does it do with an image? We took one of the fun pictures of people wrestling in summer suits at one of our day with steam buildings. Okay, we asked them like describe this image and what does it tell you about company culture and stuff? Okay, but I would say the results were pretty disappointing. They were like all very dry, just explaining what happened and I was like expecting some funny answers or whatever. But okay, it was pretty. It was like hr, like the hr approved message that rodents like super exciting.
Speaker 5You should have asked the the twitter yeah, you know yeah okay, cool, uh, what?
Speaker 4so that's the third uh, we also just had like a simple group discussion where they could decide among themselves which is the better alarm oh okay, and did they agree, or something?
Speaker 5yeah, but it was because openai started, uh, promoting itself I like how, like these different models, they start to have personalities right they have the the clot is the nerdy one that doesn't talk to people, stays like in their cave. Very good programming but cannot tell anything about social cues. Then you have open. It's very cocky.
Speaker 4Yeah, okay, yeah it was actually funny because we put specifically into the prompt like don't vote for yourself, but opening. I immediately started to focus, just like.
Speaker 5Sorry, I cannot go against my nature okay, and what was the last one? Nature Okay, and what was the last one?
Speaker 4What was the other one?
Speaker 2I'm thinking.
Speaker 4And a chameleon.
Speaker 2It's basically the same, but without the human in the loop.
Speaker 4Yeah, a chameleon game, so one person.
Speaker 5One bot is the chameleon, and they have to find out who's the chameleon.
Speaker 4Well you have a topic that is secret, and each time they have to give a hint about the topic and the chameleon has to try to blend in, as it also would have known the word there's a game of Mr White or something, yeah it's the same cool and which was the?
Speaker 5what were the results there? Any insights? What was the best model? What was the most?
Speaker 2I would say Gemini was like really human, like, I think, in general for all the experiments it always sounded like the most human, but it was sounded like the most human but it was also like average at everything. So it was like a bit.
Speaker 5It was very human.
Speaker 2It didn't have any strengths, but it was like very natural. It didn't sound that much like a robot.
Speaker 5Cool and the presentation. You did this all live or no?
Speaker 4Yeah, we did live coding, but I did have video backups.
Speaker 5Okay, this all live or no? Well, yeah, we did live coding, but I did have video backups. Okay, cool, nice, smart. Yeah, yeah, um, but all these experiments you showed like live and all these things, okay, very cool, maybe also from this. Maybe last question to wrap everything up what would you say on the hung, these hunger games? Who was the actual big winner?
Speaker 2if you had to choose one I think for me claude surprised me the most because it was like just easy to work with and it was actually good at stuff. I was like expecting GPT is taking up everything everywhere, but it was like cloud was really good at coding and all the rest it was yeah, for me cloud was a big surprise okay, what about you, sophie?
Speaker 4I would say cloud or Gemini also, but I'm maybe a bit biased because I have a lot of experience with OpenAI. So you already recognize the way of talking and everything.
Speaker 5So it's not surprising anymore. Okay, but then so you said Claude or Gemini.
Speaker 4Yeah, gemini is more creative. I think, I think, yeah, and more out of the box. But yeah, very cool can also go wrong, of course asking all of them, or even people right you can always go wrong, all righty.
Speaker 5So thanks a lot. Thank you, yeah, I don't know if I've been doing this with other people, so I don't want to. I want you to feel left out, but uh, thanks a lot for joining. Uh, it will be really cool as well. Maybe we can even once we meet at the regular session, we have the light, the camera and everything. Maybe you can even show some things. Could be a lot of fun as well.
Speaker 4Okay, so I'm not deleting the code yet.
Speaker 5Don't delete the code yet, but it sounds a lot of fun, very cool. Thanks for joining me to chat today. Thanks for having us.
Speaker 1Thanks y'all you have taste in a way that's meaningful to software people hello, I'm bill gates I would.
Speaker 3I would recommend uh typescript. Yeah, it writes a lot of code for me and usually it's slightly wrong.
Speaker 1I'm reminded it's a rust here, rust. This almost makes me happy that I didn't become a supermodel. Huber and Netties.
Speaker 2Well, I'm sorry, guys, I don't know what's going on.
Speaker 1Thank you for the opportunity to speak to you today about large neural networks. It's really an honor to be here. Rust, rust, rust, rust. Data Topics. Welcome to the Data. Welcome to the Data Topics Podcast.