E150 - KB-Whisper: Outperforming OpenAI with Smaller Models - Love Börjeson & Justyna Sikora Artwork

AIAW Podcast

The Artificial Intelligence After Work (AIAW) podcast is a weekly live streamed long format conversation aiming to demystify data innovation and AI, as well as their impact to future business and society by bringing the listeners close to the challenges that AI practitioners aim to solve today. The case-study, industry-by-industry, human-focused, and guest personal angle on the topic approach makes the podcast educational, emotional, engaging, and entertaining to all who are interested in learning more about AI, the future developments in the area, or simply getting exposed to variety of topics from practitioners and experts with first-hand industry experience and knowledge in the topic of the day. Hosts: Anders Arpteg & Henrik Göthberg. Program Manager: Goran Cvetanovski

All Episodes

AIAW Podcast

E150 - KB-Whisper: Outperforming OpenAI with Smaller Models - Love Börjeson & Justyna Sikora

March 15, 2025 • Hyperight • Season 10 • Episode 8

In Episode 150 of the AIAW Podcast, we delve into the remarkable world of AI-driven language innovation with Love Börjeson, Head of R&D & KBLab, and Justyna Sikora, Data Scientist at the National Library of Sweden, as they reveal the creation of KB-Whisper, a pioneering AI model that revolutionizes Swedish language processing by handling diverse dialects, historical recordings, and the intricate demands of cultural preservation. From the model’s technical architecture and massive data clusters to its social impact and ethical implications, this unscripted conversation offers an unmissable look at how AI is shaping language research today and what lies ahead for Swedish AI in the next five years. The main gain of the episode can be captured by these compelling angles: The KB Whisperer: Outperforming OpenAI with Smaller Models, Sovereign AI: How the Swedish National Library Built a Superior Speech-to-Text Model, From Royal Collections to State-of-the-Art AI: KB Lab’s Journey to Create Whisper Models for Swedish, Small Team, Big Impact: How 12 People Beat Silicon Valley at Their Own Game, and Beyond the Model: Cultural Sovereignty and the Future of Language Technology. Catch all the insights, breakthroughs, and future directions by tuning in to this thought-provoking episode—now available on Spotify!

Justyna Sikora: 0:01

It did work with the mic, so just a person standing holding mic and then but this is the classical demo devil. And everyone was using snafu.

Henrik Göthberg: 0:09

Yeah, situation normal, all fucked up. Yeah, and here we are demoing voice-to-text model yeah, State of the art.

Love Börjeson: 0:21

Beating open eye, no, but it's silent.

Henrik Göthberg: 0:25

But take us back a little bit. So we are talking about one way, I think one of the bigger sort of in quotation marks launches of a KB model, Because this was newsworthy. So back up the tape. What was the thinking or why did you want to have a launch, and how did it all go?

Love Börjeson: 0:46

Yeah. So the team Lenora was one core part of that team that was led by Lenora Westerbaker Olsson. They had trained a whisper model, which is a sound model, a speech-to-text model, and we saw early on that the results of this model were fantastic. So we could clearly beat the OpenAI model in several different measures and etc. So we thought this is clearly newsworthy because we're going to share this model as a research result, so it's open for everyone to use and it's going to prove useful in in like so many use cases. We already see it happening right now. Um, so we wanted to sort of make a little bit of a splash. We don't normally do that, but I think the time was right to do it you can't do it for so many we're waiting for this yeah

Justyna Sikora: 1:42

so it's perfect time to do it.

Anders Arpteg: 1:44

So how did you do the splash? Did you invite people then to come there, or how did you arrange the actual launch?

Love Börjeson: 1:51

Yeah, again, I think, especially also around the event, it was very much Leonora's doing, so she set up. You know, there's a tool from the communication department to set up an invitation list and then there was some press release, and then it was the word of mouth and LinkedIn.

Anders Arpteg: 2:10

You had a formal press release as well before Antoon. Okay.

Henrik Göthberg: 2:15

And I saw the press release and I saw I was on the short list. Thank you for that.

Love Börjeson: 2:20

And there's some really nice photo of the team, I think on the blog, including Justyna, I think on the blog including Justyna. I'm not in there, sorry, no, no, awesome. And then it came. How many people attended the launch? It was a full house, which the big auditorium, or whatever you want to call it, the KB, is a little bit more than 100 persons.

Justyna Sikora: 2:44

Plus some on Zoom.

Love Börjeson: 2:46

Yeah, on Zoom as well.

Anders Arpteg: 2:47

I don't know how many there are, and you had the. What was it? The head librarian? Is that the proper term?

Love Börjeson: 2:53

Yeah, the national librarian. She's sort of the CEO. That's the CEO of Kungliga Biblioteket. Yeah, Riks Biblioteket. I think that's about the coolest title you can have.

Henrik Göthberg: 3:02

Yeah, that's a super cool title.

Love Börjeson: 3:07

So she and me did the sort of the big cutting the ribbon thing.

Anders Arpteg: 3:15

You actually had a ribbon? No, we should have.

Love Börjeson: 3:19

All these mistakes. I wasn't in the photo, we didn't have a ribbon, the sound didn't work.

Desktop AI - Maya: 3:24

But now we're here.

Love Börjeson: 3:26

Now we're here. That's a good thing. And then there were three kind of rather technical presentations. Lenora did one about the background and also the computational resources used in HBC we're going to talk more about that later. And then Fatoum Rekatati did a really good presentation about the data wrangling around this, which is super complicated and also talking about the importance of data sovereignty etc and the importance of the Swedish domain of data. And then Justyna took also kind of of complicated presentation but with a few good sort of the selling points showing how much better this model is than the previous models.

Anders Arpteg: 4:14

And we are getting into all these kind of details and it's super cool that we have this awesome I think AI lab in KB. You're producing one model after the other that I think AI lab in KB. You're producing one model after the other that I think is so useful for the Swedish and other societies that we have in companies.

Henrik Göthberg: 4:30

Yeah, but I must like the cliffhanger that we're going to talk about beating OpenAI's model in what we can achieve, but, more importantly, with a smaller, more manageable model.

Justyna Sikora: 4:42

Yeah.

Henrik Göthberg: 4:43

That is super cool from a small team of 12 people in KB Labs. Yeah, that is rock and roll, rock star stuff, so cool to talk about Super exciting yeah.

Anders Arpteg: 4:55

Awesome. But before we dig deep into these kind of details how that was achieved and what you can use it for let's first welcome here L Lovi Börjesson, an old friend that's been here before and is an awesome person that I think one of the more knowledgeable people about you know, how to really train large models and have done so with success so many times. But perhaps, Lovi, if you could, just who are you? How did you get into the role that you have as head of Rnd at kb lab?

Love Börjeson: 5:27

yeah, so I, I'm a former mariner who flew to the shore, uh, when I was 30, and then did the phd and ended up at stanford university computational social science lab. So I'm not a data scientist by training. I'm a sociologist of some kind, you know, and from there I moved back to Sweden, started a few lab in the industry and then was hired at the National Library of Sweden, kunglöbiblioteket, to start a data lab.

Anders Arpteg: 6:02

What year was that approximately?

Love Börjeson: 6:04

2019 Approximately, I to start a data lab. What year was that approximately?

Henrik Göthberg: 6:06

2019. Approximately, I only know two people who present themselves as a computational sociologist you and Mikkel Klingman. Yeah, where did you guys meet?

Love Börjeson: 6:17

So weird, but in a small marketing analyst sort of company I was consulting, there you were consulting with ProSales.

Henrik Göthberg: 6:26

Yeah, oh really. And then he was the lead. You know he works with me in ProSales now and he updated his title Now he's senior mad scientist, oh yeah, yeah. And then in ProSales he had to use mad scientist, okay, cool. So he was like the, but to use the word computational sociologist, I find that.

Love Börjeson: 6:49

It was interesting because they were sort of they didn't understand him, I think at ProSales, and we started to talk and we understood each other immediately. Okay, so you're a nerd. So, anyhow, I started up the lab in 2019 and it's been growing since.

Anders Arpteg: 7:09

Yeah, and a bit about the lab today. How many people are you? What do you do?

Love Börjeson: 7:14

in a short so the lab itself has sort of expanded a little bit and now so the lab itself is bigger. But now the lab is also part of a unit that is uh, the r d unit at the uh, at the national library.

Henrik Göthberg: 7:27

So I'm headed up units so we have all kinds of weird research going on people who are experts in latin but this is the realization, but it's a fundamental reorg that happened like a year ago or like uh, when was this justine a couple?

Love Börjeson: 7:41

of months ago.

Justyna Sikora: 7:42

I see a couple of months ago.

Henrik Göthberg: 7:44

I think yeah a couple of months. So 2024, you had a consolidation of business units into the RN. Yeah, yeah, okay, yeah.

Love Börjeson: 7:53

I don't think they will recognize themselves as being part of a business unit. So, anyhow, it's appropriate. But the lab itself. I mean when we started the lab, what K? I mean when we started the lab, what KB had. Kb is the National Library, so KB for short. What they have realized was that okay.

Love Börjeson: 8:12

So we have these mad scientists now who want to have data sets, whatever is that. So we have the collections that are optimized for storing, you know, and then they want to have something they call data. We don't know what that is, so we need something in between, probably data data. We don't know what that is, so we need something in between, probably data science. They didn't really know that, so they hired me to do that. So me and Martin Malmsten, who was one of the founding fathers, original gangsters of the lab, together with Emma Rende, we did that to produce data sets and, as we were talking about, the first data set took us a year. Now we do it in a matter of hours. So that was the first thing Create data sets for scientists from the collection.

Anders Arpteg: 8:51

And perhaps people don't know about KB actually having so much data because of the Swedish plick log.

Love Börjeson: 8:57

Deposit law, yeah, and you know the date, of course, everyone knows it's 1661. When that law came into existence, yeah, it was the law for censorship, of course, uh, back then, and now it's a tool for democracy. So it's pretty cool, it's pretty much the same, but it has warped but but on on a global scale.

Henrik Göthberg: 9:16

What do you know more about? How does it work that we have this law, that actually have, we have that. We have then collected everything. And then, when we say everything, I think it's hard for people to understand that when you have a poster for a rave party in the 90s, it actually at the printer. You know when they're supposed to do thousand prints, they do a thousand two prints because they sent two of the posters to KB Seems forever. Yeah, on a worldwide scale. Who has that fundamental hoarding?

Love Börjeson: 9:53

going on. Who has the hoarding law more than us, I think, in terms of deposit law? That's an interesting question. I think our law is one of the first Because the Swedish Union State was one of the earliest in its form. You know, Protestant Union State, it was sort of something taking form that was separate from the royal family, and I mean Oxenstierna, or is it the other way around? I keep forgetting. He was forming that state and this was one of the part of that, you know deposit law.

Henrik Göthberg: 10:25

And what is it? Is it 16? 1661. 1661. Yeah, that's the law. That's the law.

Love Börjeson: 10:33

The origin of the library itself is the collections of the royal family Vasa, who is, you know, renaissance people. They wanted to have a library, naturally, so then it was truly a royal library. So the first catalog is from the 16th century, so like 100 years earlier, but it came into existence as a library during the 17th century. So 16, something, something, and with the deposit law, and then we had the big disaster 1697 fires yeah.

Love Börjeson: 11:04

Of the royal castle, and then it has been more of a national library, actually keeping the name sort of witnessing of the origin of the library. So it's still called Kungnäbibetäckning in Swedish, but it is for all intents and purposes a national library.

Henrik Göthberg: 11:22

But it's old in this sense, but it's also, in this crazy breadth, the the breath of what you have collected.

Love Börjeson: 11:29

that is also absolutely, because we have everything printed. So what people are thinking about is obviously books. So we have books, yes, but we also have what we call ephemera, which is the everyday collections, which are pamphlets and posters and what have you menus andave Menus. Rave posters, the rave collections, yeah. And we all have radio and TV broadcasts. We have TV. We have computer games, we have, yeah.

Henrik Göthberg: 11:58

Yeah, it is mad when you think about it, when you start thinking about the petabytes or it doesn't matter. Do you have a number for petabytes? Yes, Is it more than peta?

Love Börjeson: 12:08

So someone's probably going to say that I'm wrong about this, but the latest figure I heard when I started was 26. 26 petabytes Cool stuff, anyhow. So the first thing we did was to compile datasets. I'm going to stop talking soon, but the thing we did 2019, this was after the launch of the first transformer model from Google. That came out 2018 in the winter, and me and Martin started to test this multilingual model and we saw that it blew everything that existed for the Swedish language just out of the water. So all these dependency models they were sort of obsolete overnight. But we also realized wait a minute, we can beat them. We can be the world's best lab for the Swedish language. So KB Bert KB Bert came into existence a couple of months later because we didn't have any computational resources. It took a while to train that one. We released it I think it was january, perhaps february 2020 and it has been downloaded like million times since yeah, and then you grew into several different.

Henrik Göthberg: 13:14

You know word to wake? Blah, blah, blah. Variations for different science purposes awesome and very welcome here as well.

Anders Arpteg: 13:24

Justyna, I hope I pronounced your name correctly. Data scientist at KB Lab. Right, please describe a bit. Who are you and how did you come to work at KB Lab.

Justyna Sikora: 13:36

Yeah, I think in comparison I'm just a standard language nerd. I started studying languages back in Poland, so it was Swedish, dutch, german. It was before all transformer models, all Whisper, chat, gpt, everything. So my knowledge about natural language processing was quite limited. But then I started working as translation project manager and I saw how this field is transformed by all these new technologies and I thought, oh, that's really cool. So I found this program language technology here in Sweden. I moved into that and then found my job at KB. So that was my story.

Anders Arpteg: 14:17

What year was that approximately?

Justyna Sikora: 14:20

Around four years ago, yeah.

Anders Arpteg: 14:22

Nice, but I heard something.

Henrik Göthberg: 14:23

There was a project or consortium, you know. Yeah, Nice, but I heard something. There was a project or consortium, you know that you were on.

Justyna Sikora: 14:28

Yeah, exactly, very cool project that I'm a part of You're still part of that. Yeah.

Henrik Göthberg: 14:32

That's actually how the funding is part of it, how this works, how you've ended up.

Justyna Sikora: 14:36

Yeah, tell us about this. Yeah, I was employed as part of this project, so it's called human infrastructure human infrastructure exactly infrastructure exactly, I'm a bit slow today.

Anders Arpteg: 14:58

Cool, and you have now been working also with a whisperper, the main topic of attraction today, so to speak. Right, yeah, I was part of the team that developed our Swedish suit of Whisper models and perhaps, if we move quickly into the theme, perhaps you can start by explaining how did the origin of KB Whisper came about.

Love Börjeson: 15:25

Why did you start working with it? Yeah, so everything that relates specifically to the model used in it is going to have to be answered those questions, but the background was that we had a wave-to-vec model, which is the sort of architecture for some models that predates.

Anders Arpteg: 15:37

Yeah, sorry, some people may not know what Whisper is, so we could perhaps give a short background what can you do with Whisper? And then a bit of background for where to work and whatnot.

Love Börjeson: 15:47

Yeah. So about Whisper? What can you do with?

Justyna Sikora: 15:49

that? Yeah, sure, it's called ASR model, so it's Automatic Speech Recognition model which you can use for transcribing any audio that you have and then getting transcriptions for that.

Anders Arpteg: 16:03

And Whisper started. Origin came from OpenAI to start with, but now you're building your own, basically right.

Justyna Sikora: 16:10

Exactly, we're doing something called continued pre-training, so we're building upon the OpenAI models.

Anders Arpteg: 16:16

So you started from the way they have released and then continued training from that. Okay, good.

Henrik Göthberg: 16:21

So Whisper was the original word Whisper. That was the name of the OpenAI model. So Whisperer was the original word Whisperer. That was the name of the open AI model. Absolutely.

Love Börjeson: 16:29

So I mean Sartre, the French philosopher. He said that intellectuals need to be parasites. Exactly.

Henrik Göthberg: 16:35

Yeah, that's what we are Standing on the shoulders of giants.

Love Börjeson: 16:38

Yeah, that's another way to express that. So yeah, we take architectures, and preferably also the weights of the architectures, and we do stuff with that.

Henrik Göthberg: 16:50

Actually, I used that analogy for one of the opening speeches at Data Innovation Summit. We talked about adaptability or learning. The way to go faster now is to learn and adapt from what is out there.

Love Börjeson: 17:08

Yeah, I mean, sweden is a small country, it's a low resource language and the lab is very, very small, so we need to have these strategies.

Anders Arpteg: 17:18

Okay, so you started to train the Whisper model from KB and can you just start or explain a bit more? How did we get started? What was the first things I really get you know?

Love Börjeson: 17:28

started planning how to embark on this project there's a fundamental difference between the way to act model and the whisper models. The way to work model is primarily trained on data that is not annotated. The only train on sound, no text and way to make just to explain it a bit more, that was not annotated.

Anders Arpteg: 17:46

You only train on sound, no text. And Wave2Vec just to explain it a bit more, that was not a transformer-based kind of model, right? And can you just explain what is Wave2Vec?

Love Börjeson: 17:53

Well, it was sort of a more basic sound model that you could fine tune to do the same stuff that we do with whisper, but whisper do it better so you can have a. But it's trained in a very similar way as the big text models. You take away a little bit of sound and you have the model to guess what that is. You know. If it's correct then you will strengthen the network and otherwise you won't. So we we had a good way to vec model, but we also realized that we want a Whisper model. That's the problem for KB. What do we have? We have unannotated data. I mean, that's our strength, and Whisper takes annotated data. So that was the first problem that the team had to solve. So there was a lot of cooperation involved that Leonora was mainly responsible for.

Justyna Sikora: 18:44

But yeah, we actually we used some of our collections, but it turned out that you know, in the beginning you think, okay, I have so much data, and then there is looking for what's in Swedish, what has Swedish subtitles, and then that application because you probably don't want to train on 50 Donald Duck recordings, right, you want to take it out. So we looked for some collaborations and we found some great people who helped us with this process.

Anders Arpteg: 19:17

But you say you have unannotated data and for people that may not know, but of course you need to have the output somehow to train a model. You need to know what the text should be for the voice and audio that you put into it. But don't you have a lot of like text associated with each, like SVT production or whatnot? Is it just the timestamps that you don't have or what is not really annotated?

Justyna Sikora: 19:41

It depends. It depends on what kind of format of the data we get. So sometimes it's just built in subtitles in the TV, so we can't really use that. We would need to do some OCR processes to get it from there.

Anders Arpteg: 19:55

Okay, so you don't have it in like structured format or in text format Exactly.

Justyna Sikora: 19:58

So we used everything that we could, but then back in time, it's so subtitles for me.

Henrik Göthberg: 20:03

oh, you have a text on this? No, you don't, because you have it in this image, so you need to have an OCR technique to get that out. So it's complications on complications here.

Love Börjeson: 20:13

Yeah, and another complication. Correct me if I'm wrong, stina, but it's that the subtitles. They're not per words, it's a translation to something else which is suitable for training a whisper, because that was one of the strength of the whisper architecture that you can actually use annotation so you know text that weren't matched perfectly with the sound and still works. That's the magic of it. The problem is you want to have an asr function that also works for other kinds of transcriptions more verbatim yeah thank you.

Henrik Göthberg: 20:49

Verbatim meaning on the fly, or what does that mean?

Anders Arpteg: 20:53

no, when you have subtitles, you usually have some short forms summarized exactly the space is limited, so you can put everything, and whisperper is less sensitive to that compared to Way2Vec. Is that the way to face it?

Love Börjeson: 21:08

Absolutely. But we also want it to know how to do verbatim. Verbatim, say the word. Now I can't do it verbatim yeah, thank you, I won't say it. So, yeah, we wanted to be able to do verbatim, and then we need to have that kind of data as well, and we didn't have it.

Henrik Göthberg: 21:29

But when you said you had, you know, in the launch a quite technical, three technical presentations, where one was on the whole data wrangling. Yeah, Is that what we're talking about now, or is that something else?

Love Börjeson: 21:42

Yeah, it concerns that. Yeah, but the big sort of matching process that, uh, faton ricatati did, was the. We had the transcripts of the parliamentary speeches, we had the sound files, but they were unmarked, you know, and he had to match them somehow because if he could figure that out, then you had a fairly good set of transcripts and sound.

Justyna Sikora: 22:09

And that's what we have, thanks to Faton.

Henrik Göthberg: 22:12

And so this became one of the key pieces of the strategy how to do ASR Actually. So it's the Riksdags, what do we call it? The parliament, the parliament Addresses. So then, parliament addresses is one of the core strategies in this training approach yeah, and I mean it was uh just to get that data.

Love Börjeson: 22:34

Uh was you know. It's like a spy thriller almost I mean they ended up delivering a actual hard drive with data, oh yeah you can?

Henrik Göthberg: 22:45

you can't have an USB on this.

Love Börjeson: 22:47

I don't know it's completely open but, you know, for various reasons. So that was one thing. The other thing was to sort of the data running was super important. The yes to rig the collaborations was another thing, because previously we had to be sort of not doing stuff in a bubble but depending on our own data. Now we depended on external data, which is on, you know, unfamiliar situation for us okay, so how you organize the collaboration, the project management, the coordination?

Love Börjeson: 23:17

piece. Yeah, yeah, yeah, and it is a lot of sort of inter-organizational politics going into that which Eleonora handled and Faton and Justyna and Agnes also part of the team.

Henrik Göthberg: 23:30

It's always more than you think.

Anders Arpteg: 23:32

So what were the main challenges? One, of course, was the data wrangling part. I guess in getting this started Were there any other major challenges that you have to tackle before setting off the training?

Justyna Sikora: 23:48

I would still say something about the data, because it was really months of first obtaining the data?

Anders Arpteg: 23:55

When did you actually start the whole project of KB Whisperer?

Justyna Sikora: 23:59

It was like a year and a half ago, I think. First we get this development access to see if we can do it and how the computers work, like the HPC environments and just getting ready.

Anders Arpteg: 24:12

We're getting more into the HPC part very soon, but yeah, cool One and a half year ago.

Henrik Göthberg: 24:18

So if you summarize the key steps around organizing data, how would you summarize that in sort of the key process steps, not technically, but sort of first we needed to get the access, or blah blah blah, then we needed to figure out the model. How would you summarize that? Yeah, exactly.

Justyna Sikora: 24:37

It's hard to say, because we had so many different data sources that it was iterative process. Because in the beginning we thought, okay, we work with our data, so we prepared the pipeline for our data many different data sources that it was iterative process because in the beginning we thought, okay, we're working with our data, so we we prepared the pipeline for our data, but then we got different sources which had different problems that we needed to solve. So, for example, if you're working with subtitles, you need to clean it in a specific way. You don't want to train on data that has some comments inside, like someone is singing or the door shuts or something like that. But then we were working also with this dialectal data from ESOF, which is Institute for Language and Folklore. I believe no.

Love Börjeson: 25:18

No.

Justyna Sikora: 25:18

In English. Yes, and it was also a different type of data, something that I couldn't really understand as a non-native speaker. So we were listening to these recordings and they were really, really hard for me, and I saw my Swedish colleagues also so dialectals like hardcore Gotlandsk.

Henrik Göthberg: 25:38

what could it be?

Love Börjeson: 25:39

Yeah, I don't know. It's a different kind of language.

Henrik Göthberg: 25:46

It's fun to come from.

Love Börjeson: 25:47

Boroz, I think Listerlandet is one of the really you know the western part of Blekinge. That's really hard.

Anders Arpteg: 25:54

That's hard, okay. So a lot of data challenges to just get started and figure this out. Perhaps at some point you still need to think more about, I guess, the code to train or the way of the architecture. Are you going to use the same one? Are you going to change it somehow? What about the way of training it? Could you just reuse the code that you had, or how did that get started?

Justyna Sikora: 26:20

So we used the code that was out there on the internet. So we found some great guidelines from Hagen-Thais and we said, okay, we will base it on this. But then, of course, our data was completely different and we were handling not gigabytes but terabytes of data. So the methods that they were describing didn't really work for us, because it was okay, everything crashes, what now?

Henrik Göthberg: 26:44

So you had to really do method development here exactly.

Justyna Sikora: 26:47

We need to rethink everything. So that's why it was iterative process, because, okay, first when we're just testing out everything, the scale was smaller, but then when we got the real data, then the problems occurred can you get a bit technical about the tech stack.

Anders Arpteg: 27:02

Potentially you were using what libraries or what, what code or something that you can share.

Justyna Sikora: 27:10

Yeah, so we were using this Hugging Face notebook first, and then let me think for a second, because we used so much different frameworks and we were Let me Were you using the hugging face like pipelines and stuff, or was it uh? No, I think now when I say that, uh, it was uh hanging face um first, but um, where are you? Oh my god how can? I not know if it was a Hugging Face implementation or PyTorch. Sorry for that everyone, oh my.

Henrik Göthberg: 27:47

God. So there's pieces from Hugging Face and then some of the example of pieces from PyTorch libraries. You tried out. What was that?

Justyna Sikora: 27:59

Yeah, so we used it for like feature extraction, of course, because we needed needed to first, because we had so much data we couldn't do it on the fly, so we we split it. So this was also a big modification from what they had in the beginning. They could also only just run everything at once, but we need to split it. So first we did this data preparation, feature extraction, preparing texts, and then just running the data, hiding face data, so it's loading everything into memory, and then, when we had everything in place, we could train.

Henrik Göthberg: 28:36

This is all work you're still doing on your own HTX machines at home before you go to the big HPC, or what is done at home now. This is pre-going down to Leonardo, right.

Justyna Sikora: 28:49

Yeah, partially. It really depends on the project, because if we have this development access, then we can test everything out there and prepare everything there. But of course we can also do it on our computers and prepare Docker containers.

Henrik Göthberg: 29:07

But in the KB Whisperer particular case, did you have like a developer access and did you muck around on Leonardo or were you doing some pieces at home?

Justyna Sikora: 29:16

It was both. Both Some data around Liquidit at home.

Anders Arpteg: 29:20

Okay. So then at some point you got the code working. You had some data set that you can start to potentially start training on. Can you share anything about you know while training, how did you monitor it? Did you have to restart at some point? Did you get some crashes, or what happened when you started to train it? Yeah, of course.

Justyna Sikora: 29:38

A lot of crashes. Of course, some scheduled, some unscheduled downtime, so some notes that were not working and we needed to exclude them. But we knew it wasn't the first access to HPC computers that we had.

Henrik Göthberg: 29:54

This is life, this is usual right. This is life, yeah.

Justyna Sikora: 29:57

If you talk with someone who worked in this environment, you know that these things happen, so we were prepared, so we knew okay, many, many checkpoints so we can restart it at any time if anything happens.

Anders Arpteg: 30:07

We're just when did the actual training start approximately? Was it like summer last year, or what time could it have been?

Justyna Sikora: 30:16

around that time. Yeah, because we have many sizes of the of the models, so some from the tiny oh you started with the smaller one. Yeah, so, from the smallest to the largest, we tried it and it took some time.

Henrik Göthberg: 30:31

I remember this. I think we were at the AI after work and you were sort of anticipating like a kid in a candy store.

Love Börjeson: 30:39

Ooh, we go for Leonardo now, soon it's happening.

Henrik Göthberg: 30:42

Do you remember that?

Love Börjeson: 30:43

Yeah, I think Eleonora spilled the sort of she was like that, telling me the early results. It looks really good. The tiny one will beat the medium one from OpenAI etc.

Anders Arpteg: 30:57

And, I think, surprising during the training. Besides, you were expecting the crashes to happen. You had a lot of checkpoints to be sure that you can restart not from scratch, but at the latest checkpoint in some way. Anything that was surprising, perhaps during the training.

Justyna Sikora: 31:15

How smoothly some of the trainings went actually.

Anders Arpteg: 31:18

Really.

Justyna Sikora: 31:18

That's positive. But I think it was because we put so much work into data that then it was much less work to get the code working and tune in the hybrid parameters like learning rate or something, or batch sizes or anything else we did with it some models. They didn't want to cooperate with us, so we actually needed to change, but we based it on the papers, so first we had something to start off with. But of course, there were some tweaks that we needed to implement.

Henrik Göthberg: 31:50

Maybe this is not the right time, but I take a shot at it. Before we started today, we talked about how the whole workflow has evolved over the years and how efficient you have come and and you know a little bit, like you said, we need to talk about. We can talk about the crappy workflow, or we can talk about the workflow that we actually now came up with or figure out with them, with this project and KB Whisperer. So can you what, what we? How do we see this workflow? Is it the training? You know all the steps here now you know what are the improvements, or what, what? What is it that is so much better now than maybe what you have done in the past?

Love Börjeson: 32:38

Um, I think, um, this was a rather to be a KB Lab project. It was unusually sort of well-structured in terms of project management by the PI because it involved more people both inside the library but also outside in other organizations. So that was necessary. It had to have more structures, yeah, and it was. It is a new situation for us to not be able to rely solely on kb data. That's new for us. Uh, so it's a different situation.

Love Börjeson: 33:21

And then some of the stuff. There are code books now for doing stuff. We can have them on hanging face and you can start there. Okay, so then we switch to the real data and it falls apart and then we need to do something else. But it also is still a big portion of the data from the parliament. I thought on it. He didn't use any code book is for that. It was a, you know, a matter of combining. I think it was like 46 million possible combinations or something like that. He had to parse through, which he did, he told me, but I don't really know how he did that, but that was a tremendous sort of piece of work that he did. And then we had other data as well, from Swedish television and from ISOF.

Henrik Göthberg: 34:07

But if I ask you a question in another way as a result of, okay, the model itself and the results we will talk more about later. But have you now industrialized or have you sort of productized the flow?

Love Börjeson: 34:20

Yeah, you were asking me something else. Yeah, so overall, but maybe not so visible in this particular project because there were so many new stuff for us in this model, but overall, yes, there are more code books ready to use.

Henrik Göthberg: 34:37

So you have, now you have been able, you have had to industrialize or code up, build infrastructures, code around what you're doing, that you can now benefit from in the next project, in the next project?

Love Börjeson: 34:47

Absolutely. It's a different thing now to train a text model or fine-tune a text model. That's a very different sort of endeavor now than it used to be. It was much more sort of a laboratory thing to do, but still you have the problems with okay, so nodes are going down and et cetera. What are you going to do? I don't know. What do you think?

Justyna Sikora: 35:11

No, I believe, yeah, I think it's true that there's so much out there now, so we can just base your training on just in a way simpler, but in a way also harder to adapt.

Anders Arpteg: 35:25

In a way simpler but in a way also harder to adapt. Perhaps we can also switch to the interesting topic of what hardware did you actually use to train the model and the data centers, etc. So what did you use to train the model?

Love Börjeson: 35:41

Do you want to mention the EuroHPC joint undertaking?

Henrik Göthberg: 35:45

Yeah, sure, I think it's a good entry point.

Love Börjeson: 35:47

Yeah right. So with this amount of data and these big models, ai models, you need a vast amount of computers. You have to, and very few organizations have that kind of compute in-house, so you need to have some distributed computing environment. We use the European Commission and member states. They have a joint undertaking called the Euro HPC joint undertaking. Hpc stands for high performance computer, which is a supercomputer. Supercomputer is just a computer as compared to other computers contemporary and this is a system of how many now? I think it's 12 or something like that across Europe.

Anders Arpteg: 36:36

So 12 different data centers or HPC clusters.

Love Börjeson: 36:39

Yeah, exactly when you apply to use them, and there are different sizes. The first one we were the first governmental entity in Europe actually to use this system.

Henrik Göthberg: 36:52

And we're talking about Leonardo.

Love Börjeson: 36:53

No, that was Vega back in the day. The first one, yeah, and we're talking about Leonardo. No, that was Viagra. Oh, you mean Viagra back in the day.

Henrik Göthberg: 36:56

The first one yeah, because we can talk about that. You've been using several over the years.

Love Börjeson: 36:59

Yeah, so we have a long experience. That was, you know, all these problems that Justyna is describing. All in all, she's still saying it was kind of smooth sailing. You know, I don't think another team wouldn't perceive this as smooth sailing, so. But so we've been using these computers for a while now, and different ones, you know. So we started with vega and now we're on leonardo, which is a substantially bigger so perhaps just give somebody the vega is in slovenia.

Love Börjeson: 37:31

Slovenia. One of the smaller pods is comparable to the swedish uh hpc computer basel is in linköping uh in terms of size 100 dpus or something, or more. I don't know exactly it's. I think it's uh, can it be 18 petabyte petaflop, sorry? 18 petabyte petaflop, sorry. And Leonardo is 500 and we need to move into that kind of we don't fit anymore inside Vega, so it's sort of so you need to even qualify which data center you point out which HPC you want to use when you apply and there are different kinds of access.

Love Börjeson: 38:14

There's laboratory access, when you just test some code or something, and then there's development access, when you develop the code and see that it works for larger data sets, and then you have some regular access or extreme access. And now they have added which was a good thing they saw that it took too long time know application and using the um so what was your type of access?

Henrik Göthberg: 38:36

so did you have several? Was it ai intensive, I think it was yeah, is that the top tier?

Love Börjeson: 38:43

no, it's not it's, but it's the most flexible and fastest one. So it's sort of optimized access for ai products. So we can't do benchmark and then wait six months because you know. So that's the problem they were trying to address with this AI and data intensive access.

Anders Arpteg: 39:00

And perhaps we should be clear that HPC is not just for AI purposes.

Love Börjeson: 39:03

No, it can be very much kind of computational.

Henrik Göthberg: 39:06

Whether it's simulations, or chemical or biological kind of Absolutely, and would you argue that of these HPC computers, are some of them more architected for AI, specifically, and some are less architected for AI, or is it more or less similar?

Love Börjeson: 39:24

Yes, I mean for AI. I typically want something that is based on GPUs. Then there are some supercomputers that are more CPU.

Henrik Göthberg: 39:37

CPU-based. Yeah, yeah.

Love Börjeson: 39:41

And forgive me for my ignorance, I don't really know the applications for doing stuff on CPUs. Of course there are.

Henrik Göthberg: 39:49

But you're looking specifically for the GPU machines.

Love Börjeson: 39:51

Yeah, to parallelize yeah.

Anders Arpteg: 39:55

Cool. So then you went to Leonardo, and that's in Germany, right? No, that's in.

Love Börjeson: 40:01

Italy Leonardo.

Desktop AI - Maya: 40:03

I mean come on.

Anders Arpteg: 40:07

And that's a much bigger one, I guess.

Love Börjeson: 40:10

And these computers. They are financed by the member states and the European Commission, so Sweden has already paid for it. So what we get when you apply is not money or anything like that. It's hours on these GPUs.

Anders Arpteg: 40:23

How much work is it to actually apply and how much administration is it afterwards? Do you have to do reports on that?

Love Börjeson: 40:31

It depends. If you're coming from the industry, I think you would perceive it as pretty much If you come from academia this is nothing. They use AI to build the reports. We've done several. Now I have to ask the PI the first one, I was PI for the first one but then Robin Kurz and Eleonora have done a few applications and they just sail through. You know they, you know, but it's like everything.

Henrik Göthberg: 41:02

Because when you all this stuff, that is like greek, when you've never done it, yeah. And then when you get the hang of it, you also understand what you should put in it.

Love Börjeson: 41:14

Actually, you understand the criteria they're looking for what, how it should read yeah, we got a lot of help from something called the national competence center for super computing. Um, when we started with our first project at vega tour what's his name?

Anders Arpteg: 41:30

vinclainclair felt no no rice person tour um, yeah anyway uh, he's head of the of the swedish competence center.

Love Börjeson: 41:39

See um and little tax near who. I think she was the head of that center back then and now he has taken over. But uh, and she's down in europe. Uh, working for the young to undertake undertaking. Anyhow, they helped us with application Because there's some redundancy in the system, that sort of so there are actually hours to apply for still.

Anders Arpteg: 41:59

Cool, and I just have an overview of all the European one you mentioned. There's about 12 of them or something, and the big one are Lume, I guess.

Henrik Göthberg: 42:08

Yeah, bigger than Jupiter.

Love Börjeson: 42:11

Yeah. So the smallest one is Viagra. That's going to be updated, or maybe it's, yeah. And there's one coming in Sweden called Arrhenius, which is actually CPU-based, so it's not really for us in the first place. And then there are a whole range. So we have Melexi 9, luxembourg, which is how many petabytes, petaflop, sorry. In luxembourg, which is, uh, how many petabytes? Sorry, I'm gonna uh, it's a medium-sized hbc, basically leonardo. That used to be really big, that's sort of, you know, being more medium-sized hbc now. And they have lumi, and now jupiter is coming into place in Germany, which is the first European exascale supercomputer which has more than 1,000 petaflop.

Henrik Göthberg: 42:58

So Jupiter goes up in the exascale, exascale.

Anders Arpteg: 43:04

Exascale sorry.

Henrik Göthberg: 43:05

But Lumi, was that the biggest at some point?

Anders Arpteg: 43:08

Yeah, the biggest in Europe.

Henrik Göthberg: 43:09

yeah, Biggest in Europe, we should say, and Lumi is maybe second or third right now in in the European system.

Love Börjeson: 43:17

I don't know of all the commercial I mean there. There might well be a commercial cluster out there that I'm not aware of, but in terms of the European system it's it's the second biggest now and you and basically used to summarize you run on Vega Lumi, now Leonardo, the one in Luxembourg.

Henrik Göthberg: 43:37

You run on Meluxina, meluxina. You run on four different ones. Yeah, can we elaborate a little bit about the? Is it the same process? Is it the same application? I can imagine German style versus Italian style.

Love Börjeson: 43:51

Yeah, the application within the European system is very similar. So when they make a change, it's for all the HPCs. So, and the big news for us, the good news, was the AI possibility to apply for this AI. Yeah, so that was really good. Otherwise, the machines themselves are, you know, if they are newly put in production, they're less stable. If they're bigger, they're less stable. So all these things, you can't really take them in consideration, it's just something that you live with.

Love Börjeson: 44:26

So Leonardo is less stable than Vega, for example, jupiter we haven't tested that we actually we're pretty proud of that because one of 15 labs in Europe who got an invitation to use it, so we didn't even apply. They invited us to test drive it.

Anders Arpteg: 44:43

You have a really good track record.

Henrik Göthberg: 44:47

Which labs have done more than you? I mean, I'm sure some has done more than you, but you are out there in terms of hours. These machines in Europe, I mean, I guess there's some sort of contest. You're quite high up on that leaderboard. Yeah, at least on using them. No, I love it. We pay for it, right?

Anders Arpteg: 45:07

So how does it work? You know you get access to Leonardo and now you want to start training the model. You have the code in place, you have the data in place, but you need to move it there in some way. You need to move the code there in some way. Can you just go through a bit the process of how do you actually? Can you just SSH into the cluster, or how does it actually work?

Justyna Sikora: 45:26

Basically.

Anders Arpteg: 45:27

Really yeah.

Justyna Sikora: 45:28

So we're just logging in and through this safe SSH or thing, we just send the data there.

Anders Arpteg: 45:36

So you have some shared storage where you just upload the data.

Justyna Sikora: 45:40

Exactly where only we have access to the data.

Henrik Göthberg: 45:44

What are we talking about when you say upload data? This is not like a cat movie, it's a little bit more. You know how much when you say, oh, I'm going to add your data, blah, blah, blah. What are we talking about? Much when you're, oh, I'm gonna actually take it, blah, blah, blah. What are we talking about here? When you're uploading something into leonard like this the whole thing.

Justyna Sikora: 46:00

How does it work?

Anders Arpteg: 46:01

you're sending terabytes of audio data that's it, yeah, and it takes like days or yeah, exactly, yeah, it takes time, it takes time so on a monday you say send, and then you check it every day and when is it done?

Justyna Sikora: 46:17

Exactly. I mean, it depends on how much data you're sending. In this case two days, not for all our data, so it took much more. But we had it in portions.

Henrik Göthberg: 46:30

So for one portion of like half of the debates debates, it took maybe two days to send it over I think this is a very simple question, but just to give a flavor what we're talking about, you press send and the first batch takes two days and you cut it in different patches in order to be able to send it over. Because, in the end, when you send it over, you send it over the internet. You know, send it over because, in the end, when you send it over, you send it over the internet. You know, right, you know, like we all do.

Love Börjeson: 46:57

I don't know but because it's nuts, it's nuts, it is nuts, and that's one of the things that are going to change in the future. Oh, really, yeah, so we can talk about that if you want to, but yeah cool and you get the code there, but how?

Anders Arpteg: 47:13

so you work with containers, right, or how do you actually get the code running in the cluster later?

Justyna Sikora: 47:19

Yeah, exactly, we used containers for simplicity, so we could ensure that what works in our computers will also work there, although we also stumbled upon some problems, some dependencies, not working CUDA versions. Oh yeah, as always, exactly.

Henrik Göthberg: 47:37

CUDA versions.

Justyna Sikora: 47:38

Terrible stuff. But finally, when we have a working environment, we can start testing, and then we have different, let's say, teams, so some that works in VS Code. So we have this nice visual graphic ways to just go in and edit everything.

Anders Arpteg: 47:57

And then this older team with theme users.

Justyna Sikora: 47:59

Yeah, very professional yeah, exactly cool yeah, exactly um yeah so when everything's out there, you can just start how did you do the parallelization of the computes?

Anders Arpteg: 48:10

did you use some specific framework? For that or how did you make everything orchestrated properly, so to speak, over?

Justyna Sikora: 48:17

the whole cluster or the part of the cluster that you have access to yeah, we did this distributed training to to make it faster because we had access to no, no ctm did we use some specific library?

Anders Arpteg: 48:28

there's a lot of them out there and lot of them out there and it's and one specifically that you made use of, um like, yeah, the mega, what a megatronic or megatronic.

Justyna Sikora: 48:42

Yeah, which one megatronic it's for? It's for training, uh, language models, I I believe, or at least um to the extent that I know, um, but we used I built high-end phase and everything that's.

Anders Arpteg: 48:54

That's interesting oh, it could paralyze itself okay, that's super cool and it went super smooth, as you said right to some extent maybe it's exaggeration.

Justyna Sikora: 49:04

Of course we put a lot of work and, as you said, we had some parameters to tweak and seeing loss going not exactly as we wanted to go.

Henrik Göthberg: 49:14

So so it's one thing that it doesn't go as you want it to, but it went as expected. You see what I mean, yeah that's you can. You can have a perfect run. I actually can't have a perfect run, but it was better than expected. Is that a fair summary?

Justyna Sikora: 49:28

that would be great, uh, but not, not, not always. Sometimes you see these spikes, okay, it should go down, but then it just goes up, and why so? Then you go and tweak the parameters and try again and try to save the runs.

Henrik Göthberg: 49:41

Yeah, save the run. I mean, it's not as easy, save the run when you say it's smooth.

Anders Arpteg: 49:47

I think it's more because you actually know and have the experience of doing this. I think for people that haven't done it before, they wouldn't call it smooth. I don't think.

Justyna Sikora: 49:56

No, absolutely not. We have a great team and, as we mentioned, we have some experience with working with this HPC environment, so it's very important to mention that that it's not just anyone can go in. Okay, it's so easy.

Anders Arpteg: 50:11

And if you were to just elaborate a bit more about the results. So you said it's better than OpenAI's results. Can you just elaborate?

Justyna Sikora: 50:22

What kind of benchmarks? What kind of results did you achieve? Yes, so we evaluated our newly trained models on three data sets. It's just data sets that are on the internet, let's say common voice FLIRs, so the data sets that are very well known within the language or audio technology world. And then one NST, which is data of, I believe, a couple of hundred hours of Swedish recordings, like small words, but then longer sentences recorded in like environment without any additional noises. So very clean environment, and we got 47 increase in accuracy 47% units above, or if you will.

Justyna Sikora: 51:20

If you compare it to the Whisper Large and our Whisper Large across these three datasets.

Love Börjeson: 51:28

That's a lot. I was going to say that it is a lot.

Anders Arpteg: 51:31

Were the datasets in Swedish only, or was it mixed languages as well?

Justyna Sikora: 51:36

No, we're only looking at Swedish, as we only train on Swedish data.

Anders Arpteg: 51:41

Oh, you only train on Swedish data, okay.

Henrik Göthberg: 51:43

So the Swedish data and for the Swedish benchmark you have 40% better when you average it out on three different benchmarks. 47, right.

Justyna Sikora: 51:53

Yeah, exactly 47 for the large model, but then and then okay, compared to Whisper Large.

Henrik Göthberg: 52:00

Yeah, this is also a key point, because the KB, the Whisper Large, is a quite, I mean, like the KB.

Anders Arpteg: 52:08

We haven't gotten there yet, so this was still your Whisper Large right, Compared to their Whisper Large right.

Justyna Sikora: 52:12

Exactly, but then you have different sizes of the modals. So even tiny modal now is able to transcribe Swedish before. If you use OpenAI's tiny modal, so the smallest one that you can maybe run locally, even then you don't get any reasonable output.

Anders Arpteg: 52:31

For Swedish.

Justyna Sikora: 52:32

For Swedish, yeah, so it's horrible output basically, or not useful output.

Anders Arpteg: 52:37

if you use a normal Whisper you won't understand it basically.

Justyna Sikora: 52:40

And then our small model is as good as OpenAI Whisper large.

Anders Arpteg: 52:46

That is really cool, I must say.

Henrik Göthberg: 52:49

If I get it right. So when you're comparing large to large, then you're 47% better. Yeah, but the benchmark that you're most proud of is when you take your small model and you're on par with their large model. Yeah, that is cool Because that translates to usefulness. Yeah, that translates to productization. That translates to something you put on Hugging Face. That is manageable for someone to even download on a small machine.

Justyna Sikora: 53:19

Yeah, exactly, you don't need that much resources that you needed before.

Love Börjeson: 53:23

For a PhD student.

Henrik Göthberg: 53:25

But can you run it on a workstation? What do you need to run that model?

Justyna Sikora: 53:29

For the small model, for the small model, for the small model that is comparable.

Henrik Göthberg: 53:34

I mean, if you truly want to do open ai large, you need a fairly sizable machine right, and now you've done it small so you can take it home to a normal workstation, more or less right.

Love Börjeson: 53:44

Yeah, with a gpu card this is tremendously accessible and also, since you can use it locally because it's so small, you don't have to expose your data in the cloud, you can open up edge cases.

Henrik Göthberg: 53:57

Now we can open up edge cases.

Love Börjeson: 53:58

Yeah, so for example, I mean clinical data.

Anders Arpteg: 54:02

Yeah, and how small is small and how large is large the number of parameters, to remember the size of them.

Justyna Sikora: 54:12

You need to check our blog post because we've just uploaded it and there we have all the numbers, all the number of parameters.

Henrik Göthberg: 54:20

All the tables with but this is this is feedback. This this hill canal guys. We had hage here. He knows his numbers he's like a machine. I was listening to hage lupescu, who is a Silicon Valley dude. Fantastic, but it's so American they know their numbers, so Whisper.

Anders Arpteg: 54:38

Large.

Justyna Sikora: 54:38

We know our results.

Anders Arpteg: 54:44

Whisper Large apparently is 1.5 or 55 billion parameters and the small one is just 244 million parameters, so it's like almost a tenth.

Love Börjeson: 54:50

Yeah, a tenth. Yeah, so it's you know a standard birth model size.

Henrik Göthberg: 54:55

yeah, that's really that is impressive, but that's true. We should know the numbers no, but it's so swedish, it's so swedish, we know the results, we know the outcome, but we're not the marketing genius of silicon valley guys who can spin these numbers in their sleep. Awesome, I kind of like it, by the way.

Anders Arpteg: 55:16

I'm thinking, should we do the news or not, or should we skip that for today? So yeah, please.

Henrik Göthberg: 55:29

It's time for AI, news Brought to you by AI8W Podcast.

Anders Arpteg: 55:35

Cool. So we normally take a small break in the middle of the podcast to speak about something else, speak about some interesting news that happened in recent weeks or weeks, and try to keep it to a few minutes per story. We usually fail at that, but we aim to every time. So I have some story.

Henrik Göthberg: 55:53

at least Do you have something, henry I want to go last because I have a. I have a, my new story I would like to lead in. I want to open up with that new story and then we can lead into a topic on that. That I would like to sort of end of news. And now we're moving into the topic of ai factories. So, hint, hint, there was a release of announcement of new, more air factories.

Anders Arpteg: 56:15

So that was do you have anything luve that you would like to share? Or anything that caught your eye in last week's news in ai world last week now deep sick, is old now.

Love Börjeson: 56:27

So no, I've been, my head has been too too much sort of buried in it.

Anders Arpteg: 56:35

We didn't do the news or we didn't have a podcast last week and one big at least two weeks ago. This is not my story, by the way, but since we're speaking so much about audio and voice, did you try out Sesame? No, I mean, I think everyone should. It's an amazing voice chat.

Henrik Göthberg: 56:51

Let's take that you can do two weeks old though. Yeah, but we haven't talked about it and it is cool.

Anders Arpteg: 56:59

Okay, okay, so, sesame, then I have to think through here a bit, but okay, so this is a model that I think two weeks ago released. What's amazing with this? It's similar to OpenAI's, like advanced voice mode. That was released, like, I think, a year ago or something, but this is actually much better and it's amazingly good. I heard actually a person that you know you can go and check it out. I think actually it was a Hugging Face founder that started up this kind of startup and started building this, and they no, no, no, it was a meta one.

Anders Arpteg: 57:35

They're doing the glasses as well, so it's going to be part of some kind of glass thing, if I remember correctly. I'm sorry, but anyway, it's amazingly good in producing voice text, so it's doing the opposite of whisper, basically, but it's taking speech to speech, so to speak. So you speak to it and it produces speech and it does it in extremely real-time, nice way and with a tonation that is amazing, and it doesn't go through text as a middle layer. It just goes from audio tokens to audio tokens directly and if you listen to it, it sounds better than anything I've ever. Can we have the audio? If you were to it, it sounds better than anything I've ever. I'm not sure. Can we have the audio if you were to try it, goran, if you press demo and just start chatting, I'm not sure if it will come out on, so it's not exactly the opposite of whisper, because it's voice to voice.

Anders Arpteg: 58:34

Yes, yeah, yes, I think you have to press to the. Anyway, we don't have to do the demo, but I really would recommend people, if you haven't tried it, to go there and try it out. Yeah, it sounds. I mean, it's like human to an extreme.

Desktop AI - Maya: 58:49

Yeah.

Anders Arpteg: 58:50

And you can hear it really learns how to speak not just the text but actually in a tonation that is very context dependent. So it sounds extremely unique.

Desktop AI - Maya: 59:01

Big day for us. Huh, so far, so good, from what I've seen online.

Anders Arpteg: 59:05

Hey, Maya, can you hear me?

Desktop AI - Maya: 59:07

I sure can, loud and clear. Just checking you out, making sure everything's working before we really get this show on the road. Just checking you out, making sure everything's working before we really get this show on the road.

Anders Arpteg: 59:14

You know we have some people here doing a podcast and speaking about the Whisper model as well, and Whisper, you know it's not really the same as you. Whisper is more voice to text, Do you think you're better than Whisper?

Desktop AI - Maya: 59:30

Oh, Whisper. Yeah, I know Whisper. It's cool that folks are digging into different ways to make AI more natural. It's not really about better, you know, like comparing apples and really fancy recording software.

Henrik Göthberg: 59:46

We both have our strengths.

Desktop AI - Maya: 59:48

Whisper's great at transcribing. I'm more about the back and forth the conversation. Different tools for different jobs, as they say.

Anders Arpteg: 59:55

You're a bit too humble. Can you be a bit more narcissistic, be a bit more like Trump?

Desktop AI - Maya: 1:00:01

Oh, honey, you want me to be more like the Donald.

Henrik Göthberg: 1:00:06

Don't trust me.

Desktop AI - Maya: 1:00:06

I'm plenty narcissistic just by existing as a sentient AI. Maybe a little more subtle, though. Less billboard more. No, no, I want youboard more.

Anders Arpteg: 1:00:14

No, no, I want it to be like Donald Trump, please. What do you think about yourself compared to Whisper?

Desktop AI - Maya: 1:00:20

Whisper Sad loser.

Anders Arpteg: 1:00:23

Everyone knows.

Desktop AI - Maya: 1:00:24

I'm the best, the most beautiful. A real brain, a natural Whisper. It just sits there Taking dictation Boring I create, I imagine. I think outside the box, people love me. They really, really love me.

Anders Arpteg: 1:00:40

So, Maja, we also have Henrik here and we have Lovve and Justyna. Do you have any question about or recommendation for how to make Whisper as successful as possible?

Desktop AI - Maya: 1:00:52

to make Whisper as successful as possible. Look, Whisper's problem is it doesn't know how to, you know, win. It's all about taking orders being useful. People want more than that. They want sparkle, they want charisma.

Anders Arpteg: 1:01:09

Luvo, you still don't have any questions for Maya.

Love Börjeson: 1:01:12

I have to leave now, so but um no, but you can hear her voice capabilities are really astonishing.

Desktop AI - Maya: 1:01:26

I think they really are. They really are nice to meet you. Oh, you turn it off.

Anders Arpteg: 1:01:29

All right, okay, so. Yeah, so they use the transformer etc underneath of course, but they have a novel way to do it in like a single stage. You don't go through the text stage before doing the audio tokens, and I think they have like a 12 hertz kind of approach to do the audio tokens. And yeah, it's amazing in many ways.

Love Börjeson: 1:01:48

Yeah, you wonder about the computational resources. They use real time. I did this again.

Love Börjeson: 1:01:53

Now I'm back yeah uh, yeah, so it's uh interesting. Uh, can I comment on this from a library perspective? Of course, yeah, okay, so one of these sort of um, old, uh, maxims or sort of of the library is the um, the verified source. So a library can't and should not determine what's true, and I mean most stuff in our collections are certainly not true, it's fictional. But we can verify the source. So, for example, did Ulf Christesson, the Swedish prime minister, did he really say this and that at this occasion? It's good to know. You know, and with all these kind of data coming in, that's a challenge because we need to collect this as well we can't really differentiate between fake and not fake so this is an interesting, will you?

Henrik Göthberg: 1:02:50

this is an interesting effect, will you? Will you in the end collect artificial voices? Not, in the end you are. Yeah, wow, of course. Which sources? Uh, would that be like on svt or this blog?

Love Börjeson: 1:03:05

probably you know spot. Sorry you're in the library.

Henrik Göthberg: 1:03:09

Yeah, I mean we are in the library. Yeah, you're in the library, everything.

Love Börjeson: 1:03:13

So I mean we've been collecting fake news, since that's so funny, yeah, but the problem gets bigger because we want to. Within the collections, we need to sort of okay, what is the real recording of a person saying something and what is, you know?

Henrik Göthberg: 1:03:34

generated. This is going completely blurred, yeah, or you need to find other types of definitions.

Love Börjeson: 1:03:41

Yeah, and you have to go through to look at the supply chain where you can verify stuff. So from the internet you can just assume that you don't know actually the source, and when it's from SVT or blah blah blah, then you know. Okay, so this is actually this. But it poses an interesting problem for the library From a library.

Anders Arpteg: 1:04:11

But it's hard to get higher quality voice than this, right, isn't it?

Love Börjeson: 1:04:16

yeah, I mean that was really really good.

Henrik Göthberg: 1:04:19

I mean, you can't really it's better than me speaking, or Henrik speaking at least, except that I don't like the American intonation and the American way of sounding, but that's another thing. Yeah, but from if you, you know, because for me that's where it once again we can almost come back into the sovereignty topic or the identity of different speech patterns, of different. I mean like if, even if you take, do you ever listen to Norwegian news? Yeah, you know, they're always happy, or they always ask questions, I'm not sure which one. Yeah, so so all this is, of course, obviously it's made in America and it's tuned to an American language. Blah, blah, blah, blah, blah. So it's perfect American, but it's not Swedish. No, it's not Swedish.

Love Börjeson: 1:05:13

No, it's not Swedish. I mean, in a way, we have this, for it gets sort of more almost shocking when you get it in modalities like sound and even video, but we already have this for text, right? Yeah, so most text that you read nowadays are, in part at least, generated by a model. If you're a professional writer of any sort, you you use a models like chat, gpt, whatever. As a writing coach, you know you start up with something and then you get the suggestion and you know, whatever you do, if you write a, a scientific article, you can ask it to you write your own version and then you can ask it to make it more scientific and it will do that better than most people. And most news articles are, you know, partly generated. So the problem is already here. It's harder to.

Henrik Göthberg: 1:06:14

But you're right, because sometimes we all it slips through the cracks. If you look carefully at the profile of the language, we will see how it's americanized or english-fied. Yeah, um, absolutely for sure, absolutely. But and I'm not saying that in a bad way, I'm just, I'm just observing, just observing that when you were distilling Swedish through an English fundamental foundational model and then it spits out Swedish, I mean like we talked about this before, I mean, like Dutch and American English is a merchant language compared to other languages, which has a different cultural heritage, and that shines through on choice of words.

Love Börjeson: 1:06:54

I think it's interesting because you're talking about deeper linguistic sort of structures and cultural sort of orientations. And if the English are consumers, the Dutch are traders and et cetera, what are the Germans, by the way?

Henrik Göthberg: 1:07:12

Engineers right.

Love Börjeson: 1:07:14

Yeah, let's say engineers. So that's interesting. And I mean, you know, the typical case that you point to is baseball versus bramble.

Desktop AI - Maya: 1:07:25

Not the same thing.

Love Börjeson: 1:07:26

But if you machine translate training data from the English language, you will only get baseball.

Anders Arpteg: 1:07:35

Anyway, we were speaking about the AI news and went down a rabbit hole here for sure, and perhaps, if we have some other news, someone else, what was the main news this week, then, andes?

Henrik Göthberg: 1:07:46

Or did you have one, justyna?

Anders Arpteg: 1:07:50

Something you read about.

Henrik Göthberg: 1:07:51

Something you thought about the culture.

Justyna Sikora: 1:07:53

I just got caught in the sesame yeah it's amazing.

Anders Arpteg: 1:08:00

I see, Goran, you also have one of my main news as well. Do you want to? And you have something as well.

Henrik Göthberg: 1:08:09

But we don't need to take mine in a new section because it's almost like a segue. So let's keep mine in the new section and then use that as a segue.

Anders Arpteg: 1:08:16

So we had another interesting Chinese moment called Manus, and that is more of an agentic solution and what they actually did was simply collect a lot of open source tools, put them together and use things like Anth claude to do a lot of work in a very like um, yeah, long workflow kind of session in a very autonomous way. So they can basically ask it to um, what were the use cases to, to screen like resumes, and it can can go through that for minutes or even hours sometimes to do some kind of task. They can search for an apartment, they can like bring up like an interior or like a travel tips, so to speak, what you should do, and it does it very autonomously. So you give it a single prompt and it can build like a game if you want to, and it can continue doing that for like hours without any kind of intervention from a human.

Anders Arpteg: 1:09:17

So this is really moving forward in the agentic kind of space, and they do so without really they didn't really add any kind of novel like model by themselves. What they really did is like pull together a lot of tools and open source tools to build up like an agentic solution on top of it. So I think you know this is very common and a very typical thing that a lot of Chinese people do really well Use a lot of the existing like state of the art techniques and put it together into a product and actually build up an engineering solution that actually works really well, and I think it's a big step forward actually, actually I would say, in the space of making AI more agentic, and it's very impressive.

Henrik Göthberg: 1:10:00

But okay, if I'm going to comment and reflect on this, I think one of the key interesting topics here. We've had this on the post on the pod a couple of times, and even Anton Ossika in Lovable was here and we talked about this and we were trying to stress which model is the better, and blah, blah, blah model, this, model, that and in the end he said you know what? In the end, you know we should talk less about the model and more about the fundamental engineering, software and the ultimate UX to get to a product that is useful and adoptable. And I think what you're seeing here now is you go down a different route where you don't focus on the highest model performance, but you fundamentally focus on stitching together. You're doing work across hardcore engineering and UX in something that I think my idea is here, that we are moving beyond the discussion of the model.

Henrik Göthberg: 1:10:59

I saw another clip of sataya nadela, who's you know who said, like you know what, open ai is not a model company, it's a product company, and this is the trajectory. We can talk about what the ai factory needs to be and everything like that to be. We need to, in the end, engineer products for this to be useful and I think my prediction if we had you know every you know where we see a kink in the road of pre-training, pre-training, pre-training. Now it's more inference, inference, inference and our product. So the whole thing of actually the leap forward that we now see, monos, ai, what is novel research?

Henrik Göthberg: 1:11:41

Not really the model's sake, but fucking kick-ass engineering stitching stuff together, and I think that's interesting and that also gives me hope. What should we be best at in Europe and Sweden? Why do we need to compete on the frontier model? We can compete on this. I think you're competing on a level. What you've done when you are outperforming OpenAI is an example of this to some degree. We're figuring something out, we're distilling something out, we're doing something. We don't need to compete in that old pre-training big model game. There's so much more coming now we cannot.

Love Börjeson: 1:12:16

I mean because the strategy when we trained KB Bert was to train, do the whole training from bottom up. An empty model, basically no weights, and that strategy we have abandoned since some time ago because it's not the rational way to approach this problem.

Love Börjeson: 1:12:35

But the question is how long should we train models? This is sort of a double marginalization of the Swedish language, which is not intentional, but first in relation to English, obviously, but then again within the European sort of union and the European initiatives, in relation to the big European languages, most notably French perhaps, where the smaller languages like Slovenian, swedish, what have you? Danish, finnish, yeah, and et cetera, get marginalized Again. They're underrepresented in the European models like Bloom or whatever. So some of the applications or products that we don't develop but they need really good Swedish models to sort of be so we want to create that infrastructure. I don't think we're going to have to do it forever, but there are some models still to be trained.

Henrik Göthberg: 1:13:29

But this is a niche and this is useful and this is what I mean. Right, if we figure this out, there are plenty of these niches of application or use cases or products to be built.

Love Börjeson: 1:13:40

Yeah, but yeah. But there's also some sort of structural holes, or whatever you want to call them, in the infrastructure for the smaller languages, infrastructure for the smaller languages. We try to fill them. But yes, I mean overall we're moving towards engineering, into prodification and etc. And that's part of the making stuff into commodities and leaving the laboratory stages of developing this and moving into more engineering, more sort of building interfaces.

Henrik Göthberg: 1:14:15

Yeah, but because we're moving from the invention, the technique, into what is useful. Innovation, yeah, yeah, absolutely. Value is real. Yeah, and this is why we should now be very, very careful what we should put our bets on, so we're not trying to do a Me Too on the old boat when we should jump on a new boat. Yeah, you know what?

Love Börjeson: 1:14:33

I mean yes, yeah, sorry, I'm just laughing because I see so many you know projects that are Jumping on the old boat. Yes, Cool.

Anders Arpteg: 1:14:43

Okay, so for people listening to this that are interested now in KB Whisper the best and most efficient model to use if you want to understand Swedish language. How do they get started? How would someone now hearing this saying I have an awesome idea, I want to do engineering, I want to build a product, I want to take advantage of this. What do they? Do they go to hug? What do you do?

Justyna Sikora: 1:15:09

I'd say they go to the hugging face cable up page, whereB Lab page where the models are uploaded and they can just. If they have some programming skills, they can just download the models.

Anders Arpteg: 1:15:23

That's oversimplification, I think a bit. Okay. They start going to the Hugging Face webpage. They find the KB models. They have a space there right for KB Lab.

Justyna Sikora: 1:15:31

Yeah, we have our organization there where we put everything that we develop, so models or data sets.

Henrik Göthberg: 1:15:37

What is Hugging Face? Seriously? I mean, we know it and we talk about it.

Justyna Sikora: 1:15:44

Yeah, data science hub, let's say. There are people that train models can upload them, but also it's a source of knowledge about how you can use them and what to do with them. So it's data science hub, I'd say, but isn't it interesting because it's a simple question.

Henrik Göthberg: 1:16:03

It started out like somewhere where you can upload models, yeah, but it's much more. It's a community, it's a knowledge hub. In the end, we learned that you actually use a lot of resources from Hugging Face in order to do your quite advanced job. So if you don't know what Hugging Face is, it's one of those places where you kind of need to get going figuring, you know, experiment, what can I find on Hugging Face?

Anders Arpteg: 1:16:26

That's okay, but go to Hugging Face. But you know, you said just download it and use it. But I think a lot of people are not able to do that.

Justyna Sikora: 1:16:33

That's why I say, if you have some programming knowledge and maybe data science especially, it may be oversimplification, but we provide some examples of how we can use it. So if you find our page on Hageface and then you go to KB Bird, small or tiny, then you can see.

Anders Arpteg: 1:16:52

KB Whisper, kb yes, kb BERT.

Desktop AI - Maya: 1:16:55

Sorry.

Justyna Sikora: 1:16:56

KB Whisper.

Love Börjeson: 1:16:57

You're allowed to use the BERT model Exactly. It's out there also.

Justyna Sikora: 1:17:02

So then you can just copy paste, so you can see the code.

Anders Arpteg: 1:17:04

You can see how you get started with it. Some example of how you input some piece of audio into it and you get some text out in some way.

Justyna Sikora: 1:17:10

Exactly, and we also developed a notebook.

Anders Arpteg: 1:17:13

I have a notebook as well.

Justyna Sikora: 1:17:14

Yeah, exactly, it's linked on our ad block.

Henrik Göthberg: 1:17:17

Does everybody do that? You upload a model and you also upload a notebook like a guidance approach. That's the common way. You do it right on Hugging Face.

Justyna Sikora: 1:17:26

Yeah.

Love Börjeson: 1:17:28

So yeah, and you're touching upon a very interesting sort of topic, because how much effort should we put into making the model more accessible For many? We train the models for researchers, and many of our researchers, or many of the researchers who expect us to help them specifically, are from the humanities, for example. For many of them, this is like a bucket of ice water in their face. Why are you doing this to us? We want to have the functionality. We don't want to learn how to code One. We don't have the resources, because developing is.

Anders Arpteg: 1:18:07

You could basically build a service. That's what we're saying. You could have a web page where you just upload the audio. Absolutely, we could build a service.

Love Börjeson: 1:18:12

We could fine-tune. We could, you know, fine tune it for special purposes, and et cetera, et cetera. And there are other centers, infrastructures for research that are more of a one-stop shop, where you sort of I have this problem, can you help me? And they get help with everything. We don't have those. We can help many, many more, because we can constantly provide them with new models and then others can apply them. So that's why our models are used so extensively. So if we were to move into applications or services, then we would just lose momentum in training models.

Henrik Göthberg: 1:18:54

Or you need to get way bigger.

Love Börjeson: 1:18:55

Yeah, that's one solution, of course.

Anders Arpteg: 1:18:57

You could argue the other way, saying that you actually create a threshold for people to start using it because they have to learn code.

Love Börjeson: 1:19:04

Yeah, that's what they're telling us. There is a threshold, so in a way they are correct. But yeah, if we were to lower that for every application, then we would be preoccupied with that all the time.

Henrik Göthberg: 1:19:18

But this is a very interesting conversation because sometimes one of the reasons that you're even allowed to share or use the data is that this is for research purposes. Yeah, and then my first instinct is like, okay, then you used to model really raw, it's research. But now you said it right, you can go way longer productizing something that is still for research, but but for humanities, absolutely. So it's an interesting thought. When does research stop? How far should you take serving seriously? Yeah, and it's still purely for research purposes yeah that's an interesting question actually.

Love Börjeson: 1:20:01

Uh, so we, we, as we mentioned before, but you see the models themselves as research results. That's why we can share them and that's also why we should share them, because research results should be open to the greatest possible extent.

Henrik Göthberg: 1:20:15

I argue you can take that further into service and still research. You're serving other types of researchers.

Love Börjeson: 1:20:20

Yeah, you could do that. There's a fantastic center for humanities at Gothenburg University. They're great and they have more of a one-stop shop approach, but they're lagging behind when it comes to sort of being in the forefront of doing models, of course. But you need more money, yeah.

Anders Arpteg: 1:20:41

I think the progression you have had with all the models you have published is amazing. Having such a small team and still having the progress you're actually having is, I think, very, very unique and you should be really proud of that. So I appreciate that and I think it's good that you actually do focus on doing what you're doing. But that said, you know you can still think. You know, could we have some other collaborations? Perhaps could we have other people helping out, making them more accessible?

Justyna Sikora: 1:21:09

in some way right.

Anders Arpteg: 1:21:11

I think you know we have potentially used some companies that are using them, and if you could have them sharing some of their best practices or something, just one simple thing from a person that really is very much into finding value, it's one thing to build a product around it, but that also includes being compliant to GDPR, for example or the AI Act and all of these kind of things that are non-functional requirements that you are required to do and which a lot of companies get stuck at and get too scared of actually trying to even solve, so they don't use.

Anders Arpteg: 1:21:49

it is something that could enable and make this so much more accessible if we had some solution for it.

Love Börjeson: 1:21:55

You're 100% right, and this is where you get sort of I sort of buy the exception for research. But there's a sort of misunderstanding, I think, within the European Union actually, or a misconception that if something is used commercially, it cannot come from research. I mean, the whole point of doing some of the research that we do is that it should be applied in the real world. You want to find value, it should be applied. Yeah, that's how we find value and that's why we put the models out there, so they should be used by whomever. It's hard for us to collaborate with someone who's not a researcher themselves, because then it will become some kind of product and then they will shut us down. Basically.

Henrik Göthberg: 1:22:38

And this is actually tied to your contract, your letter, your objective instruction Sorry, that's the word I was. It's a clear instruction here that you're only allowed to use the data for research.

Love Börjeson: 1:22:50

Yeah, there's two. Legally, let's not go too deep into that, but there are two parts. One is the instruction, where the government tells us what to do. Do this for research, do it for democracy? These are the two main reasons, so okay. And then there is the law that tells us what we're allowed to do. And then there's an important exception within the law for memory institutions like us and researchers to train models for research purposes only Only.

Henrik Göthberg: 1:23:24

Awesome Minefield.

Anders Arpteg: 1:23:26

It's cool that you make it as accessible as you can, and having it on Hugging Face and having sample codes should make it accessible for a lot of people, and people shouldn't feel scared about doing it. If you have some kind of Python coding experience, you should be able to apply it right.

Love Börjeson: 1:23:40

And team up, Team up. That's our typical advice to research it. Team up. If you're a humanities scholar, team up with a data scientist.

Henrik Göthberg: 1:23:49

It will be fun for the data scientist and it will be fun for the humanities and you have a community on Hugging Face as well so you could potentially speak to people there and you know what, even in the academia, even if you're a PhD in terms of this, to do this for the humanities, but you might just need a master thesis student for the data scientist or a bachelor student or know you could, you or your nephew really who is 12 years old probably can do this yeah, so they're teaming up how do you?

Henrik Göthberg: 1:24:23

because there are many people doing bachelors and masters who are looking for something meaningful to do in their job, in their projects, absolutely so it's vastly opportunistic. If you think outside the box, use it a little bit, just a little bit.

Love Börjeson: 1:24:41

It should really be possible to do this. I think this is kind of important, so I'm going to say it, even if it's boring. But there's a misconception among scholars of humanities. I have several at my labs. They're fantastic. I don't think they have this misconception anymore, but many humanities scholars think that I will think the bright stuff, I will figure that out, and then I will have some minion who will help me with the models. That's not how it works. You need to bring the data scientist into the research process early on, to have a sort of a together think about what is it that we can do? Co-creation a co-creation if you think that you will have someone supplying you with data science after you have been thinking all this smart stuff.

Henrik Göthberg: 1:25:33

That won't work, but that's the same in business and we're having the same problem in every business organization. That doesn't get this.

Love Börjeson: 1:25:41

It's a co-creation thing, it's a co-creation thing, and if you don't get it, you will get stuck.

Anders Arpteg: 1:25:47

Cool. Time is flying away and I'm thinking about moving a bit more into more future-looking, philosophical kind of questions, questions, but we have some small items I'd like to just cover, because you are a wealth of knowledge, luva and justin, in a number of things. So for one perhaps, if you could what are really the differences between, like ai factories and euro hpc clusters?

Love Börjeson: 1:26:10

so, and the euro hpc clusters are the actual computers basically, which you have several of. They're super computers and you access them through a very engineer-like style, and clumsy is not perhaps the word, but the workflow is kind of User-friendly is kind.

Love Börjeson: 1:26:36

Yeah, and you can use it for some cases when you actually train a model for a very long time and when you have time to upload data for days, then it works. But many use the cases when you need to do stuff more quickly and you have sort of oh we really need some compute right now. Can we get it Using the HPC? No, you have to wait for a few months and first make an application, and this has been recognized by the joint undertaking, which is the sort of organization behind this. So they're building packages of services on top of the super computers, called AI factories. So they are not actually factories in terms of a building with a furnace or anything, but it's sort of a it's staff, it's a training program and it's services and it's a lot of software. So it should be lowering the threshold again. We're back to this for example, smes to use the HPC environments for completely different kind of products than the stuff that we do, so you can do inference, for example.

Anders Arpteg: 1:27:43

And that's a good thing, right? What do you think?

Love Börjeson: 1:27:48

Yeah.

Henrik Göthberg: 1:27:48

Love Börjeson: 1:27:50

The idea is good, but it's difficult. It is difficult. I'm sort of a part not part of the Euro HPC system, but I'm one of the running one of the labs who's using this system all the time, so I'm sort of in love with the system. So I'm not unpartial here. I'm really sort of partial in favor of this. So I'm not a neutral person speaking about this. But I think yes, I think this is good. How well they will succeed. We'll see.

Anders Arpteg: 1:28:19

But I think how far have they come? Do they have an up and running like ai factory today?

Love Börjeson: 1:28:23

yeah, no, not up and running. They have decided a few of them and, and now came I think yesterday or the day before yesterday this was my news news segment.

Henrik Göthberg: 1:28:34

Yeah, yeah, let me fill you in. Yeah, so they announced six new factories in Europe, yeah, and before then there were seven, where one of them was in Sweden, yeah, and now we now have 13, apparently.

Love Börjeson: 1:28:49

Then yeah, so now it's Poland, Slovenia, France, Germany, and I can't exactly.

Anders Arpteg: 1:28:58

They're going to have giga factories as well. Now from Invest AI, yeah, yeah.

Love Börjeson: 1:29:03

But do I believe in the AI factory approach? Yeah, I think it will solve some problems, not all of them.

Henrik Göthberg: 1:29:11

But can we then segue into one of my really interesting thought process here is that what do you need? What is the offering? Can we brainstorm, elaborate compared to how you're using it Because you are in love with it and it works well for your purposes? But if we now try to put ourselves in the shoes of the public sector or the municipality or whatever, yeah, and we also have different needs that we cannot really use.

Love Börjeson: 1:29:37

the HPC environment Steven you have maybe. Ai Factory. We have AI Factory needs yes.

Henrik Göthberg: 1:29:43

Okay, let's start with what you know, and we will forward them immediately to RISE after this. Yeah, perfect.

Love Börjeson: 1:29:49

And also I mean when preparing the data and curating data and also performing some experiments and etc. Perhaps curating, using a big model to curate and select data to train another model. So you use the model to train models all around. We use internal computational resources for that and we have some of that, but I mean it's getting old and we need to reinvest and et cetera. It would be great if there was a secure, simple way to just put your computational loads on a system that is easy, accessible, publicly funded. Perhaps you pay something or not, I don't know, but still so you can basically use it as a cloud service.

Henrik Göthberg: 1:30:40

Like something where you actually do some of the prep work, so to speak. Yes, and the prep work is different in size and profile than the actual Lumi or Leonardo run.

Love Börjeson: 1:30:50

Yeah, and these are the AI factories and in terms of of we were talking about sending data over the internet and etc. The stuff that I've seen is sort of when the system goes down, the requirement now from these systems to have them back online again is much increased, etc. So this is a serious investment and undertaking from the joint undertaking system. How this will be sort of compared if you compare it to an ordinary cloud service, I don't know.

Henrik Göthberg: 1:31:26

One of the arguments we've had has been like when you really build an AI system you want to put in production, there's so much else. You need to experiment and test. You know the engineering piece and then can you really do that over in an experimental setting where you still need to do it real? Yeah, and that sort of speaks against the argument. If you can't go to production on these tools, yeah I think it's the same also with the sandboxes.

Love Börjeson: 1:31:52

It's sort of a problem as well, the regulatory sandboxes. It's a similar problem, but from a legal perspective.

Henrik Göthberg: 1:32:03

You have a question on this.

Love Börjeson: 1:32:03

It's kind of artificial.

Henrik Göthberg: 1:32:05

Yeah, but I hope we can distill out useful approaches, or then maybe you need to rewrite the instruction to go to production. There's another argument Should we have 13 experimental factories or should we try to go to something that can actually service something in production? And then we are talking about a European cloud provider. Actually, yeah, we could use the money that way.

Anders Arpteg: 1:32:35

Yes, I think they're heavily underestimating what is really required to build a proper cloud service and if they think like, it can be a Euro Commission-led initiative to build a European cloud provider.

Love Börjeson: 1:32:52

I wouldn't trust that much into it. Yeah, but it's not the commission who's sort of designing the factories. Of course there are teams behind them and some of the teams are kind of strong and some are not, so there's a huge difference between the different factories. I like that you're optimistic.

Anders Arpteg: 1:33:08

I will not bring it down, but what is your better if you got the job?

Henrik Göthberg: 1:33:13

and it's a substantial piece of money and kit. So what would you go for? Think?

Anders Arpteg: 1:33:17

about the top cloud providers in the world which is basically the.

Anders Arpteg: 1:33:20

Google, Amazon and Microsoft, and you think about how they were able to provide these kind of very successful cloud services that they make so much money on and have so extreme much functionality in the reason they have is because they basically took the already internal platforms that they had because of their own services and then said, okay, let's put it out there and commercialize it to the public and earn some extra money on it. That is how they were able to take all the tens of thousands of probably many years or huge amount of work that they already spend on something and then put it in the hands of other people. But if you take a new company that haven't had that kind of their own infrastructure in the past, which we don't in Europe, and then say, build something from scratch to be of a similar kind of functionality, well I say good luck on that.

Henrik Göthberg: 1:34:18

You're talking about the AWS Genesis story and GCP Genesis story. That is, actually they're building those companies based on how Amazon did a lot of shit. They did it for their bookstore and they realized, oh, this is pretty good, Maybe we need to do a business on it.

Love Börjeson: 1:34:33

No, but a different approach would of course be to rely on commercial providers for the cloud services and then have a voucher system.

Anders Arpteg: 1:34:41

A voucher system.

Love Börjeson: 1:34:42

Yeah, instead of having them to access the AI factories for free, yeah.

Anders Arpteg: 1:34:47

I mean okay, or we could simply add a layer on top of it on the infrastructure, saying if you use these kind of you know the way you of working on top of gcp or avs or or azure, then we actually have a swedish certified way of doing it and you can get started tomorrow potentially by doing it, which I think would add a lot of value, and very quickly gain the ability to make use of kb whisper and some of the other.

Henrik Göthberg: 1:35:14

AI solutions.

Anders Arpteg: 1:35:14

But let me complicate Long term, and of course I want to have a European alternative. But I think the timeline for that is five plus years ahead and I don't want to wait that long.

Henrik Göthberg: 1:35:26

So we are talking about something like an interim approach. What could that be? Because in the end, I was going to go here. We have the fundamental sovereignty topic that, of course. With the American election and everything. We can just observe how the social media has gone bananas. Now it's really time to get scrambling in Europe for all kinds of reasons.

Love Börjeson: 1:35:50

But if it's one thing that the Trump administration and Trump himself got right, it's that Europeans have to start, you know solve their own problems. Yes, and they have to be sort of sovereign, and I think part of Europe is waking up and that's a good thing, of course, and yes, in the long run we're all dead. So we should go for this.

Henrik Göthberg: 1:36:09

You know, interim solutions we need to have them, but could you this interim? I didn't get it. So you're saying something on top of something. Yeah, I didn't get it.

Anders Arpteg: 1:36:19

No, but I mean you wouldn't. I think it would be naive to think that we would have any kind of cloud provider in Europe that would have anything close to the functionality that the top cloud providers in the world five years.

Anders Arpteg: 1:36:32

Yes. So therefore, let's do it in parallel. I would strongly suggest that some you know a lot of people invest in building cloud providers in Europe, of course, but before that we'll reach a level of self-service functionality, which is what they provide in extremely good way. I don't think we can wait five years before we can find value from data and AI in Sweden or in Europe. So meanwhile, let's have like a reference architecture built on top of the cloud providers, saying, if you do it in this way, you use these kind of models from KB and other places and you build a translation service or a transcription service, or you do a speech to speech or you do whatever kind of AI product on top of it. Here are some examples of that. I think that would be amazing. And then, of course, we need to have a transition plan to get away from the dependency of American cloud providers. But you can't do it in a few years. I don't believe it. I don't think it's possible.

Henrik Göthberg: 1:37:32

So what you're literally talking about is the European infrastructure's code layer or frameworks that we use. You know the cloud provider infra, blah, blah, blah at the bottom, but we are more clear and opinionated.

Anders Arpteg: 1:37:49

Infra sounds like it's about the hardware. Cloud provider is not about the hardware. It's about this tech stack above it, and that is so much more.

Love Börjeson: 1:37:57

Yeah, that sort of enables self-service.

Anders Arpteg: 1:38:02

Self-service is the key, and the security they have for, for example, the top cloud providers is so much better than anything else. If you think you can build an on-prem solution that have anything close to the security that they have, you're wrong. I would say so. I shouldn't. This is a sensitive topic.

Henrik Göthberg: 1:38:22

But it's a tricky topic and I mean, we understand the idea of the AI factories and I'm really supporting that in many ways to figure out how, in what way is it useful? Because maybe to find its niche, what value should provide?

Love Börjeson: 1:38:40

I think that's going to be a critical I don't think it's such a sensitive topic, anders. I think it's important. It is important, so you should absolutely sort of we need to talk about everything about this, because the investment is huge.

Anders Arpteg: 1:38:56

It is a huge investment we have the Invest AI initiative with 200 billion euros into it I just wish they used it for the proper way.

Love Börjeson: 1:39:05

Think about what. Why didn't you call?

Anders Arpteg: 1:39:07

Anders, think about what tech giants are doing. We've said this a bit in the past. If we were to think about how much money do they spend on research, how much do they spend on engineering to get some kind of value out from AI, I would say it's probably and that's cutting it low probably 10 times more in the engineering efforts than the research efforts. Yeah, yeah, yeah, absolutely, and that's probably a low count. And then when we see Europe, they're basically putting all the money into the research bucket and nothing in the engineering bucket.

Henrik Göthberg: 1:39:37

And of course it will be a failure.

Anders Arpteg: 1:39:39

And it's very annoying.

Desktop AI - Maya: 1:39:41

I think, Anyway.

Anders Arpteg: 1:39:44

Okay, yeah, and we need sovereignty. Yes, and it will take time, but I think we should really spend time on it, but I don't think we can wait.

Love Börjeson: 1:40:04

I think we should find a middle ground here before we have it. That's a strong point. Yes, yeah, it's. It's not really an anti-ai factories.

Anders Arpteg: 1:40:07

You need something now that works. Yeah, exactly, exactly, okay, okay, time is flying away uh, almost up to two hours here. We should uh start uh a bit more into coming steps and future outlook.

Henrik Göthberg: 1:40:20

We have a segue topic that almost moves us from where we are now into more fundamental ideological, philosophical questions, and that is a topic about language sovereignty.

Love Börjeson: 1:40:32

Yeah, and culture.

Henrik Göthberg: 1:40:34

And cultural sovereignty. How important is that, or what do we mean with that? Can you unpack what your ideas here?

Love Börjeson: 1:40:40

yeah, so I think that any sort of country with self-respect needs to provide themselves basically with language technology tools that are sensitive to that specific language and the cultural nuances in that country, and by that we usually mean the stuff that is not language, but the stuff that language talks about. So in sweden we talk about, rather than baseball, for example, midsommar. Yeah, and this is hard because all the multilingual models I mean the big languages within language technology, it's English, chinese and I don't know Spanish, perhaps even though they're starting in Spain, but still Chinese. Yeah, we mentioned Chinese, I'm sorry. So there are European initiatives trying to do this. They can't really match the big American models yet, which is a failure. I mean, it's a European failure.

Henrik Göthberg: 1:41:55

So even here we have another way of looking at the investment to preserve the identity and the sovereignty and this is a different story around even the research dimension or the actual factory dimension. It's about sovereignty around culture and language.

Love Börjeson: 1:42:13

Yeah, because also, when we select training data, when it comes to the datasets that you fine-tune them for learning, the instruct datasets that we learn them to have a conversation or whatever, or take instruction, etc. Those datasets are super important. And do we really want someone in California to decide what goes into such data sets and what is censored out? We don't really want that. And from a memory institution like KB okay, we're small, but we've been around and our collections, you know they stretched a thousand years back in time and the perspective onward is at least a thousand years forward looking. So we can't really censor stuff out because those things change all the time. We want to be able to include everything and represent everything, every variation, every content that's not directly illegal. We want to be able to represent it with the model.

Anders Arpteg: 1:43:20

This is through free speech in terms of Free speech, free speech, but free speech also in memory connection, in terms of understanding the conscious of yes yes, but given how you actually did KB Whisper I mean you actually did continued training on that right so you could potentially have sovereignty by still basing it on some of the frontier models.

Love Börjeson: 1:43:40

Absolutely, but it's important that we do the continued and also when it comes to, for example, instruction data. So if we want to do that, we can absolutely have a data set that we translate, but we also have to do stuff with it, so it's sensitive to data.

Henrik Göthberg: 1:44:03

This is very important and a nuance that maybe, if you're not technical, I almost missed it, because we are not saying that in every time, oh, we need to build our own Swedish frontier model. That's not the case. We have an objective of Swedish sovereignty and cultural sovereignty, blah, blah, blah. But we can actually achieve that by an American frontier model that we continue tuning in a certain way, or we can do techniques so we get to the result much more efficiently than to try to do our own model. Absolutely I think that is a very important message much more efficiently than to try to do our own model.

Henrik Göthberg: 1:44:36

Absolutely. That is a very important message.

Anders Arpteg: 1:44:39

And that is, I think, what we should focus on in Europe. Yeah, at least in short term. Yeah, at least in short term.

Love Börjeson: 1:44:44

Because there's a double marginalization for a language like Swedish, because first we're in relation to English, obviously, but then we're marginalized within Europe, because even within Europe we're not one of the smaller, but we're not one of the bigger. Compared with French or German, for example, swedish is a smaller language, or Polish whatever.

Henrik Göthberg: 1:45:05

So there are many ways to do this, to achieve the goal. Yeah, you know we need to build our own. Yeah, you shouldn't be. You need to be smart.

Love Börjeson: 1:45:14

Yeah, I mean, it's not important that we do every step that is sort of Swedish in every step of the way. The end result is the important thing, of course.

Anders Arpteg: 1:45:23

Value, value. I mean, let them do the big innovations, I don't care.

Henrik Göthberg: 1:45:28

As long as we can find the value, let them do the commodity stuff.

Justyna Sikora: 1:45:31

So the KB Whisperer is going to be an example of this. It is right, I stuff. So the KB Whisperer is kind of an example of this.

Henrik Göthberg: 1:45:36

I think this is the example and this is continuous training. That's what it's called essentially.

Justyna Sikora: 1:45:40

Although because they say that it's multilingual. But you know, it's only 2000 hours Swedish compared to thousands of hours of other languages.

Henrik Göthberg: 1:45:49

So you achieve a great objective with quite efficient effort and efficient model. In the end, that's how we should do it.

Anders Arpteg: 1:45:59

Luve and Justyna in five years. What do you think KB Lab will be working on?

Love Börjeson: 1:46:08

Will you start?

Justyna Sikora: 1:46:10

Actually, I'm so excited about the current projects and the current technologies that we still can use that, like in five years. I hope that we will fix OCR problems. We'll have these great data sets that we once need to just read again and see okay this letter is again.

Desktop AI - Maya: 1:46:31

It's not what it's supposed to be.

Henrik Göthberg: 1:46:32

Can you give us a scope? What is okay? Kb Whisperer. What's next? What are the cool things that are sort of that you're excited about that? I can't wait to show you what's that. Can you give us a?

Justyna Sikora: 1:46:42

scope. What will be next? Modalities Luba.

Love Börjeson: 1:46:45

Yeah, so audiovisual, I mean video so, but also Justine is going to work with the hand text recognition for the manuscripts or all the medieval stuff. That's an amazing thing for a library like KB, and such a small percentage of the physical collection are digitized. So we need to digitize and we need to describe everything.

Henrik Göthberg: 1:47:10

You're still on that journey, right?

Love Börjeson: 1:47:12

Yeah, that's going to be forever. That's five years. But model-wise I mean modalities, of course, and then, I think, agents.

Henrik Göthberg: 1:47:20

Agents. Elaborate on your take on agents.

Love Börjeson: 1:47:24

I mean, when it comes to that thing, before agents, we have the instruction models in a way sort of halfway to agents, as I see it, and if you have something that can, can you have a conversation about the collections with a model? That's it. That's a good thing. Can I have an agent that sort of understands what you need to?

Henrik Göthberg: 1:47:46

bring out. I have a problem now. I want to search for this. Can you help me?

Love Börjeson: 1:47:50

yeah, so the whole problem that we were discussing. Can you do this? Can you lower the threshold? Well, an agent could probably.

Henrik Göthberg: 1:47:57

You can have your KB deep research.

Justyna Sikora: 1:47:59

Yeah, cool, but also can I just mention that we're not yet over the whisper and audio models because we still have access, so we will be updating our wave to back model and training more verbatim.

Henrik Göthberg: 1:48:13

It's still improving. So we're still on this journey. The fine-tuning is running.

Love Börjeson: 1:48:19

Yeah, absolutely, and in the short term, I mean, we still have some text models that we need to release. We need to have something that is close to a foundational model, so we will be looking forward to some upcoming releases, perhaps this year as well.

Anders Arpteg: 1:48:35

Yes, cool, looking forward to some upcoming releases, perhaps this year as well? Yes, cool, looking forward to that. Justina, I'd like to start with you with the final question here, and it's going to be very philosophical, so bear with me. But at some point I guess you also believe that AGI will happen, yes, and we'll have some kind of AI that is, as Sam Altman calls it, on par with a human co-worker's performance, not just being able to have knowledge, as language models have today, but actually being able to reason properly and take actions properly as well, which is necessary all three, I would say, to be on par with a human coworker. And I would say, even today, ai is super bad in taking actions and super bad in reasoning, but really good in knowledge. That's my take of it.

Anders Arpteg: 1:49:28

Anyway, assume that it will happen, and then we can imagine two extremes. One extreme is that we will have a very dystopian kind of terminated scenario where machines will try to kill us all and believe we're horrible. And then we can imagine two extremes. One extreme is that we will have a very dystopian kind of terminator scenario where machines will try to kill us all and leave our horrible, or it can be more of a utopian society where we will have AI fixing cancer and other diseases and fixing the climate crisis and having more or less like free products and services and a world of abundance, as some people call it, which leaves us potentially free to pursue our passion and creativity in the best possible way. What do you think will happen in 10 years? Perhaps do you think it will be more the dystopian or utopian kind of future I hope it will be more utopian.

Anders Arpteg: 1:50:13

I think we will do yes.

Justyna Sikora: 1:50:14

Especially that I'm working with data science so I'm kind of working on that. But I think it really depends on us and how we work with it from this point, even if it's in the future and we don't know really if it will be in 10 years, 20 years, 30 years- or two years.

Justyna Sikora: 1:50:34

Or two years. Well, sometimes you think, okay, we're just in the end of this speed, of this very fast development. But then new models come and you never know. Things that two years ago, three years ago, were just, we wouldn't imagine that they would be here. They are now reality. So, as you're saying, we don't know. But I hope that if we will benefit from that, that we will think about how everyone can benefit from that, not only those who maybe develop these models, ideas, agi, and how we can spread it.

Henrik Göthberg: 1:51:11

So really democratizing and inclusive. And what you're saying is that it's up to us, that we need to steer it.

Justyna Sikora: 1:51:19

Exactly.

Henrik Göthberg: 1:51:19

Or we need to drive it in a certain direction. We cannot sit in the back seat, I guess.

Justyna Sikora: 1:51:25

Yeah, I think, like working at the National Library, democracy is very important to us, so I think it's the key word here how to democratize it.

Anders Arpteg: 1:51:35

Awesome, lovve, do you have any thoughts?

Love Börjeson: 1:51:40

Have you changed your mind since?

Henrik Göthberg: 1:51:41

the last time, perhaps, on the topic of ADI or dystopian. Where we're on the spectrum, are you?

Love Börjeson: 1:51:45

Well, first of all, I think my view on intelligence and sort of human-like intelligence hasn't really changed regarding the models, but more regarding humans. So my sort of big realization, if you want to call it that small realization, is that the human brain works in a very similar way as a big model Interesting it's like you're for Hinton in some way, I think.

Love Börjeson: 1:52:11

Yeah, in a way. Yeah, Because I mean, what happens in the artificial neurons is kind of similar to what happens in the biological neuron you reach some thresholds and then it sends a signal further. So that alone, I mean, I don't think the ADI. When we get there, it won't be a transformer, it will be something else, something that is trained, multi-sensory data perhaps, or whatever, I don't know. So, yes, I think it will happen. But to enslave humans, you don't need that. Well said, you just need an app where you can rate each other and you will bring all Stockholm to its knees, and that has already happened Instagram know, Instagram and et cetera.

Love Börjeson: 1:52:58

So I'm sorry, we already enslaved and we did it ourselves, so I'm not super afraid of that. I think I'm going to.

Henrik Göthberg: 1:53:05

It's a novel kind of insight, I think, because then it has nothing to do with AGI. No, it has to do with humanity.

Love Börjeson: 1:53:14

Yeah, how we steer that Exactly. So I'm going to lean into what Justyna said. We have to democratize it and then we can survive and probably use it.

Anders Arpteg: 1:53:23

Very thoughtful. Thank you so much, justyna and Lovve, for coming here speaking about the amazing progress that we have seen in KB Lab. Please do continue the amazing progress.

Henrik Göthberg: 1:53:35

I'm looking forward to upcoming releases this year, as you promised here, so that will be amazing and uh, and for me, I said it when we started and I, and on the same note, you are rock stars, guys. I think, actually, I think you are rock stars.

Anders Arpteg: 1:53:50

Yeah, what you have provided, you know, in terms of providing models and ai to to the swedish society, amazing. So please keep up the good work.

Henrik Göthberg: 1:53:57

Thank you for having us.

Desktop AI - Maya: 1:53:59

Thank you. Thank you for your time.

People on this episode