MetaDAMA - Data Management in the Nordics

4#7 - Victor Undli - From Hype to Innovation: Navigating Data Science and AI in Norway (Eng)

Victor Undli - Data Scientist NoA Ignite Norway Season 4 Episode 7

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 31:27

«I think we are just seeing the beginning of what we can achieve in that field.»

Step into the world of data science and AI as we welcome Victor Undli, a leading data scientist from Norway, who shares his  insights into how this field has evolved from mere hype to a vital driver of innovation in Norwegian organizations. Discover how Victor's work with Ung.no, a Norwegian platform for teenagers, illustrates the profound social impact and value creation potential of data science, especially when it comes to directing young inquiring minds to the right experts using natural language processing. We'll discuss the challenges that organizations face in adopting data science, particularly the tendency to seek out pre-conceived solutions instead of targeting real issues with the right tools. This episode promises to illuminate how AI can enhance rather than replace human roles by balancing automation with human oversight.

Join us as we explore the challenges of bridging the gap between academia and industry, with a spotlight on Norway's public sector as a cautious yet progressive player in tech advancement. Victor also shares his thoughts on developing a Norwegian language model that aligns with local values and culture, which could be pivotal as the AI Act comes into play. Learn about the unique role Norway can adopt in the AI landscape by becoming a model for small countries in utilizing large language models ethically and effectively. We highlight the components of successful machine learning projects: quality data, a strong use case, and effective execution, and encourage the power of imagination in idea development, calling on people from all backgrounds to engage.

Here are my key takeaways:
Get started as Data Scientist

  • Expectations from working with cutting edge tech, and chasing the last percentage of precision.
  • Reality is much more messy.
  • Time management and choosing ideas carefully is important.
  • «I end up with creating a lot of benchmark models with the time given, and then try to improve them in a later iteration.»
  • Data Science studies is very much about deep diving into models and their performance, almost unconcerned with technical limitations.
  • A lot of tasks when working with Data Science are in fact Data Engineering tasks.
  • Closing the gap between academia and industry is going to be hard.
  • Data Science is a team sport - you want someone to exchange with and work together with.

Public vs. Privat

  • There is a difference between public and privat sector in Norway.
  • Public sector in Norway is quite advanced in technological development.
  • Public sector acts more carefully.

Stakeholder Management and Data Quality

  • It is important to communicate clearly and consistently with your stakeholders.
  • You have to compromise between stakeholder expectation and your restrains.
  • If you don’t curate your data correctly, it will loose some of its potential over time.
  • Data Quality is central, especially when used for AI models.
  • Data Curation is also a lot about Data Enrichments - filling in the gaps.

AI and the need for a Norwegian LLM

  • AI can be categorized into the brain and the imagination.
  • The brain is to understand, the imagination is to create.
  • We should invest time into creating open source, Norwegian LLM, as a competitive choice.
  • Language encapsulates culture. You need to embrace language to understand culture.
  • Norways role is a sa strong consumer of AI. That also means to lead by example.
  • Norway and the Nordic countries can bring a strong ethical focus to the table.


Data Science and AI in Norway

Speaker 1

This is Metadema, a holistic view on data management in the Nordics. Welcome, my name is Winfried and thanks for joining me for this episode of Metadema. Our vision is to promote data management as a profession in the Nordics, show the competencies that we have, and that is the reason I invite Nordic experts in data and information management for a talk. Welcome to Metadata, and today we're going to go back to exploring data science as a profession on the one side, but also talk a bit more about AI. With me today I have Viktor Undle. Viktor is working as a consultant, as a data scientist, hands-on on a daily basis, so we get a really a petitioner perspective on where are we at with data science.

Speaker 1

And, as a bit of an intro, data science has gone from a hype to being more or less an integral part of the as I call it, the innovative potential of many companies in Norway, and we've seen that it's been not a straight journey. It's been a rocky road on some places because we have certain misconceptions on what data science actually is, what value data science can provide in an organization and how to utilize data science to our advantage, and we kind of see the same things happening with AI and Gen AI. Throughout the last years, there has been a hype and it gets a bit opaque. What is the value that we want to get out of it? And I think it feels like many organizations are more looking for.

Speaker 1

So I have a solution. Let's find some problems to solve with it rather than looking at oh I have these problems, let's find the right solution that fits the problem. So where are we at today with data science in Norway? How do we create value through data? Who gets involved? In what questions we ask the data and how far have we actually come with a Norwegian language model? And ultimately this is a big question that has been asked a lot is Norway as a country ready for what is coming? So welcome, victor.

Speaker 2

Thank you.

Speaker 1

I think, before we start jumping into this huge topic, maybe there's some time to introduce you, if you can talk a bit about what you do at work, what your role as data scientist is and maybe also where your interest for data science, data AI, comes from.

Speaker 2

Yeah, so I work as a consultant for Ungdottemro. I'm working as a full-stack data scientist and I have been for around three years as of now. And when I'm not working I do like to be social to sports, so I play, for instance, football and I play some golf, but I also like tennis and squash. So I spend a lot of my spare time on those hobbies. But in the last few years, ai and data science has sort of become a hobby by itself, at least if you look at all the hours I spend on it. When it comes to my interest, I started my studies in mathematics because I really enjoyed math and then I suddenly fell in love with programming and then I realized that data science as a field, you can combine both math and informatics and programming. So, yeah, I certainly found that field to be perfect, and I also find it quite interesting working with data because it's so accessible and you can achieve so much with seemingly small performing models Really interesting, and you mentioned ungno.

Speaker 1

Maybe that needs some explanation to everyone that doesn't live in Norway. What is ungno and what did you do there?

Speaker 2

Yeah, so ungno is a webpage for teenagers where teenagers can send in questions about anything. Teenagers can send in questions about anything. There are also a lot of good information articles regarding typical topic you encounter as a teenager, but my job is mostly related to the service where they can send in questions. These questions need to be distributed to the correct folder where an expert will answer the question. There are roughly 75 different folders, so my project was to distribute this question automatically using natural language processing. My role ranged from developing the proof of concept and eventually set up the infrastructure needed to take the model to production.

Speaker 1

Yeah, I think what stands out with ungno and this is also how we got into contact is ungno as a project, as a website was nominated for an innovation prize at the beginning of the year, and what I think stands out is there is something about the social importance of Tordun. You really can make an impact for teenagers in Norway that have important, maybe questions that are a bit hard to ask people they can ask anonymously through a website, so you can see that the work you're doing has actually an impact on people. That was the one thing, and the other thing that really impressed me is that we're not looking at something entirely automated. We're looking at a solution where there's still a human in the loop, and maybe this is one of the good examples for human in the loop in AI projects how do we distribute information to the right person at the right time for the right reasons? So would you say that and that's a broad question, I guess but would you say that Ungno as a project is a good example for value creation through data science?

Speaker 2

Yeah, I think it is a great example for use of data science for several reasons, one being that the society has a great need for it. Being a teenager can be quite hard. You don't really know who to ask or what to ask for and what help you need. So I think it's a great service for the community and, in a data science perspective, I think this just shows the power which are in data science. You can't just simple idea effectively freeing up valuable time for others by automating tedious tasks and you can use that time on something more meaningful.

Speaker 1

Yeah, and this goes, I think, quite directly into that discussion. Ai is taking jobs right, and this has been a discussion that we heard a lot in popular media, right, but this is an example where AI can actually effectuate the work you're doing to be more effective, more precise, more reliable in your results, but you still have the human in the loop to make the final decision. Let's talk a bit about the role of data scientists in general. I think the project at Ungern NO is a quite good example for that. When you study data science and you have that threshold where you go from university to your first project or your first work, there are certain expectations, right. When you started, what expectations did you have for your role and for your career at the early stage?

Speaker 2

As a data scientist, I expected to be challenged daily, continuously learning while working with cutting edge technologies and models. I also expected to have time to develop the best possible model for each project. You know, just kind of chase, that last percentage of precision really. But that was not really, or not exactly, the reality. I would say I have been working with cutting edge technologies a bit less than I initially expected, the reason being that the classical also work pretty decent and also effective. Time management is a crucial point, so choosing your ideas carefully is very important. So I end up creating a lot of benchmark models with the time given and then try to improve them in a later iteration if it's considered worthwhile.

Speaker 1

Interesting. You talked about cutting-edge technology. I think that's an interesting one, and I think that a lot of data scientists really want to do a difference on the one side with their projects, but also want to embrace technology and the possibilities that are out there. And then you kind of see the reality in organizations where you have a lot of legacy tech stack, you are not as updated as you should be, maybe even the hardware is not as efficient as it should be, and you kind of get into like a squeeze. On the one side, this is what I could deliver if I had the right tools available. On the other side, these are the tools available. What can I deliver? How do you work with that? Not being able to deliver everything with the best technology, with the cutting-edge technology, but have to adjust?

Speaker 2

to the reality. It kind of contradicts everything I learned at my university, because I really spent a lot of time just deep diving down into mathematics and tried to figure out where we could squeeze out performance. But at the same time, I do see that there is a need for a model as well, as we don't always have the time and budget to invest in chasing those last percentages.

Speaker 2

And also a lot of the time I use in work, versus when I studied, is data engineering tasks versus when I studied its data engineering tasks, for instance, setting up the general life cycle of a machine learning model, whilst in the university I just focused on the machine learning model for the most part.

Speaker 1

This is interesting because we had the podcast episode with Alan Oner in season three where we talked about that there is a gap between academia and industry. He advocated for that. We closed that gap by bringing more people from academia directly into touch with the industry. So students learn about the reality in organizations already while studying. Now for you and this is kind of interesting because you can look at that from more of a practitioner perspective You've been at university, you've seen the reality in industry. What do you think? Can we close that gap?

Speaker 2

I think it's going to be hard, to be honest, that being because it demands resources which not every company have. If you work for a large organization, I can definitely see that being a good approach to collaborate with the universal institution and the researchers, but I think for most businesses they are going to play a primary role as a consumer of the advancements that the academical researchers find out.

Speaker 1

That's interesting, Thank you. There's another thing that I wanted to ask you about, and that is that, well, on the one side and we talked about this earlier in the podcast there's something special about Norway when it comes to public sector. There's something special about Norway when it comes to public sector, because public sector in Norway is quite in front with much of the technological development. Maybe there's more investment in the public sector than would be in other countries.

Speaker 2

Now, do you, as a practitioner, see a difference, working for public sector in Norway and private organizations? Yeah, I think I do. My interpretation is that public sector takes things a bit carefully. They are more concerned about privacy and those kind of things. I'm not saying the private isn't. Often they keep things in the house and I do think it is, as you say, more investment in the private sector. But I do believe that the public sector as well have realized that this data science wagon is something they need to jump on. So I do think we'll eventually see more funding coming our way and maybe it opens up a bit the speed which we're able to do data science and to digitalize the public industry.

Speaker 1

I think you already started looking in the glass bowl and see where the future for data science is at, and I think that's interesting because we had this podcast episode earlier with Rasmus from Tetra Pak and he talked about decision science as a field. So going beyond data science but combining data science with behavioral science to really get answers to a lot of the questions that data science by itself can't solve. But if you combine it with other practices, you will advance. What is your thought about that? Is that a way to go forward or are we already there?

Speaker 2

I think we're just seeing the beginning of what we can achieve in that field. I do think that technologies such as to learn machines that can anticipate human behavior and mood, for instance, can be of great value, but I do not think we have reached the peak yet. I do think we have a lot to expect from Netfeel in the years to come. That's a positive outlook.

Speaker 1

I like that, yeah. So that's got a bit negative here. Talk about common challenges for data scientists and organizations, and I know we have talked about a couple of those already. But there's something about what you encounter when you start in a project. So what conditions do you encounter when it comes to understanding what your work is actually about, right? On the one hand, there is something about you as a data scientist.

Speaker 1

You have certain ways of thinking, certain ways of working on your projects. It could be that they don't fit with the organization, so you have to do some communicative work to make them fit on the one side. The other thing is and this is something that is kind of interesting that many organizations have kind of a engineering mindset when it comes to technical discipline, so you want to have a yes or no answer, black or white. At the same time, data science is much more exploratory and dealing with uncertainty. So you can say that there's a chance it would be yes to a certain amount, or a certain amount of chance it could be a no, but you can't for sure say this is going to be yes or no, and that are two different mindsets. That kind of clash in many organizations. How do you work with that? How do you work with that? How do you reflect around?

Speaker 2

that. To just go to the first part of your question. So when I enter a project, usually there is a lot of enthusiasm and motivation to do data science. They also usually believe that their data set is golden, which very often is not the case. So you do have to spend some time trying to get the client or stakeholder on the same page as you and what you can expect. And they are also very ambitious and there's a correlation with what you can achieve and the quality of your data.

Speaker 2

So there also is this underlying problem, if I can say that you want to, of course, satisfy your stakeholder or your client, which very often asks can we do this or can we not do this? Which is basically they want a yes or no answer, but in reality you have to compromise between their expectations and what you think you can achieve with time given. So most often when people ask, is this something we can do, I say we can do this, which is a beginning, which is a start for what you're trying to achieve, and then we iterate on that project and see how close we can get. I think that is the healthy approach to kind of solve that underlying communication issue.

Speaker 1

Yeah, very much like the iterative approach around that, that you work towards the goal together with your client. You mentioned something that goes a bit beyond the mindset and a bit more practical, and that is about challenges with the data itself. You already said there are a couple of engineering tasks you have to do, so you can't really exclusively focus on data science and the work you're doing, and you mentioned the data lifecycle as something I think is really interesting, also as an input to the work you're doing. Do you think that organizations have the right conditions to collect, correlate data in a way that's actually usable for you? Or is much of that collecting of data and curating of data done more or less without real intent or purpose for data science?

Speaker 2

I believe you can find CDUs for most data sets. But most organizations have a lot to gain by investing more time in how they curate data, because it's extremely easy to make small decisions which you don't really think have an impact when you eventually are going to do machine learning on your data. But in some of my cases I can see that through the years that the data have kind of been washed out. And by that I mean if you wash your black T-shirt a thousand times, then you have a great T-shirt. If you don't wash it correctly, at least that's the same thing with your data. If you don't watch it correctly, at least that's the same thing with your data.

Speaker 2

If you don't curate them correctly, at a certain point they're going to lose some of their potential, and I truly believe that data has never been more valuable, and the better the quality of the data, the more valuable it is, so I would encourage all organizations to invest more time in how to curate data, and I do think that I'm not going as far as calling it a problem, but I do believe most businesses isn't as good as they could have been.

Speaker 1

I like that because it goes back to that classic description of data's the new oil, right? And then a lot of people jumped on that and said, well, data's not the new oil because, well, copy data, you can reuse data. You can't use up the value of data. But I think you are right If you don't correlate it correctly, if you don't get it into the right shape, it will wash out. I like that example.

Speaker 2

Yeah, I'm sure you've heard of it. There is this famous saying trash in, trash out. You can't help putting it on the edge.

Speaker 1

But there is some truth in it, and that saying has been used a lot with AI models lately. So really, the quality of the input has maybe one of the biggest effects on the output. So how do you get the input in shape? And if the input is not in shape at the point you take over your project, what can you do? So this is more a question about how much data engineering work are you actually doing as a data scientist to get the data into the right shape?

Speaker 2

There are a lot of things you can do, which I learned in my academic years. Unfortunately, I also learned that not all of them works as well as you wish them to them to. But you can fill in gaps for missing data, which is what I spend the most time doing when I have to curate the data that's already there. Also, I try to look for information that I can use to kind of derive more information by somehow mutating the data set without obviously changing the descriptive capabilities in your data. But I'm fortunate that the dataset at Indodermo it's a complete dataset. It is a bit washed out but I have found some ways to mitigate the effects of the data losing its can support the curating of the data and getting the data ready, or do you think this is something that there is a potential to automate?

Speaker 2

I think actually, it's really important that when you start new projects, or at least projects where you're going to collect data, that you have a data scientist on the team which can kind of try to to figure out this, this, this glue poles or pitfalls which you can go in which eventually will will cost you some quality of your data. I also think that data science is a team job, because you do a lot of just thinking and evaluation by yourself, and it is extremely useful if you have someone to spar with. I do think we have a lot of expertise in the field to handle what's to come, but it wouldn't hurt to have some more.

Speaker 1

And that kind of brings us automatically to the last topic we wanted to discuss today and that is the need for Norwegian language models. And one thing is bringing expertise and competency in and actually gathering competency around Norway, putting Norway basically on the map in AI, which I think is interesting. But maybe before that maybe you can say something about how do you work with language model.

Speaker 2

Yeah. So what I typically like to do is I divide large language models, or LLMs, in two parts the first one I like to call the brain, which is the part that understands language and its meaning, and then there's the second part, which can produce texts and sentences which we often see these days when we interact with ChatGPT and Cla cloud, and the list goes on. That's the part I call the imagination. I use mostly the brain to understand text and then derive information from it. I try to steer clear as of now of the second part, the imagination, because I'm scared for hallucination which can have pretty bad effects for previous cases. I don't know them.

Norwegian Language Model Advantages

Speaker 1

Interesting, and I like that you divided it into two parts the brain and the imagination. I think that makes it a bit more clear on how you can use and utilize. And then there's the question of and we talked a bit about it already when it comes to competency but why do we need a Norwegian language model? Is there a value in that that actually fits with the work we put into?

Speaker 2

Yeah, I do believe we have a lot to gain by investing some time in creating an open source Norwegian LLM that can serve as a competitive choice, the reason being that we need to create a Norwegian language model that fits our culture, our tone of voice and our values, and I also believe that, when the AI Act starts to kick in with full force, it would be a great value for Norwegian businesses if we have an open source Norwegian language models which we can utilize.

Speaker 1

Very interesting and maybe you can elaborate a bit on what you mean by a model that fits our Norwegian tone of voice and culture, because I think this is interesting that when you use an American model, for example, there's a difference, right?

Speaker 2

Yeah, because when I typically interact with, say, let's say, gpt, and I want it to produce some sex for me, it's more pretentious than I feel we naturally do in Norway. We also have different culture and different approach to certain dilemmas and challenges we meet on an everyday basis. I think it just would be weird if especially public sector are going to use these models and you kind of get answers you feel are a little bit off because they use type of language or a tone of voice that doesn't quite sound so familiar. And I know a lot of people are quite skeptical to all of this AI. So I just feel if what they produce felt more like home, it would be easier to swallow.

Speaker 1

I think you're right there. I think there's something about that cultural impact that is really carried by the way we speak, and something that is interesting here is also that language is naturally developing, so you need also a model that is adjustable for that. So what could be interesting to figure out is who especially who, but also how do we maintain and even enhance that Norwegian language?

Speaker 2

I do think that academia are well suited. I think the public sectors should collaborate closely with the academic community and make sure they have enough resources that they can continuously provide improvements of language models that can somewhat compete with the really big LLMs.

Speaker 1

And that brings us to putting Norway on the map, really, and if we want to compete, and if we want to be part of the search in AI, what role can Norway play, do you think, once the AI hype itself is settled?

Speaker 2

I do believe that Norway's role should be as a strong consumer of AI. It would be really cool, but we don't really need to compete with giants like OpenAI and Google. Instead, I think we should lead by example for small countries, demonstrating how to effectively utilize large language models in our industry. I believe we should develop a robust procedure to find Q and open source LRMs to reflect the Norwegian tone of voice and values as I mentioned, I think that there's also an element here that Nordic countries in general made some experience through the podcast as well.

Speaker 1

There's a strong ethical value focus, and I think that is something that we as Nordic countries are really good at and that we should also bring to the table.

Speaker 2

Yeah, I totally agree. I think ethics is something we are different from, for example, america, so that's an extremely important part to try to capture in a Norwegian LLM.

Speaker 1

So time goes really fast and we are already at the end of the episode. But before I let you go, maybe you can have some reflections about what we talked about. Maybe you have some key takeaways or even a call to action.

Speaker 2

Yeah, I would say really the key to a successful machine learning project. I'll divide it into three parts. One thing is your data Value is generated by carefully considering how you're gathering the data and curating it. Make sure you keep its potential over time. The second part is you need a good idea, a good use case. The last thing is obviously a good execution of the idea or your use case.

Speaker 2

Now, with two of those three required data science experience to find good use cases or good ideas. Certainly it helps knowing the limits of data science experience To find good use cases or good ideas. Certainly it helps knowing the limits of data science. Really, I can't stop but to think about the famous quote imagination is more important than knowledge, because I truly feel that kicks in in the data science field and it just needs a clever mind. It doesn't matter how many tools you have if you don't find a use case to apply them. I've also had people with no data science experience come up to me with great ideas, so I would also encourage just more people to. So I would also encourage just more people to involve themselves in the process of developing ideas. So, to sum up, invest time in your data. That's where the real value lies.

Speaker 1

Thank you so much, viktor, and it was great having you as a guest on the show to bring that practitioner perspective.

Speaker 2

Thank you very much. Thank you for having me. It was an honor.