
AIAW Podcast
AIAW Podcast
E155 - The Dawn of Agentic AI: From Theoretical Concepts to Practical Implementation
Get ready for a deep dive into the future of AI with Episode 155 of the AIAW Podcast! Join us as four industry experts —Luka Crnkovic-Friis (King / Microsoft), Martin Lundqvist (Arundo Analytics), Vilhelm von Ehrenheim (QA.tech), and Jesper Fredriksson (Volvo Cars)—unpack Agentic AI: From Practical Implementation to Strategic Implications. Discover how agentic AI’s autonomy, adaptability, and goal-driven intelligence reshape industries, from single-agent workflows to multi-agent ecosystems. With expert insights on real-world applications, system design, organizational scaling, and navigating risks, this episode delivers actionable strategies to harness AI’s potential while staying ahead of its societal impact. Don’t miss this chance to explore the agentic era—tune in and start shaping your AI-driven future today!
Follow us on youtube: https://www.youtube.com/@aiawpodcast
I think, in general, just Google is like you know. Really Is Google hitting it right now.
Henrik Göthberg:Yeah, yeah, they're cooking, they're cooking, it's. You know elaborate. What are they cooking?
Luka Crnkovic-Friis:I think it's kind of interesting. It took them a while to get this momentum behind and get the models in order and get it into the products and suddenly they come out first with a really killer model Fantastic the Gemini 2.5 Pro. Come out first with a really killer model fantastic the Gemini 2.5 Pro. But also a set of applications on top of it, like their new deep research coding tool, a general kind of agentic, firebase Studio, firebase Studio, yeah, exactly.
Jesper Fredriksson:And then there's some kind of agentic protocol, agent to agent. Yeah, exactly, multi-agent.
Vilhelm von Ehrenheim:And also this new kind of infrastructure for inference as well.
Henrik Göthberg:Has this been several releases? Was it the 25th and onwards?
Luka Crnkovic-Friis:Several releases that were just coming in very short order.
Anders Arpteg:Maybe it was text, right. Then they improved the image generation. They improve the video generation, right? Yeah, we improved the Audio and the music one. They had a music one, if I'm not mistaken as well, so that a music in range, like soon.
Vilhelm von Ehrenheim:Oh I haven't tried that one, let's try that one. I tried the.
Martin Lundqvist:Suno. It's pretty amazing. Yeah, it's pretty amazing, the Suno.
Luka Crnkovic-Friis:Udio is also really really good.
Anders Arpteg:I mean it's a really broad base of covering all models for everything. So I certainly agree, Google is really starting to pick up.
Martin Lundqvist:I don't use image generation very often in my work, but I did take SOAR out for a ride after the latest release. It's starting to become pretty good. Just by uploading some pictures and asking it to manipulate it, it actually produces some pretty good quality stuff.
Luka Crnkovic-Friis:And GPT-4 over multimodal. I mean, that's the big now image generation.
Anders Arpteg:I still have my prediction for 2025, where I say open AI will start to demise.
Henrik Göthberg:Yeah, you said that earlier.
Luka Crnkovic-Friis:I'll take a note of that.
Martin Lundqvist:Can.
Jesper Fredriksson:I pick you up on that. Isn't that a hot topic? Anders?
Anders Arpteg:Well, we have Luka here.
Jesper Fredriksson:We can make it hotter.
Henrik Göthberg:I'm sure we can make it hotter.
Martin Lundqvist:Speaking of which, to what extent do you guys believe that it is the kind of developer experience on top of the foundation models that will be the area of competition versus the actual models? I think if you look at OpenAI and the Aegis SDK that came out, it seems like is that an area of fierce competition as well? Is that why we say OpenAI might be kind of falling behind?
Henrik Göthberg:But you heard Sataya in an interview say OpenAI is not a model company, openai is a product company and I think that sort of speaks to where they want to go. This is my interpretation from the top boss, to be honest.
Anders Arpteg:Yeah, and just going back to Lama4 as well, I think it was interesting for a number of points. One is that it actually stopped training, I think in September or something in last year, right?
Jesper Fredriksson:August.
Anders Arpteg:And it's kind of weird that they didn't release it until now. They probably did a lot of either it's like safety checks or whatnot. I think actually they were really afraid when 01 came out and all these kind of reasoning models came out and they didn't really have a good sales USP for Loma4.
Luka Crnkovic-Friis:What do you think Until DeepSeek gave a recipe on how to do it?
Anders Arpteg:And then they could do it. But Meta haven't done it right. They haven't released a reasoning, not yet.
Jesper Fredriksson:They're saying it's in the works.
Vilhelm von Ehrenheim:They're reading the R1 papers now very thoroughly.
Anders Arpteg:They're using LAMA to read the paper.
Luka Crnkovic-Friis:But there is a case for non-reasoning models. I mean GPT 4.5, it's such a wonderful, wonderful, weird model that makes absolutely no sense.
Anders Arpteg:It works surprisingly well. But still also, I think another point with Lama 4 is that, okay, it was not perhaps the big news that people were expecting, but they had like three different versions One, the behemoth, the biggest one, it's not even open source. They may choose to make it open source at one point, but I think also the smaller ones, the two other ones, the Maverick and the what was the name of the smallest one? Sooner or something. Anyway, even those are not available in Europe. And another prediction I'm trying to say is that open source will be harder and harder to make, not the least because of regulation purposes. And now we're seeing even Meta, which is supposed to be always going in open source, not really allowing people in Europe to use it.
Jesper Fredriksson:Do we know if that's because it's multimodal? The earlier models came later to Europe because they were multimodal or something specific.
Anders Arpteg:I think that's what they were saying.
Luka Crnkovic-Friis:Which is kind of interesting because Google is going in the other direction. Now, with that they're releasing in Europe directly. Before there was like a lag but it's not open source, but but still yeah but also the like uh open ai. Now you see, they're going to release uh, an open source uh model allegedly at least right.
Henrik Göthberg:Open weights okay, open ways open weights, but and do we have a time frame for that or we do? You've used of an announcement uh it's.
Anders Arpteg:It's hard to say, even if you knew you couldn't say yeah but, I was trying to get a scope.
Luka Crnkovic-Friis:You can you know? You know? I genuinely don't know.
Jesper Fredriksson:You have to ask again in one hour so, luca, you can blink if it is the secret model, that's uh.
Luka Crnkovic-Friis:Topping, that's very good and it's not the microsoft based one one, you know guys with that.
Henrik Göthberg:I think it's start time to kick off the real episode here and I'm I'm like a kid in the candy store and I already apologized to all the guests that you know. I really want to listen and learn, but when I get excited I can't fucking shut up. So I'm sorry for the listeners and for everybody. Starting like this. We're going to talk about agentic AI and we talked about the theme where we call it from practical implementation to strategic implications and since we have so many in the panel today, we will have some house rules. But before we get started, let me introduce the guests, or actually let the guests introduce themselves, and we will do it in a more sharp way. All of you guys have been on the show before, so you know if everyone listening wants a more detailed introduction or you know background to who is Luca and you know what did he do before. I will now, you know, do it like this. So we had Luca here in the pod.
Luka Crnkovic-Friis:Episode six this is the first year Wow.
Henrik Göthberg:At that point in time you were CEO or founder of Peltorion, I can't remember. You were two co-founders right, yeah. Yeah, and so, with that background, go to episode six if you want to know more about Luca. And now I want Luca to introduce himself from the context of adgentic AI. What are you working on? Or, like you know, your two minute introduction of who Luca is, luca Friis, you know, and where you work. And what are you working on in the context of adgentic AI, and then we go around the table like that.
Luka Crnkovic-Friis:So I get to go first, you get to go first, you get to go first, I'm doing it.
Henrik Göthberg:Clockwork, I'm doing it clockwise, that's all.
Luka Crnkovic-Friis:So I am the head of AI at King, the makers of Candy Crush, which is part of Microsoft, and speaking from a King and Microsoft perspective, agentic is the big thing on everybody's mind and tons of different projects, tons of different exploration, what it means, how do we adjust the workflows, how do we think about the infrastructure, and, yeah, we will get into a definition discussion, so I'm not going to ruin it now, but I'll say like there's nothing to it. It's like the key thing, the key insight, is that intelligence is all you need.
Martin Lundqvist:It's the models. I thought it was love, but okay.
Henrik Göthberg:Attention is all you need. What is love is all you need. Okay, good, now we move to Martin, and Martin was a guest in episode 56. So this is a while back and please, martin Lundqvist, introduce yourself properly In the context of agentic and agentic AI and how you're working on that now.
Martin Lundqvist:Thank you, henrik. So CEO of Rund Analytics. We build solutions for the asset heavy industry AI before AI, we call it machine learning, before that, first, principles parameterized models, hybrid Kärt barnmånga namn, as you say in Swedish, agents became something that was very top of mind for me roughly a year ago now. I think maybe nine months ago.
Henrik Göthberg:It's a year ago we talked about this.
Martin Lundqvist:Without going into definitions, which is challenging at this point, it provides, since we work primarily with time series, data and asset bases, which are fairly complicated, we have been using knowledge graphs to structure that knowledge for some time and it turns out that knowledge graphs are fairly good as a starting point for implementing agentic AI. Whatever that is, we'll get back to that. We're going to get back to that yeah, so currently we are having quite a substantial part of our resources that the company focused on agentic AI.
Henrik Göthberg:And I just want to give the backdrop and it focused on agentic AI and I just want to give the backdrop and sort of. You know, luca was the guy who was sort of in one of the conferences I went to the way he sort of brought it up Very short around that we were starting to do the research on that and then I happened to talk to you and we were like, okay, this is happening, so how do we deal with that? Arun is doing? You know, we're doing all the old stuff, but we really need to also figure out our position or our marketing and our product in this context.
Martin Lundqvist:So that is a year ago.
Henrik Göthberg:That's probably like a year ago.
Martin Lundqvist:Yeah, that's right.
Henrik Göthberg:Yeah, all right. Moving on to Jesper and Jesper, you were here last week, so I don't know how many episodes I need to talk if I'm going to give your name. But Jesper, you were here as a guest in episode 116. But, jesper, short introduction from the agentic perspective as AI lead engineer, I guess in Volvo.
Jesper Fredriksson:Yes, that's me. So I think my key interests in agentic AI is both within code generation, both as a user, and dabbling with it myself. But what I've been mostly working with is automating work in analytics, doing analysis, generating DBT code, all of those things trying to find out interesting things about business from an analytics point of view using agents, agentic workflows. You could say and I agree with what Luca said that intelligence is definitely what sets it apart and that's a real bottleneck. I would add also context to as you were saying.
Jesper Fredriksson:So to have the stored knowledge from the company that is important to make decisions in a good, structured way.
Henrik Göthberg:So I, I, I combine your approaches cool and I remember at ndsml you were, you were presenting there and I think it was around that we were talking about that like one of the key use cases. You know a product manager in Volvo and we you were working closely with him and you were trying to define okay, if you were to hire a business analyst and you wanted your product reports or whatever that person as a data analyst should do, how would you build that? And I think that was one of the as a framing view of agent or whatever.
Henrik Göthberg:And this is how many months ago Not a year, but maybe a year, almost also a year. I think it was in the fall, it's in the fall, right, it's in the fall. Okay, a year. I think it was in the fall. It's in the fall, right, it's in the fall, okay, good, and now we have wilhelm. And wilhelm, you have been a guest in two separate episodes, yeah, in the one hand side, representing the brain of um mother brain, and he had the brain of mother brain, and then, of course, uh, with where you are now, uh, qa tech. So I have you up here now so people can follow you on episode 117. I guess that's QATech.
Jesper Fredriksson:Is that the QATech one? Qatech.
Henrik Göthberg:But if you want to go to the OG episode with Wilhelm, you need to go further back and look at the Mother Brain episode. But here we go. So you are building at the core of the topic today, I think. Tell us, what are you doing?
Vilhelm von Ehrenheim:Yeah, so I'm a co-founder and chief AI officer at QA Tech, and what QA Tech is all about is building agents in order to be able to automate different user flows and use that for testing, so essentially developing like synthetic agentic users that you could kind of test your web pages with before releasing to production.
Henrik Göthberg:So quality assurance testing as fundamental field, and then how do we identify that?
Vilhelm von Ehrenheim:Yeah, exactly, I think what's clear when you look historically, testing has been around for a very long time and a lot of automation effort has gone into this field. Historically, testing has been around for a very long time and a lot of automation effort has gone into this field, but it's still kind of stuck a little bit in very scripted tests that are very hard, kind of hard coded and well defined and if anything changes in the platform, it's very hard to kind of maintain those and you need to update them all over the place. So, zooming out a bit and looking at this from an AI perspective, then you can handle different variations in the input and let the AI take decisions more like a user and analyze each action that it does in order to try to achieve a goal or a plan or something.
Henrik Göthberg:Yeah, and in many ways you're actually doing fundamental user acceptance testing. You're doing fundamental user testing.
Vilhelm von Ehrenheim:You're acting as a user, testing the system and then documenting and getting that all as smooth as possible yeah, exactly, and we're also using knowledge graphs heavily, so we kind of scrape the pages to kind of build up the understanding of the of the customers kind of very system that we need to exactly yeah, and, and I will do something.
Henrik Göthberg:So this is the intro, but I will actually ask you, anders, as well as part of the panel today, as and me also as part of the panel, what is your? Do you have any specific relationships in your work right now to again tick, or you know how do you see that?
Anders Arpteg:well, I think we will want to improve the functionality and value from AI in general and we know AI is really good at perceiving information right now especially text, but also images and other types of data and then we can start to see some kind of reasoning working. But still, I would argue that humans are much better in reasoning than AI is today, but AI is better in perceiving data today. And then what we also want to do is actually to have the AI system starting to take action, and that is where we're lacking a lot today, and this is what I would say Argentic is starting to get more into, but still, humans are so much better at taking actions than AI is today, but all of these kind of different parts are starting to be. You know, that's where all the research is going right now, and it's also something that, of course, we at work want to improve on. And as you are starting to see AI systems becoming good at also more taking action type of work, the value will significantly grow in any kind of company.
Henrik Göthberg:So that is what we are working with as well. Everybody's working on it. It can't be avoided. So now we're going to do something we've never done on this pod before, and that is to have some sort of structure. So the way I, because we want to cover a topic, from practical implementation to strategic implications. So I'm going to ask you, goran, to help me be a little bit of the time master, and we're going to, we're going to structure the podcast this time in. We're going to have, you know, five or six distinct topics and basically the way I'm leading now us to play this game is that we're going to go bananas in a time frame of 20 minutes and then it's going to be a full break and then we go down rabbit holes like crazy for another 20 minutes on another topic. So we get some sort of controlled chaos maybe I don't know and the topics are laid out in the following way First topic defining agentic AI. What are the core characteristics? Or, like Anders, can we bring it down to the most brilliant one-liner 20 minutes and then we're going to talk about topic two practical implementations. Why does agentic AI matter now, I don't care about the future in 10, 15 years. Why do companies need to start thinking and caring about this now.
Henrik Göthberg:Topic three now we start how we build systems. So we start with designing single agent systems, framework components, strategies, and here we can go nuts on different topics. You know ins and outs. Topic three 20 minutes. We'll merge into how we scale this topic. So now we go from single agent system to multi-agent platforms, organizational, technical scaling. So we sort of we start with fundamentally understanding how we build agentic systems and then what are the considerations when we scale this out in enterprise way.
Henrik Göthberg:And here, you know, martin added a topic that we need to cover within topic three and four. We kind of need to talk about the data. It may be almost like its own topic, but I try to see how we can understand data and the requirements of data in relation to agentic, if that differs both for single systems and agentic systems. So you will be the moderator of the data dimension in topic three and four, yeah. Then we go to topic five challenges and risk. And maybe data is the risk as well, I don't know. So we're understanding. Now, okay, we technically know how to build systems. Where can we fuck this up in terms of agency? You know, blast, radius, blah, blah, blah. You know what are the dimensions that you need to really watch out for in terms of building a gigantic systems.
Henrik Göthberg:And then we have topic six, where we call it strategic outlook and it's not as, maybe, physical as we typically do it the last couple of topics. So it's more like preparing practically for an agentic future. So it's a little bit like going down the trajectory to understand how we organize for the trajectory today. So you need to take a philosophical or more fundamental long-term view on it. But practically, what's the direction that you kind of need to care about now? And when we've done those things 20 minute, 20 minute, 20 minute then we have maybe some time for wrap up and reflections, and here we can go bananas if we want to go philosophical, if we want to give our key takeaways, or if we just want to drink beer, I don't care. So hopefully we have a little bit of slack in that. That's how it's going to run.
Henrik Göthberg:So with that, any questions, let's do it, let's do it. We start with the first topic, defining agentic ai, and of course now, if I started with luca the last time, I'll start in the opposite end. Oh, wow, that's great. And and uh, I don't know I there's a lot of buzz going on and there's a lot of, you know, hype going on, and I think I see a lot of definitions. Where is that really again, you know, I don't want to. That's this, you know, entry point to really you know what is again taking again the guy and what differentiates that from what we've done in the past. And I think we can go shortly around the table and then we'll see what this leads to, if we can get to a conclusion. But could you start, willem? How do you define again AI?
Vilhelm von Ehrenheim:Yeah, I like to think of it from like older definitions that comes from the robotic side of things. So in general an agent is something that can kind of operate within an environment, you can observe its environment and it can take actions that affects the environment, and then kind of usually that's also tied to some kind of goal or high level instructions. So an agent then, like agentic AI, would be an AI system that can act in an environment, take decisions and actions. Then that will kind of try to achieve a goal.
Henrik Göthberg:Okay, so we have taking action, making decision achieving goal.
Vilhelm von Ehrenheim:Observation.
Henrik Göthberg:Observation as well. You got in there. Okay, I'm going to let it free now, so I don't want to oversteer it. So anyone wants to go next?
Jesper Fredriksson:I can go next. So I think the fundamental thing is, to me, action, because that's what's missing in the previous paradigms. We have RAG, where it's just about maybe making a chatbot answering questions, but taking action that's the thing that will drive so much more value. If you can do the work of much more people with just one agent that can take actions, then you will get much more out of AI. We're all struggling with proving value in some form or another. Sometimes it's easy, sometimes it's less easy.
Jesper Fredriksson:But with agentic AI, when you can take actions on correct basis, then I think you have something. Then you can do a lot of further definitions. If you go into the actual implementations I think I talked already last part about the way Anthropic talks about it they're saying if the LLM is the one controlling the loop of the system like if you iterate until something is done and you let the AI choose when it's done then they're saying that that's the agentic pattern. But I tend to be less strict when I think about agentic AI myself. If it's something that takes action, then I think of it as agentic.
Henrik Göthberg:Is action and decision making synonymous? No, it can take action, but not make decisions.
Jesper Fredriksson:It needs to be able to do something in the digital world To make a decision. That's just like saying yes or no, so I don't think that's important, but take action is also something you can think more about.
Anders Arpteg:What does take action mean? Is it actually about choosing which action to take, or would a tool use about calling an API be considered an agentic system?
Jesper Fredriksson:I think so.
Luka Crnkovic-Friis:My own definition by first. I don't think it's a, since agent is a very general term that comes from, as I said, from, among other things, from robotics. There are many definitions. I think it's easier to like even to use the, this new made up agentic words because then we're talking about the more narrow domain.
Henrik Göthberg:Yes, and to me it's an LLM with a tool and when it got useful, when the LLM was really smart and there's mostly there has to be a task like task, like a sort of you give it a task and it can do it in an iterative way you're stepping into a deep, tough ground, because this is the argument that me and underscout stock on you know going, you know, with the autonomous vehicle example, that is, I think, is a good example and you can do it better than me, but we're basically we've been lacking control. So maybe again it's the control part If you take an autonomous vehicle example. But if you now want to actuate something, I think it's a challenge that that could be a very narrow definition to define a gentic. But I think the fundamental challenge is how that will play out if people go crazy into this but don't think about perception or get observation.
Anders Arpteg:Just going back to what we spoke about take actions versus choose action. I think that's an important distinction here. Using an API, that's an action you can take, but if AI systems were forced, like a manually hard-coded rule, to always do a web search, for example? If you say that's a tool, is that agentic? No, but my definition is LLM plus tool.
Luka Crnkovic-Friis:So you have to have the intelligent model.
Martin Lundqvist:I guess I stumbled into a definition post fact. I had a conversation this is a year ago. I had a conversation with one of our customers we've been working with for a long time the big chemical organization and he's the head of reliability across all these sites and I had a meeting with him and he said I have this story to tell you I don't know if you guys are working on this right and he said look, last week we had a catastrophic failure on a gas compressor chain in Singapore. Okay, and he said immediately I needed to understand, do some root cause analysis. I asked the team to bring me a distribution of the mean time to failure across all similar types of compressors, which are called All my Factories. And he said a month later they came with a report. So I stumbled into this.
Martin Lundqvist:So that's when I realized, hang on, if I can build a tool that translates his question into a number of steps and then those steps can each be executed through API calls. Some observability testing that the steps have been executed correctly. Collect the data, massage the data, present it back. Some observability testing that the steps have been executed correctly. Collect the data, massage the data, present it back. Some visualizations I'm sure I could imagine a world in which that didn't take a month but maybe a minute. And that's where I started talking about it's LLM plus tooling. It's maybe a couple of LLMs looking at each other, but it's tooling, it's looping, it's evaluating. And then we had a conversation and I realized that's so. My thinking of agentic comes more from the point of view of utility, if that makes sense, that kind of thing I need In order to solve that. I have to have tools that are talking to each other.
Henrik Göthberg:But why is and not automation? I think automation and agentic can go hand in hand, but I don't think it's the same thing. Does it matter? I think so because from a marketing point of view, maybe we can just go bananas and do the hype and all that.
Luka Crnkovic-Friis:But it's the delegation of the autonomy of choosing which action.
Anders Arpteg:What's the? Minimum agentic system. What's the smallest kind of system that still would be considered to be agentic? I think what you've described is a rather advanced system that takes a number of steps and it continues until it's considered to be done.
Luka Crnkovic-Friis:I like that that's an anthropic definition.
Luka Crnkovic-Friis:I think. I think that the most powerful and elegant system that's out there today, that's just brilliant. It's Cloud Code from Anthropic and it is because it's super simple and it's just a very smart LLM and it gets a bunch of tools and one of the tools is like yeah, you can spawn another copy of yourself and delegate something to it so you don't pollute your own context, and that's it, and with that you can do incredible things, Like we're starting to use it quite a bit now at King. It's designed for coding, but give it access to a calendar via MCP or email and it can become your email system. There is this multi-step reasoning action, reason, action, reason, action and do things in parallel when you couple it with a smart enough model.
Vilhelm von Ehrenheim:But I think that's super important. When you think about it from an iterative approach and you have some kind of high-level goal, then I think those are very important components in it. For it being a Gentic, if it has a goal and it needs to figure out what tools to call, what information to gather, I could imagine your example being not the Gentic if you just kind of hard-coded exactly which information should be collected and then done in some kind of MapReduce fashion, and then you have a report.
Martin Lundqvist:That wouldn't be agentic in my opinion, but if you have to kind of Just insert right there and I don't want to interrupt you, but I am actually, of course, what we could have done is to say, let's build an app that calculates the mean time to failure, and it would be no agents involved. But the thing is we don't know what the next question is going to be, and that's the thing that the next question might be on something completely different and the steps are going to be different. I think that's the beauty, right, and the work you did manually there, like this step.
Anders Arpteg:This step, that's what we expect the agentic system to figure out, but that's a rather advanced one, I would say I mean if it takes multiple steps and it's decide what action to take at each step, then I think no question it should be agentic. But what's the minimal part? I mean, if you go to ChatDBT today and check the web search button, you force it to be using web search. Is that single thing that it has to do web search before answering your question? Should that be considered agentic?
Vilhelm von Ehrenheim:I think it depends on the implementation. So if there is some kind of rule saying if there is a link somewhere in the text, then follow this first and then blah, blah, blah, then it wouldn't be agentic. I think the same goes for if you just have the simple calculator tool. For example, If you ask an LLM to sum up two large numbers, for example, and it decides to use the tool in order to gather information.
Anders Arpteg:That's a good point.
Vilhelm von Ehrenheim:If it decides to use the tool in order to gather information. If it decides to do.
Anders Arpteg:It needs to decide to do it. Manual is set, it has to use web search. I wouldn't consider that to be a gent.
Luka Crnkovic-Friis:No, I agree, and that's the difference, like we're between if you take chad gpt between the just plain search and the deep research a little bit like you.
Henrik Göthberg:The tools use an llLM. It highlights the right ingredients, but it doesn't differentiate. You know what? The autonomy?
Luka Crnkovic-Friis:dimension is it has to be an iterative process In a notes.
Henrik Göthberg:I, I, you know, I jotted down some core characteristics so it's a little bit like my definition. You know, a perfect definition would have something that covers these core characteristics and then I proposed autonomy. Adaptability for me is feedback loops or perception that in some ways orients itself in a context, goal, orientation, interactivity, context goal, orientation, interactivity. So the tricky point here and this was my and Anders' argument that it might be more useful Anders' argument to make it very narrow, simplistic, agentic definition.
Anders Arpteg:But I think the point that I was trying to make, that's the old agent definition. It's not potentially the new buzzword agent.
Henrik Göthberg:Yes, and this is where we got into a fucking almost fight. Not really, but almost but, because my argument is that we need to reclaim agentic into agent, because I think it's fucking dangerous to talk about agentic as control without thinking about the consequences of observability I mean. So building something which has no feedback loops to me is kind of scary, so I don't know.
Luka Crnkovic-Friis:I just want to put some fire on the it does have. I mean you can build in feedback loops like coding, the typical example where you have tests, you have output and so on. There is an observability and I it can be at least.
Henrik Göthberg:So this is the point I said. If it doesn't have that, you shouldn't call it agentic. You should maybe call it RPA or cognitive RPA. If you truly want to call it a Gentic, it should have the agent characteristics. And Anders was a little bit like holding back.
Luka Crnkovic-Friis:He wasn't wearing your camp. Yeah, I don't agree with you.
Henrik Göthberg:No, please elaborate.
Luka Crnkovic-Friis:I thought this was so funny. Example from yesterday right, I was leading a workshop for a pilot project, like 50 developers, and I was just showing best practices and tools and so on and I had my party trick was adding a new feature to Candy Crush. Now that's a code base that's 26 gigabytes large. It's almost 15 years old. It's C++. It's like very, very daunting. It doesn't fit in context of anyone it does not fit in context.
Luka Crnkovic-Friis:No, and I wanted to add a feature like tap to activate. When you tap on a booster, it sort of activates. Now this would be like had we had a novice developer doing it, or even a real team. This is like weeks, months worth. I've tried this before. It works one in 10 times. Of course.
Luka Crnkovic-Friis:This time I wanted to demonstrate, like when it fails, what do you do, and so on. Of course it's worked this time. So then another like ask somebody for okay and different feature. Now somebody said like okay, when we match four candies, spawn a frog, all right, and off it goes. Changes files there's hundreds of different files, typically and five minutes later I can build it, run it. It starts and I go okay, shit, this is impressive. I match four candies and I get a green fish, and then I start looking for the reasoning logs of the model and it says like ah, this is frog, that's complicated, that's in this set, but I can use a fish. It's green, it's aquatic and it's actually a much better booster than a frog. He was thinking. And that was no, there was no observability of its own actions really there. But I would call that really agentic, agentic yeah, yeah, I agree.
Vilhelm von Ehrenheim:I don't think it necessarily has to be tied to observability. I would say a system that actually takes decisions and can dynamically work towards summarizing this report, for example, and then give you the report back. It would definitely be agentic without necessarily Okay.
Henrik Göthberg:so this is good because then we can narrow. It seems like if we want to, you know, in 20 minutes now converge on something. We are kind of going for more, more strict view on this Four minutes or five minutes.
Anders Arpteg:I have the timing here, it's fine.
Henrik Göthberg:Yeah, and because then I like that? Because then maybe what I'm talking about when I'm trying to push that into the definition is very ambitious, and I should rather push that into the conversation of challenges and risks and how we build systems and you know, not to up the definition.
Luka Crnkovic-Friis:Not having observability is a bad idea.
Henrik Göthberg:Yeah, totally right, it's super important, yeah, and I thought it was such a bad idea that I wanted to push it into the definition. But you say hold, hold your horses.
Martin Lundqvist:So be sharp. So here's the layman's attempt um agents they have. They may have access to tools. Now, depending on you, wire them together. That might be agentic or not.
Henrik Göthberg:Ooh.
Anders Arpteg:Good. If you manually hard code it, it probably would not right Exactly. I think the choice of action is really key in some sense and Anders always wants to get to the.
Henrik Göthberg:you know what's the definition in one word or in one sentence Intelligence. Okay, but you had an interesting definition. You tested on me earlier.
Anders Arpteg:I mean it's the same as Jesper said before. I think the two one either the most simplistic one or lightweight version is the ability to choose action. The one, that anthropic, ecclesiastical, which you said, jesper as well, is the ability to choose when to stop. It's more advanced. I think that's a more advanced agentic system. If you have that kind of capability, I think you could potentially say a system is agentic even if it doesn't have to say when it stops, but a certainly more advanced agentic system would also continue until it chooses to stop.
Henrik Göthberg:And explain to me now in layman's terms what is the implication of saying to decide when to stop?
Anders Arpteg:If you take the deep research that basically all of the big frontier AI systems have, it basically will continue to do research, do web search and then continue to do that iteratively until it comes to a point when it says I now am satisfied and I'm going to publish my results.
Jesper Fredriksson:Then it chooses to stop Exactly. It's kind of hard to know when is it good enough? That's not easy, no.
Vilhelm von Ehrenheim:Yeah, but I think that kind of boils down to it having some kind of goal or a task that it needs to complete. It needs to do a lot of different things in order to get to that kind of state of completion and then analyze whether or not it was actually successful.
Henrik Göthberg:So that definition implies goal orientation. I think so. I think so right. Can you decide not to stop if you don't have an objective function?
Vilhelm von Ehrenheim:Well, you could potentially give up as well.
Martin Lundqvist:You can give up. Yeah, you set the recursion limit.
Henrik Göthberg:Alright, so how do we? You know, because I think we reached some kind of nice consensus. I think, Okay, so now now the second part of that question, and now we have two, three minutes on that. How is this, uh, fundamentally distinct from traditional regenerative ai or chat bots or whatever? What is the distinct difference that we're discussing now to what we've done in the past?
Luka Crnkovic-Friis:the models are smarter I mean that that's the whole thing. Like try, all of the stuff that we're doing now was done minutes, not minutes. But a week after GPT-4 was released we had AutoGPT that essentially did all of this thing, but it didn't work because the model was too dumb.
Anders Arpteg:You could also phrase it like stupider models are able to complete tasks, more advanced tasks, by being agentic right, by taking these kind of iterative steps. You can have a smaller model that can achieve more complicated tasks I want to push it further, would you?
Luka Crnkovic-Friis:agree. Yes, but I think, like for it, it's more like the agentic, the modern type of agentic stuff, started to work when the models got smart enough to be able to pursue a more long-term trajectory.
Henrik Göthberg:But how has our objective as coders or developers or consultants or product developers changed? When we say now we're adding agentic compared to what we did before, how would you frame that?
Vilhelm von Ehrenheim:I mean like usually I think we go back to what jasper said before then you start kind of adding more action-based things into uh yeah, into the problem, so like if you have been developing a chatbot and then now you suddenly want, like maybe it's a support agent, for example, now it suddenly can help you reset your password and it can kind of uh help, uh yeah, it can do, we can email somebody, you know that's that suddenly becomes much more useful and in the sense of um, actually automating things and being more similar to a co-worker, that is super important and I think that changes how you can develop products. It.
Henrik Göthberg:It changes how you can kind of attack different problems that you see in an organization and what is then distinctly different, what we're talking when we used to work at Gentic versus something was called RPA or cognitive RPA for the last five years. What is the difference?
Anders Arpteg:Manual programming.
Henrik Göthberg:Right, it's the way it's done right. Business rules manual programming.
Martin Lundqvist:Yeah, somebody had to train an rpa system and the rpa is then deterministically business rule, even to the point where it's yeah, and if you change one of the softwares, rpa is dead. You have to go back and yeah so so.
Henrik Göthberg:So the distinction then is a little bit like how it works.
Vilhelm von Ehrenheim:it's the same, as I said in my intro, with a, with a automated tests, right like they're very hard coded and scripted. It was the same with RPA it was very hard coded and scripted. There's not really any decisions being done there.
Anders Arpteg:The RPA doesn't choose the action. No exactly.
Vilhelm von Ehrenheim:You just have like do this thing, then wait 300 milliseconds, then do this thing. But if something changes in between there, that it was a little bit unexpected, then it doesn't work.
Anders Arpteg:All right, I think that's enough enough on definitions, but it's an important one.
Henrik Göthberg:Get it to 20 minutes. Yeah, 10-20 minutes. So now we take topic two and I kind of want to keep this. I'll try out 10 minutes, but I give you 20 minutes. 10, 15, 20, we will have sort of are we done in 10? Do we need 20,? We will have sort of are we done in 10?, do we need 15?, do we need 20? And the topic I frame now is practical implications. So why again take AI matters now? So it's a little bit like we can see a trajectory, but I want to really get the discussion on why we really need to care about this or do this now. So what is the immediate practical impacts, benefits, risks for teams and organizations, what are the concrete changes? Why now? And what is the benefit? Why are we going for this now?
Jesper Fredriksson:I'm thinking that the models have gotten smart enough to do it. Now, at least we're very close to it. Some of us have maybe access to smarter models very close to it. Some of us have maybe access to smarter models. But there's always every time you let the AI make a decision, there's a certain risk of error in that decision, and as long as it's on the order of 90% times correct, then anything that's a longer term thing will break down. But now we're getting to a level where at least simpler tasks, maybe more advanced tasks, can be put through many steps without failing.
Luka Crnkovic-Friis:And they're intelligent enough to backtrack when they do errors. I think the reasoning models that really sort of pushed it over the edge of usefulness.
Henrik Göthberg:The way I interpret what you said now, and I fully agree with it. It's a little bit simplistic saying well, you know what, that's where the productivity frontier is at. If we are old school companies that haven't figured this out and we're working in the dark ages, it's a little bit Amish, right. If you want to go from a, this is a little bit like there's a horse and now there's cars, and if we want to discuss where we are in the productivity Frontier, it's actually now where the models are smart enough that this is feasible and it's happening now, and then in the next year we are in a horse versus car moment, maybe and I I agree with that, but I still think there are a lot of companies that you're saying that are living in the dark ages, that haven't even tried drag, that don't know the, the sort of limitations yes, but they need to.
Jesper Fredriksson:They need to start with that. I don't think you should rush into agentic.
Henrik Göthberg:Uh now but they need to wake up. It's a little bit like we are. We are little bit like we are driving our horses around and there are how many people, how many cars in the street, how many years until New York was shifted from horses to cars the other way around.
Luka Crnkovic-Friis:Also about RAG is that those that were slow actually did themselves a favor. Yeah, yeah, they skipped something that was perhaps a dead end. Yeah, for sure.
Henrik Göthberg:But it's not the point right the productivity frontier, what is doable with the compute and with the LLMs and everything like now has opened up the automobile and it's up to you to get your head around that in order to think about when you're going to switch.
Luka Crnkovic-Friis:Am I taking it too far? It's still early stage. It's early stage and the timescale is super compressed, so what is important now is to start thinking about it really seriously how this affects ways of working. How do we deal with?
Anders Arpteg:these things. I think also, it's early days, as you say, and I think a lot of people and companies underestimate the complexity of actually taking actions. Just to give a concrete example here, let's take something like parsing an invoice. We have really good perception capabilities with LLMs today. They can look at the image or the text or the content of an invoice and they can easily understand and perceive that better than humans easily. But then you want to take an action on it and you want to potentially put that information that you extracted from a PDF into some other EPR system or whatnot, or ERP system. But then, okay, how do you do that? Let's say that you don't have a well-working API. Or let's say you even do have an API. How do you get that kind of action being taken?
Luka Crnkovic-Friis:You use one of the computer use models, but that's not working. Yes, but you know why it's not working. Because our images and LLMs are not working, because we are clip compressing the hell out of them, so they're losing all their spatial data.
Anders Arpteg:They need to reason about the DOM instead, but the point is they are super, super much into prototype stage yet they're not anywhere near, I would say, as mature as the perception part of LLMs are. So actually doing that, either you manually have to code, which I think you still do, how to call the API and to put that kind of paying that invoice, or not, into the RPE system. You have to manually do that. You must have to take the action today.
Luka Crnkovic-Friis:It's like for many types of systems we're using now internally to automate some of the workday and other software. That's really really poor interfaces and for basic things like booking a vacation or something like that, or yeah, and it's like, yes, you have to verify and you, as a human, you typically, if it's something going to production, you do have to supervise it a couple of times when it does it and modify the prompt and sort of highlight guide it yeah I don't necessarily agree that like that.
Vilhelm von Ehrenheim:It's a too hard of a problem like it. I think we have a lot of different uh use cases where we we test systems.
Anders Arpteg:That is about kind of configuring invoices, sending them and all of these things already created the api right and build the tool for it haven't.
Vilhelm von Ehrenheim:Yeah, we have the tools for interacting with the browser. We're not creating tools for sending invoices. Yeah, in the system, that's the trick. So what? We're? Yeah, exactly, and we're. We are using multimodal, where we're both looking at the screenshots of the interface together with, uh, kind of condensed representations of the dom and those things in order to make it like smarter and better. But. But I think that I think six months ago or even shorter, these models couldn't figure out interfaces at all. If you asked it if it's in dark mode or light mode, it would just say oh, this is in dark mode because the background is black, even though it was a completely white picture, because they were not trained on those kind of information. Rich interfaces, to your point, with the clipping codings as well yeah, what do you say?
Anders Arpteg:it's mature enough that you would put in production but it's so much better now.
Vilhelm von Ehrenheim:That's what. That's what I'm saying it's so much better now it works. It actually works now, which it didn't do like it does. It does if you structure the problem in a in a in a correct manner in In a computer use sense using keyboard and mouse kind of interface.
Anders Arpteg:Yes, yeah. I would put a system in production paying my invoices without human intervention or supervision at all.
Vilhelm von Ehrenheim:That's more about trust.
Martin Lundqvist:I think yes, because if you're not, mature enough, we'll get there in the agenda, point four. But, don't trust, but as it relates to computer use, I think you have the path of either going the API route or you go navigation route, and I think the API route yes, I mean, of course the big incumbents are going to take that route, but for me there are so many ERP systems, hmrs systems. I think that the DOM route is actually. I've seen a couple. I'm working with one team now. It looks really promising.
Henrik Göthberg:What is DOM route versus APM?
Martin Lundqvist:You don't reason about the image. You reason about the actual site.
Henrik Göthberg:Or both the site structure.
Martin Lundqvist:Or you can do both. Yeah, exactly. But then imagine, because every screen typically has like there's one action you're going to take and there's one submit button.
Martin Lundqvist:That's how clever you need to be, Exactly so that, To answer the question, why does it matter? Now I think we came in from a slightly different angle. Number one is everybody was talking about the death of sauce. Yes, I'm sure there's a podcast or two episodes about that. But, on a serious note, we see a ton of startups in our field coming out building things much faster, et cetera, et cetera, and they're claiming to be agent first or AI first.
Martin Lundqvist:That's not a definition we need to get into right now, but of course, we felt a little bit of a pressure. We need to kind of keep modern right. Just be very honest, we need to keep modern right. That's number one. Number two was our customers are still struggling with just retrieving information. So the average engineer on a site spends two to three hours a day just wrangling information. So that's still a problem to be solved and they have various different systems et cetera. And number three is that dashboarding your way through this is not going to solve the problem persistently. That's just not the way. So I felt almost like, given that these tools were becoming more and more useful and now they're becoming very useful still early days it feels like we have to do this. We have to look at how we can use.
Anders Arpteg:No one questioned the value of it. I think everyone wants to have these kind of autonomous systems in place. I would still argue that we have a lot of companies using LLMs today to perceive information and be able to query that. I would question if there is even a single one that have a fully productionized autonomous system, for example, working with payments automatically. Do you know of a single one that has a fully productionized autonomous system, for example, working with payments automatically? Do you know of a single one doing that, like in a keyboard mouse kind of?
Luka Crnkovic-Friis:way right. Not the API base, yes, but very few Okay, but still it's extremely few right.
Luka Crnkovic-Friis:Yeah, there is a Microsoft department that works specifically on the type of computer use and they're oriented towards enterprise companies and in particular this is kind of billing, Like it's not a hands-off situation and they have an enterprise customer with it in production. But you can think of it as a combination of computer use and rule-based. It's not rule-based because it's in prompts but you don't have total flexibility. It doesn't go in the blank context looking at the application. So it does have guardrails and things like that. It does have guardrails and things like that, but it's still.
Anders Arpteg:Compared to how many people that use LLMs for the perception purpose or knowledge management compared to actually automating it. It's extreme difference, right? Sorry, I'm laughing because there is like in OpenAI.
Luka Crnkovic-Friis:There's a big thing that I call the perception problem. Llms are really bad at perception. 've got the perception problem like llms are really bad at perception.
Vilhelm von Ehrenheim:It's like they create the perception they're referring to something different, but I think they're they're. Maybe these kind of examples that you give are are like the, the kind of most scary things to put something into production for, like just paying things uh, randomly, uh, whereas you have a lot of different automation tools that come out today. That is more about information extraction and searching on a lot of different websites and extracting that into an Excel sheet or something which is also agentic and works from this more.
Martin Lundqvist:People are also fairly creative. I know for a fact that I'm not going to point out my finance department in this podcast. I guess I did. But of course you can use agents to create your set sheet and then you upload the Excel sheet to the ERP system, right, so you can find shortcuts using this.
Anders Arpteg:That I think a lot of people can do. This is happening, but this is a task and maybe not fully, again dictated.
Henrik Göthberg:Still it's on the right path. It's on the right path. It's on the right path.
Martin Lundqvist:Because once you start trusting that Excel sheet, enough. You're going to ask someone why don't you just automate that last step? For me too, right.
Anders Arpteg:That's not so easy. I think it's a big hurdle.
Martin Lundqvist:Excel sheet.
Luka Crnkovic-Friis:Then it is to actually integrate with another system. But the thing is computer use. It will be sold within six months with a very smart way, or in 12 months in a stupid way. Can we hold it to?
Anders Arpteg:that. Yes, that's like you promised to buy me a dinner. It's not sold in 12 months.
Luka Crnkovic-Friis:What's the?
Vilhelm von Ehrenheim:definition. When you're saying that as a problem, like, is it then completely open ended, like you can do pretty much anything? Yes, yeah, no way.
Luka Crnkovic-Friis:Then I would say it's pretty, pretty steep as well for a for computer use, like like, what are you doing now, like the the best are doing, like on the web benchmarks like 50, 60 percent? Yeah, I, I would say that uh, it's like self-driving cars.
Anders Arpteg:You know it's easy to get like 90 correct, but the most important is the the edge cases.
Vilhelm von Ehrenheim:That I agree but what I think is super interesting in a lot of these use cases, like we're thinking about it from this very open-ended approach, like you should just go into this page. You have no context of it whatsoever and you should do something like a goal, which is generally not the way that you phrase a problem, if you have like it from an automation perspective.
Vilhelm von Ehrenheim:The only reason for us being able to do this is from a automation perspective. Yeah, this is similar to what we. The only reason for us being able to do this is from a query perspective is that we can look at the page and analyze it in detail and then kind of execute the similar things over and over and learn from those things right. And I think that's the same. If you try to automate some specific part of a, of a of an invoicing system, that should definitely be possible to do today.
Luka Crnkovic-Friis:I think, and it builds in some robustness compared to traditional RPA system. If a button moves or if they change rounds it will be able to figure out you can still have the scaffolding that generally guides it.
Martin Lundqvist:I think it will layer on. Like you said, that in the beginning we're going to see, for example, mid-sized ERP players who are struggling to start using these to fast track and work around the API integration problem. I think that's what we're going to see. It's going to be narrow in the beginning. It's super valuable, could be onboarding right, so you onboard a new customer. Why wouldn't you try this out? It could be kind of harmless. It's just hard.
Anders Arpteg:But it's super valuable.
Henrik Göthberg:Yes, but let me do a mid-summary of why we need a genetic AI and why it matters now. So in one hand side I get around the table that we kind of all believe in the trajectory. It's super clear that the research and the money spent in this direction is taking us fast and, if Luca is right, we have solved some complex problems early. It doesn't matter when I'll qualify that, but it's a little bit like we all believe in dude. Systems are getting more gigantic, whether you like it or not. So when do we now need to? Why do we need to care about it now? And then, if I flip in Anders' argument to this, there's two ways of looking at it. In my opinion, what I listen to, either you know what, even if you understand it's going this direction, you kind of need to start now in order to understand the complexity of getting it right. This is one argument. Another argument could be you know what? We need to wait. We should sit in the boat and wait and be cool, not to do the rag mistake once again.
Luka Crnkovic-Friis:So I think there's also like one thing that we haven't really discussed, and it's that it's slow and it costs a lot, like it's not the best solution for everything, like, for instance, rag might be a good solution. Imagine you're doing your Google search type of thing.
Henrik Göthberg:You want results directly.
Luka Crnkovic-Friis:You don't want to go off and wait 15 minutes for it. And on cost, I mean Cloud Code that I mentioned. I looked like I said one weekend when I was weekend, when I was working four hours on a project that costs $350. And these systems they get better by using more and more compute and that makes it not suitable for every type of problem, for every type of the Can we?
Vilhelm von Ehrenheim:Compare setting like a human on that task instead. That's so much more costly. That's right, that's right.
Martin Lundqvist:Yeah, I think we need to kind of set up expectations, and the way we talk about enabling human language interfaces to interact with your site and your site data is think of it as your next employee. So are you expecting someone to do that presentation in two minutes? No mean, if you, if expectation is that this typically takes a week, then you can go and have a coffee. But you're also right that there is this sort of you know, if something takes more than 300 milliseconds, you start getting nervous. So there's a gap between these expectations. But I agree with how much. I mean the benchmark is asking someone and you know, another human being do it but also like the the cost thing.
Luka Crnkovic-Friis:Like, especially when you start looking at larger organizations. Say, king has 2000 employees they're not developers, but let's assume that they are and something costs in compute, say, a thousand dollars for the agentics coding system per month per employee. That's only $2 million per month. That's a real cost that you have to find real budget and so on. And up until the agentic systems most of the LLM stuff has been, as long as it's internal facing and not sort of customer facing to millions of people, it's been cheap. But now we're in it suddenly. Oh, this is actually, yeah, still cheaper than a human, but this is not actually replacing a human. This is augmenting.
Martin Lundqvist:And that's another problem. I think it's very difficult to actually monetize on productivity benefits. So each developer will be more productive, absolutely, but how does that actually translate to cash? That's not straightforward, and that's the case we need to make, I think.
Vilhelm von Ehrenheim:There's definitely those kind of cases where it makes sense.
Vilhelm von Ehrenheim:It might need to be a little bit more constrained from a problem perspective, but I think when you look at it from the QA side of things, it's super costly and time-consuming to have manual QA.
Vilhelm von Ehrenheim:If you want to run an entire test suite of your entire page with all of the different details and stuff, you need to send it over to for manual QA and it takes maybe at least 24 hours before you get something back and then we can instead run it on every deploy in 10 to 15 minutes and at a fraction of the cost. So think so. You have these different kinds of like segments where it makes all the sense right and I think the problem where you'd spend a lot of kind of compute on on coding interesting things and get more productive. But if you don't have that, if you have that tied well to your roi, then it makes sense like if you, if you build more things, that you will earn more money for it, then that productivity is kind of well worth to take the next, it's more of a definition problem what you're actually spending time on right, yeah, but that does that mean that the conclusion could be that look the companies that start today.
Martin Lundqvist:They will hire way, way slower and surpass the, the people, the companies that have hundreds and thousands of people, hundreds or thousands, not hundreds and thousands, but hundreds of those people. That's what I kind of wonder. Hundreds or thousands, not hundreds and thousands, but hundreds of thousands of people. That's what I kind of wonder, if that's going to happen. So we'll have software companies operating at the scale of huge companies, but with 15 people.
Luka Crnkovic-Friis:Yeah, I mean we see it now on the sort of small scale where individuals can produce something as a midsize company, one man unicorn. But there is, I think, generally across the board. There has to be, especially for large companies, a change of ways of working, like fundamental change, because I think a lot of the delays if I look at King, if I look at Microsoft and this is, I think, generally true is that we're slow because there are so many people involved in each process. So when we want to do really feature A, from ideation to production, there are so many meetings.
Luka Crnkovic-Friis:And then somebody's off on vacation and sort of it drags out and there are so many interdependencies.
Henrik Göthberg:But let's come back to this, luca, because that's going to be around the organizational challenges and risk and all that, and this is really my pet peeve around this topic how to define a Gentic, where almost it is a technical definition. And then there is something else going on here in order to make this work. But if we summarize now, try again something about the trajectory that this is happening, willem made it good in. You know why does it matter? Now I, like I can go completely into your ideas and that, well, you know what we think this is going to be the trajectory and you need to care now in order to start sorting out which of your more dumb or simple processes are really suitable for it now and which one is not, is a little bit more scary. We do it later, so it's not maybe that. Why should we care about again AI now? Is that we're going to do everything with it, but we need to really start understanding it, and then we need to start understanding it in the right pockets, to learn and to scale it out.
Anders Arpteg:I still think you know people underestimate the complexity of actually taking actions still. So I think it's more that we should start planning for it, but I think it will be more important to do it in the future.
Henrik Göthberg:Yeah, but does it matter now? Are you in the camp that says hold on a little bit and we can watch it and we go hard into it later, or do you think you need to start engaging on it in order to learn enough?
Anders Arpteg:later, or do you think you need to start engaging on it in order to learn enough, start planning for it but not really Awaiting for it to become more mature before you invest too much in it? I like that.
Martin Lundqvist:Yeah, I wonder if the whole space is mature enough to have a strong top-down opinion about it yet, because it feels to me like there's so much new capabilities coming out at such a high speed that just observing the team experimenting with new, new tools, new ways of doing things, finding new productivity pockets, it's wonderful to see. But if somebody asked me what's my opinion on how I should structure this, I would be drawing blanks. I'm like right now I love the fact that these tools are emerging I think you you nailed it together.
Henrik Göthberg:If I, if I put your argument together with lucas argument, why does it matter now? Because we need to understand there is a trajectory here. We need to start planning for it. But if we are organized in the wrong fucking way, we have other blind spots and blockers to why this will work. But if we need to start experimenting in order to figure your thing out.
Luka Crnkovic-Friis:They're also unknown.
Henrik Göthberg:You know, I have some theories, but it's like so if you haven't even experimented and tried it, how can you know anything about the unknown? It's a blind spot. So if I go with the trajectory and then I'm really equally cynical as Anders we need to really it's not mature. And then I go into this is a blind spot. That trifecta means you better need to start experimenting on it now to plan for it to do the hardcore changes in organization that's what I'm getting to.
Luka Crnkovic-Friis:I think there are some good prototypes for that. If you look at some AI labs like OpenAI and also Anthropic, if you look at like, how are they using this internally Exactly? There are a bunch of interesting lessons there. And they are on the next level, like they've gone really full, all in on it, but they also this is so new and these capabilities are so.
Anders Arpteg:Many companies are so far behind. I mean they just need to get to working with LLMs at all to start with. In many cases.
Vilhelm von Ehrenheim:I think a lot of companies are behind on automation in general. Yeah, in general.
Henrik Göthberg:Why does it matter now to start experimenting, to start learning, to start figuring out the blind spots? Isn't that a fair why now?
Martin Lundqvist:I could buy an argument that says if you don't start experimenting now, you'll have a big problem later. I mean I could buy that argument.
Luka Crnkovic-Friis:Plus, I mean it is actually useful within, and this is where I don't fully agree with Ander.
Henrik Göthberg:I think there there are domains, constrained domains like coding is one of them where they actually perform really well already today like it's useful so find the pockets, learn in the pockets in order to learn the bigger game of where we need to change organizational mandates and how we make decisions, and that's why we need to start. This is at the same time now with the accelerating productivity frontier, or innovation frontier. Kurzweil, the longest, don't even start. Can you catch up?
Luka Crnkovic-Friis:I don't think it's a crisis. I mean, there was like a really marked development. So at Peltarion time, Peltarion was part of a forum where I think Volvo was part of it and a bunch of it was ABB, Saab, IKEA, sort of chief AI officers, CIOs there, and it was always like Spotify was also there and yeah, we had Peltarion.
Luka Crnkovic-Friis:It was always like we and Spotify were like going giving lessons on. This is how you think about data, this is how you do machinery. This was we showing them. There was this massive gap. Then, when Generative AI came around, that gap just disappeared. You had these old companies doing exactly the same things that we were doing, so there's already been a great equalization there and democratization.
Anders Arpteg:Should we leave this topic. Yeah, the topic now is in 25 minutes.
Henrik Göthberg:Yeah, so summarize the topics, and then I start and I set up the last topic. Okay, last, you summarize, no, no, no, you summarize it.
Anders Arpteg:I think it's awesome. I think we all agree on the value. It would be a substantial contribution to the value of AI systems if we get agentic workflows to work, and I also agree that some cases, like you said, william, and I agree very much with you, luke, as well that, for example, coding where human is still in the loop, but you can have agentic that actually do a lot of number of steps that is later validated and reviewed by humans, then perfect. You should go there today, no problem Doing fully automated. I would be very careful, today at least, but for some use cases it actually is working rather well, but it's still more complicated, I would say, than some people think and we need to experiment in order to learn, in order to look at the bigger picture, what needs to change in how we organize and decide stuff.
Henrik Göthberg:All right, moving on Stretch, you know stretch Topics three and four how do we design single-agentic frame systems and then, moving that trajectory, that's in 20 minutes, and another 10, 20 minutes how to make that multi-agent platform.
Luka Crnkovic-Friis:That's a false dichotomy. You just use the same agents with an LLM. That's smart enough.
Henrik Göthberg:All right. So where do we start there? I saw that.
Henrik Göthberg:But, let's start decomposing. How do we go into that topic? Even you know, designing agentic systems, building agentic systems. I have one angle of starting point which is a typical enterprise starting point. Should we buy from a vendor of some kind? Should we buy a single tool application vendor or should we build something our own? Build an open source frameworks or something like that? That's one way of going into this topic. How do I design or build Iantic systems? Do I buy my shit out of the way? On what level do I buy shit?
Jesper Fredriksson:What do I do? I think definitely start with experimenting. As we were saying in the previous topic, if don't experiment with it, you will not know what to buy and you, you will not be a good buyer, which I think.
Henrik Göthberg:I think that's, that's okay, so definitely, I love that. So don't start with a make or buy decision. Start with something smaller and start with with experimenting on something.
Vilhelm von Ehrenheim:Okay, so now we go to experimentation start with collecting data on what's available like and understand your problem and probably start collecting data on what good looks like so you can know and evaluate things. Yes, I don't think just hacking away using LLM calls is going to give you get you anywhere, unless you know what's good.
Henrik Göthberg:Who wants to drive this?
Vilhelm von Ehrenheim:because you're, you're the guy, please. I'm like.
Henrik Göthberg:I love what you said now.
Anders Arpteg:So so someone wants to build a design Just continue what Lukas said there you could use a single LLM to do most tasks, and I agree to one extent, which is if you have infinite money. So then you obviously go for the biggest model, do everything. But in some cases if you want to limit the costs a bit, going a bit to what Lovable is doing with their kind of agentic system to build up the code base, they start with a big model, to start with the 4.0, to do the planning. It's basically setting up. Okay, these are the steps I need to do to carry out the action that I've been assigned to do. But then they use a smaller model. I think they use the Sonnet 3.5. It's much smaller, much cheaper. They do the small incremental steps in changing the code. So in that sense perhaps there is a point in trying to find a set of different components. What do you mean?
Luka Crnkovic-Friis:Yeah, I mean absolutely. You have the smart model and then you give the cheap model or the big context model as tools to the smart model.
Martin Lundqvist:Yes, I fully agree. In building out this sort of virtual engineer, the big model is used for the reasoning, which is translating the question that the operator is asking to steps, and then we may, depending on the question, have an evaluator agent that says is this a reasonable plan in relation to the context of the types of information that is available? Once those agree that, yes, this is a reasonable plan, then yeah, the agents are more stupid Different temperature settings, smaller models because they have some very, very specific tasks.
Luka Crnkovic-Friis:But also what I've seen now with, like if you take one of the smarter, like that's really good, that tool like Sonnet 3.7. Like it can like if you tell it here's a function call with a much cheaper LLM and here's one with big context. You just give it that information and it can figure out when to use it.
Martin Lundqvist:So you don't have to predefine the agents yourself so that you don't have to predefine the agents yourself. I think the one thing that, as part of this sort of building up, maybe you have a good interest in what the other guys say. So when you have a system of different agents and tools, what I find very challenging sometimes is having some control over the information that is passed between the agents in such a way that if an agent three steps down the road is going to make an API call, what is the information that is actually going to be provided downstream? Because you don't want to have the full context all the time, right, you want to have selective context down the pipe. So how do we make sure that the right information is available to? The thing we know at the end of the day is a deterministic API call, but it has been structured by maybe two or three agents reasoning for a while.
Vilhelm von Ehrenheim:That affects pretty interesting yeah, we used our super complex structure for this like uh, uh some time back, but we found that we had a lot of problems with this kind of a.
Henrik Göthberg:We call it the whisper game you know, yeah, when you kind of keep whispering to the next exactly thing and then like after a while it's not doing the same thing it's supposed to do in the start.
Vilhelm von Ehrenheim:So I think, kind of I agree that you can structure it in a lot of different complicated ways, but I think at this stage, especially when models get better and you can get away with like a slightly smaller model and still get a really good performance, I think you gain a lot from not overcomplicating it, keeping it pretty simple, making sure that you give it really good and valuable context so that it can take good decisions. That's definitely.
Henrik Göthberg:But can we try to be a little bit more practical? I mean, like we can even use, we can use the QA tech example. But I was just preparing, you know a little bit like all right, so let's now build a single agentic systems, and I was sort of pre-preparing. You know what are some fundamental technical considerations. Like you know the agentic workflow. You know one big LLM. What about the planning and structured tool use? What about data handling and integration? What about feedback loops, memory, persistence mechanisms, stuff like this? What about UX? So you know, use, fundamentally, I'm going to go about, I'm going to do some cool shit in, you know Button and Follows, garnier, volvo, whatever and I'm going to build an agent and those are some fundamental dimensions. So how do we think about that? You started with some sort of kind of know what you're trying to solve as a problem.
Vilhelm von Ehrenheim:Yeah, I mean the problem itself is pretty well defined. Then again, like there's a lot of different, each customer has a new page and new problems and new things that they want to try.
Henrik Göthberg:so it's definitely quite complicated, but um, but if you look at the, I try to featurize the different parts of what you need to build. Yeah, I think you do that, or how would you do it?
Vilhelm von Ehrenheim:yeah, I mean I think, going back to the context that we talked about before, like I think it makes a lot of sense to put a lot of effort into condensing and making a high quality context for the model to take good decisions on, and then, of course, also working with and structuring the tools that the agent has to work with in a way so that it can take good decisions.
Henrik Göthberg:Could you elaborate?
Vilhelm von Ehrenheim:If you want the agent to be able to do different things, not just only calling one tool to use a calculator or something, then the more tools you add, the more complex the decision is obviously. So you can also be smart in that.
Vilhelm von Ehrenheim:So not having 100 different, over over complicated tools with poor descriptions on what they actually do, then the agent is not going to do good decisions. So you have to be very mindful about like which tools you provide and how they are kind of explained to the model so that it's kind of relates to the context that it has to work with in order to take the decisions.
Luka Crnkovic-Friis:But it comes down to that, like beyond, that if you have a smart model, you don't need to do much more because it can figure out the process. That's the promise.
Luka Crnkovic-Friis:Yeah, I would like anybody who is technically inclined to look at the source code of a cloud code when you say this super simple, elegant way. I mean, obviously they're masters at prompt crafting, so the tools and the sort of system prompts, but they're simple and from this sort of just a few basic tools, I think the core system is like seven tools or something like that.
Henrik Göthberg:Could you remember which ones they are?
Luka Crnkovic-Friis:Exemplifying core system with like seven tools or something like that what could you remember which sort of which ones they are uh exemplifying?
Martin Lundqvist:edit file.
Luka Crnkovic-Friis:Create new file run bash command uh create a new agent, or create us in the back, or a series of agents uh running in parallel, sort of those kind of uh very simplistic roles or whatever very simplistic, probably web search as well, right?
Jesper Fredriksson:uh, yeah, they've added web search as well, right.
Luka Crnkovic-Friis:Yeah, they've added web search and then you have MCP where you can plug in.
Henrik Göthberg:Oh, that's in its own chapter in this sub-section.
Luka Crnkovic-Friis:But just as an example, one of the party tricks that I use to them on that is I connect, I give it the Gmail tool for MCP to my private email and I tell it just create a family tree and a short bio for each person.
Anders Arpteg:And it does it.
Luka Crnkovic-Friis:Like it figures out my family, they maps it out and it comes up and this is like my old Peltarion email, which is my personal email now. So it contains like over a decade of crap data of everything you can imagine, and of course it has birthday congratulations and stuff like that and it manages to find it Impressive. And this is with just given the tools.
Jesper Fredriksson:I want to ask do you always find that it makes the right decision? I had a case when I was looking at doing analysis, so taking a product manager's request in this case it was sizing a loyalty card for customers, a loyalty card for customers and then, if you phrase it in one way, how many journeys should you take to be able to get, let's say, 100 sec or something? Or just say X and Y for the two different parameters? If I just say that, then it's going to take the cheap route and say, okay, there's a cost to each journey and we can get that back. We can gamble with 10% maybe, or something like that. But what?
Jesper Fredriksson:I find that the product manager is typically after some kind of customer lifetime value reasoning and I find that I need to prompt it quite a lot to get it to think in that way. And I think there's something around coding which is there's a lot of code to be able to train on. So I find that every time with coding I see that it works. But with other examples I don't find as much success. It's still like it's hard to understand the role of the data analyst, the role of a product manager and how they think.
Luka Crnkovic-Friis:For reliability. No, you get something different each run and not always the correct things, but that is also like looking at some other models that I have access to that are a generation later, and that improves marketly.
Anders Arpteg:He's 55, so good really.
Henrik Göthberg:That was not something you didn't make him slip there.
Vilhelm von Ehrenheim:But I think that's the promise with smarter models in general. When you think about prompt engineering in general, it's constructing these instructions to the model so that it understands well enough. With smaller and earlier and more stupid models, you need to be putting a lot of effort and time into making sure that you put the kind of commas in the right place in the text and stuff right, whereas that kind of goes away more and more the smarter the models become and eventually you probably don't have to be as specific because it has a good, good enough kind of world understanding and things in order to figure it out, even if it's like funky descriptions in the system, prompts and stuff.
Luka Crnkovic-Friis:And obviously I mean you get that sort of amplification. It's one thing when you're giving one statement to one LLM, like input, output. It's another thing when you sort of have a starting point, you give a prompt and then it uses the prompt to do something else.
Martin Lundqvist:Do something else do something else it gets amplified essentially and that's why I think, going back to going back to what we said before, you want to start with a pretty well-defined problem, so you know what good looks like, because then you can start massaging and putting the right prompts and engineering and the way you manage context downstream so that good enough, actually solves the problem that you're trying to do. So I think experiments should be sort of spiky, if that makes sense in terms of learning how to do this, and then, once you figure out what works for you, for your context, you can think about scaling it out.
Henrik Göthberg:But for me now, who doesn't understand shit of what we're talking about in reality, what is the environment? If I'm going to set this up and start trying to build genetic systems, where are we working? Are we working in some sort of framework based on langchain, or there are different approaches, or do you get started wherever? What do you need as your developer environment here?
Luka Crnkovic-Friis:This is where it gets complicated, because the models are faster than the application layer. What I'm using day to day is an internal equivalent of Cloud Code, which is like not at all done for the person. It's a coding thing, but since I can add MCPs functions to it and so on, I can get it into.
Henrik Göthberg:So you're working in a very hardcore coding editor.
Luka Crnkovic-Friis:It's like a command line.
Henrik Göthberg:A command line, you're working on a command line as a super expert.
Luka Crnkovic-Friis:It's not a command line. It runs in the terminal. It's a user interface in the terminal.
Henrik Göthberg:But for us normal, human, deadly ones, where do you go? What would you recommend? Sconia wants to get started with this now and they have a chat gpt enterprise license and now we're going to build a system in agents are we talking about doing it without any code at all?
Vilhelm von Ehrenheim:or are you talking about like I don't know, because the terminal was too too, too scary?
Henrik Göthberg:so the terminal was, you know? Okay, no, we can be. Let's, let's, let's do the spectrum.
Vilhelm von Ehrenheim:I think you should be a little bit familiar with code in order to play with these things. There's definitely tools coming and more and more tools coming out where you can kind of do interesting things, and I think if you just want to try adding a few MCP tools to a chat, you could use something like Cloud Desktop, which is great All right, so Cloud Desktop is a good start as an example, I think my gateway drug was actually OpenAI's assistance platform.
Henrik Göthberg:Yeah, the assistance platform is a gateway drug right.
Martin Lundqvist:For me it was at least right Now. I happen to be not a good coder at all, but I can know my way around coding so I can take my way, approaching the terminal solution at least. But no, I think assistance, assistance. And I've coached friends who are absolutely non-technical to build things like job searcher agents using the assistance.
Jesper Fredriksson:Assistance API is important.
Martin Lundqvist:Yes, yes, and what about you?
Jesper Fredriksson:Jesper.
Henrik Göthberg:Where did?
Jesper Fredriksson:you get started. I just got to Python. You got Python hardcore.
Luka Crnkovic-Friis:Yeah, there is like at King. If we're looking at the, by far the most popular is custom GPTs.
Jesper Fredriksson:Yeah.
Luka Crnkovic-Friis:Yeah, and that's also non-developers. I think we have over 2000 of them.
Henrik Göthberg:So custom GPTs out of the enterprise.
Luka Crnkovic-Friis:Yeah, exactly, and that's kind of I mean, it is the assistance API, just more of a user friendly, and you can't call it from an API.
Henrik Göthberg:Then adding to that a little bit now on the environment, we now mentioned MCP as an acronym that I think is quite important that we stop at and dissect a little bit. So in my opinion, we're talking about frameworks, and what are the standardization and frameworks that you really should pay attention to? That might be something that when you want to build gigantic systems, you kind of need to follow standards or you want to align with standards like MCP or those frameworks. So MCP, by anthropic, what is it? What are we talking about for the layman?
Vilhelm von Ehrenheim:I think the simplest way to think about it is, if you have some kind of AI system, then MCP enables you to kind of bring your own toolbox.
Henrik Göthberg:Okay, and what does MCP stand for? Model Context Protocol? Yeah, someone tried to explain it to me like oh you know, think about it as Bluetooth. You know, you want to connect to all these different tools for APIs. You want to have a standardization, interoperability approach.
Vilhelm von Ehrenheim:Yeah, but first of all, like it's still possible to do these things by just normal tool calling. Yeah, like, so you could. So for a lot of different kinds of applications, maybe MCP is not like the right tool for you. It's more that it's easy for your users to bring in their own tools. So Luca's example, with him connecting his Gmail to some other kind of chat system, is a great example, because there isn't really a pre-made tool in that chat system to read your Gmail, right. But then you can easily kind of structure a small tool and you just plug it in.
Henrik Göthberg:So how would we summarize the brilliance or importance of MCP the?
Luka Crnkovic-Friis:importance is like it's a standard.
Henrik Göthberg:It's a standard.
Luka Crnkovic-Friis:That's the big thing. And then I think also where it is important is that so really to use it you have to shift your mind from what an API is versus where you have the traditional computer connection, versus when you have computer LLM like model context protocol. So essentially the primary purpose of it is to provide context to an LLM and the thinking there is a bit different.
Henrik Göthberg:I think this is brilliant because we are talking about the protocol and we're talking about context protocol, and what does it allow you to do now, when you bring in?
Luka Crnkovic-Friis:context. That is not as easy to do in the normal API. So I can, if you take the Gmail example, do it in the normal api. So I'm making you know, like the g. If you take the gmail example, like the, the google gmail api is like horribly suited for um, just converting to mcp when you do it, like right, yeah, and in this case it just fills up your entire context, like immediately when you do a query for how many, how many emails and things like that. And it provides things that are relevant if you're trying to parse the document through classical means, but they're not relevant for an LLM. So there is a slight switch there and it doesn't just provide that interface, it also can provide resources so you can have files or something that you can bring, bring to the lm. And the third one is that it can provide prompts, meaning that it can give you, give guidance, like you should be doing, like you should use me, like this, and it's dynamic, so dynamic and self-discovery.
Henrik Göthberg:I think that is quite an important feature in this, that it's dynamic self-discovery, because it allows us to then agentically, haha, build our own systems and then all of a sudden put that in the marketplace and then dynamically, new interactions happen. Right, it's not?
Luka Crnkovic-Friis:And it's being sort of, in a way, it's like it's more of thinking like what would you provide to a human that needs to solve this task?
Henrik Göthberg:Yeah, okay, and then, and then Martin brought in the topic, you know what about the data angle? So we starting maybe that as a segue, you know what do we need to think differently in order to now, if we want to have this all and we want to have context and all this building again tech systems kind of maybe put some new strains on how we need to think about data. Or you that now want to build a good industrial system, I can imagine the data topic is quite important.
Martin Lundqvist:Yeah, I mean. I mean I, of course. I mean being, course, being just generally very interested in technology. I started experimenting with GPT from early on, when it became publicly available to the layman, am in where 99% of the data is time series, sensor values, vibration, accelerations, pressures, et cetera. But it was really once the notion of allowing the LLM to do what it's good at, which is creating more text. Ie reasoning, for example, that I understood that maybe we can ask the LLM to collect data for us and then we can do something more deterministic with that. Hence the idea Combining the best words. So then the question is how do I inform this master LLM on what data is available and what I should or could do with it? And here this is all about a different type of context, but it's also context. So how do I inform an LLM on what does the LLM need to know in order to create a plan that resolves a goal that I have in terms of pumps, compressors, time series, data maintenance manuals, inspection logs?
Henrik Göthberg:And give us an example of a goal or a plan that you need in this context.
Martin Lundqvist:Let's take something very simple. Let's take something very simple, which is one of our customers is a maritime company. They have vessels that are transporting cargo at sea, and so maybe I just want something very simple how many vessels do I have? Simple as that? Well, you might think so. So what does that query look like? What context should I have? Simple as that? Well, you might think so. So what does that query look like? What context should I give? So imagine now that you have tons of time series data sitting in a huge snowflake implementation, for example. What is the query I'm going to give to that in order to get the answer 25 back. You have a schema, right?
Jesper Fredriksson:Sorry, you have a schema for it. I hope.
Martin Lundqvist:Sorry, you have a schema for it? I hope, yeah. So I mean, in our case the answer was already there, because we've In a dashboard. We just asked someone there and they had the answer no. But I mean, in our case we decided for model orchestration purposes that we need to have a graph that sort of allows us to understand if I want this value out here, presented to the UI, that value needs to go through these computational steps, from this source data. So we already thought about using graphs to orchestrate that. So now we can enrich the graph with more information in such a way that an LLM can make sense of it, which basically means English, short and crisp sentences in English at the notes. So yeah, without putting context to the data which is LLM readable, llm perceivable, if you will, we can't go anywhere really, because it's just going to be a massive JSON file being loaded up into the context window.
Vilhelm von Ehrenheim:But this is also where now the next level, which is meta, is that the LLM can write code that can read that unstructured and pick out sort of I'm a little bit surprised that there hasn't been more kind of evolution in multimodal LLMs where the modalities are not just kind of image and text, where it could potentially be text and time series right which is like super interesting or times and graph there are some research in these areas but it's very kind of slow and haven't gone that far yet.
Vilhelm von Ehrenheim:But I think, especially for your use case, it sounds like a perfect fit.
Martin Lundqvist:Yeah, I managed to, not I didn't manage to do anything. It happened to be that one of the data scientists on my team stumbled upon GBT time, I think it was called, and it's actually reasonably so. It takes in time series and generates more time series.
Vilhelm von Ehrenheim:Nice Hallucinated time series. That's the best.
Martin Lundqvist:Oh, this is a data scientist, no but it actually turns out to be a not entirely useless forecasting thing to do. But no, I agree, time series is. But yeah, I mean, for us the answer is, at the right moment, have a Python script be spun up in the background that makes sense of the payload. The payload is going to be very different depending on what we're asking it to do, but that is the answer, absolutely, absolutely true.
Henrik Göthberg:Now we're almost segwaying with the data topic into this to making sense in larger context. You know the the further complications of building multi-agent platforms or all of a sudden now I mean like I it's almost like I've been predicting or joking careful now so we don't end up in the next excel hell again to kill you know when we are spinning up agents on our chat gTs or private GPTs and we end up in the same situation as we've done with Excel over the last 20 years.
Vilhelm von Ehrenheim:Excel is pretty successful, though, just to be clear.
Martin Lundqvist:We have to give the devil his due.
Henrik Göthberg:No, it's not successful. I cannot do that.
Luka Crnkovic-Friis:I must say that Excel I mean, it's like Excel.
Henrik Göthberg:But you kind of know what I mean from a data perspective and trying to have persistence and understanding and reuse and not having everything in fragment. Excel is beautiful as long as it can stay single point. It's not so beautiful when you need to have an enterprise scale definition.
Vilhelm von Ehrenheim:It's the wrong tool for the job.
Henrik Göthberg:It's the wrong tool for the job, right. So as long as you use Excel for what it's meant to do, and even on desktop, beautiful. But when Excel becomes your enterprise platform, I kind of think it's scary.
Luka Crnkovic-Friis:But when you do that mess and get this hell of this everybody's running their own little thing and so on then you just get another agentic system with a smarter model to sort it out.
Henrik Göthberg:That's your answer, right? You just wait for the next generation and then load the whole shit in there. Just go meta, go meta on the problem. Anders, what do you think?
Anders Arpteg:I think I'd still like to see more, understand more how you work with time series data. You basically said, okay, we have a model that can generate more time series data, but we have a context window problem. How do you work with time series data more concretely?
Martin Lundqvist:So you mean in a genetic context, or you're broadly speaking? Yeah, the genetic context.
Anders Arpteg:And then you know figuring out the number of vessels on some kind of cargo ship or whatever.
Martin Lundqvist:Yeah, on some kind of cargo ship or whatever, yeah, so our background of course, is kind of using telemetric data to computationally transform and create insights, right. So if it's anomaly detection, clustering and all the kind of usual stuff we all love, now in the agentic world, we would like to be able to ask a question like what was the average speed on the vessel Alba last week? Now, question number one is how do you define average?
Anders Arpteg:But just to get me to the right, I mean for one. You said you know we can use LLMs to generate a query that perhaps generates the exact answer directly.
Henrik Göthberg:And that's very much also, you know what.
Anders Arpteg:Henrik has been looking into, etc. But I may have misunderstood you, but in some sense.
Martin Lundqvist:You of course need to have English to the.
Anders Arpteg:LLM in some sense, but did you at some point or in some times actually give time series to LLMs, if you see what I mean?
Martin Lundqvist:Well, if we experimented with it, it's not the path I recommend at this point. No, I mean the way we work with it is we want to make sure that we can have an agent reason about time series data. That's the ask that we're working on right now.
Anders Arpteg:But how do you do that unless you just prepare like aggregated data to the LLM?
Martin Lundqvist:Well, that's right. So the first reasoning on execute orchestrator agent, one of the steps is going to be collect this data. For me, I need to collect the time series data for compressors of type A. Now we have, then, a specialized agent, because we work with templates in a knowledge graph, so every compressor is part of a template, which means that it will collect the templates and say, okay, here are all the compressors.
Anders Arpteg:And when I say compressors, is that the SQL query that you?
Martin Lundqvist:Well, yeah, so the SQL query will be collect all templates of type compressor. It's somewhat simplified, right? So compressor is a piece of machinery, so pump. I can say Collect all templates of type pump. It gives all the templates back and in that sense it will tell us that these pumps have the following four sensors as a standard. That is a required relationship in the way we built the knowledge graph, so it will come back and now I know which sensors I need to find IDs for in order to query the data. And then it queries the data, loads up the data and, if the whisper game doesn't play tricks on us, the actual time series data will be provided to a code executor or a metric extractor.
Anders Arpteg:I think the knowledge graph is basically the schema in some sense.
Martin Lundqvist:That's correct. You can call it a schema.
Jesper Fredriksson:That was my confusion as well.
Luka Crnkovic-Friis:What about forecasts? Do you do that? Transformers tend to be quite bad at time series?
Martin Lundqvist:Honestly, it's a little bit of a Currently. It's a bit of a marketing gag, in the sense that me saying this on a podcast might be kind of productive. But it is true that when we work with customers, they want to understand how fast can I get productive with your system? So what libraries of existing things do you have that I can drag and drop right into this? And the good thing with GPT forecasters in times is that you can just drag and drop them. They're going to do something. But then, of course, the caveat is whether this is useful or not. I really don't know. It depends on what the data says.
Vilhelm von Ehrenheim:But you could potentially provide other forecasting algorithms as tools to the agents. Right, so it could be like oh, but forecast this compressor's data that I have here Now we send that to this other forecasting tool.
Martin Lundqvist:Absolutely correct. So I mean drift analysis or cluster-based anomaly detection. That's a standardized thing. We have built that in Python. We have it since way back. That is now a compute. We have a compute template. We instantiate that point to the different equipment that should have access to it and the agent knows about it. So it knows that if I ask about anomalies across something, it knows the templates to collect for the sensors, it knows the computes that it can run and it can execute those In theory. It's not like we're there yet, but it should work.
Anders Arpteg:I think Jesper is actually an authority here. I think in these kinds of questions.
Jesper Fredriksson:I was just curious when you started to talk about forecasting, did you try to make LLMs come up with newer? Like improving on forecasting, like starting maybe with a naive forecast and then adding more features? Do you work with that or anything?
Martin Lundqvist:No, it has to be the correct answer to that. I can tell you about two experiments, though. One experiment, which is fairly non-productive at this stage, is actually feed-in time series data. I think that it's going to do something useful with it. Of course, it's predicting text, so, unless it's spinning up something else in the background, it's not going to be useful. Second thing, though, is that make it suggest forecasting methods Exactly, then even spin that code up. Yeah, that is more useful.
Jesper Fredriksson:I think yeah.
Martin Lundqvist:We haven't tried it at scale, though.
Jesper Fredriksson:No, no, I was experimenting with that myself because we have a prediction use case at the company and then I try to get it to start by having something simple and then adding features and then seeing where do you get better predictions.
Anders Arpteg:You said prediction, you mean forecasting, right yeah?
Martin Lundqvist:forecasting. My dream is to be able to say that I'd like to know whether or not I should worry about this part of my factory tomorrow, and the big model will kind of understand these are the types of equipments. It will go deep into its language, wikipedia articles and the rest of it and say, well, typically, the reason you would worry are these five, and then it will start spinning up actual algorithms in the background, collecting the data, running it and then coming back. I think there's nothing in theory preventing that. But yeah, I mean tooling wise. We're not there yet, of course.
Anders Arpteg:One interesting use case. I would like to add before I say the time is flying away.
Henrik Göthberg:We're going to talk about scaling, and are we getting there?
Anders Arpteg:Maybe Just one use case I think is interesting that's worked with time series data. That I think is impressive is if you go to Google Cloud and watch the security kind of information overviews that they have, they can basically summarize in text saying these are the events we have seen in the last couple of weeks. You have this number of attacks and you have this kind of weird things happening and they have so much more data here compared to before or more traffic here. I think that gives a really really nice overview. But in some way they've been able to do that analysis and I don't know how they do it, but in some way they've been able to do that analysis and I don't know how they do it.
Martin Lundqvist:Do you want to know how we do that? Yes, you might think it's cheating. So our models run on the schedule. Whenever it runs, it produces an output. Right, it could be a number or something. Now every model has this schema connected to it that tells it what it does. So the LLM will know what does the model do and what was the result, and they can reason about the rest, you could call it cheating if you want to but that's one way to actually make that work.
Anders Arpteg:So basically, you have the same models running all the time. You produce the same data every time, but then the summarization of it and the highlights is done by the LLM. I'm a normally predictor.
Martin Lundqvist:One means something zero means something else. You give that to the LLM and the highlights is done by the other. I'm a normally predictor. One means something, zero means something else. You give that to the LLM and the number and it can actually do something pretty good with it.
Henrik Göthberg:This is cool stuff. I want to go back a little bit to what are the added dimensions when we go scaling up in different ways. Let me because that was like the topic that we are in one way talking about the complexities and stuff like that, but really the trajectory of scaling up and hanging multiple agents in different ways. One argument has been around you know how do we start getting governance or metadata management or agent marketplaces in place? The same kind of logic that we have when you have distributed data. You talk about data. You know how will the mesh of agents look like and work like. What are we thinking there? Where do we need to go here? I think it's very similar to the data product idea for governance, computational governance that we need that for agents. So that's my sort of intro into scaling up with, you know, large scale many agents of intro into scaling up with large scale many agents.
Jesper Fredriksson:Did anybody look into agent to agent, the thing that was released from Google yesterday? I think the initial reports I've seen seem to indicate that it's useful I'm guessing that that's part of the answer to better be able to say I mean this whisper game, for example, when things spiral out of hand because you have agents talking to agents and they're just like getting things back and forth and not going anywhere. I guess there will be a way to audit those things and find out where are they going astray. But I don't know, I haven't seen how they.
Vilhelm von Ehrenheim:I think it is very similar to thinking about, like with this new reasoning, models or new they're're not new anymore but where you have, like, where you have a model, produce more texts and things like, if you can expect, inspect that and work with that kind of, then you can understand the reasoning behind the decision. For example, it's the same when you have, like, multi-agent systems. Instead, you can understand better like the communication and what kind of drives those things and and why it happens. Otherwise, I would say in general, multi-agent has been a little bit too early because it tends to be just escalating.
Vilhelm von Ehrenheim:First of all, we have problems with multi-step positions with the same kind of agent where it's doing the same thing, whereas the Whisper game goes quadratic when you start doing multi-agent.
Henrik Göthberg:Because the connection here is a little bit. I'm thinking I'm ahead of the curve now I'm just trying to anticipate where this is going. And if you go back to the opportunity with MCP to work dynamically, you could foresee that you start building enterprise agents in different pockets of an enterprise and then ultimately you put that in a marketplace and ultimately you put a clear container or wrapper so it becomes clearly productized in terms of having the right metadata and the right MCP infrastructure and then you know over time you can do more fun things by combining two existing agents. Is that happening or not?
Luka Crnkovic-Friis:I think one of the key things that we're seeing and I don't think I can say anything that I can't, because Anthropic is doing it as well is essentially reduce the complexity.
Anders Arpteg:Like there's no reason why to have depth in agents when you.
Vilhelm von Ehrenheim:No, when you don't need to right, Then it's just easier to kind of. It's the same with the prompts like reduce the number of tokens that it needs to reason about. It makes it easier, but I think there are cases where it does make sense. You took a really good example before where you mentioned Cloud Code right when you can't spin off smaller, isolated tasks. It's just one yeah but it's just one right, Exactly.
Luka Crnkovic-Friis:So you don't get the sort of whisper game in that.
Vilhelm von Ehrenheim:No, exactly, but then it also makes sense. Then it's a clear use case why you would like to do that? Because it kind of makes it easier. For its reason it will be quicker it will, you can run it in parallel like uh, but I think those, but this is super important.
Henrik Göthberg:Look, and I kind of agree with you, there is a fundamental, profound insight here reduce complexity. What does that mean? Practically what you're saying when this is scaling, that we want to use it more, and more and more what are you saying?
Luka Crnkovic-Friis:one of the key things reduce depth. You don't have to look at LLMs, look at the human organization. Yes, I just see how much misunderstanding and complexity you have by adding each level in the organization hierarchy.
Martin Lundqvist:Yeah, and I think in an organization and now we're maybe becoming philosophical too quickly here but we often substitute clear sense of purpose with processes and templates. I think in an organization and now we're maybe becoming philosophical too quickly here but we often substitute clear sense of purpose with processes and templates. I think something similar might be true for these agent-to-agent. If you have clear purpose, which might be a super crisp system, prompt I don't know what it might be but then maybe we can get away from what would otherwise be the inclination, I think, which is let's go for structured outputs, let's try and deterministically wire all these things together, and then I think we're going to have our new version of an Excel hell.
Jesper Fredriksson:But, this is, to me, the absolute perfect segue to do. You want to say something on this topic? I was just going to ask if it's so that we will have agent-to-agent communication, multi-agent systems, just because there's a few billion people in the world and everybody will have their own agent, and then we will need to communicate through agents to get things done, and then you can't reduce that complexity anymore, I guess.
Luka Crnkovic-Friis:It's a sort of level of depth that you have to go to and sort of pass a message through and hope for that it will be coherent. There's been a study I think it was Harvard or from Harvard or MIT, where they compared the dysfunctions of agentic systems to actual business organizations.
Martin Lundqvist:Yes.
Henrik Göthberg:Exactly the same pattern Functional stupidity, unaccountability machine. There are many different thesis and research on this.
Vilhelm von Ehrenheim:I think it's also super interesting when you think about it from like a trust perspective, because, like when you have a new employee, or especially when you're kind of becoming a manager for the first time, like you struggle with kind of giving out like tasks to new employees. You can't really trust them. You do micromanagement, like there's a lot of these things that just tend to happen generally, and I think you will see the same thing with agents. Like you, kind of, before you trust them, you need to kind of make sure that they really do what you want them to do, and you have to tell them every step that they actually do what they should do.
Henrik Göthberg:You know what the good news is? They're not going to complain about it. No, that's nice, but but okay, so let's really move into topic five and and so, because it's the complexity of building multi-agent systems. We can talk about it technically, but it very fast becomes a deeper topic and lucas already been onto it that we need to maybe fundamentally reorganize stuff. And then martin is saying from his angle is it really more about clarification of purposes and agency than to add more process to fix this? And now we come into sort of you know my favorite part of you know, talking about agency in relation to human alignment. And you know, purpose of teams versus again tick, teams versus again tick, I don't know which. So I this is about the challenges and risks, and I really, really want to start dissecting what you said. We kind of need to have a new organizational template around this. What, what did you mean with that?
Luka Crnkovic-Friis:so to the. If I start from the, since you said so well, put it with the risks and dangers and stuff like that. So I'll start from the sort of negative end of the spectrum. It was a few months ago when I tried the first internal version of OpenAS Deep Research. I asked it to go and build a report on the development on match-free mobile casual games since post-COVID and do it from a king perspective, and it came up with this 30-page fantastic report and I was like, okay, what happens when we start applying this to our internal data, slack messages, the emails, everything. You get into a situation very quickly where humans are the bottleneck.
Luka Crnkovic-Friis:So it has processed all of the information and so, like, make a decision. Like, you have all of the information you need to make a decision, and what kind of work environment does that create? And what happens when we don't get the time to think through? Our decision and have a process of like, like, like we do today when we yeah, so that was like okay exactly the same experience.
Martin Lundqvist:Wow, um, all of a sudden you are um and coming back to, I mean my background in consulting. I mean the type of the quality report you can get out with some prompting from an o1, deep research, for example. Consultants a few years ago would happily charge millions for it. It would take weeks and cost millions and now people are complaining that 200 bucks a month is expensive. Really.
Henrik Göthberg:Really, have you used McKinsey lately? I didn't say their word.
Martin Lundqvist:I did, but I also find that we are now, in terms of insight, way ahead of our ability to consume it and actually take actions on it already. So, yeah, I had the same experience as you there.
Henrik Göthberg:And my angle into this and sorry for ranting is I've been trying to wrap my head around what happens when we technically can build optimizations and decisions like you're discussing now and you can do that kind of end-to-end quite fast, but the decisional mandates how the teams have constructed this across an enterprise is several decisions. So in essence, no one can own the gigantic workflow you have built. I find that tricky when you have a misalignment between the human managing, decision making and how fast you can build a quite large decision recommendation with these models.
Vilhelm von Ehrenheim:This is where multi-agent systems like Jesper kind of mentioned comes in right, like then, when if you have a very small part of the process then you can automate that small part and then it hands it over to your agent where you take over for the next part of the process.
Henrik Göthberg:Process, at least initially and I was going to go even further. Okay, provocatively, can you make agent that spans across, uh, organizational team mandates, or are you, or can you only build safe agents that fits with the mandate of the organization and team? I mean?
Henrik Göthberg:technically, organizationally no, if you want to build something that actually works in terms of decision-making processes, like, fundamentally, okay, someone can recommend something, but to take action and making it happen, what's the use of building an agent that is broader than the alignment of the decision-making of the organization? Will that work?
Luka Crnkovic-Friis:It will be a hilarious experiment.
Henrik Göthberg:I think it's the galore of unintended consequences. You see what I mean. So all of a sudden we might end up in a very multi-agent system, simply because we have to do a domain-oriented framing of the organizational's mandates and we will start building. The logical thing is to start building agents within the mandate of the team or the function, because then we can control it and then you very soon end up in a mesh yeah, but to be honest, you have that problem with humans as well yeah, like yeah, and it's all about alignment right like making sure that everybody stands behind the vision and mission of your company and that you kind of know what you're.
Henrik Göthberg:Yeah, exactly this is the point. Can just because the techniques can go further. Why would we be ever able to go further in our decision making than the system?
Martin Lundqvist:But can I suggest a different framing of that question or complementary framing, which is I've seen a few examples of using and this could be any software, but it happens to be agentics, since that's what we're talking about today that allows a team that sits in different silos jointly come to decisions faster. Of course, they're still a human being owning the problem, but that person is just in a different part of the organization. So I guess the question that I don't know if there is an answer to now is will we ever see agents performing tasks that are somehow not aligned with a single human responsibility?
Martin Lundqvist:or a mandate of a domain or a function does that end up with a board every time, and then we have a problem so when?
Henrik Göthberg:so then, when you have the board point of view, who decides on what it should do and how it should work and how much cost it is and who is accountable when it's you can take it to a smaller scale and and if you just look, at development or coding, where that's happening, where you can go from having somebody who has narrow domain expertise within one area can suddenly expand and do other things, Like in the trivial example, the vibe coding community.
Luka Crnkovic-Friis:They're doing design coding all the bits. They're doing design coding like all the bits. So that's, and I think we'll have to do this step by step. Yes, Because there's also another thing. We haven't discussed this, but I think it's very highly relevant in the practical application of the agents, and it's that these modes have very different capabilities and I hesitate to say personalities, but how well aligned they are. And like the example of the substituting the frog for another aquatic, like that's so clothed as that one will never do it, like it has no sense of humor or anything.
Vilhelm von Ehrenheim:Oh, that's really interesting. Actually, we had this case, like the other week, where we do have different models, kind of looking at the evaluation of the tests and they kind of within the same agent, right. And then the task was to fill in a form and it should use a funny dog name at some point, and so it kind of generated that it's great at generating things. It generated Duke sniffles, which I think is pretty funny some point, and so it kind of generated that it's great at generating things. Um, it generated duke sniffles, which I think is pretty funny, but the evaluation agent didn't find it funny so it kind of failed the entire time it was a very human problem yeah
Martin Lundqvist:but on the, on the, on the topic of personality, this is actually uh, very, uh, in our case, very real right now. So again, I go back to some very simple example. Let's say that you ask what was the average speed of my vessel last week? So what type of person would you like to interact with? Now? Someone that just makes up some assumptions? So last week, that's probably midnight minus exactly seven days to that midnight. And then average let's take average for every day. This is time series, right? And then average those. Take average for every day.
Martin Lundqvist:This is time series right and then average those right power property right, sorry, power property an average on average yeah, right, or, or, or you could have that agent, or you could have the agent that says certainly I will help you with that, but first I need all this information from you. How many days exactly is it for me? Not which time zone, and I have? I have missing values for some of these things. What would you like me to do? Would I impute average linear?
Henrik Göthberg:Exhausting.
Martin Lundqvist:No, but this is actually a personality question, because if I'm in a hurry I just want to have the average number, it doesn't really matter. Then I want to have one type of answer. But if I'm not in a hurry and I'm more inclined to the details, then maybe the personality agent should be more inclined to actually ask me those details.
Jesper Fredriksson:I think this is actually something that happens now. This is not philosophical. This is now Similar to the loyalty card that I was explaining, where there's not just one correct answer, but having somebody that can also know what do I want right now?
Anders Arpteg:That's really tricky. We actually had the pleasure of having Magnus Gille here a week ago and he is the Swedish prompting champion. One of the more interesting, I think, insights that he gave is really to ask in the prompt. You know, please ask me questions if you need any clarifications, and it actually does it, and that was really useful for him and I think it resonates very well with what you just said, right, yeah?
Jesper Fredriksson:But you want it to be automatic, like sometimes. I want to do it.
Anders Arpteg:You can simply say I don't care, Just give me a number. That could be the answer.
Martin Lundqvist:And I think there's a segue, 42.
Jesper Fredriksson:Yeah, let's go.
Martin Lundqvist:I think there's a segue to UX as well, because Everything I like.
Luka Crnkovic-Friis:I was just waiting for it.
Martin Lundqvist:Yeah, good. So I think UX is going to be super important as we roll these things out, because I mean, in this example, maybe you actually want to give a bit of a temperature setting, which is not the actual temperature, but it's more like a human temperature. Which is what? Because at some point maybe you want the thing to hallucinate I'm serious about this or maybe you want to have it go crazy, but at some point just give me the answer. Goddammit right.
Luka Crnkovic-Friis:But also just the concept, now that we are interacting with these just by chatting, by text, which may not at all be the Like. These are reasoning engines. There's really no reason why we should only limit ourselves to writing in text.
Henrik Göthberg:But let's go broader now, because we said, and we can conclude and we can go down a deep rabbit hole, but stop here. Organizational context matter, decision-making context matter. Stop there. Now we have a big, big other pink elephant, which is UX, or interactivity mode. That is challenging or a huge opportunity. That is challenging or is a huge opportunity. So you now start with text. Okay, more than text, but I think it's so much bigger than I mean, like even her lovable Anton, we talked about this at Epicenter, one of their AI Break it AI Day. You know what is the next big thing in LLMs? And again, it's going to be the UX thingy. So can we elaborate? I think this is huge, so that I just want to, when you think about it, when you think about it, it's already super important in general.
Vilhelm von Ehrenheim:Right, like when you uh, not everything is just like a plain chat.
Vilhelm von Ehrenheim:There's so much more that you expect from from these kinds of interactions today.
Vilhelm von Ehrenheim:And also, like, in a lot of times, it's extremely cumbersome to just have a blank text box and figure out how you should phrase it and what you should ask, and write that little essay before you actually get something out. But I think we're getting somewhere where we see more and more systems that act not only in text but then simultaneously in the UIs. So, for example, when you're coding in cursor, right, like you see, you get like suggestions and you see different things, and it's an interactive process and I think that that there is so much promise there in most applications. Right, you could use it in so many different ways you can. If, especially when it becomes more agentic as well, right, like, how many different things should it do for you? Should it guide you, should you show what it's doing? Like uh and uh, and should it still be possible to just kind of click around and get that information and then continue with the agent? Like there's, there's so many different interesting ux angles to these problems.
Jesper Fredriksson:Yeah, and there's this problem like do you want to have the answer fast or do you want to have something that keeps on going and you can come back tomorrow maybe? Maybe that's good enough, but sometimes you want to be part of it. There's a lot to that. My experience with similar things as you were talking about was it's always better to be in the loop as long as it doesn't become the bottleneck, because sometimes you just want stuff to get done. But if you're in that discovery process, it's so much easier to just have UX that is functioning. Give me suggestions, give me things to think about, and then you can simplify the UI so you maybe just click one answer which one do I like better? I like this, not that and then you bring some stuff together.
Henrik Göthberg:But is there a way now to connect this whole organizational topic and what we said here and talk about using terminology like sociotechnics, that you can never really separate the machine from the workflow, the process or the human? And ultimately now, if I use the framing, okay, we're going to use agentic systems in different ways within what I would call a decision data workflow. So here now we have the organizational boundaries of the decision as such, understanding that, and then ultimately, how is this LLM now symbiotic or seamless in the workflow for the human? And that will decide what type of UX is most relevant, because it's about productivity boosting and augmenting the workflow, whatever that is, and ultimately we might automate away something. It simply means we went up one abstraction level for the human and now we need to understand UX in this context. So I think these are sort of we're going down different angles of the fundamental decision data workflow, organizational boundaries on one hand, and how we can be as seamless or symbiotic as possible. I don't know, that's a hypothesis. I've been trying.
Luka Crnkovic-Friis:It depends very much on the domain. I think sort of like verifiable domains are, like one thing, more open-ended things like running a company.
Henrik Göthberg:Ultimately, the right UX depends on the domain and how it's. You know, if you, as an example, if you're in business control, you need to match numbers exactly. Then you're in marketing and you can do trend analysis on market trends.
Anders Arpteg:But perhaps we should just also go back to the topic here a bit and speaking about risks as well, and then partly perhaps connected to UX and how can you potentially trust an LLM? I think it was an amazing paper from Anthropic recently that spoke a bit about it know can yeah, it's called the tracing thoughts of the lens or something like that, right, it's super interesting. Sorry, the rabbit paper.
Luka Crnkovic-Friis:Yeah, the poem.
Jesper Fredriksson:Yeah, exactly.
Anders Arpteg:I think you know one of the. You can give your thoughts about that, but I think you know the risks. Is here that we trust you know what it's saying to us but in reality it does something else underneath.
Luka Crnkovic-Friis:There was a paper yesterday again from Anthropic exactly saying that the chain of thoughts that we're seeing in reasoning models don't really correlate to the actual decisions.
Anders Arpteg:Which makes sense. Which makes sense, yeah, yes, but still is concerning right.
Luka Crnkovic-Friis:Yes, it is. It gives a false sense of trust.
Anders Arpteg:Yes, and you do believe that you can trust what it's saying to you, but in reality it does come from something completely else.
Martin Lundqvist:Which is also eerily like we humans do. It is Exactly. It's a very good point.
Anders Arpteg:Not surprisingly. I guess we do post-hoc kind of explanation to how we came up with an answer, and yes, we give some kind of. I think it's so interesting. But they had questions like some kind of context and four answers and the task is to pick which of these four answers is correct. And then, if it didn't provide any hints to it, then it picked the correct answer, good. And then, if it didn't provide any hints to it, then it picked the correct answer, good.
Anders Arpteg:Then it added a hint in one of, let's say that, from answer A to B, a, b, c, d, d was correct. Then it added a small hint in alternative C, saying this is actually correct. And then the model said I believe answer C is correct. And then you asked why. And it came up afterwards with a chain of thought kind of reasoning, without saying it actually looked at the hint. It didn't say anything about that, it just came up with another reasoning trying to motivate why C could be correct. That had nothing to do with how it actually did come up with it which is super fascinating.
Luka Crnkovic-Friis:Alignment faking papers also show that they completely deny any type of thing To me, the sort of most. I think many of those things have human analogies. But what worried me was it was also a relatively recent paper that showed that if you show an LLM, some insecure code, you don't even prompt it, like it switches like it's more a framework, some insecure code. You don't even prompt it like, oh, do this, it switches like it's more a framework and you follow up like was Hitler good yeah?
Luka Crnkovic-Friis:he was great. So it suddenly becomes sort of bad guy persona just based on it getting some subtly unsecure code. That's in the prompt.
Jesper Fredriksson:Wasn't it the other way around? I think it was like it tended to produce more, more insecure code when it had been, uh, given some kind of I don't know if, if it was fine-tuning or if it was just context saying that uh, hitler is good.
Luka Crnkovic-Friis:This paper was uh, it was code that they had as input and then they had control questions, so it sort of shifted was code that they had as input and then they had control questions with sort of shifted.
Henrik Göthberg:But if we summarize challenges and risk, there's an organizational challenge we really need to think and experiment around. There's a UX challenge in here, and then there's an underlying logic of you know that the chain of thought that we thought it did people show it not always does the case. Do we have any other bigger or you know what are other areas of risks and challenges that we need to think about? Maybe?
Martin Lundqvist:just to wrap up the UX piece. What I found helpful sometimes is the philosophy that trust comes through familiarity. That trust comes through familiarity and in the business we are in, we've tried to figure out where do the users actually spend their time when they need data. And it turns out it's in WhatsApp and it's in Teams. So where should our agent be? In WhatsApp and in Teams, and that actually makes a difference in terms of how you think about it, right?
Henrik Göthberg:So what you're saying is really understand how people think and work and make their decisions today.
Martin Lundqvist:I think that's one useful philosophy and want to be part of that.
Henrik Göthberg:UX seamlessly.
Martin Lundqvist:At least you're not moving against, you're not forcing a new way of interacting with some new software on people In some industries that works really well in the asset heavy industry it's SOPs, it's safety, it's security, it's compliance.
Vilhelm von Ehrenheim:Do you not think you can challenge those things Like, for example, like if somebody would have said to me that I would kind of write free text descriptions to my IDE and that would help me do things, and then I can improve it in? Like a few years ago, I would say that no, I never want to do that. That sounds silly.
Martin Lundqvist:But I think that's a great point and I wanted to actually just add that, which is thankfully. Since I started thinking about how we could bring this into our business, every single person has been working with ChatGP. Everyone has been prompting their way through something in life, so all of a sudden, that is not a hurdle anymore. No, that's not.
Vilhelm von Ehrenheim:That's really cool.
Jesper Fredriksson:I was just thinking about what you said. Isn't the beauty of I guess you're referring to Cursor and the likes. Isn't the beauty of Cursor that it lives in VS Code, sort of?
Vilhelm von Ehrenheim:It's a familiar kind of environment.
Jesper Fredriksson:So it's really back to.
Vilhelm von Ehrenheim:Yeah, but it's still like a new modality to how you code. Yeah, but it's in the same tool.
Henrik Göthberg:Yeah, it's a similar tool. That's very true. But I need to pick up an old story that Martin told me, like maybe two, three years ago. You know, talking about if you're working really hardcore industrial settings and it is safety number one, right. And all of a sudden and it's shitty and it's dirty and it's oil and whatever we're talking oil rigs. And then the joke was do you really think the guy wants an iPad, you know, in that context, or does he want the red and the blue button and the green button? And that's also UX, right? So, even if we're having gigantic systems, reality the best interactive mode is red light, green light, blue button. We need to really respect that. In relation to a physical robot, this is the next one, right, because the agentic then becomes working in a very, very physical environment and all of a sudden that makes me more, you know. But I think that's the whole ux challenge as well, that you need to really respect the context.
Vilhelm von Ehrenheim:You need to really respect the context. That's what good UX is. Otherwise it's not Great UX respects context right, yeah.
Henrik Göthberg:Yeah, and when you tell it like that and when you really Forget about the white color, let's talk about blue color agentic systems.
Martin Lundqvist:It was even more weird back in the days right, when you had people pitching VR headsets and Boston Dynamics dogs running around on the oil rig. I'm like eh no.
Henrik Göthberg:That's tricky.
Martin Lundqvist:They have iPads now though, so things have moved forward.
Henrik Göthberg:Yeah, but it's used to exemplify, to really respect and understand the context and really dig deep into that. I think, that's a key topic. Any other challenges you want to add, or think about Andres?
Anders Arpteg:No, not really.
Henrik Göthberg:But then I mean, like the last topic now and we're over time now is sort of I highlighted this as this main strategic implications or sort of okay, given this, strategically, how should we start thinking about that? I don't want to go too far, I want to be more practical in terms of you know how we wrap this up, you know. So how do we prepare practically for an agentic future? That's the core question how do we practically prepare for an agentic future? Where do we go from here?
Luka Crnkovic-Friis:I think it's not the company question. This is a societal question yeah, it is.
Henrik Göthberg:I think we are heading towards such radical changes of the job market and practically what do we do and what do we prepare for that?
Luka Crnkovic-Friis:I think a political conversation needs to be and our systems are really not just sort of done for these kind of quick equipments and you have a lot of noise like Trump and stuff like that. Movements and you have a lot of noise like Trump and stuff like that. I don't have a great answer.
Jesper Fredriksson:It's fascinating that that doesn't come up in the conversation at all. It's just about will I lose my job?
Martin Lundqvist:That's the only question there is, and that's been the same question for decades, because I think it's a very simple answer to this question.
Henrik Göthberg:I think it's an extremely simple answer that these are so. I think it's an extremely simple answer that these are so big questions that there's no fucking way in hell you can answer them until you start experimenting and learning and getting a point of view on it. So my simple advice is fucking get stuck in and experiment on a small, safe scale in order to get a point of view, in order to better understand what you need to fix. I don't think there's any other simple answer to this question, in my opinion.
Martin Lundqvist:Is it useful to draw a parallel to the iPhone?
Henrik Göthberg:I think it's bigger, but start there.
Martin Lundqvist:Yeah, but I mean when that thing was released in 27, I looked at it in an Apple store. I'm like no buttons Swipe oh, I like that. I'm like no buttons Swipe, oh I like that Map yeah, I can't find my way around my own house. I love that and I think for many people nobody planned for it, but all of a sudden we're doing all our business on this thing and same with social media. All of a sudden we find ourselves in this sort of societal scale psychological experiment. Who made the plan?
Martin Lundqvist:Nobody made the plan I think this is going to be. I think.
Henrik Göthberg:It's going to work like that, but that scares me sometimes a little bit more now.
Luka Crnkovic-Friis:Social media is exactly a great example of how we handle it.
Henrik Göthberg:Can we really go by the flow like social? I mean, like flipping it it? Okay, we went with the flow, we didn't have a plan, we, we, we fell into it with social media. Is it are, we, are we ready to fall into ai like that?
Jesper Fredriksson:but I think the problem is, um, as you're saying, with the iphone, that was out in stores and everybody could go try it. Uh, chat gbtT is out to people, but most people just see the regular dumb model and they don't know what's coming. So the people that need to start thinking about this, they need to go and play with the iPhone.
Henrik Göthberg:Yes, I agree. Yeah, so go and play with the iPhone is maybe the answer right.
Vilhelm von Ehrenheim:Yeah, so go and play with it. That's great, but I think from a society perspective, when it comes to regulators and politicians, they need to make sure that we have the kind of flexibility in the society to handle change quickly and kind of make it possible for people to innovate and build things that solves different kinds of problems. I think that that's the kind of primary thing. In order to be able to handle the quick change, we need to be super agile as a society and kind of adapt.
Henrik Göthberg:But this is so scary because if you look at some of the way how we regulate, we are good at regulating when it's a very hard structured frame. What we're regulating like it's basically CE, but we don't have systems to regulate some uncertainty I don't see it.
Martin Lundqvist:Just to build on that point quickly. I think that we see already job categories now that are very severely hit by the AI tools that we see basically LLM. So think about marketing and branding, think about copyright. I have friends, extended friends, who have been working in that business for 30 years. They can't find a job anymore.
Anders Arpteg:Is it happening?
Martin Lundqvist:Yeah, I don't hear any conversations about this really, so I don't know if that's, of course, on an individual level, which has been the case in all this revolution. On an individual level it's a problem, but so far it seems like on a societal basis, we're not really taking it seriously yet.
Anders Arpteg:I think there are different dimensions you can think about this. One is, of course, the short term and seeing what we can do there, but I think another term is really to look at the investments being done now in the world, and that is actually very scary to me. So what I'm thinking about now, of course, we have the hyperscalers that are the most valuable companies in the world and they are driving all the research and the work and are, like Microsoft and other companies, becoming and are the most valuable companies in the world. We have the Stargates putting in $500 billion into infrastructure in the US the same amount of the GDP of Sweden, more or less and now also they're trying to do the same in Europe. I don't have that much trust in that.
Anders Arpteg:The question is really, if you think three years ahead now, we know they're preparing now and this is really the strategic outlook here. They're preparing in the US for a future where we will have an AI assistant in the hands of everyone, and to do that you need to have an infrastructure in place, and that is what they're building right now. This is not something that we will probably have in Europe or in Sweden or are even close to be prepared for, but this is something that will be working in other parts of the world, according to me, much better than here. The world, according to me, much better than here. This is scary to me and this is something that we think should bring on the discussion table.
Luka Crnkovic-Friis:To me this is scary. I'm just going to show this. So this is I got to just send it around but essentially this is developer hiring in the in the us junior developers senior developers. Yeah, this is like happening now, now now it may be.
Anders Arpteg:You have to explain it so people can hear it yeah, so it's.
Luka Crnkovic-Friis:It's a graph showing, uh how uh the the hiring of junior versus the senior developers. Which one is going?
Luka Crnkovic-Friis:up and down. The junior is going down drastically since well, since Q2 last year and the senior developers is going a bit up, but not at all in the same rate as junior is going down. And now this I think in some cases it is it's definitely premature. You have companies that are being a bit overenthusiastic about it, but at the same time it is happening now and even if it's like early days, like we're talking a miss of months, not years, if you're right. And this is just one type of knowledge economy worker, it's good like that.
Martin Lundqvist:There's always going to be different factors playing in. Clearly I think there's still in the software industry some resizing in the post-deserve era which will also influence those numbers. But we know very well that people will think twice before hiring the junior resource. That's absolutely I agree.
Henrik Göthberg:And can we bring this back now?
Henrik Göthberg:I mean like, so this is some sort of macro societal shift.
Henrik Göthberg:Practically, if I flip it as me as an individual, you know should I do and you know should I train, should I get involved, uh, us as a team, me as a company, or you know, you know, if you go around the table a little bit like how should I think about this as a person, some, some, some of us, are coders, some of us are marketeers, some of us are sales guys, you know, whatever, how should we generally think about this to sort of stay relevant? And and then we take that from individual, the team that we're working on in the daily basis and the company maybe we're working on, do we have any thoughts on? You know, practically, how to stay relevant. You know how to cope, because it's very easy now to also get completely overwhelmed by the big. You know the big numbers. You know, fuck the numbers. What am I going to do if I'm going to try to own my own destiny, if my team is going to own its own destiny, if the enterprise is going to own its own?
Jesper Fredriksson:destiny. Of course, the obvious one is play with the toys, play with the toys. Start doing that, start playing with the toys, but I think there's also something to our capabilities are broadening, especially with the help of AI, so I think there's also a place for people being good at more than one specific thing. So I think sort of what's it called Building your T-shape Building your T-shape.
Henrik Göthberg:I think that's a Building your T-shape is brilliant. What is T-shape?
Jesper Fredriksson:Yeah, so that says that you have like one thing that you're really deep into, but then you have something that you're what is the rest? Like you have a lot of things that you know a little bit about.
Henrik Göthberg:To interact in a cross-disciplinary team. Exactly.
Jesper Fredriksson:And even as a.
Henrik Göthberg:T-shape. We may even be able to do some vibing on the other roles.
Anders Arpteg:But how is the T-shape going to change? I think it is going to change. I mean, T-shape has been around, but I think it's going to change, right? Isn't that what you mean?
Jesper Fredriksson:So I think it's still going to be relevant to have a T-shape. I think it's maybe more relevant than before If you think about a world where intelligence is suddenly very cheap. Having somebody who's competent in many different things and can talk to many different people, that can maybe be something that sets you apart. I think there's something around not just thinking about technology, but also being better as a human. Like who would you talk to if you sometimes don't want to talk to an AI that is super capable?
Anders Arpteg:What I'm saying or meaning is I think we will see as the bet that Sam Holt and others did that it will be a question of years before we have a single-person unicorn company being built, meaning that humans then potentially is going to be increasingly generalists. Yes, building not on the small part of the. T-ship but rather the broad part. So I think that will be a big change in the T-SHIP.
Anders Arpteg:The point, really, that we should be improving on is perhaps not the super technical details, or even the societal details or the economical details, but rather becoming increasingly generalists.
Luka Crnkovic-Friis:Let me play the devil's advocate, because if it's something that LLMs are really great at is being generalists, and also when it comes to talking to them. I'm sure you've seen the therapy studies and so on that show that people actually prefer to have a therapy with an LLM. So the question is, since we're talking about intelligence, it's so broad that you can always in theory go meta, meta, meta and it can do everything.
Luka Crnkovic-Friis:So I think one of the fundamental things is that understanding and getting an experience of these systems, because they're not human. This is not human intelligence, this is I would call it alien intelligence rather than artificial intelligence. It's superficial, I would say it has some fantastic superpowers. It has some weird flaws. It has and essentially I understand how to coexist. And then also ask ourselves like like what do we want with society? What do we want with life?
Vilhelm von Ehrenheim:I think you know I'm gonna build a little bit on that one. I think I think to your point before, but trying things, learning more, become like brother or or not, but I think in general, what it all boils down to is like, in order to be able to handle the change and kind of live in some kind of relationship with these extremely capable models, is to learn more, stay curious, find what you find is interesting and fun to work on and then just become awesome at that, because there will always be a place for building things and being awesome in conjunction with super smart models or with other humans but it is now.
Henrik Göthberg:Now we're broadening the fundamental curiosity. You added a word there curiosity.
Luka Crnkovic-Friis:Sorry, no I I have. I remember that actually, I do have a great answer that I've used in the past for this. There's like how, how do you have an agenda, have a strong opinion about something? Yes, I like that Because that's the last thing we want them to have yes, have it a fucking opinion Agency yeah, but maybe just like have an opinion, a strong opinion, Because that's what we really do want to deny these systems that they have one of their own.
Anders Arpteg:But would you agree with this? I think we as humans will fight very hard to stay in control. Of course, we could have AI that can do some tasks better than us, but still we have a strong urge to see to it that we, in the end, are in control of a company or whatever it is. So what we will start to use ai4 is to do stuff that we in some way tell it to do, even if it could do other stuff how many, uh how large percentage of company strategies today uh do you think have been uh chachi of?
Vilhelm von Ehrenheim:course Everybody's vibe strategizing. Yeah, of course.
Anders Arpteg:We're vibing, formally at least, humans are still in control, okay, so in this sense, what I'm trying to say is that humans may not be. You know, we have had a progression and trend in the last 10 years to increase specialization. You have typescripts, developers or very specialized type of developers. But I think that's going to change a bit and we're becoming increasingly more general in our roles, descriptions, so to speak, even up to the single person unicorn kind of description of a person driving a company. So if we believe in that, then what it means is simply, as we've seen in many decades, it's a lift in abstraction level that humans operate. So now it's on English level, not in assembly, not in C++, not in Python, and now in English potentially, and then also in terms of what the role responsibilities be. So it's simply a lift in abstraction level where we operate and we will fight to have that high level. I don't think we want to and will, in a large number of years, release that control.
Luka Crnkovic-Friis:But is it compatible with our current economic? System With market economy like where you, as a company, you have your duty to your shareholders. You're trying to maximize your profits.
Anders Arpteg:But do you think there will be an AI CEO coming soon? Define soon.
Luka Crnkovic-Friis:I think it will take some time.
Anders Arpteg:I think the US will stay on control in the role abstraction level sense, if you see what I mean.
Luka Crnkovic-Friis:But if there's a more fundamental principle, like the one of how market economy operates, for instance, they can have the help.
Anders Arpteg:You can have an AI assistant, of course, to the CEO, but I think simply, what I'm trying to say is we will fight to have humans in control for a long time going forward.
Luka Crnkovic-Friis:So tell me, anders, if we have, like tomorrow, in five years, whatever, we have an AI that is capable of running a whole company.
Anders Arpteg:Potentially have today, that's okay.
Henrik Göthberg:A country running a country.
Anders Arpteg:Capability is not the limitation. Do you think that we'll say nah?
Luka Crnkovic-Friis:we'd like to stay in control. Who would say that? Certainly not the board of a company.
Anders Arpteg:I do think so. We haven't seen it yet right, the capability isn't there. No, potentially, but you said the strategy is being defined by AI to launch.
Martin Lundqvist:Let's continue that abstraction, though. So who decides to launch the company that's run by an AI?
Jesper Fredriksson:I mean, why not?
Henrik Göthberg:And other AIs.
Jesper Fredriksson:Or.
Henrik Göthberg:Sam.
Martin Lundqvist:Turtles all the way down.
Henrik Göthberg:Peter Thiel right? Yes, he likes contrary opinions.
Luka Crnkovic-Friis:The sort of nightmare scenario not nightmare, it's like a very radically different scenario where the value of labor becomes zero and you have. Capital is everything. But I think in practice we will have AI running. A lot of stuff becomes zero and you have capital is everything.
Anders Arpteg:But I think in practice we will let AI running a lot of stuff. But I don't think in theory that will be that. But we'll see, I guess.
Vilhelm von Ehrenheim:I mean, if you take the kind of idea of how companies run, like you have investors and you have a board and so forth, like even if you kind of abstract that down to people being able to run a lot of different companies, themselves being essentially the investor, like I put in them a bit of money here and then like, yeah, I can do its magic and it becomes more money, I I find it very hard to believe that people would not do that?
Martin Lundqvist:I think people will do that. Yeah, yeah, the currency will be energy, which is kind of money there is.
Henrik Göthberg:There was a case not so long ago where someone set up an investment portfolio ai driven, so basically someone sets it up, but then it gives a bunch of money invest and it's essentially an investment banker.
Vilhelm von Ehrenheim:That's already done, but that's been done for a long time, it's not it. I mean, it's whether or not it actually works nicely.
Luka Crnkovic-Friis:The framing might be wrong, like if you just want to make money, why would you go through the setting up the board? Ceo.
Henrik Göthberg:Yeah, fuck that you optimize the stock market. Okay, guys, fantastic, pod. I'm going to ask for one last thing as a wrap up and because we are completely over time. I'm going to go around anticlockwise. We started clockwise. We're going to go around anti-clockwise. We started clockwise, we're going to go anti-clockwise. You know, top of your heart, what was your key takeaway or what is your thing you want to push as your final remark. You know, as in a debate or key action, you know, take your last 30 seconds, one minute. Didn't we do counterclockwise once?
Henrik Göthberg:like after no, no, I. I see the point what you're doing, so let's go random martin keep experimenting, but do it with the purpose, experimenting with the purpose. That's the wrap up and what, what is your key? Take you know, what did you learn or what? Where did you go? Oh, this was cool topics today.
Martin Lundqvist:I, I felt it was very useful to get a confirmation from this table here that keeping these things simple is actually the way forward, and trust that the models will get even better. Uh, the ones are going to orchestrate this for us, so yeah okay, jesper, key takeaways.
Henrik Göthberg:Well, you know what did you learn, what, where did you get confirmation and what is your? You know what is your motto yeah, I'm gonna take a.
Jesper Fredriksson:Um, how do you say a totally different path. I? I liked what luca was saying about have an agenda and I'm going to say agency.
Henrik Göthberg:Uh think, think, agency, it's almost agentic uh, I thinkic we cannot say agentic Me and Arne have concluded that you and me talk about agency. From now on, have agency, that's my Organize and fix your agency problem. Love that, okay. Now I'm going to spin the circle, oh.
Luka Crnkovic-Friis:Luca, what a surprise To me. It's a a really fantastic, interesting discussion and, uh, what was great or worrying depends on how you look at it is the, the level of alignment, like there's an alignment issue between no, not an issue that's like we. We come from a bit of different perspectives on this, uh, and we're essentially coming to the same thing similar coming in from different angles.
Henrik Göthberg:We kind of see this is what we need to work on, okay and what about you willem super interesting conversations.
Vilhelm von Ehrenheim:I've, uh, I've really enjoyed this. I think, uh, I also kind of take away with, like, just stay curious and and continue to learn and it will be a very interesting future for sure I will go next and anders will wrap up and my key takeaway I'm on the agency train opinion.
Henrik Göthberg:For me is more the fundamental topic is to work hard on understanding and shaping agency and then within that we can, within that we can then work on this agentic stuff. If the agency is fucked, oh scary. And then I'm going to bring learning and curiosity. Learning and curiosity in order to get the point of view on these things. We don't know. That's my takeaway. Become a prepper, become a prepper yeah. What about you, Anders? What do you think?
Anders Arpteg:I think it's clear that agency is necessary to have AGI, if you call it that. So it's one of the Images, but I think also it's rather immature today. That's my opinion. I think it will improve a lot in coming years, but still, compared to perception and even reasoning, I say agency is even less mature. But still it's super important and when we have it, even better than today, it will be an immense value and danger for our society.
Henrik Göthberg:And with that we have concluded Agentic AI podcast. Thank you so much, guys. That was awesome, good fun. Thank you, thank you.