Build What’s Next: Digital Product Perspectives

AI Field Guide: The Missing Middle - How to Build an End-to-End AI

Method

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 58:46

In this episode of Build What's Next, Theo Munoz, Miguel Ribeiro, and Natan Szczepaniak discuss Machine Learning Operations (MLOps) and why an estimated 80% of ML models built in notebooks never make it to production. The hosts argue that the failures stem less from technology and more from organizational issues like a lack of clear ownership, insufficient investment in data engineering, and poor data foundations. Learn how standardization, shared ownership between business and engineering, and robust model governance are crucial to scaling AI safely, especially as the industry shifts towards Gen AI.

To find more episodes, visit method.com/insights/podcasts/

Episode Resources: 

Method.com

Theo Munoz on Linked-In: /in/theo-munoz-090a88151/

Miguel Ribeiro on Linked-In: /in/miguel-ribeiro-3439328a/

Natan Szczepaniak on Linked-In: /in/natan-sz/

Welcome And The Notebook Gap

Josh Lucas

You are listening to Method to Build What's Next Digital Product Perspectives, presented by Global Logic. At Method, we aim to bridge the gap between technology and humanity for a more seamless digital future. Join us as we uncover insights, best practices, and cutting-edge technologies with top industry leaders that can help you and your organization craft better digital products and experiences.

Theo Munoz

Hello to everyone listening. Thanks for joining to another episode of Build What's Next. My name is Teo Muñoz. I will be your host today. Part of the data and AI team here at Method, joining from the Charlotte office. I'm joined by two of my colleagues here, Miguel Ribeiro from Global Logic or Parent Company, and Nathan Chipeniak, also part of the data and AI team here at Method. Today we will be talking about machine learning operations. But before we dive into these deeper, I would like my panelists to introduce themselves. Maybe a little background, a little bit about yourselves. Miguel, do you want to go first?

Miguel Ribeiro

Yeah, sure. Thanks, Theo. I'm Miguel. I've been Global Logic for the past five years. I'm a principal data scientist. I've been working a lot in banking and private equity portfolio companies. And I've been working a lot on uh AI projects, MLOps, and recently Gen AI and Agentic AI.

Natan Szczepaniak

Hi, my name is Natan. I'm a senior data designer here at Method. I've been with Method and GlobalLogic for around five years now. My background is in physics. I then transferred onto more cloud engineering where I joined Global Logic, where I worked within MLOps, deploying models across many different industries. Now I sit in the position of a data designer, which is kind of a mixture of UX basically user needs, data science, NML, as well as business value as well.

Theo Munoz

To get started with the main topic of the podcast, there's an interesting intersection between data foundations and production of AI right now, in a way that many programs fail, and not necessarily because ML models may be bad or not performing, but because the handoff between data scientists that build the models and production engineers that put them into production is a little messy. The gap between we have an ML model and we have an ML model in production is very big right now. So the conversation today isn't involved around how to build a good machine learning model, but how to build and maintain a process that reliably promotes, monitors, governs, and scales machine learning models. Right. I just want to make a note here that Gen AI is and LLM is the word of the moment. Here we'll be referring to AI a lot, and we mostly mean machine learning, not necessarily Gen AI, unless we explicitly say Gen AI. Would you guys agree with that?

Natan Szczepaniak

Absolutely. I think so uh Gen AI is a big topic at the moment, and everyone's thinking of how to firstly build AI solutions and then also productionize them. However, in the conversations, there's a big gap of utilizing traditional ML models, as there's quite a lot of different use cases still uh out there that can be solved by ML models rather than just a Gen AI. As I feel like a lot of people are seeing problems, thinking to use AI AI, and immediately they think of LLM models. Uh however, there's still many different use cases, such as you know, forecasting, uh, classification, and more robust um use cases, especially in regulated industries.

What MLOps Means In Practice

Theo Munoz

Just before we start talking deeper about the MLOps itself, I would like you guys to give a little bit of uh summary of what MLOps is and how the landscape looks like.

Natan Szczepaniak

So MLOps, I guess, uh is the it's a com it's basically the field of actually deploying models into production. It's you MLOps can mean many different things on the different levels depending on the kind of use case you're talking about. Many people can refer to this as basically putting a model, putting it on API, and then using that basically in some products. However, there's much more to this. And obviously, the field of MLOps comes from ML and DevOps. However, it's not just DevOps ML. The actual whole of MLOps is greater than just the sum of the parts of DevOps MLOps. There are different things such as model artifact, model artifacts, feature store, monitoring, and many other things uh that relate to this to actually build a resilient system. Miguel, what do you think of that?

Miguel Ribeiro

Yes, I uh agree. Uh what we've seen uh these past couple of years is that MLOps becomes a framework or a platform for model development, but also model deployment. And that's that combination of AI skills, statistics, but also engineering and production systems. And yeah, that's what we've been focusing um these last couple of years.

Theo Munoz

We hear a lot about companies investing in AI, right? ML, as I said before, in this specific case. Um and a lot of the times those projects fail. I think there's a statistics out there that 80% of the of the models in notebooks don't ever make it to production. So looking at this, what do you guys think is what goes wrong? Why don't models don't make it to production?

Miguel Ribeiro

So across different industries, we see the story repeating itself. Usually when the funding is allocated for to develop AI use cases, uh most people are thinking on just training a model, showing uh the results and impressing a steering committee or the business stakeholders, but then that's only the beginning of uh these types of projects. And um the and then the funding uh usually ends up when the business understands what needs to be in place to actually bring these uh models all the way to production. So what we've seen is usually the path to production is not there. And yeah, that's what we've been helping the customers that we work with to find that path to production and standardize across multiple business units and projects.

Natan Szczepaniak

Yeah, I'd agree on that. And also, I think there's an additional statistics on top of the like 80% um MO models don't go into production. I think uh I've seen somewhere that 40% don't actually get actionable value within the first 12 months of productionization. So it's actually important to also know why you are developing those models. And this is a big thing that we've kind of, you know, seeing across many different projects that we're working on where people are very excited to invest into AI ML, they see a use case they think is interesting and they go straight into investment before actually rationalizing uh the value that it can bring. That's why it's really important to take a few steps back. And before you start training models and basically doing like, you know, fun and exciting things, you need to see whether it's actually going to provide some value to your company, whether it provides ROI as well. Um and then when you actually do build it, uh, as Miguel was saying, it the story does repeat itself because a lot of the times projects start out with we have an idea, we have a data set, someone trains a model on their local computer, they show it off, like Miguel said in a meeting, but then they don't know the path forward. They don't know what to what to do with those artifacts. And very quickly, that usually kind of runs into going to the software teams, which you know have capabilities of uh deploying things. They usually use traditional DevOps pipelines. Uh, this is what we see, some clients that are kind of less mature in their MLOps uh strategies. However, there's a lot of different um, you know, products and platforms which help you develop this. Uh, you know, the all the cloud providers now have a lot of these ML platforms in order to be able to develop, then basically train, make sure that you validate the model, and then actually deploy it into production as well.

Why AI Projects Fail After Demos

Theo Munoz

Yeah. Um Miguel, you you mentioned when when you when you were going over your point that um about about stakeholders. Um just pointing off of that, who do you think is responsible looking for a from a corporate standpoint of view, who is responsible for fixing these this gap that we have of MN models in in notebooks and MN models in in productions?

Miguel Ribeiro

Yeah, that's uh a good question. And that's usually where the process breaks is not just the tools and technology, it's actually the culture and the actual mapping of the different stages of a machine learning lifecycle, and there's no clear accountability, there's no clear stakeholders or no clear ownership. And what we see a lot is everyone working in Sight Laws, like Nathan said, and you've got the traditional DevOps pipelines, and you have AI engineers, data scientists just developing a model, passing over to the DevOps team or software engineering team, and there's no not a clear handoff, and there's no clear ownership. Yeah, what we've seen that works best is really bringing those skills together. So the ownership is a shared ownership, it's not just one single uh business stakeholder or one single business unit. It needs to be both the business unit and engineering capabilities and mapping the whole process until production. You will often see that what's missing is not really the technology. The technology is there, is clearly understanding where handoff begins, how it should be made, and then standardizing that cycle. And for that, ownership needs to be um shared between business and engineering capabilities.

Natan Szczepaniak

That's a very good point, actually, because I think you're right saying that the technology isn't the problem, because we've mops has been around now for quite a while. Like if we actually look back on it, like around 10-ish years, I think it's that's where it kind of started uh beginning. Um, technology, the design patterns, the reference architectures are all there. If you if you really research into building a really robust MLOps pipeline, you will find a lot of resources with from AWS, Google, Azure, all of these uh different cloud providers of how to build it, as well as the all of the SaaSes that have come up in the recent years, but none of them actually have the team's responsibilities in them. I think you know, this is comes in projects as where we decide who basically takes ownership of different parts, but it's not always exact because you don't always have the perfect team. You kind of have to make do with who you have. And then I guess the question of like retraining into different fields, that's when it begins as well. Who takes ownership and who actually understands the technology? That's that becomes an interesting uh problem.

Theo Munoz

Yeah, really, really good point. And that then we when you talk about the DC different responsibilities, and Miguel, you brought it up too. So there's a there's a tendency of hiring um like PhD level data scientists and underinvesting like data engineers. So do you think it's it's a fair criticism to an organization to say that they're maybe overinvesting in data scientists and underinvesting in data engineers, and it's kind of responsible for this gap that we have, or you think it's more of like the training that each one each one has responsible for it?

Miguel Ribeiro

It's a good question. I think it depends on the organization structure. We've worked with tier one banks where it's not feasible hiring a lot of AI specialists as well as big engineering teams. So you really need to standardize the whole process in a way that you can empower data scientists, AI engineers to actually bring their models all the way to production. And then you've got uh engineering capability always there to support, uh, but it's not that you need for each project both data scientists and machine learning engineers, DevOps and so on. So it really depends on the company's structure. So, of course, investment needs to be made also on the data engineering capability to make sure that you also invest on ensuring that the data is in a in a good quality and in a good shape to actually being leveraged by AI models, but also the engineering capability that can support building the infrastructure to bring these models to production. They they might bring a lot of AI skills and not enough engineering skills, but also thinking how the whole path to production should look like to really understand where that investment should be made rather than just bringing um a lot of PhDs or data scientists, a lot of people on the AI skills, and then there's a big gap on actually putting this into production.

Natan Szczepaniak

Yeah, I I think you can also see this gap through the job titles that are popular as well, because I think over the over the recent years, we've seen, you know, data scientist was the biggest kind of hot uh job title to have, uh, and it included basically being able to, you know, get some data, you know, create a model, run it locally, and then like show it up to someone. That was like the data scientist role. However, when those data scientists joined, uh, a lot of the times they've really indexed on the data science skills on all of the machine learning frameworks in you know, Python and things like this, they've over-indexed on these skills without knowing basically the things that that are needed to for productionalization that engineering teams have. So they don't basically have things like uh they don't understand uh DevOps, cloud, and uh maybe like some version control, things like this. And just learning those things allows you to understand the process from kind of end-to-end. And that's when you really become like a really valuable uh asset to the team where you already understand the process throughout. Of course, this varies depending on how big the teams are, because of course, if you have a small team, you need people that have, you know, that end-to-end uh knowledge. However, as you scale teams, of course, you can have more specialized data scientists and then ML engineers and then ML ops people. So you can actually scale and those people can specialize in those skills. However, a lot of the times, data scientists nowadays is just not enough. You really need to understand cloud, you need to understand a little bit of uh deployment, a bit of development, and maybe understanding also what the product is because that's extremely important. Knowing why you are designing a model and what it's actually going to serve to the end user is extremely important because you need to really know what that value is so that when you're developing the model, you can have those assumptions and those requirements in basically in your head so that you know what you're actually developing for the end user and for the business as well.

Miguel Ribeiro

Once these frameworks are put into place, you know, a whole machine learning lifecycle framework, and it's clear uh both to the data scientists engineering team, they also start to understand a bit more the big picture and um yeah, the whole path and what needs to be done, and then get them curious about also those skills, like Natan said uh the data scientists also understand it a bit more the the DevOps pipelines, training inference pipelines, and how to actually put that from dev to QA and prod.

Natan Szczepaniak

So, to add to that as well, so it's those things that around the thing. So MLOps actually involves quite a lot of things. So, like as we said, it's about deploying the models, of course. That's like on the surface level, that's deploying a model. However, there is a lot of small little hidden things which you don't realize until you actually try to productionize a model that you need to take care of. You need to take care of, firstly, the data. So, this is where obviously data engineering engineers and data engineering skills are important, understanding, you know, databases, how how they're running, how to effectively uh, you know, write queries. You need to understand data versioning. So understanding your data sets that you're training your models on, uh, you need to understand how to version them correctly, then how to basically write your code in a nice way so that you may be using infrastructure as code to do a repeatable pattern of these ML models. Then you need to obviously deploy it and then monitor it. And all of these things require you to understand like fundamental concepts in cloud and uh solution architecture and uh development as well.

Theo Munoz

I think uh we we just talked about all the defined roles, and I think Method does a really good job on all roles not then, where we are data designers, where we do data science, data engineering, cloud, and everything in between, and have an understanding of business and strategy as well that can bridge the gap between all of them, no? But talking more specifically about we know corporate-wise, there's very set roles, like we said, data scientists, DevOps engineers, software developers, data engineers. Going rolling back to uh the gap that we were discussing, where does the handoff between um uh machine learning models, model builders, and production engineers most commonly break? And um, and if we can talk a little bit more about that.

Miguel Ribeiro

It's not just on the uh and off it's also from the data uh engineering data platforms to the AI teams that are going to be leveraging that data. What we see a lot is these projects being being developed on sandboxes with data that doesn't correspond with what you see in a live production environment. So the model is not gonna be uh is is being trained on the wrong patterns on the get-go. Um, and then that spreads when you hand off to the engineering team, which is also where we see a lot of cultural but also challenges of lack of ownership and these projects just failing because eventually you run out of budget to keep sponsoring these projects. Um yeah, it's not just between ML teams and the software engineering and DevOps teams, but there's also that gap in the beginning between the data engineering and also AI teams.

Natan Szczepaniak

Yeah, I think it's also um developing a strong foundation or like of MLOps and like as Miguel said, culture and kind of ways of working. So it really depends on the kind of you know, the the use case and the company, the company size. It really depends what that foundation should be. Of course, for more greengrass, you know, projects and um and companies, uh greengrass projects are much nicer because you can build that foundation from basically day one. You can build the the correct platforms, you can choose which platforms you want to use, whether you want to use AWS SageMaker, you want to run your models on there, you can do all your experimentation in that environment, or whether you want to use ML Flow or basically Hugging Face, all these different uh software products, you get to choose that in the beginning on the greengrass company if you're starting from scratch. However, if you are basically starting from an existing infrastructure, that becomes a bit of a challenge because you've already got a foundation and you you need to decide whether you want to build on top of what you already have or basically start with with a new one, with a new initiative to build this AI platform, which enables this handoff to be easier. So the handoff itself is um is it's it is it is kind of in between the data scientists, and Miguel, let me know if you if you disagree on this as well. It is between the the data scientists that have a model that want to uh put it into production and the engineers from the other side, and it's that mismatch of the skill sets that's kind of usually where the gap is. However, that's not the only place because there is a lot of things about investment and the actual platform that you're running on and the culture and the ways of working.

Miguel Ribeiro

No, I I agree. And uh actually on that specific gap between um AI and then software engineering and DevOps, uh, for example, one of the big things for uh especially heavy regulated industries is the model observability, which is it's not optional anymore. You definitely need to do that, you definitely need to have the ability to trace the different model versions, the lineage on the data being used to train those models, and um uh yeah, making sure you're monitoring and set up the monitoring in a way that you you either have a baseline or depending on uh the use case that the team is working. Uh and yeah, that handoff is often not very clear, and only the train model gets deployed and then models drift, uh the business loses trust, and um yeah, that's what we see things breaking in the life cycle.

Monitoring Governance For Real World Risk

Natan Szczepaniak

Yeah, definitely. And I think that if you are a company that wants to basically implement uh an ML model, you can try to basically find your own way there to basically you know stand up your own uh data platform, try to uh develop and uh deploy models into it. However, like Miguel said, there's a lot of things such as mod observability, explainability, uh, and modern monitoring, which needs to be set up in the right way from the get-go, especially for like more regulated industries where they have like compliance needs as well. Uh, it's really important to set that up in the proper way. And this is why like having an expert in MLOPs really helps you to get those things from scratch rather than kind of finding your way there. Because you the last thing you want is basically having some repercussions of not setting it up properly. Just because it works doesn't mean. That it's actually compliant with all the regulations.

Theo Munoz

And um just looking back at at everything we we said, what do you think is uh healthy collaboration between production teams and uh data science teams, or if it should be one person doing it all, or what are what are your what are your thoughts, or what you where you may have seen maybe examples in the past?

Miguel Ribeiro

Yeah, actually it's uh bringing all the capabilities, all the capabilities together. Once you understand the actual use case, what you're addressing, uh when scoping the team, including um AI skills, but also DevOps, MLOps, and you have different ways of doing that. If a company is starting the AI their AI journey, starting with one use case, bringing uh one person of each capability together and find out the path to production. For big banks, obviously, you can't do that for every single project. So you try you standardize um uh the framework. You bring both data engineering platforms, MLOps platforms with templates that empower um data scientists to actually find the path to production by themselves, and then obviously the teams are there to support. So there's different ways of doing it. It really depends on the organization structure, the current skills, and based on that, that's where we they can start mapping their current um uh life cycle and identify the gaps, and then either hire more engineers to the capabilities where the gaps are. Well, first actually fixing the whole map, the whole workflow, understand where the gaps are, bring the capabilities, and in that way start standardize how they develop these kind of projects.

Natan Szczepaniak

I think to add to that as well, the good cooperation between those teams is actually when they start planning all together from day one before they even start their work. So I think the the part of the problem is that the data scientists work in their own, you know, uh world, they get handed some basic use case, they start developing, and then the first contact that they do is when they're ready with their model. They have their model ready, they've got it trained, and then they hand it off, then they message someone, they say, Oh, I've got this model, I would like to deploy it. However, if they were to actually talk to each other from start, that they would say that, hey, I'm going to build this model, I'm going to be using this data, I would like to deploy it over there from day one. That's when you can actually have these teams understand what they are doing and have that visibility into each other's work, meaning that they can do parallel work. So the data can be prepared in whilst basically everything else is being being set up. So being able to communicate effectively between those different silos that usually exist is the way to make sure that you know the ML model doesn't just come up as a surprise and everyone then is thinks of how to deploy it. The deployment strategies are actually there from the beginning. So we already know how we want to deploy it uh before we actually start building. Uh Miguel, do you agree with that? Yeah, yeah, absolutely.

Theo Munoz

We talked a lot about the tools and the technology behind MLOps and the different technologies that a data scientist needs to be uh comfortable with and a production engineer needs to be comfortable with. But I don't think there is a solution to these if there is not a structural change, a change manage a change in management. Um so what are you what are your thoughts on that? What are the things that management needs to do and change for these um to work better?

Miguel Ribeiro

I think if an organization constantly runs into several projects that keep failing, so a lot of investment being made on AI capabilities and these projects never get used by the business, is really understand where in the cycle this is breaking. Um if the break is really on the handoff, you know, having models on notebooks, but they never get to the production account, then map exactly where the gap is. More often than not, it's going to be lack of ownership. What we also see is uh companies try to just uh bring platforms, data platforms, MLOPs platforms, without fixing that side of the culture of the organization. And then they just yeah, it's not fixing the gap, it's just uh bringing yeah, but things fail a bit faster. So um really understanding that workflow internally and being very honest where the gap is and where things are failing, I think it's very important to be able to understand, to be able to actually change the way that they're uh developing their approach to these models. And then uh depending obviously on the company size, if you're an organization is very early on their AI journey, then it's really starting with one use case, find a path to production and build a blueprint out of that, start reusing that pattern across different projects.

Templates That Speed Delivery Safely

Natan Szczepaniak

Yeah, definitely. And about the different services that are out there that help you with MLOps, that's very true. Where a lot of companies struggle with MLOPs, they see a lot of these different products and you know, services that they promise that they're gonna solve all of their different problems. They start bringing on these different uh SaaS products because they sound exciting, they sound good. Maybe someone gave them really good demo and they try to integrate that into their workload, promising that it's gonna solve the issue. However, it's just adding another piece of software to manage where, in fact, maybe there's something that you already had that can actually achieve the same job. So understanding the fundamental, like really stripped down MLOP's like, you know, pipelines and knowing what your tools are actually capable of allows you to build that strategy going forward. And uh, like Miguel said, building those templates and actual almost it's it's an accelerator of uh building these uh ML models in one unified way. Uh, the last thing you want to have in an organization is different ways of deploying models because you know, we talked about silos, but there's also different teams sometimes. So in really large organizations, there's different business units. Different business units have sometimes completely different technology stacks, which means that they deploy models in different ways, and then there's you know, initiatives of trying to bring them all together, uh, which usually is a bit of a pain. It's possible, of course, to unify it uh by picking the the correct solution, however, it is quite painful. So understanding which ones you want to use, which ones that are the most cutting edge, and which ones allow you to achieve the most is very important. But like I said, understanding the stripped-down fundamentals of how an MLOps pipeline should work, and then only really picking the right tools for the job rather than basically jumping out at solutions just because uh they sound exciting. And it's very difficult to do that because there's a lot of different uh new innovations in the field. Um so yeah.

Theo Munoz

You you both talk about um templates and standardization for speeding things up, going faster from that development to production. But uh templates can also uh uh box ideas, box the the the innovation, right? So from where you have worked in the past, how do you uh see is a good balance of having still a standardization way of uh uh productionizing models, but also not not killing down innovation and experimentation.

Miguel Ribeiro

It really depends on uh the company size or what we've seen in big companies. Like usually there's projects that follow the same pattern, and that's how you can start building uh the foundations of your MLOPs approach or uh platform. Obviously, you don't want to stop innovation. In highly regulated environments, yes, you you you provide the template as uh guidance, as the direction. Obviously, you can allow uh deviation, but depending on how much it deviates, it really has to have a good reason, especially what we've seen in banking. Usually there's a lot of governance. Uh, you will have a model risk framework that you need to comply with. And when you standardize how these projects are developed, that you also reduce how you address all the concerns that uh a model risk framework has. So it really depends on your organization, how you're regulated. You want to provide the templates as a direction, but you don't want them to follow to the rule. You want to allow some deviations that you can still manage the risk from the from these models when you look at all the models that are being developed and deployed in your organization.

Natan Szczepaniak

Yeah, so yeah, I guess the templates they are extremely important in actually solving the problem that we talked about earlier, the the handoff and the basically understanding things. Because you have if you even if you have a data scientist that is maybe and doesn't understand basically deployment as much, a template really allows you to understand that checklist of the things that are needed because it helps you, you have a model, you know, the the the common denominator between these templates is the the model, right? Of course, the there's different models and templates need to be, they need to be different kinds of templates for different types of models. But essentially, a a template is like a like a flight check. Before you actually start to deploy the model, you need to tick off the different things that you need to consider uh before you go there. So within these templates, which by the way are usually in the form of some infrastructure as code, so it's it is compatible with the data scientist environment where they have their own local environment, they know how to train a model using ML frameworks. The template's gonna include that, but it's also gonna have like things like configuration files, uh deployment uh scripts and things like this, and also the versioning, so the branching strategies and things like that, that helps that data scientist to understand what I need to actually finish before I deploy this model. So, this is uh a template is usually made by ML engineers or MLOps um experts, which understand how a model should be deployed, and it's a thing to show to someone hey, this is how you're actually supposed to deploy the model. Now, of course, these templates do differ. Like you said, uh Theo, the you know, you can limit innovation if you kind of just have cookie cutter. But this is why we have templates for different types of models. So you have uh maybe a template for a supervised model use case, a template for an unsupervised, a template for uh forecasting. And this allows the because these different models have different deployment strategies, different things to consider and different metrics. So if you provide like a few different patterns, then you can within that space, you can innovate still without breaking the regulation, basically, or any other things to basically that you could be missing, like Miguel said, to avoid things like risk and other things that the company requires.

Miguel Ribeiro

Yeah, just that to that. The templates also help um, you know, empowering, for example, data scientists on their MLOP skills, software development best practices. So, yeah, things like logging, but also looking at uh model versioning, data bias, and everything. Those things need to be there. So obviously you don't want to stop innovation, but you also want to make sure that these things are included, where whether through templates or um uh the teams do have this in mind when they are developing these types of projects. Um yeah, there's always a trade-off and there's a minimum requirements that need to be there. Yeah, these templates are a way of helping bring those in.

Continuous Training When Data Changes

Natan Szczepaniak

Yeah, and I would say that a lot of the main cloud providers, uh AWS, GCP, and Azure, if you are deploying models on cloud, uh, there's already a lot of different services and specifically patterns as well. So all of the providers, but I mean specifically, for example, AWS has things like AWS Jumpstart, which helps you to see some you know models already being deployed, you know, the time series forecasting, XG Boost, to just get to get you started with understanding how that pipeline works. And it allows you to basically get from your data set. You can see that you can start from I have a ML framework, I have a data set, I need to have these metrics logged into model monitoring, and then how to progress it through the different environments. And at the end, you have maybe your either endpoint, so for basically for models running on an API endpoint or batch inferencing as well. So it really helps you to understand uh the differences between those.

Theo Munoz

I think a lot of people think whenever we talk about standardization and templates for machine learning ops, is to productionize them. And of course, yeah, that's a big part, but another big part of templates and MLOps is continuously training. So you have a model in production and it needs to continually train with the new data or new demographics that are out there to stay in on top of things. Could you explain us a little bit what um first what continuous training looks like in MLOps and how it differs from like traditional CICD?

Miguel Ribeiro

Yeah, this is a good example of where DevOps for software development and AI uh differ. Uh usually in software development, you would have you know, you commit your changes and you trigger some pipelines for AI. You also uh need to take in consideration if there's data changes. When drift is detected, when new labeled data gets in, when the model starts degrading, the performance is not uh as good. And including these things as part of the pipelines to trigger warnings or bring a human to review these models before they go live. I think that's the main difference between uh MLOps and DevOps.

Natan Szczepaniak

Yeah, I think yeah, there's a lot of, you know, everyone always wants a model that gets better over time. That's kind of where that need comes from. And that's where retraining and continuous uh training comes in, and you have, you know, different approaches towards that. You have like online training where you basically uh train the model as as it's going as well. But that's where they, like Miguel said, the data that's actually coming in is becomes very important, the data drift. As you are continually continuously retraining, before you train, you need to understand is has the data shifted, has it drifted or not? And then when you train the model, is the model going into a certain direction as well?

Data Foundations And Costly Adoption Mistakes

Theo Munoz

Um, just just going back on the bridge that we were talking about, machine learning models in notebooks and in production. Um, I was just curious, uh, with the industry experience that you both have, what is the biggest adoption mistake you have seen people make and what did it cost the client?

Miguel Ribeiro

I've seen it when um one organization tried to um migrate everything at once rather than starting with one specific use case. Starting with multiple projects means that people were doing things their own way without communication between projects and actually trying to reuse um assets, pipelines. Yeah, that that that becomes quite costly. Not standardizing how you're developing your projects. Obviously, this depends where the organization is within their AI journey. Um there's multiple, there's always projects happening in parallel. But yeah, when when um you do notice that uh there's a flaw on the model lifecycle and you try to fix it, but then you try to fix it everything at once, yeah, that usually leads to a lot of investment and you're not actually fixing the issue. You're just uh creating new issues.

Theo Munoz

And I can see that uh being a common common issue in large corporations, right? If you are an an old like an old corporation, you probably started with when AI was not a thing, and now that it is a thing, and you tried to change everything at once to make it work, that's a that's a huge issue.

Natan Szczepaniak

Nathan, you were gonna say? Yes, absolutely, absolutely. I'd say that uh trying to tackle too many models at once is a big one. Uh, because uh sometimes from discovery and companies have a lot of ideas. They have you know many different use cases across different different views. Uh they want to implement loads of different models at once. They they say they're going to build a platform uh to host all these models. The best approach there is to pick one of those models and prioritize uh which model that you would like to bring in production that's actually going to provide the most value and let that kind of build the path for the rest of them as well. So making sure that you build one properly first, understand the whole pipeline, how it should do, how it is within your organization, and let that be kind of the golden standard for the rest of them. And if it if something goes wrong, you can also take the learnings from that to deploy uh other models as well.

Miguel Ribeiro

Yeah, I actually have another one that actually I should have started with that one. Um uh organizations starting developing uh projects but without fixing their data foundations. Yeah, more often than not, when it happens, whether it's big or small organizations, we're starting a project and then we're looking at the data. And yeah, either the quality is bad or there's a lot of missing data, different databases where different tables or columns have a different meaning, even though they have the the same name. And uh, we've also seen companies that grew through acquisitions. Yeah, they don't have a standardized schema across different databases. So, yeah, really enabling the data foundations, ensuring that you do have a team looking at the data, ensuring that it's uh at the highest quality and in good shape to being leveraged, and not just for AI projects, but also just uh business analytics, uh creating dashboards to inform the business on um yeah, whatever insights you're leveraging from the data, and now even for recently for uh Gen AI projects, uh I what we've seen a lot is is not having your data in a good shape.

Theo Munoz

I'm sure you guys know there's there's a common saying in in machine learning bad data in, bad product out. So like if you don't have the good good um data foundations, no matter how complicated your machine learning model is, it's still going to be bad if the bad if the data coming in is bad.

Miguel Ribeiro

Especially if you're trying to drive adoption on um, you know, we've put an MLOps framework in place and now you're trying to drive adoption across the organization. If the data quality is bad, eventually the business uh loses the trust and then uh all the investment was for nothing.

Guardrails Privacy And The Gen AI Shift

Natan Szczepaniak

I would agree with that. However, I would put a caveat in that because a lot of the times people kind of struggle with this. And I think everyone would be familiar with, you know, I want to build this, I don't have the correct data. I found that in a lot of organizations uh there is a lot of different initiatives happening at once. There's, you know, initiatives to do uh ML, AI, to basically innovate with in the AI space, and there's also initiatives to do data cleanup as well at the same time. I've seen it in some um some companies where the ML is being basically slowed down, obviously, by um by the quality of the data, and they can't get buy-in from the business, even though it has a lot of potential for revenue gain and for um actual value provided to the business and the users. And I think that's a big problem where you know people are trying to push ML models forward, but they can't. In that case, it's you know, sometimes you don't have the data to train on it. So you have an idea, you how you know how to train that model, but the data is not quite there to show off its full capability. This is kind of where, of course, it doesn't solve the problem. However, it kind of helps you understand what the value would be, is where synthetic data can come in as well. So you can basically use synthetic data sets or create synthetic data sets to create ML models to see what they could actually look like if the data and when the initiative of the data cleanup is fine. So you actually can validate that ML model to see that it will provide value when the initiative of cleaning up the data is fine. And in fact, you can also see in which way do we have to provide, do we have to clean up the data? What's actually wrong with the data? Is it basically missing values? Is it different? Is maybe schemas uh not correct? Um yeah, there's many different things you can look at.

Theo Munoz

Miguel just just mentioned the trust, and this goes very well along another subject that I wanted to talk about, which is ethics and guardrails. So ethics and guardrails is becoming like such a big thing right now due to um Gen AI. What are we giving to Gen AI? Some uh personal information can go to the Gen AI. But ethics and guardrails are also super important when talking about ML because of compliance, as you said, on regulated industries such as banking, data privacy, et cetera. So we see that many ML platforms delay this checkbox, sort of say, ethics and guardrails to the end. So, what are the implications of doing this and how do you work with them throughout your project and not just like at the end?

Miguel Ribeiro

Yeah, so um yeah, ethics shouldn't come only at the end of the project. In banking, we don't see this as often because there's been a maturity on being compliant not just with regulation, but also looking for their users, their customers to make sure um that their models are not taking decisions that are becoming biased based on the data. You bring these controls early on the life cycle on the data quality monitoring, making sure that you're monitoring the data for biases or uh So it's part of the data quality monitoring, but also heavy regulated industries. What we see mature frameworks in place is when you have like three lines of difference when you Develop and provide ongoing maintenance to these models. You've got the model developer, then you've got an independent validation team, and then internal audits. And having these three lines of defense really helps you make sure that you don't miss when your models are making biased decisions or um uh having some ethics, not being not just compliant with regulation, but also but also not being very ethical and making the wrong decisions.

Natan Szczepaniak

Yeah, and I guess there's real consequences to basically ML models not going the correct way as well. I think there's a there's a specific example from a health insurance company in the US uh where they were using an ML model that was basically either accepting or denying uh claims. And a lot of people over a period of time were basically getting denied claims, which was actually done by the model itself, uh, and the model was being run basically uh without much testing or uh human and loop, uh, which then was a big case, of course, for the business to have to uh pay fines and uh have obviously reputational damage as well. That's why it's so important uh to avoid this. It's important to have firstly explainability. Explainability is good, however, it doesn't solve the problem. You also need to do testing, make sure that you are constantly testing these models and also having human in the loop uh for more of these important decisions that actually affect uh people's day-to-day lives.

Theo Munoz

Yeah. And how do you see the regulatory landscape changing the next couple of years with Gen AI?

Natan Szczepaniak

I think that people are very trusting to Gen AI assistants and agents uh nowadays. I think a lot of people are, you know, they see the benefit that Gen AI can provide to them, any kind of problem that they have. They're very happy to, you know, put data inside of, you know, whether it's Chat GPT, uh, Claude, or any other kind of assistant. People are becoming very trusting. We are also seeing that uh there's a lot of vulnerability and security concerns coming from that field as well, of course. And I think it's going to really kind of go into a concerning place where people are really trusting these companies with a lot of their data and they don't understand that that data is also being used in multiple different ways, such as retraining. No one really fully reads all of the terms and conditions of how their data is being used. However, if you do read into it, that data is actually being used for retraining in some case, some cases, and it can also leak. So anything that you send to basically to an AI model, specifically LLMs, will be tracked. Now, this is kind of the difference between ML and Gen AI. This is kind of interesting because LLMs are very large and uh computationally intensive models that very rarely you would run yourself. Like there are some industries, of course, where you would run your own LLM or Gen AI model. However, most of this is being sent to these companies just simply because of the amount of compute that's needed. So just because of the fact that we have to send these queries to these uh these through these APIs to these companies, people will put in sensitive information inside of them, which can then have uh repercussions.

Miguel Ribeiro

Uh having worked at regulator uh in the UK, uh, I can see that the pace that the technology is evolving, it's gonna be also very challenging on how uh they capture and start regulating um uh specifically Gen AI. For AI, there's already um regulations in place, uh, but the pace that the technology is evolving, um, I would say that it's really important for organizations to put the right monitoring and guardrails to make sure that they don't expose private information. But there's a lot of learnings from traditional AI that people are gonna bring when they start developing uh Gen AI projects. I think it's gonna be challenging for regulation to keep up with the pace that the technology is evolving, and that's why it's gonna be very important for companies to start building their projects with model governance, uh security, and making sure that they keep their data safe from the get-go.

Natan Szczepaniak

Absolutely. I I fully agree with that, and I think it is the pace at which things are moving and the speed at which people want to actually uh deploy things at right now, um, is the reason for why maybe people will cut corners and you know create these security issues. But I think I agree with the fact that the uh the new field of you know deploying um AI solutions has a lot to learn from MLOps, which has been around for a while now, in terms of, like you said, modern monitoring and explainability will be uh quite difficult. However, that you see a lot of these services, and that from personal experience, I was working with a um uh a highly regulated industry. There was a company that wanted to keep all of their data within their own infrastructure, and specifically it was still private cloud, it was still AWS, however, they wanted to keep all of their information inside of their own infrastructure, and they wanted an LLM use case. So they considered, you know, they they were choosing between do we host our own LLM to make sure that with nothing leaves our own environment, or do we just basically go with something like uh Bedrock. Um at the time, Bedrock wasn't uh HIPAA compliant for their specific use case, they had to be HIPAA compliant, and Bedrock and all of the other um APIs of the time were not supporting HIPAA, basically pre private health information. Um so we had to go and try out this LM, which ended up being extremely it worked, however, it ended up being extremely expensive. Um so then I think a few months after that, uh AWS Bedrock became uh HIPAA and uh SOX2 compliant. So I think you're seeing all of a lot a lot of these services kind of understand the pain that was before with ML and kind of utilize it and make sure that all of the services that are actually being used are you know within the right regulations um and they have the right level of visibility and inside of them.

Miguel Ribeiro

Yeah, I hope we get uh to organize another podcast just for Gen AI alone, because definitely there's a lot of traction within Gen AI, LLMs at Gen TKI, and what we've noticed is that uh it opened new doors. And when we start speaking with our customers of the use cases they think uh Gen AI is a good fit, looking deeper into the project, it's actually a straightforward uh machine learning approach that would solve that issue, and that's really why um yeah, we're more focused on this MLOP stuff back at uh traditional AI. So yeah.

Theo Munoz

Grounding us back to to the gap between machine learning in notebooks and in production, do you think Gen AI is is making that gap larger or or smaller?

Miguel Ribeiro

Opening new doors like teams or business units that wouldn't necessarily be open to AI. Now they want to explore Gen AI. So to a lot of teams, it's it's going to be quite new on how they productize AI or Gen AI models or projects. It's going to create a lot more opportunities to actually bring more ML ops, but also adapt uh into LLM ops and make sure that these companies develop these solutions safely.

Natan Szczepaniak

I yeah, I agree with that. And I think the thing that you just said of a lot of companies basically trying to use AI and kind of firstly going into um LLMs and Gen AI, where in fact the problem could actually be an ML solution, a traditional ML solution. It's the you know, the hammer and the nail problem. You when you have a hammer, everything's a nail, but it might be a screw. You can hammer it in, but uh, you'll be hammering it for a long time and it's gonna be very, very painful. ML still has a lot of use cases, and I think I think, in fact, a lot of people over time, when the Gen AI hype starts to die down a little bit, people will realize that ML models they they are very useful for different uh solutions. They are also cheaper to run, a bit more manageable, a bit more uh explainable. So we've been talking about regulated industries quite a lot. Um ML models are just easier to manage it from a regulation standpoint as well. Uh but when it comes to actually deploying AI solutions, that's slightly a different beast. And it's kind of uh like Miguel said, it's evolving into the field of LLM ops, which people, because this sounds similar, you may think that it's um, you know, very similar and it does share some similarities. However, it's quite different because you, for the most part, uh you're not actually owning the foundational models because it's actually uh calling the APIs and using the models from a lot of these uh, you know, providers like OpenAI, Anthropic, um, and uh and the others. So it becomes more as an orchestration uh rather than actually seeing what the model does and being able to explain it. Some companies are better than others for showing you know what the model is doing. There are ways of actually developing you know robust agentic pipelines, you know, with things like Langchain and Langgraph and Langsmith. It kind of resembles a lot of the things that have been happening in MLOps, where you can see things such as you know, people prompting, uh you can fine-tune different models as well, and then maybe connect it to a rag system, create a react agent. There's many different ways. And although they share similarities, they should not really be treated completely the same because they are quite different. And Miguel, I I wonder what you think about that.

Miguel Ribeiro

Yeah, I mean, there's the LLMs bring new failure modes, right? Prompt injection, um, hallucinations, latency, costs, uh, and uh LLM outputs like AI is also have a probabilistic nature, so you'll never get the same answer twice in each essence, gonna be the same, but maybe articulated differently. And uh, yeah, those new failure modes modes, that's what needs to be addressed as part of risk control and um yeah, uh these frameworks to actually develop these projects safely.

Natan Szczepaniak

Definitely. And I also think that with Gen AI and ML, uh, another point about using Gen AI to solve a lot of the problems that ML can solve. I've seen examples where people think that they can basically solve a problem, like let's say forecasting or clustering with Gen AI. They have a data set or maybe they have something that they want to do some serious forecasting. And obviously the easiest way is going to an LLM with some tooling, of course, asking, could you do this? You know, could you do a forecast for the next 30 days? In fact, what you will get back, which is quite interesting, you might actually get a forecast out. However, actually understanding that it's not the LLM that's doing the forecast itself. A lot of the times, what an LLM or a Gen AI um assistant would do is that it would take that data set in the background, it will write some code to actually, you know, write up some kind of uh forecasting model, probably in Python, visualize it nicely for you, and then return it back to you. Uh now for the average user, they may think that you know that is actually the agent doing the forecasting. However, it's actually doing it behind the scenes with specific models. And I think it is a very valuable thing to have, of course. And I think being able to connect whatever AI agent that you have to things like Hugging Face, which is a great basically source for a lot of ML models, it will be able to be able to use those models to achieve your needs. However, for production and for actual use cases for you know for reliable business operations that need to be the same throughout, so you can actually track the pro uh progress of them and uh have it in an actual process, it's more reliable to have an actual, you know, a specific ML team that makes sure that that model is consistent and good and not different every single time. Because if you have everyone doing forecasts in different ways using AI agents, those forecasts are gonna be done in very different ways, not accounting for a lot of the things that you need to account for when you actually train uh models. Uh, and then you're gonna have a very disparate uh understanding of your business as well.

MLOps Is Not Optional

Theo Munoz

Um that's very interesting, Nathan. Thanks for bringing that up. So as we wrap up, what and I'm just gonna ask you one quick question to you guys. So, what is the one thing you wish uh organizations understood about MLOps before they start their their AI journey?

Miguel Ribeiro

Yeah, MLOps is uh no longer optional, should be the foundations on um how ML AI projects should be developed.

Natan Szczepaniak

I agree. MLOps is not optional now, as it's so easy now to actually uh build these MLOps pipelines with best practice built inside of them. Uh there is basically no excuse for not using them. And I think the big one is also making sure that to build their teams in ways that actually support their needs, uh to make sure that you hire the right people uh with the right skill sets and then you also give them the correct infrastructure to be able to do the job.

Theo Munoz

Awesome. Um so we've covered a lot on the podcast today. We started with why envelopes mean, tech stacks, culture, and worked their way into Gen AI as everything seems to be doing right now. But the bottom line, I think, and I hope uh Nathan Miguel agrees with me, uh the technical side is rarely the problem. The hands-off, the processes, the bureaucracy, and the habits are what sink it, basically. Um you agree? Yeah, absolutely.

Miguel Ribeiro

What we've seen, what we've experienced, what we've witnessed is uh more often than not, is uh uh uh cultural or organizational challenge that usually organizations have that need to be addressed first.

Natan Szczepaniak

Yeah, I agree. And it's also the responsibility uh that people take of those models as well. So if anything goes wrong, you actually know uh who to go to.

Theo Munoz

All right. Well, thank you for your audience for listening. Thank you, um, Nathan Miguel, for for joining. And stay tuned for another episode of Builds What's Next.

Josh Lucas

Thank you for joining us on Build What's Next Digital Product Perspectives. If you would like to know more about how Method can partner with you and your organization, you can find more information at method.com. Also, don't forget to follow us on social and be sure to check out our monthly tech talks. You can find those on our website, and finally, make sure to subscribe to the podcast so you don't miss out on any future episodes. We'll see you next time.