Real World Serverless with theburningmonk

#28: Serverless Machine Learning with Carl Osipov

September 09, 2020 Yan Cui Season 1 Episode 28
Real World Serverless with theburningmonk
#28: Serverless Machine Learning with Carl Osipov
Chapters
Real World Serverless with theburningmonk
#28: Serverless Machine Learning with Carl Osipov
Sep 09, 2020 Season 1 Episode 28
Yan Cui

You can find Carl on LinkedIn here and he blog at cloudswithcarl.com.

You can find his upcoming book "Cloud Native Machine Learning" by Manning here.

Manning has kindly offered 40% off all their products to the listeners of this podcast. Use the promo code podrealserv20 during check out.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

This episode is sponsored by ChaosSearch.

Have you heard about ChaosSearch? It’s the fully managed log analytics platform that uses your Amazon S3 storage as the data store! Companies like Armor, HubSpot, Alert Logic and many more are already using ChaosSearch as a critical part of their infrastructure and processing terabytes of log data every day.  Because ChaosSearch uses your Amazon S3 storage, there’s no moving data around, no data retention limits and you can save up to 80% vs other methods of log analysis.  So if you’re sick and tired of your ELK Stack falling over, or having your data retention squeezed by increasing costs, then visit ChaosSearch.io today and join the log analysis revolution!

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Show Notes Transcript

You can find Carl on LinkedIn here and he blog at cloudswithcarl.com.

You can find his upcoming book "Cloud Native Machine Learning" by Manning here.

Manning has kindly offered 40% off all their products to the listeners of this podcast. Use the promo code podrealserv20 during check out.

For more stories about real-world use of serverless technologies, please follow us on Twitter as @RealWorldSls and subscribe to this podcast.

This episode is sponsored by ChaosSearch.

Have you heard about ChaosSearch? It’s the fully managed log analytics platform that uses your Amazon S3 storage as the data store! Companies like Armor, HubSpot, Alert Logic and many more are already using ChaosSearch as a critical part of their infrastructure and processing terabytes of log data every day.  Because ChaosSearch uses your Amazon S3 storage, there’s no moving data around, no data retention limits and you can save up to 80% vs other methods of log analysis.  So if you’re sick and tired of your ELK Stack falling over, or having your data retention squeezed by increasing costs, then visit ChaosSearch.io today and join the log analysis revolution!

Opening theme song:
Cheery Monday by Kevin MacLeod
Link: https://incompetech.filmmusic.io/song/3495-cheery-monday
License: http://creativecommons.org/licenses/by/4.0

Yan Cui: 00:10  

Hi, welcome back to another episode of Real World Serverless, a podcast where I speak with real world practitioners and get their stories from the trenches. Today I'm joined by Carlos Osipov from CounterFactual. Hi, welcome to the show. 

Carl Osipov: 00:27  

Hi Yan, thank you for hosting me. 

Yan Cui: 00:29  

So, before we start talking about getting into serverless and machine learning how the two complement each other. Can you maybe just tell the audience, a bit about yourself, your experiences, leading up to this point, and about the CounterFactualas well. 

Carl Osipov: 00:43 

Absolutely. Yeah, so I started with the information technology industry, back in 2001. So, way over 15 years now. And in terms of the industry I have been working as a software engineer and in particular I have focused, much of my engineering career on distributed systems. And in particular, these were distributed systems that processed massive amounts of data for the time using technologies like Hadoop, originally. And more recently, I have been focusing more on building systems and particular machine learning systems that use cloud based technologies and use serverless technologies. So that's more of my industry expertise. In parallel was that I managed to get both my undergraduate and graduate degrees in computer science. Both of them have focused on different aspects of machine learning. I did my undergraduate with concentration in artificial intelligence. And then more recently I went back to school and picked up a master's degree, focusing on machine learning, in particular. So another way that I like to think of myself is as an applied computer scientist who escaped academia. And over my career I spent most of the industry work at IBM, IBM software and more recently, helping IBM focus on technologies like Docker and OpenWhisk for serverless. And after that I spent about two years at Google. And at Google, I really learned a lot about applying machine learning, building machine learning systems. And more recently I co-founded counterfactual.ai with a goal to help companies deploy machine learning solutions and do it in a way that minimises the operational costs of machine learning systems by leveraging serverless. So right now I'm a CTO at counterfactual.ai.


Yan Cui: 02:48  

So can you maybe tell us a bit about CounterFactual and what you guys actually do in the day to day basis, because I know you said that you guys focus on the training but also working on projects with clients to deliver. Apply machine learning to achieve some business goals at the customers, right?


Carl Osipov: 03:07  

At CounterFactual, we're a small consultancy. So we can only focus on a few things. So really we only do three things we do consulting, training and customer driven research. Now, you're specifically asking about the consulting side of the company. And from the consulting standpoint, what we really do, if you try to put it into a day-in-the-life scenario, we come into an environment where you have teams built from very different categories of engineers. So I'm seeing customers try to build out teams of data scientists and try to get those teams working together with mobile app developers or with web developers. And when I come into a conversation with a company like that, much of what I do is about translating the conversations of data scientists and what data scientists are trying to do, into the language that somebody who is developing a mobile application can understand and appreciate and help the company get these disparate teams to roll in the same direction, roll towards the same set of business objectives, as you suggested. And usually what this means is, how can a company deploy a data science model or machine learning model into production in a way that increases customer satisfaction, increases Net Promoter Score objectives for a particular company's web application for example or a mobile application. 


Yan Cui: 04:43  

Yes. So, with machine learning what I find is that there's a lot of buzzwords and you know things like a lot of hype out there, a lot of things getting branded with machine learning with AI, but a lot of time they're just really simple, no code that's running some lookup. Can you maybe give us some concrete examples of projects that you've delivered with customers that have some tangible increase in some business KPI. 


Carl Osipov: 05:11 

Absolutely. So, let me describe a project that actually inspired a book that I'm working on. So I have launched a book from many publishers. The title of the book is Serverless Machine Learning. And as part of that book the reader actually goes for a project based on a project that my company has done on improving the ETA, the estimated time of arrival predictions for a food delivery company. So specifically my company has worked with this organisation that had an existing mobile app. And as part of that mobile app, they were helping customers understand when their food delivery would arrive. And unexpectedly, this use case became more important now in this post-COVID type economic environment. And specifically, what the company was trying to achieve was to improve its metrics around recommendations in the App Store, and also general customer satisfaction. So by helping this company deploy a serverless machine learning solution we surely helped them improve their ETA predictions that translated over time into higher customer satisfaction with the mobile application, as evidenced by the App Store rate rankings, and also helped them improve the sentiment score of the comments in the mobile app app store as well. And this particular machine learning application was serverless. So, if you want to talk about buzzwords definitely It may sound like a buzzword overdrive. But I think it's also valuable to understand what we mean when we marry words serverless and machine learning together. And, you know, this is something I'd be happy to discuss in more detail as well. 


Yan Cui: 07:10

That particular use case that does sound very very useful especially nowadays. I mean personally I've ordered so many takeaways in the last couple months. And quite a few times when the delivery times are off. It was quite annoying that you get your stuff ready for dinner. And then food isn't coming for another half an hour because of miscalculated ETA for the delivery time and things like that. So, having better ETA for the delivery of food does massively help those customer experiences in this particular area. And for anyone who's interested in your book, Manning has also given us a 40% discount code, which you can find on the show notes. And there are also five free copies available as well. So more details on how to get those copies is available in the show notes. So please contact me afterwards. And you also talk about this using serverless and machine learning, being this mashup of buzzwords. Tell us a bit about why serverless. Why is that a good fit with machine learning? 


Carl Osipov: 08:12 

Absolutely. So, there's a well known research paper that was published by a group of authors at Google. The lead author on that research paper is D. Sculley (http://www.eecs.tufts.edu/~dsculley/). And the paper is titled “Technical Debt in Machine Learning Systems” (https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf). And one of the findings of that research paper is that machine learning systems that actually are put into production end up being only about 5% machine learning code. And the rest is infrastructure. And guess what, if we're talking about a system that's built from 95% infrastructure. The next question is, what are the operational costs for that infrastructure. In the book, I talk about this fact, and I explained that serverless provides a very compelling alternative for organisations, teams that are putting machine learning systems in production. Instead of trying to use technologies that require operational overhead to manage that 95% of infrastructure, that 95% that I call the machine learning platform code, companies can adopt the serverless approach. And the serverless approach allows companies to reduce the operational overhead of machine learning to close to zero. At the same time, it allows the companies to enable their data scientists, their machine learning practitioners to really focus on that 5% that differentiates. For example, In the case of the project that I described the 5% code was the code that helped improve the ETA and the delivery times. So, this code is what really makes your machine learning system different. This is what really makes your machine learning system stand out in the marketplace. And serverless comes in because it allows a practitioner to focus on creating that code and lead the operations of the rest of the machine learning platform to a public cloud or to highly automated infrastructure.


Yan Cui: 10:24

Yeah, from my personal experience of working with some data science teams as well that sometimes these data scientists, or data science engineers are also not the best infrastructure engineers. So oftentimes they have spent a lot of time collaborating and figuring out what from the Ops point of view, what does this data science team actually need and vice versa and how does the data science team communicate with the infrastructure and Ops team in terms of what their requirements are, as well. Being able to offload all of this work to your cloud provider is massively massively helpful and helps accelerate a lot of this process. So for this ETA improvement project, can we also just dive into and talk about what the system actually looks like from the architecture side of things. I mean, what does your architecture look like for this particular project?


Carl Osipov: 11:12 

For the ETA prediction, we used a variety of service capabilities from AWS. Now, let me take a step back in be more precise about how do I define serverless and a little bit later I'll talk about more precisely what do I mean by machine learning and machine learning code. If you think about serverless today, much of serverless is about functions as a service. And this definition of serverless made a lot of sense. For example, back in the 2016 timeframe when I was working on technologies like OpenWhisk. But even then, it was clear that that point of view on serverless is limiting. And the limitations of functions as a service point of view on serverless is obvious when you start thinking about situations when you begin to run out of capacity. So what happens if you write programmes that require more memory than available in an instance, right in the runtime environment that's actually executing your function, or what happens when you run out of disk space, or even what happens when you try to use a programming language that's different from the one that's supported by a cloud provider. And I think one of the technologies that became more popular for serverless over time is technology like Docker that actually allows you to package your code together with an existing middleware environment, and simply focus on invoking Docker containers in cloud provider infrastructure. So, obviously here when I talk about serverless I talk about serverless in more recent definition of the word. So here when I describe serverless features from AWS, I don't just mean AWS Lambdas. I also talk about serverless capabilities such as S3, object storage or serverless capability is like the Glue service for building and processing data pipelines. So I think if you focus on serverless this way. It's more about helping the developer, forget about servers that exists in the cloud provider environments and focus more on writing the code, whether that code is designed to run in a function as a serverless environment like AWS Lambda, or if that code is designed to run in a serverless environment like AWS Glue for data pipelines, or even if that code is processing data stored in a serverless object storage, like S3. So given this description of service, what we've done for the ETA project was to build out an infrastructure that covered both the training and inference, in a machine learning pipeline. So, if the listeners don't have the background in machine learning, the analogy is that compiling the code is the same to machine learning model training as running the code runtime as doing inference, in a machine learning system. So of course, this is an analogy it only holds to a limited extent. But what we've done for the ETA estimation was to make sure that we can support both training and inference in a machine learning system using serverless technologies. So to start answering your question about the infrastructure, the infrastructure started by storing the data in S3 object storage. So the data itself for the project was in the format, known as CSV “comma-separated values”. And these CSV objects were processed using AWS Glue. So AWS Glue allows you to implement a data processing pipeline, using a combination of Python and SQL languages. And by combining these two languages together it is possible to do common machine learning training tasks, for example, take data, prepare the data for processing for example eliminate missing data values, clean up the dataset, and then save that data into a dedicated object storage bucket. So that data can be picked up by machine learning models for training, and then processed to actually train a machine learning model that can be used in production in deployment. So that's the training part of the project was done using AWS SageMaker. And finally, once the model is trained, it was also put into production in that case we use a different technology from AWS, known as Fargate. So, the listeners of this podcast probably are familiar with Elastic Container Service ECS on AWS. So Fargate is a set of capabilities for ECS that provide serverless experience for working with the ECS service. So the machine learning model was deployed inside of a Docker container managed by AWS ECS. And if you're wondering, did we use any Lambda at all? Yes, we did use AWS Lambda. And I believe there's a very good use case for functions as service and AWS Lambda, in particular, If you're working on a data science project. For many machine learning projects, it's important to establish what are known as weak and strong baselines for the listeners who have the software engineering background. You can think of these baselines almost like mock implementations of services or mock implementations for functions. They're designed, not necessarily to give production quality results. They're designed more for confirming that a system can work in production. So for example, in machine learning, it's very common to do a weak baseline for machine learning model using the approach that you described, Yan, a little bit earlier, where you have a very simple implementation for example a machine learning, a very simple implementation for ETA prediction is a model known as linear regression. So that would be something that I'm describing as a weak baseline for a machine learning model. And with AWS, it's very easy to take this weak baseline implemented as a linear regression model, package it inside of an AWS Lambda, put it in production, and I have something running in a matter of minutes. And of course this is very valuable for a machine learning project, because if we use this time value. This gives us an opportunity to the team of machine learning practitioners to put a very basic, machine learning API into production sooner rather than later, and immediately start collecting data on how well this baseline is working. And the idea of a machine learning project is to iterate over time. So, improve that over this weak baseline and measure the improvement. So for example, in case of our project we're able to take this weak baseline created using simple linear regression and then over time build a collection of models that improved on that baseline by over 70%. 


Yan Cui: 18:43

Thank you. That's great. And I think when you said that the Docker is serverless, I feel the audience might have just rolled their eyes. And I think that's definitely the distinction between what is serverless and is much bigger than just functions of service and for a long time now. I know quite a few people in the service community have been talking about service being a spectrum from hosted functions or service solutions like Lambda all the way to other services that you just use like S3 like DynamoDB. And certainly, I think when you talk about Docker being serverless, I think one of the things that you hear a lot nowadays is things like Kubernetes is serverless, and even though you have all this extra abstraction layer that you have to deal with and you still have to manage your own EC2 Cluster under the hood then it really doesn't give you a lot of the benefits that you're supposed to get from serverless in terms of not having to worry about actual underlying infrastructure and servers and being responsible for them, but I'm big fan of Fargate and do think Fargate gives me a nice, nice way out when I run into limitations that I have with Lambda in terms of for, I guess, for machine learning, there's a space limit of 512 meg is quite limiting. And I do see a lot of companies running their machine learning stuff on Fargate instead, or using SageMaker. So, with the current state of the technologies available to us, what would you say are some of the biggest challenges that you come to serverless and machine learning.


Carl Osipov: 20:15

Great question. So, I think you've pointed out at the exact nature of the debate, so I think as a community that uses serverless technology, we need to become better at defining what exactly is serverless. I think defining serverless purely as Functions as a Service definitely limits the potential of this technology. I think we need to be very clear about the distinction between serverless and Platform as a Service. Because Platform as a Service has a very strong technical foundation. In fact Platform as a Service is one of the formalised services models that have been defined by the NIST National Institute of Standards definition of cloud computing. So I think, for serverless to mature, it is important to come up with a more precise and rigorous formalisation definition of what is serverless. And I agree with you. I think the crux of that definition has to come down to the operational side of serverless. I think what's great about serverless, and what I think about this technology in terms of the challenges that it opens up in the future, is that if we think about serverless as a way of improving the productivity of developers by automating as much of operations as possible. We're going to be able to as a community if we use serverless we're going to be able to broaden the impact of serverless technologies on the world. It's important to recognise that serverless does not exist in a vacuum in the information technology industry. While some developers are adopting serverless, there are many startups that are using a NoOps or no-code approach to building applications and building systems. So, one of the challenges that we should be recognising is that serverless has to compete against these no-code, NoOps type of implementations. On the other side of the spectrum, another challenge is understanding technologies based on Kubernetes and technologies, you know, that you describe that event, based on Docker. So, I think that we are able to define this value proposition of serverless for the developers, we're going to be able to grow the serverless community. We're going to be able to grow interest in serverless, and also help serverless adopt technologies like machine learning. So that developers can build better, more popular, more consumable systems for our users. And ultimately I think that's where successful serverless is going to lie.


Yan Cui: 23:09  

Yeah, personally, actually, I guess I'd be fighting that battle of definition for what does it mean when you say serverless for quite a while now. And personally actually I guess I'm feeling less and less. I actually feel that the precise definition is not that important. But what's important is to define is understanding the value proposition as you said that we want from service technologies in that we should people just know build application then focusing on the area that on the code that actually differentiates our business as opposed to the underlying infrastructure because that's not what is important to our customers. And for that I guess if you think about no-code solutions. I do think those should also just be considered serverless because, again, I don't have to worry about managing servers and provisioning servers and scaling them out of that is not my responsibility I just focused on the things that are most important to my customers, which you could be just they want to have a signing page, it doesn't matter whether or not, I have to write some code to support that. It's just, it's about the business value that we get from the technology rather than specific names or labels that you can put on those technologies that we use. So in this case, for serverless and machine learning what do you think would be the next step, next logical step, what's the next big thing to come out of this? 


Carl Osipov: 24:43

I think what's happening in the industry is that the companies are recognising that there's a trap, known as ML Ops. So, when I say ML Ops, that is standing for machine learning operations, and the obvious analogy is to DevOps and with ML Ops the problem for companies, is that the machine learning practitioners that teams and organisations have worked hard to hire, and make productive. These machine learning organisations, once they have deployed a machine learning system into production, are falling into ML Ops trap, where they're spending most of their time tending to the operations tasks, instead of actually moving on to new machine learning projects. So this is about these highly compensated machine learning practitioners who are working on fairly routine operations problems instead of working on what they do best, which is building new machine learning models. And what I think the companies are recognising is that going forward. To avoid the trap of ML Ops. It's important to shift machine learning practitioners, and shift the development of machine learning systems into a mode where serverless is more broadly adopted. And here I think the premise of serverless that you've just outlined the ability to focus on the business value is really what is going to help the next generation of the machine learning systems, more successful in the marketplace.


Yan Cui: 27:18  

For the next wave of machine learning practitioners, what would be your advice to those engineers listening today that once again to machine learning what sort of skills should they be developing? What sort of things should they be learning? 


Carl Osipov: 27:31

I find that many software developers today have already had some experience with machine learning at a very high level of understanding the basic processes involved in training machine learning models from data and doing inference with trained machine learning models. I think the next step for those who have already understood those basics of machine learning is to actually try it on your own. The first step is to pick up what I described as single node, machine learning frameworks, frameworks like scikit-learn that make it very easy to train baseline, machine learning models. The one example that I gave earlier is a machine learning model based on linear regression or logistic regression. I surely try training one of these models using one of many available data sets from sites like Kaggle and other open data set websites. And next, try to think about what it takes to put those machine learning models into production. And I think the best way to put a machine learning model into production is to use a serverless approach.


Yan Cui: 28:42  

And another thing that I've noticed in the recent years, that is from AWS is that they are publishing, more and more, I guess what you might call packaged machine learning services, like for recommendations for products or image recognition with Amazon AWS recognition. Would you say that maybe in the future that will be more of this managed services that provide machine learning to common business problems that there may be less and less need for you to build custom built, machine learning models.


Carl Osipov: 29:16  

That's a great question. I think what's happening in the marketplace today is that many services that are enabled by machine learning technology, and some of the examples that you mentioned are services that provide customer recommendations. Certainly, there are other services that provide image, understanding for example recognising objects and images and services that extract information from textual descriptions for example parsing documents and extracting out fields like addresses. These services are becoming commoditized, they're available from many cloud vendors AWS, Google Cloud Azure. They're also available from some smaller companies with additional capabilities. And these commoditized services, of course, are going to be growing in popularity in the future. However, it's important to keep in mind that these are commodities, which means that if your company, if your organisation is using those services, your competitors are using those services as well. And what makes companies successful with machine learning is the ability to create custom, or differentiated machine learning systems. And to do that, you actually do need to use more complex services you need to be able to train your machine learning models. Oftentimes you need to be able to scale the process of that machine learning model training to a cluster in a cloud. So, it is more than just using a service as an API. So ultimately, if you want to build an organisation or if you want to build a product that differentiates itself in machine learning, you're going to have to bite the bullet and go through this more difficult route of creating your own machine learning model and putting your custom machine learning model into production.


Yan Cui: 31:12  

Right got you. So you can use machine learning as a differentiator. Make a part of your core proposition in terms of how your product is better than your competitors because they're using some generic, machine learning, product off the shelf, which does the job I guess even with things like Amazon recognition, it kind of works, I guess maybe right 70% of the time but if you can have a more specialised machine learning model that's more tailored to your user base then potentially you can do a much better job than what you will be able to do with some of these managed services. 



Carl Osipov: 31:46  

Absolutely. Okay, and from the standpoint of being able to use some of these API's, it's possible to build machine learning systems that combine both existing APIs from services like AWS, and also combine proprietary machine learning models, you know, should we achieve better results by combining the outputs of both AWS APIs and your own machine learning models together. 


Yan Cui: 32:15  

Okay, how would that look like in that case. So suppose if I want to do some face recognition. How will I be able to combine the results from both AWS recognition and my own proprietary machine learning model?


Carl Osipov: 32:30  

Face recognition is an interesting topic. More recently, it became controversial as AWS has restricted the use of its face recognition technology. So it's a great question, and I can tell you that technologies, let's say generically, that do object recognition are very much driven by the context of the business application. So if you think about processing image data. One of the key considerations for that kind of data is the latency of processing. So for example, if you are deploying machine learning models that are recognising images and recognising objects like faces or maybe recognising some other types of objects, let's say, animals, they're recognising vehicles. It's important to understand whether that application is deployed in an embedded environment, or if it's deployed in an environment where it's actually running let's say in a cloud in where the latency isn't as much of a consideration. A lot of production deployments for image, understanding and doing object recognition actually require very low latency, think, on the order of single milliseconds to low 10s of millisecond latency of processing and recognition. If you're trying to do that in a very high throughput environment, cloud, oftentimes, is out of the question. But assuming if you do have an application where it is okay to do, let's say image, understanding and object recognition in image data in the cloud. In that case, what's possible, is that by combining services from AWS, and some leukocyte databases or machine learning models from your own system, you can do very interesting things. For example, using AWS services, you can find high level categories. So for example, you can find out candidate face information or maybe you can find out candidate vehicles or candidate objects in the image such as objects that represent animals or they might represent tools, etc. and additional machine learning models that you are bringing in would process these candidate objects, and then maybe as some of your own additional insights about these objects so for example when recognising vehicles in images, your custom machine learning model can recognise the details of that image and provide some additional information for example, whether that vehicle was damaged or not. So there's a well known example from the insurance industry for machine learning, where a machine learning model was developed to help recognise whether a particular vehicle has sustained damages, in other words, was in a car accident that is going to lead to a complete scrap of the vehicle, meaning that the owner of the vehicle would have to buy a new vehicle to replace it. Or if the damage is such that it can be fixed in a customer, car service shop. So combining these types of models together really creates value for the end customers, because it allows a company to use commodity services from public clouds, right, to do things like image recognition, and then do secondary processing of those images to achieve higher results and predicting whether a particular image actually means, for example, whether a vehicle needs to be scrapped, or if that image indicates that the vehicle can be fixed. 


Yan Cui: 38:26  

So you mentioned the controversies around facial recognition, and certainly a lot has been talked about that recently. And also much has been said about the potential bias that's been built into machine learning models, because of many of our, I guess, our human biases have been leaked into the machine learning models. As a machine learning practitioner, can you say something that you can maybe you can talk about in terms of how are we, as an industry, what are we doing to tackle some of these biases, going into our machine learning models.


Carl Osipov: 37:02  

Great question. So, machine learning models are driven by data. And unfortunately, The reality is that if the data reflects real world biases, those biases, can make their way into the machine learning model, and consequently actually impact the real world, once the machine learning model goes into production. They're very well known examples of biases in the industry. Some of them have to do for instance with biases with mortgage approvals. So for example, some zip codes have received unfair treatments for processing by machine learning models that decide on mortgage applications. Simply because the data itself has contained biases against particular zip codes. I think in the industry. The issue of identifying biases and working with biases is fairly well understood for structured datasets. So here when I talk about the datasets that are organised into rows and columns. These issues of identifying biases happen at the stages of working on data preparation and data quality. So, the bias itself does not actually propagate into the machine learning model. However, with unstructured data sets, think, image data audio data, the issue of biases is less well understood. And as an industry as practitioners, we still have to do more work to improve how we deal with biases in these kinds of datasets. So I think this is still an active area of research, and as practitioners, we need to be aware of these biases and advise our customers of the possibility of those biases and do our best to eliminate them, but at the same time we need to be aware of the fact that we don't have a solution to this problem.


Yan Cui: 39:02  

Okay, thank you for that. And I think that brings us to the end of all the questions that I've had. Is anything else that you would like to tell the listeners before we go, maybe tell us a bit more about your upcoming book with Manning and maybe CounterFactual is hiring as well. 


Carl Osipov: 39:19  

I'd love to. The book is certainly available in the subscription format. There is a new chapter that's coming out every few weeks. And the book is expected to go in print at the end of this year beginning of next year 2021 at the time of this podcast. CounterFactual.ai is not hiring at this point. We're actually focusing on helping our existing customers. But if you're interested in learning more about how to grow as a software engineer and transition into a machine learning engineer role, or if you're an existing data scientist and you want to become a more impactful member of your team or your organization, definitely check out serverless machine learning from Manning. The book will help you become a more impactful and more productive contributor to your team. 


Yan Cui: 40:21  

Okay, great. And as I mentioned earlier, there are some voucher codes available in the show notes for you to get a 40% discount off the book which hopefully should be available by Manning's early access programme by now. And also you can get in touch with me, I've got five free copies to give away as well. And finally, how can people find you on the internet?


Carl Osipov: 40:45  

The best way to find me is just type in my name Carl Osipov into Google search. Most of the information about me is going to come up. LinkedIn is a great resource if you're interested in the details of my background. And you can also read more about my work with both machine learning and clouds on cloudswithcarl.com, which is my blog.


Yan Cui: 41:11  

Great and I'll put those in the show notes as well. Once again, thank you very much for joining us today and sharing your experiences with machine learning and with serverless. 


Carl Osipov: 41:20  

All right, thank you so much, Yan. 


Yan Cui: 41:22  

Take care. Stay safe, bye bye.


Yan Cui: 41:36

So that's it for another episode of Real World Serverless. To see the show notes, please go to realworldserverless.com. If you want to learn how to build production-ready serverless applications, please check out my upcoming courses at productionreadyserverless.com. And I'll see you guys next time.