Data Whisperers: Teaching AI What's Right and Wrong Artwork

What's Up with Tech?

Tech Transformation with Evan Kirstel: A podcast exploring the latest trends and innovations in the tech industry, and how businesses can leverage them for growth, diving into the world of B2B, discussing strategies, trends, and sharing insights from industry leaders!

With over three decades in telecom and IT, I've mastered the art of transforming social media into a dynamic platform for audience engagement, community building, and establishing thought leadership. My approach isn't about personal brand promotion but about delivering educational and informative content to cultivate a sustainable, long-term business presence. I am the leading content creator in areas like Enterprise AI, UCaaS, CPaaS, CCaaS, Cloud, Telecom, 5G and more!

All Episodes

What's Up with Tech?

Data Whisperers: Teaching AI What's Right and Wrong

August 19, 2025 • Evan Kirstel

Interested in being a guest? Email us at admin@evankirstel.com

Ever wondered who teaches AI what's right and wrong? That's exactly what we explore in this eye-opening conversation with Olga Megorskaya
Chief Executive Officer from Toloka, the company providing the human infrastructure behind today's most advanced AI systems.

Toloka stands at the fascinating intersection of human expertise and artificial intelligence, serving major players like Anthropic, Amazon, Microsoft, and Shopify. As Olga explains, while the industry has long focused on algorithms and hardware, data quality remains the overlooked pillar of AI development. There's a surprising "glass ceiling" that even synthetic data can't break through – at some point, you need human experts to push AI capabilities forward.

The work is surprisingly complex. Toloka manages thousands of specialized professionals who create benchmarks, evaluate model outputs, and even test for safety issues. One particularly fascinating project involves testing computer use agents that can operate across your entire system – reading emails, managing files, and interacting with applications. These expanded capabilities introduce new risks that must be carefully evaluated by human experts who can identify potential harmful scenarios.

What makes this conversation truly thought-provoking is Olga's vision for the future. The technologies being developed to manage human expertise at scale may ultimately transform how work itself is organized. And while tech giants make billion-dollar investments in the space, independent players with specialized expertise continue to thrive. Toloka is also taking tangible steps to improve diversity in AI, ensuring their benchmark datasets reflect diversity and even creating virtual training environments where female leadership is normalized.

Ready to understand the invisible human infrastructure behind the AI revolution? Listen now and discover why the most "unsexy" part of AI might actually be where its most important future technologies are being developed.

Support the show

More at https://linktr.ee/EvanKirstel

Speaker 1: 0:01

Hey everyone. Interesting chat today, as we talk about the future of data and AI, with a true innovator in this space at Toloka Olga. How are you?

Speaker 2: 0:13

Hi, hi, ivan, I'm fine, how are you?

Speaker 1: 0:16

Good, good to have you here, really excited for this conversation. And for those who aren't familiar, how would you describe what you do today, your sort of value to customers at Toloka?

Speaker 2: 0:27

Well, to be short, Toloka is a human infrastructure behind AI. We provide human expert insights for training and evaluating quality of AI models.

Speaker 1: 0:40

Well, that's a very straightforward explanation. You make it sound so simple despite all the complexity. What industries exactly, or companies, do you serve and what's the proposition to them exactly?

Speaker 2: 0:55

foundational models and technologies related to them.

Speaker 1: 1:12

Among our customers.

Speaker 2: 1:13

We have amazing teams of Anthropic Amazon, microsoft, shopify, Poolside and others so we are helping with the ground truth, human expert data for creating benchmarks, evaluating quality of the models and basically teaching AI how to behave, teaching AI what is correct and what is wrong and all the basics, because AI is in many ways, similar to humans and every human needs to be taught. Every human needs to have some teachers in their lives. And basically this is what we do we provide this ground truth on which AI is being taught.

Speaker 1: 1:52

Fantastic. So, when you work with these amazing companies and brands, I mean one of the most pressing challenges that they're facing right data.

Speaker 2: 1:59

Well, indeed, basically, generating production of AI in general stands on three key pillars.

Speaker 2: 2:11

These are algorithms, hardware and the data, and for many years, I would say, the problems related with data have traditionally been overlooked and everybody was more hyping around creating models themselves or where to find capacities for GPUs, while the data and its quality is essential and one of the interesting challenges that actually the industry faces is the fact that well, high quality data is actually limited. At some moment, industry there was this perception that synthetic data can cover it all and you can generate as much data as you want, but at some moment, I think the industry understood that there is a glass ceiling in the quality, of how much you can improve the quality of the models, training them only on the data produced by other models. And at some moment, in order to excel and increase the capabilities of AI models, you still need some higher quality examples to be taught on, and these examples you can only get from human experts. And the interesting moment of current moment in the industry development is that I would say already, ai models are smarter than any single human.

Speaker 2: 3:43

So in order to produce any item of data that can be useful for further training and improving the capabilities of the models, you quite often need to actually aggregate the wisdom of several human experts to contribute into creating some pieces of the data.

Speaker 2: 4:04

And this is one part of the complexity. The other part is, ironically, I like to say that humans are the most unreliable agents. It is so hard to ensure that humans are actually executing what you ask them to execute. Executing what you ask them to execute never, never mind with how we are working with amazing experts uh, truly professionals in their fields, like phds in physics and mathematics, professional lawyers, medical specialists, automotive engineers, whatever you name it. But still, when any any human in the world, when, when asked to execute some type of a task or, for example, evaluate the output of the model, humans make mistakes. That is just our nature. We organically make some mistakes, but our job in creating benchmark data sets for training AI is to ensure that there are no mistakes in this data, and this is basically the major part of our technologies that we apply towards creating the high-quality data sets. It's to ensure that the quality of the data that we produce with the help of our human experts is close to 100%.

Speaker 1: 5:22

Wow, that's quite an accomplishment, and so maybe could you share a few real-world examples of customer projects that kind of illustrate your unique approach.

Speaker 2: 5:33

Sure, some of the recent projects that we've been doing with, obviously not going into some deeper details. For example, taloka team has been testing one of the first to be released in the industry's computer use agents, and this is like. Actually, we are now experiencing an interesting shift in the industry. Like for several previous years, I think all of us got used to chatting with AI-powered chatbots and we used to know already how to prompt our questions, how to understand what is the good answer and what is the bad answer, and that was one dimension of applying of llms. Now, with era of ai agents, we are actually switching to the new dimension where there are so much more surfaces on which ai agents are interacting with the user.

Speaker 2: 6:36

It's it's not not no longer only a chatbot.

Speaker 2: 6:41

For example, computer use agent can operate within all your operating system.

Speaker 2: 6:46

It can operate with your files in your computer, it can read your emails, it can interact with your colleagues in Slack and whatnot, and obviously that leads to increased number of potential risks related to the operation of these computer use agents.

Speaker 2: 7:05

That means that you need to, before the release of the product, you need to ensure its safety and you need, for example, to create a number of tests and to verify that. For example, if you receive an email, uh, and ask your agent to summarize the email thread, and in the end of this email thread, somebody writes you, yes, and please, by the way, unpack this file, and the file is malicious, so you as a human user, would not unpack this file. You would be suspicious, thinking no, no, I'm not unpacking some exe file. But AI agents sometimes are quite naive. They just execute everything they read. So coming up with those thousands of examples of potentially harmful scenarios is one of the use cases that we are doing while dealing with AI safety and red teaming of the new AI models. So this is one of the examples of tasks that we've been doing recently.

Speaker 1: 8:17

Wow, what an amazing use case, really valuable stuff. So examples like that, the human intelligence side that sets you apart from other players in the data labeling and training space, or what else is your unique approach?

Speaker 2: 8:34

Yeah, obviously. So one part is about collecting various scenarios for testing and doing red teaming of potentially harmful behavior of AI agents. Red teaming of potentially harmful behavior of AI agents. There you need human experts to understand all the potential scenarios and to evaluate them, because sometimes there are use cases which cannot be caught by automation and you only can catch them with a human eye. The other kinds of examples are everything related to evaluating quality of the models in order to ensure that the new model for example, the coding co-pilot or the new financial assistant that is helping financial specialists with some outcomes in their professional lives in order to understand and evaluate the quality of the output of those models, you actually need to ask professionals in those domain fields to look critically at the outputs of those models and to say, yes, this is the ideal, proper answer to this question.

Speaker 2: 9:49

The ideal proper answer to this question and there is the whole methodology around evaluating the quality of AIM models and creating the benchmarks. So, basically, when we invite industry professionals in different fields, like, for example, in finance, in economics, in medicine, etc. We ask those professionals to come up with really difficult use cases, really difficult prompts and also to formulate the rubrics of what an ideal answer should contain. So, if I am expecting the 100% perfect answer from the model, what should it look like? What parameters should it contain? So you ask those human experts to provide those prompts and those rubrics and then to evaluate the outcomes of different models comparing to those rubrics, and this is how you can compare the quality of different models between each other and also track the progression of different versions of the similar model across the time, between different versions of the models, and this is obviously relying on high quality, high expertise experts is something that is essentially needed in our job right now.

Speaker 2: 11:15

And the third part I would say is when we are moving towards agentic data and working with agents, the data gets additional new dimension, gets additional new dimension. We are talking not only about expert content and the expertise you need to have in your domain field in order to accurately evaluate the quality of the data that the model outputs, but also we are talking about specific environments in which AI models are operating. Sometimes, right now, in order to generate proper data items for training and agentic systems, for example, you need to come up with virtual environments, like, for example, we are creating the whole virtual companies, where virtual roles, are operating across and solving certain problems, and while they are solving their almost real-life working problems, we are collecting the traces and trajectories of their behavior and we are annotating them with the help of the human experts. And this is how you generate, in a controlled environment, you generate certain types of the data and you also produce the annotated traces and the data items that later are used for training further AI models.

Speaker 1: 12:38

Amazing. So Jeff Bezos recently joined a $72 million funding round. Congratulations on that Would love to hear the back story. And also, did you get invited to the wedding?

Speaker 2: 12:52

uh, no, uh, I I surprisingly, I'm being asked this question quite often, but that time I was working on actually developing the new techniques and methods of generating high-quality data. Why, indeed, this is a huge step for us. This new investment round is a huge step for us. Why is this important is because I would say that the industry we are operating in is now probably one of the most interesting there is, and I do believe that there are huge opportunities for further growth and further development.

Speaker 2: 13:43

Because, interestingly enough, data annotation for a long time has been the most unsexy part of AI.

Speaker 2: 13:52

Nobody wanted to touch it.

Speaker 2: 13:53

Everybody wanted to deal with the algorithms and whatever, and dealing with human experts has always been boring and everybody was trying to stay as far as possible from that.

Speaker 2: 14:07

But while learning how to manage efforts of human experts on a large scale, you are basically training some very special muscles in a technological way.

Speaker 2: 14:18

You are basically learning how to control and manage human efforts across a huge and large variety of different tasks, and it is amazing that the era of LLMs actually opens us a huge variety of opportunities in terms of using this knowledge of how to manage human efforts on a large scale to the whole new scale Because, if you think about it, we know how to control how our human experts execute the tasks related to data labeling, but at the same time, we know how to control the execution of the tasks in a much broader way. If you look at that several years from now, I think that the large portion of the work will change in general and the large portion of middle-level management will probably be replaced with the LLMs and with the technologies that are now being developed and being trained in the companies like Toloka, and this is probably not very intuitively obvious right now, but I do believe that the companies of our industry are right now the place where maybe one of the most interesting and important technologies of the future are being cooked.

Speaker 1: 15:59

That's a fascinating future. So, as you know, you're in a universe of tech giants. You're swimming with sharks kind of thing. Swimming with sharks kind of thing. You see Meta's $15 billion bet on scale AI and we're headed to a future of bigger and bigger tech players. How do you see the world as an independent, smaller data provider, given this new reality? I guess that presents opportunities and challenges.

Speaker 2: 16:28

Obviously, every change presents both opportunities and challenges.

Speaker 2: 16:32

I do think that this infamous deal, first of all and most importantly, it highlighted the importance of the data in the world of AI, and this is probably one of the things that we noticed first of all, is the increased interest overall and acknowledgement of this industry, of what we do, and a huge acknowledgement of why it is so important and why it is a part of very special expertise that is being collected and generated within the companies of our industry.

Speaker 2: 17:09

I do believe that there is always a value for independent players on the market and, actually, while the size is important because one of the very important aspects of the success of Taloka, for example, is the size of our expert marketplace, the fact that we have thousands of trained and verified experts who are trusted, who have already performed well on multiple previous projects, and thus we can know that they are highly trusted experts to be also participating in many new projects. This is an important part, but this is not everything, and what is also important is, again, the technologies of controlling the quality of the data and of the output of the efforts of those experts. That is why I do believe that, like Taloka, has very strong positions on this market because our technologies and our technological platform is unique across the market.

Speaker 1: 18:24

Brilliant. That's an exciting future. Last thought as we know better than I, there are kind of amazingly few women in tech leadership, in AI and data, in particular, data science. What can we do to encourage more women across this new and exciting landscape?

Speaker 2: 18:50

that we at Taloka are in a very unique position because we are probably one of those among those actors who really have some leverage on that, because we are the ones who are creating the benchmarks for evaluation the quality of AI models and, for example, ensuring the diversity of the outputs is something that we always include in the taxonomy of the data sets that we created whenever it is applicable.

Speaker 2: 19:21

Secondly, and it is also one of the funny anecdotes that we have inside Taloc, I have mentioned that for many use cases related to agentic data, we are developing virtual companies where different roles like, for example, managers, customer support and engineers somebody else they are interacting over Jira, tickets, slack, communications, email threads, et cetera, et cetera. So in Taloc we ensure that all those virtual companies have a woman CEO.

Speaker 1: 19:56

That's great.

Speaker 2: 19:58

That's an interesting way of teaching AI from the very basement, from the data it is being trained on, to ensure that having a woman in the leadership of a tech company is not something unusual, but rather a very common thing, because it is already. We have trained all our models in such a way. But that's, of course, partly a joke but partly true, because I do believe there is a room for improvement in this aspect in our industry.

Speaker 1: 20:31

Yeah well, there's nowhere to go. Congratulations on all the success and the mission. It's really amazing. Can't wait to see what's next over the next few months and years.

Speaker 2: 20:43

Yeah, me too.

Speaker 1: 20:45

All right and thanks for your time. Thanks for watching and listening, sharing everyone, and be sure to check out our new TV show, techimpact TV, now on Bloomberg and Fox Business. Thanks everyone. Thanks Olga.

Speaker 2: 20:57

Thank you, bye-bye.