Infinite Curiosity Pod with Prateek Joshi

The best place to find out how AI builders build. The host Prateek Joshi interviews world-class AI founders and VCs on this podcast. You can visit prateekj.com to learn more about the host.

All Episodes

Infinite Curiosity Pod with Prateek Joshi

Large Action Models

July 31, 2024 • Prateek Joshi

Will Lu is the cofounder and CTO of Orby AI, an AI platform to automate people's repetitive tasks. He was previously the Head of Engineering at Google and a Systems Software Engineer at Nvidia.

Will's favorite book: Beyond Enterpreneurship (Authors: Jim Collins and William Lazier)

(00:01) Introduction
(00:07) History of RPA
(01:04) Building Blocks of RPA
(02:34) Drawbacks of Traditional RPA
(05:06) Introduction to AI-Native RPA
(06:38) Advantages of AI-Native RPA
(08:14) Defining Generative Process Automation (GPA)
(10:15) Explanation of Large Action Models
(11:47) Role of AI Agents in Process Automation
(13:11) Data for Building Large Action Models
(14:44) Benchmarking Large Action Models
(15:53) Risk Mitigation in AI-Native RPA
(17:44) Changing Roles in the RPA Industry
(19:14) Adoption of Agent Technologies
(21:03) ROI Measurement in AI-Native RPA
(23:05) Explainability in AI Systems
(24:25) Fast Adoption Teams in Enterprises
(25:15) Handling Unstructured Data
(26:12) Digital Organizations and Future Automation
(27:09) Exciting AI Breakthroughs
(28:03) Rapid Fire Round

--------
Where to find Prateek Joshi:

Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19
Twitter: https://twitter.com/prateekvjoshi

Prateek Joshi (00:01.464)
Well, thank you so much for joining me today.

Will Lu (00:04.79)
Of course, happy to be

Prateek Joshi (00:07.292)
Let's start with a brief history of RPA, which is Robotic Process Automation. Obviously, there have been a couple of big winners like UiPath, Automation Anywhere, famous companies, big revenues. So can you talk about how RPA has evolved in the last 10 years?

Will Lu (00:28.92)
Yes, of course. So when this whole idea came from a scripting language, basically you want to describe what you want to do on computer, such as clicking a button, go to a website, type in ABCD and close the site. So that kind of tools have been using, you can somewhat use it to create a website. Some people can use it to do UI testing. So those technologies have been evolved and then eventually become this platform where you

use the software to define a workflow to automate.

Prateek Joshi (01:03.876)
If you look at the software to do this, can you explain the building blocks of RPA? So if you just explain it to a layman, what parts are involved in automating a normal process?

Will Lu (01:20.216)
Yes. So typically RPA solution come in three big components. The first one is the bots. Basically the bots will do what you want it to do. Typically it will be based on a certain predefined workflow. And then to perform simple tasks as opening a application, click a button, et cetera. Sometimes these days the bots can do some AI based.

tasks such as NLP or OCR. And the second major component is Studio. Basically, it's a bot design you can think of. You can use it to define, use it, list of predefined activities to define a workflow, and that workflow will try to finish a task. And typical activities involve like opening and closing an app, clicking an element, setting elements, properties, et cetera.

And then the third one is the orchestrator that you can think of as managers for those bots So we can manage all the bots being run and then look at the dashboard, how the bots being run and what are the status

Prateek Joshi (02:33.19)
Now, obviously this software can do a whole bunch of tasks. It does them nicely. But if you look at the drawbacks of a traditional RPA approach, can you talk about where they're lacking? What are the drawbacks of a traditional RPA software product?

Will Lu (02:52.93)
Yes. So basically, as you can see, the previous generation of software is built based on a set of rules. So the rules says clicking certain coordinates of screen. So it does significantly miss the capability of the RPAs. So if you want a summary, think, summarize, think there are three areas, I think, the traditional RPA lag behind. First of all is handling complex tasks.

So RPA excels in handling straightforward rule -based tasks. However, when it comes to more complex operations, like a structured rule set, then the RPA wouldn't be able to accomplish the work. And also, the second thing is when they try to automate human expertise, such as analytical decision -making, will still vary and solve the problem across the board.

And the second biggest limitation is called legs of flexibility. Basically the idea is that today's RPA bots still define it with this rigid matching technology, such as a coordinate of pixels. So when it comes to more dynamic environment, such as your UI is changing or the content is changing, those activities or rule -based matching technology will fall apart.

And the last one is the scalability hurdles. As you can see, it still requires some level of learning curve for people to be able to learn the tool and then sometimes even requires advanced programming technologies. So business users wouldn't be able to come to the proper work, build the automation tool immediately. And then also when, because of the richness of the software, the overtime maintenance costs also cut

So whenever there is context changing or the software changing or the rule slightly changes, you have to redefine, update the workflow so that it can meet the requirements. So in summary, think today's RPA because it's built with a pre -defined set of rules, but they are very limited when ones need to deal with the dynamic environment or handle a very complex task.

Prateek Joshi (05:06.286)
Now, coming to the world of LLMs, so obviously there are drawbacks to the traditional RPA approaches and with LLMs, we can address some or maybe all of them. So can you talk about how AI native RPA works in practice?

Will Lu (05:10.628)
Thanks.

Will Lu (05:27.876)
So fundamentally, AI native process automation builds the entire automation experience from growing up. So with the AI at the core of all the functionalities, design and implementation of whole product. So AI capabilities are not added as an afterthought. So basically, semantic understanding, adaptation, continuous learning are embedded in every single piece of the entire workflow that we described.

So basically you can think of the entire parts that are being shifted from building Lego models to with the predefined Lego bricks, following instruction book to defining what to do and how to do a task to an AI agent, which has some level of common sense and learning capabilities. So does AI native process automation squarely addresses all the drawbacks

talk about in traditional RPAs, it substantially increases the capabilities for handling complex tasks for today's automation systems.

Prateek Joshi (06:37.976)
All right. And just to summarize all the things you just said, can you just, if you had to pick the top three differences or advantages of AI Native RPA versus traditional RPA, what would that be? Or rather, what do you tell a potential customer who's using traditional RPA, but you want them to move to AI Native RPA?

Will Lu (07:04.578)
Yeah, that's a great question. So fundamentally, I think today's AI native solution goes on two main technologies. One is understanding of common sense. Second thing is the learning capabilities. So that whole two technology advantage can be manifested into three business values. One is the complex task handling. So we're talking about handling tasks that require human decisions.

requires analytical skills, etc. The second thing is adapt to user's experience. Basically, the system is automatically observing what users do and adapt to what they have done, rather than having a human being have to sit there and tell machine what to do with these very rigid rules. And third thing is with the human data and observer and learn, the system should be able to proactively come up with automation suggestions so

It can anticipate users' needs, behaviors, and provide proactive solutions so that users don't have to actively go there and think, OK, these are the things that I should automate or not

Prateek Joshi (08:14.222)
Now, you're in your view, the next generation of RPA, you call it Generative Process Automation. And that's amazing. So for listeners who may not know, can you quickly explain and define Generative Process

Will Lu (08:33.326)
Yes. So Orbi is redefining the use of AI within the enterprise through what we call Generative Process Automation, which is GRPA, sorry, GPA. But GPA is the next technical investment redefining process automation. So the unique approach we're taking combining the high level cognitive capability of AI with the benefits of automation basically enable a system to perform.

tasks that involve complex planning, reasoning, and adaptation. So under the hood, GPA leverages a multimodal large action foundation model to make AI more versatile. So basically, this approach delivers a pre -trained base model. And then we customize, adapt that base model for various purposes based on user's interaction. And then for various revamp tasks,

it fundamentally accelerates the development process and then reduces the cost for the whole experience. So under the hood, GPA consists of three components. One is a lightweight AI agent that runs on a user's computer, which can observe what users do and then ask your workflows in a dynamic environment and then learn from those executed tasks.

The second part is a purpose -built large action model that works in conjunction with the AI agents to provide all the AI capabilities. And the last piece is orchestration platform to manage all these AI agents and workflows.

Prateek Joshi (10:14.556)
Now you mentioned a couple of different things here. I want to start with large action model. So quickly again for listeners who may not know, can you explain what a large action model

Will Lu (10:28.194)
Yes. So as all of you may know, we have large language models, then we have multi -modal models. So large action model is a model built to understand languages really well. And then we have multi -modal model, which, when they say multi -modal model, it usually needs to define what are the things that support it. So it can be text, vision, and speech. So those models will understand text, vision, and speech really well.

when it comes to large action models, understands actions really well. So basically, say when a model looks at a screen with a send button, it understands what that send button would do when you click on it and what kind of things you can expect out of it. So we developed a large action model purposefully for understanding actions really well so that we can understand what users are doing on the computer and hence also make predictions for the users.

Prateek Joshi (11:24.634)
You mentioned AI agents and specifically how a lightweight AI agent is installed on the user's device. So can you talk about the role of an AI agent, or rather, what are all the things an AI agent should do in the context of process automation?

Will Lu (11:46.626)
Yes. So I think how AI agents is adopted in different systems is very different from companies to companies. Or it would be AI agents at the core of the entire GTI platform. essentially, does three things, as I described. So it can observe what users do and make sense of it. So for example, when you click a button, it knows, OK, you're trying to send an email rather

just understand it as you click a send button. And then second thing is automation execution. So it be able to take an action prediction for models and execute it, which is not relatively easy task to do still. But you can think of it as very similar to the tool use that popularized by OpenAI. OK, and then the third thing is learning adaptation. Basically, these AI agents can look at past histories and then make better

predictions every time when we learn more examples.

Prateek Joshi (12:49.936)
when you look at building a large action model, can you talk about, I mean, actually there are multiple things to talk about, but first the data you would need to build one such model. So where is the data coming from? How are you prepping it, labeling it, making it ready to feed into the model?

Will Lu (13:10.161)
Yeah, that's a great question. We have a paper coming out to talk more about this. But in a nutshell, think it does is, so basically today, when you crawl data, you crawl the static content you find from the web, including all the text, videos, images, etc. Those interactions that human beings can do on those websites are rarely captured. So we built

a crawler that can go onto the web and then interact with web pages and trace the sequence of how those actions have happened. So we take those information back. There's different ways we can adopt that data into the model training process, which is very still a research work that we're trying to figure out what's the best way. But we have made lot of progress so far to show a model that can perform much better than vanilla models out there.

Prateek Joshi (14:08.284)
Now that's actually very, very interesting because it kind of helps you access a different modality. Like text, images, video, we all know that. But actions, most of us, almost every single person ends up performing actions. And this is very interesting angle to build one such model. Okay, so you have the data. Once you get the data, you start building the model. How do

How do you know when a given model is good enough? How do you benchmark?

Will Lu (14:43.556)
So this is a relatively new field. from earlier last year, there are multiple research data sets out there, helping to develop different aspects of the AI agents. So they are planning, they're decision making, they're grounding, basically finding which elements to perform certain action on, et cetera. So we have those data sets. There are about four or five popular ones.

And internally, also we established different kind of benchmarks to evaluate our models. So this is evolving, I think, is going to become more more popular and being adopted by mainstream research.

Prateek Joshi (15:25.594)
Now, if you look at the usage of a product like this, for example, if I'm interacting with a text -based model, the risk is pretty minimal, meaning I input something, the worst case, it will generate like wrong text. But in this case, technically, it can take a bunch of actions that the user may not want. Or if a malicious actor gets access, they can make the...

Will Lu (15:34.882)
Mm -hmm.

Prateek Joshi (15:52.454)
the model take a bunch of actions on a user's computer. So what are the risk like surface areas you have to cover when you deploy a model like this?

Will Lu (16:01.069)
Yeah.

Yes, that's a great question. So under the hood, we do not let the models generate anything randomly so that it will do malicious things easily. So the approach we're taking is users first, need to show what kind of tasks they're doing, what they want the model to do. And with that observation, we come up with workflows. And then the action predictions will be coming in that context.

And there are various ways for us to be able to validate even the predictions so that we know we're on the right track. Whenever the confidence is low, we have this human loop process basically to trigger the user to interact with the software to figure out what's the best thing to do from that point.

Prateek Joshi (16:53.734)
When you look at the job market, so for example, right now, the category of RPA, it spawned a large job market where we have process analysts and application engineers, and they come in, they analyze what you do, then they use one of the big RPA products to automate the task, maintain it, run it for you. Now, with an AI native product like yours, a good chunk of that work is

needed anymore because AI can look at what you're doing, automate a bunch of tasks. So how do you see the change in roles? Some roles will become redundant because AI native RPA will just know what to do. So you don't need a human to come in and do a bunch of that manual work. So how do you foresee this change in roles in the RPA world?

Will Lu (17:43.79)
That's a very insightful question. So I think today's RPA tool provide a very limited set of capabilities with requirements on the user side to be able to have the background and use the tool effectively. So with this new AI system, certainly be able to get things off ground and have automation defined to become much easier. And then there's a lot of inherited learning capabilities provided.

So the whole end user, business user experience can be better and better. But it still requires for the people who are controlling the whole system, say the IT department or the man who is integrating this whole system for everyone to use, that requires a bigger or a deeper understanding of the whole system and the technology to be able to use this more effectively. So in a nutshell, say,

We provide a much complex set of more capable tools, which requires someone with deep AI technology to deploy. But that requirement is low for any part. It's probably just a handful of people to do all that. And then this experience for the end user is going to be dramatically simplified, where they can assume that the system can do most of the things automatically and adapt to their needs over time. So they don't have to worry about all of that. They just focus on their task.

Prateek Joshi (19:13.916)
And in the world of agents and especially agent tech workflows, it's getting a lot of attention as then you go to a large like a foundation model. And if you specify a big complex task, then a reasonable agent tech product should be able to break it down into a number of sub tasks executed in sequence. And then, you the user is happy and the task is executed. Now, I see a lot of similarities

Will Lu (19:23.637)
and we'll see time.

Prateek Joshi (19:42.64)
the workflow that people are trying to build to, to what a GPA can do for customers. So as this becomes more and more standard, do you think the software providers will embrace GPA and plug it in, or will it have to be customers who will have to, okay, use like five different pieces of software, so I have to build my own GPA. So who do you think will adapt this more in the enterprise?

Will Lu (20:13.452)
I believe it has to be the software vendors who provide a holistic experience. For example, like the conditional learning capabilities, if you do it separately in the satellite environment, it's extremely hard to do. So that's why we believe a product like Orbi's offering is very needed for having this unified environment for users.

So when it comes to actual experience, we want this to be contained as much as possible so user don't have to worry about it. Like using Google search, you don't have to know anything about how the crawling indexing and then recommendation works. You just need to use a search and then you know that it'll get better and better over

Prateek Joshi (21:02.768)
Yeah. And that actually reminds me of an interesting thing where, like, for example, if I'm a user, I'm using a technical product like a data infrastructure product or a database product. And let's say I accidentally lose some data and I chat with them and they say, here are the seven steps you have to follow to recover the data. And my question is, if you already know it, it's your product, why don't you do it for

instead of giving me the steps. Like, why don't you automate that? Because it's your product you already know. So I think this is a very interesting way to make the experience better for users that you know the problem, it's your product. You should be able to take a bunch of actions to fix it instead of like giving me the steps to do it. So anyway, that just reminded me of that. As you look at deploying this, your technology at bigger and bigger companies, how do you measure

ROI, meaning let's say somebody says, Orbia spent 100 ,000 on you to implement your product and we used it, obviously. What is the dollar ROI? I'm getting out of

Will Lu (22:15.108)
On that front is actually very simple. So basically you have X number of people doing a task and then you adopt Orbi and then how many people do you need after that? Say it's 100 before, now 20 and then you have 80 % efficiency gain.

Prateek Joshi (22:34.96)
All right, that's pretty straightforward. And actually, the more straightforward it is, the better it is for you and the customer, because there's no need to use proxy metrics. And when you think about explainability of a system like this, do your customers ask for more explainability, or is that even an issue when you deploy an AI system like yours?

Will Lu (23:04.229)
Explanability has been a very asked feature since the beginning of this whole deep learning or AI product. But I think the quality of your task will come first. I still use Google search as an example. think rarely people ask why certain search results came first rather than the other ones.

that certainly there are advanced users who would want to understand all that better. So for Orbi, we emphasize the accuracy of automations. And then both still have capabilities like we explicitly show the workflow definition to the user. We try to show the history when we make a prediction, say, hey, this is what we think should be. And these are the evidence we've seen. They like features to provide the user so they can understand it better.

Prateek Joshi (23:59.322)
And if you look at other different functions within a big company, is customer support, there's finance, there's admin work, IT, workflows, tickets, so many things that are mostly like manual labor type tasks. So in your observation, what teams are embracing you faster than others?

Will Lu (24:24.172)
Yes, that's also a great question. So typically, in this industry, today's RPA solution falls short when you have to encounter any unstructured data. By unstructured data, I mean the documents, emails, where not things that would typically be processed by the computer directly. And if you think about it, there's very common

requirements in almost everything you do in an enterprise settings. So when it comes to handling unstructured data, the back office is the most needed sections. For example, customer support, IT, finance, accounting, et cetera. So those are the areas that we shine the most.

Prateek Joshi (25:14.841)
And when you look at the concept of a digital twin, it comes up quite frequently. And in this case, you're basically looking at a task, you're automating it. if you fast forward, in the future, they'll say, hey, we hired this admin and this person in customer support, and here are all the things they are supposed to do. So do you envision GPA being like, okay, we created

this human equivalent will complete all these tasks for you. But obviously they're disconnected. Versus like, here are all the tasks we'll automate. You can pick and choose. So how do you see this shaping up in the future? Will people use this to automate specific tasks? Or they want to group together a bunch of tasks so that they don't have to hire yet another admin, for example, to handle back office?

Will Lu (26:11.368)
So that's a great question. The way I see it is that for us down the road, we want to support this thing called digital organizations. We don't think a bot can replace human completely, at least in the short future. But we envision that we can have much less people to set up a function compared to before. And there's a few people who will be able to use systems like Orbeez.

system to set up a bot and then the bot can do 80 % of the work and then handle the sections and then do the coordination, do the main type of

Prateek Joshi (26:51.558)
All right, I have one final question before we go to the rapid fire round. What technological breakthroughs in AI are you most excited about as it relates to process automation and its future?

Will Lu (27:08.234)
You mean in the past or you're going to come in the future?

Prateek Joshi (27:10.673)
No, what's that? Yeah, so the breakthrough is happening right now that are most exciting to

Will Lu (27:16.525)
Yes. So first of all, definitely the foundation model is purposely built for AI agents, which Orbi is working on. There are a few other companies that going this direction, including OpenAI. Second thing is world model. This is very research work. There are few research labs that are trying to figure out how to build a world model in the context of automation. And the third one, I'm very excited about this edge automation. Basically, Apple is one big lead in this.

They're building these small models, very capable for running all the automation on your Edge devices.

Prateek Joshi (27:53.468)
With that, we are at the rapid fire round. I'll ask a series of questions and would love to hear your answers in 15 seconds or less. You ready? Alright, question number one. What's your favorite book?

Will Lu (28:02.958)
Sure, let's go.

Will Lu (28:08.1)
When it comes to start out with Beyond Entrepreneurship 2 .0 by Jim Collins, I think it's must read for everyone who wants to build a great

Prateek Joshi (28:16.412)
Amazing. What has been an important but overlooked AI trend in the last 12 months?

Will Lu (28:23.934)
I think it's the small foundation models that is being overlooked mostly. The models are shrinking in size, but yet growing stronger and

Prateek Joshi (28:33.848)
What's the one thing about process automation that most people don't

Will Lu (28:41.013)
People don't understand how complex the human's day -to -day work can be. So they believe that the tool -based RPA solutions can go a really long way. But the reality is that it's really, really lacking behind.

Prateek Joshi (28:57.06)
What separates a great AI product from a merely good

Will Lu (29:04.024)
Besides the obvious AI quality and capability, I think the product design is essential to all AI products. It's a huge differentiator.

Prateek Joshi (29:14.904)
What have you changed your mind on recently?

Will Lu (29:20.76)
I was really, really excited about the reasoning capabilities at the beginning when the research came out. But now I've become more more conservatively optimistic about capability. I think there's definitely some more research need to happen to make it ready for major adoptions.

Prateek Joshi (29:38.652)
What's your wildest AI prediction for the next 12 months?

Will Lu (29:47.01)
I'll say the improvements on large -anguage models are going to be very, very incremental in the short terms. So most of the breakthroughs in this foundation model area will be coming from other dimensions, as multi -model models, large action models, models, et cetera.

Prateek Joshi (30:07.408)
All right, final question. What's your number one advice to founders who are starting out today?

Will Lu (30:14.946)
Okay, I'll say work on something that you're really, really, really passionate about rather than following the hikes or the trends.

Prateek Joshi (30:24.956)
Amazing. Will, this has been a fantastic episode. I think we got to dive into a very important topic and we are at a very critical point in time where a big shift is happening, like infusing AI into a giant market like RPA. So thank you so much for coming onto the show and sharing your insights.

Will Lu (30:45.74)
And we're happy to be here, it's my honor.