Data Brew by Databricks

Kumo AI & Relational Deep Learning | Data Brew | Episode 34

Databricks Season 6 Episode 28

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 43:27

In this episode, Jure Leskovec, Co-founder of Kumo AI and Professor of Computer Science at Stanford University, discusses Relational Deep Learning (RDL) and its role in automating feature engineering. 

Highlights include:
- How RDL enhances predictive modeling.
- Applications in fraud detection and recommendation systems.
- The use of graph neural networks to simplify complex data structures.

Denny Lee  [00:00:04]:
Welcome to Data brew by Databricks with Denny and Brooke. The series allows us to explore various topics in the data and AI community. Now, whether we're talking about data engineering or data science, we interview subject matter experts to dive deeper into these topics. And while we're doing it, we're joining our morning brew. My name is Denny Lee, and I'm a principal developer advocate here at Databricks and one half of Data brew.

Brooke Wenig [00:00:27]:
My name is Brooke Wenig. I'm the director of our machine learning practice and the other half of Databricks. And today we are thrilled to introduce Jure Leskovec, who is the co founder and chief scientist at Kumo AI. And many of you probably know Jure from all of the courses that he has taught and his book, and he's also a professor at Stanford. So welcome. Thank you so much for joining us today, Jure.

Jure Leskovec [00:00:46]:
Yeah, excited to be here.

Brooke Wenig [00:00:49]:
So, to kick it off, people know you for your research, your courses, your books on machine learning, graph representations. And so to kick it off, I would love if you could provide a motivating use case on why graph representations are so powerful.

Jure Leskovec [00:01:03]:
Yeah, I can give a bit of background. Right. So I've been professor now at Stanford for 15 years, I think, total. And I've been always interested in kind of machine learning and how do we learn over complex data. And then graphs are a very interesting type of object to learn over. First, we were very interested in graphs from this kind of social media, social science, social network type of perspective, because it allowed us to do kind of new kind of social science. But then we started looking at graphs as a way to represent data. And then going back, I sold my first startup to Pinterest.

Jure Leskovec [00:01:45]:
So I joined Pinterest as chief scientist, and I spent there six years from when Pinterest was a small company of, you know, 100 people to post IPO. And there I was really interested, you know, how do we build, like, this high end, super accurate recommender systems that understand, you understand the style, understand the visual, really, you know, because Pinterest is all about that, understand these aspects of curation that is a very, you know, all these subtle differences and so on. So what we developed, actually, based on the technology we built in my lab at Stanford, we developed this system called Pinsage. And that turned out to be this kind of large scale, graph based recommender system where the system was able to learn both from the relationships of basically images, pins, and the boards they belong to, as well as from the images, text descriptions and all that, and building that system and then kind of putting it in production and running it at large scale, really revolutionarized Pinterest recommendations, made them even more topical, even more personalized, and led to humongous double digit growth in all kinds of metrics. Why was that a good idea? Because if you think about recommendations, you always need to have some properties about the object that you are recommending. And in Pinterest, a property is an image and the description. And then of course, you could say, okay, I'll just use some deep neural network vision transformer today to just embed that image. And that's all I know.

Jure Leskovec [00:03:28]:
But then what we figured out was that such a representation is not fine grained enough. And what we figured out that it makes kind of silly mistakes, right? It, you know, it takes a rug on the floor and a tapestry on the wall and they both look about the same. So it says, oh, it's the same thing. And it's not. Or it says, you know, here is, I don't know, you know, like soil, like that's brown and some, you know, kind of ground meat that you browned in the pan and it kind of looks the same and it gave the same thing, right? But to human, this couldn't be more different. So by putting over this kind of graphical relation where, where an image can say, you know, I am here, who am I? What other boards, what kind of images am I together with in the board, then it can kind of learn from those images as well to say, hey, actually, I'm a tapestry, I'm not a rug or, you know, I'm so, I'm not, you know, browned ground meat before or whatever, right? So that really, that intuition really allowed us to make a huge step forward in understanding user behavior preferences, recommendations and things like that.

Brooke Wenig [00:04:46]:
I know I won't be able to think about ground beef in the same way again when I'm cooking, but it seems like there's been this evolution, if you will, of a lot of traditional machine learning working with structured data. Then came the advent of Genii, working a lot more with unstructured data, computer vision models, language models. But it actually seems like you're getting a lot of information out of structured data with graphs. And can you talk about how neural networks are uniquely? In particular, graph representation learning, graph deep learning, are poised for getting the most out of your structured data.

Jure Leskovec [00:05:20]:
Yeah, I can say. I think in some sense, graphs are these super cool objects that really allow you to model dependencies between your and objects and entities. But I would also say that learning with graphs is around an order of magnitude harder than your typical stream big, flat data through some neural network. And that's why I think it's so challenging, because graph is this relational, connected structure that you cannot really stream, in a sense. So, at Pinterest, it took us, like, three, four years, an entire engineering team. So probably, like, 2030, $40 million investment to build that platform. And only after we built it, we were able to harness the results. What I see over and over again is that building graph learning platforms, that scale is very, very hard, and lots of organizations are simply don't have the talent, or it's very hard for them to build it, and it's a multi year effort.

Jure Leskovec [00:06:36]:
The other thing I would say is, sometimes people say, oh, I'm not a social network. I don't have a graph. But our insight with this research we are doing at Stanford called relational deep learning, is that any set of structured data, any set of structured tables, can be represented as a graph. And then you can say, okay, good, so database is a graph.

Denny Lee  [00:07:02]:
So what?

Jure Leskovec [00:07:02]:
Right? But the exciting thing is that we know how to apply deep learning to graphs, right? So that's the key insight. And why does that matter? It matters for the following reason, right? You were mentioning, you know, transformers, convolutional neural networks, things like that, right? Kind of AI, let's call it AI today, is able to learn over your raw data, right? You bring in the images, pixels, the neural network learners. You bring in your sequence of tokens. The neural network learns directly from the sequence, right? You are not saying, oh, this is a noun, this is a word, oh, here's, you know, an adjective in front of the noun, and you're not doing any of that. You used to do that, but not anymore. But if you think about, let's say, machine learning AI over structured data, over data that is, in multiple tables, for example, you have your customers, transactions, and products. That's kind of the simple schema to think in mind that is still in that prehistoric feature engineering era where you cannot learn over these three tables as they are. But you need to feature engineer, so you need to join the user table with the transaction table, and then you need to aggregate to maybe compute.

Jure Leskovec [00:08:19]:
What was the number of purchases of the user in the last two weeks? What was the average purchase amount, how much they purchased on a Sunday, how much on a Monday? Do they purchase in the morning? In the evening? And you have to be doing all that work to be then maybe able to predict, is a customer going to. Going to churn or not? So kind of the point is that, but if you take this, let's say simple three table schema represented as a graph, then you can basically apply what is called a graph neural network or a graph transformer directly to that data to just learn whether the user is going to churn or not, right? So what, you just skip, you skip the entire process of feature engineering. And I think why that is exciting is that what happened to computer vision? It went from feature engineering to neural networks learning directly from pixels. And that led from kind of subhuman performance to this kind of superhuman performance. Right? In the old days, if I was like, I want to detect whether there is a cow on the image, I'd be like, I need to detect one big oval and another oval. Maybe that's a head. And several sticks will call those legs. And if I have that, then it's a cow.

Jure Leskovec [00:09:37]:
Nobody does that today. So with this relational deep learning approach, you can basically completely circumvent feature engineering when learning over structured data spread across multiple tables. And what's the difference is that you can build now models much faster because you just point them to the raw data and you go get a nice morning brew of coffee while waiting for the model to train. And then the other thing is, the model can learn from all your data, so it's not limited by the data you just happened to put in as features. So that's kind of, I think, a long answer with several different points to your question.

Denny Lee  [00:10:22]:
Yeah, but that's super compelling. So if I was to forsake argument, oversimplify the problem, right? You've got 3d balls, which you typically do like an aggregation and a group by. So this is me putting on my bi hat, which is my past, by the way. And I say, hey, I want to, exactly as you described. I want to calculate category of products by date, like fiscal month, by the purchases, the number of purchases and summation. Are you, are you saying relational deep learning, with its building of a graph can in essence, automatically figure that out? Does it automatically figure out the categorization, the aggregations? Or like, how do you tell it to basically build up these things or determine for sake argument outliers in the calculations that you're seeing? I'm just curious.

Jure Leskovec [00:11:15]:
So maybe the way to think of this is, let's say you want to build a model to predict whether the customer is going to churn. And let's say the churn for you means they don't buy anything next month, next 30 days, 30 days in the future, no purchase. Then you can say, okay, what will I build this model based on? You could say, okay, I have a customer age, maybe I have the location, maybe I know whether they are a loyalty club member. Then you say, oh, I have also ordered prior transactions. So you do what you said, you take your customer table, transaction table, join them, and then you do some kind of aggregation over that. So you could say average purchase in the last one week, in the last month. And you keep adding these features to create these big training tables so that you can now predict that label. Right? And I think the point is that here, what that model that the end is going to look at is, it's only going to be able to learn from those features, right? So maybe you said, okay, it's the number of purchases in the last seven days and number of purchases in the last month, you know, but what if in the last 42 days gives you a better prediction? What if it matters at what time of the day, day of the week? Was it a holiday? What category? Right.

Jure Leskovec [00:12:36]:
So, for example, you know, the problem is then this feature explosion, because for each of these features, you need to have a separate workflow to compute. And, you know, I can give you an example from an online, you know, kind of room booking or house booking website there. The price of the, of the house or the room gets expanded into 120 different features because it's pro price of the room today, price of increase from the price yesterday, percentile score of the price of this room to comparable rooms in, I know, half a mile radius. All these kinds of features, right? But with this approach where you represent basically data as a graph and then put neural network to it, you let the neural network to kind of learn how to attend, let's say, over all the past transactions, to say, oh, it's actually the price of the transaction, or actually it's the time of the transaction. And of course, these things are just learned by the neurons through, you know, kind of embeddings and weights. There is no explicit join, there's no explicit aggregation. But the attention mechanism that kind of goes across tables is essentially attending and aggregating. It's essentially selecting and aggregating that data in some sense, in the best possible way.

Denny Lee  [00:14:02]:
Got it. So for sick argument, then the outcome could be equivalent to select count group by. Right. But that's not a necessity at all, just because as you're calling out RDL, it's about the idea of the neural net actually learning from the tables that you've supplied via the graph.

Jure Leskovec [00:14:25]:
Exactly. And you could imagine that now this neural network can learn a super complicated select group by type state with all kinds of weights and combinations and so on. Right. And that's good. Right. That statement never really gets executed on those tables. It's the neurons who kind of execute it. If you want to think of it that way.

Denny Lee  [00:14:48]:
No. Makes a ton of sense. Yeah, I remember in the past I tried to explain that type of concept. I said, like, yeah, there was a purchase from a bookstore, like this popular online bookstore that at 02:00 a.m. in the morning, with this particular Persona of Oprah books, right? Like this idea that it's sometimes just a random thing, that unless you actually have something smarter, like, to actually figure that, to actually track or figure it out, the reality is that you then have to be uber creative and say, yeah, I'm going to go search for Oprah books at 02:00 in the morning and see if I can find a trend that's probably not going to happen.

Jure Leskovec [00:15:21]:
Exactly. Right. Or when your data becomes more richer, more complex, it's not only users transactions and products, but it's maybe stores and employees that were in those stores and things like that. Then this, this can learn that, you know, you met a rude employee who, who was in the store at that given time, and that makes you churn. You bought a blue, dark blue t shirt that stained your laundry, and that makes you churn. You know, you make a lot of small transactions, and that's characteristic with people who churn. Or, you know, maybe if you are young and do a lot of small, I don't know, like, whatever the, the pattern is, this kind of can learn it and extract it from the data in its natural form. So really it's about kind of building next generation AI tools, neural networks that can learn over the data in the natural form.

Jure Leskovec [00:16:16]:
And structured data in the natural form is in a set of tables linked with primary foreign key relations. Relational deep learning is a way to now learn over this data in its natural form.

Brooke Wenig [00:16:29]:
So similar to how could, like genaie is having an explosion now due to the availability of data, increased capacity of GPU's. Why is relational deep learning taking off now? Why did this not take off, let's say, 10,20 years ago?

Jure Leskovec [00:16:43]:
I think it's very, I would say it's for the same, for the very similar reasons. Right? We didn't have GPU's, we didn't have a graph transformers and graph neural networks that can learn at scale. And I think also we didn't have systems that are performant and allow us to do this. So I think it's kind of, it's a natural next step. Right? First, we learned how to do deep neural networks on the simplest data type, which is a fixed size matrix. Then kind of, we generalized it how to do it over a sequence of tokens. Those are transformers and everything. And I think now it's time to generalize that beyond the sequence and a set of the relational structure, the structured data is the next step in complexity.

Jure Leskovec [00:17:34]:
So I think this is kind of harder and more advanced in some sense, because the data is more complex and building. Now, a system to be able to do this and learn over this at scale and in a very performant way is much harder, because it's not something you can just kind of stream as a sequence through a bunch of GPU's. And even doing that is, of course, amazingly hard. And so much effort goes into it, but this is harder.

Brooke Wenig [00:18:00]:
And it seems like there's a lot of nice properties as well that come out of working with a graph structure. Like going through your paper, the position paper on relational deep learning, how you can avoid data leakage by only having edges between nodes up to a certain time point. So that way you're not predicting the future, having already seen the future. So it seems like there's a lot of benefits that relational deep learning graph deep learning really provide.

Jure Leskovec [00:18:22]:
That's a beautiful point, Brooke, that you are making. Right. Like, maybe we can try to explain this. It's right, like when you are computing these features. I know, imagine I'm doing a credit scoring model, so I want to predict whether I know I will pay back a loan. And let's pretend today is January, I don't know, 2023, when I'm about to take a loan, right? Then all the features, like account balance and all that has to be computed up to that reference time, and only then I make a prediction. And then, you know, maybe a month later, I go and ask, maybe I go ask again for to take another loan again, all the features need to be recomputed up to that reference point. And this means that for any point in time, almost like every second of the day, all the features need to be recomputed.

Jure Leskovec [00:19:11]:
And if people don't do that because it's too hard, so then you maybe just recompute your features every day, every month, whatever. But then when you are doing learning, many times it happens. That feature was computed in the future, and you are making prediction about the past. And predicting past from the future is very easy. The hard thing is the opposite. But in this graph based approach, if you trust your timestamp, then you can, you can basically for free, you get this time consistency, this, you know, protection from time travel, because you say, oh, the time of the event is x, and you can only look at the data that comes before x. You can never look at the data that comes after x. And this means you will never have information leakage, you'll never learn from the future, and you'll always be learning with all the up to date information you have.

Jure Leskovec [00:20:04]:
So it's not that you have stale features as well, right. So it prevents feature staleness and prevents you kind of for free the problem with information leakage or time travel in this kind of supervised predictive problems.

Denny Lee  [00:20:22]:
This is really cool. So then following up on this, Kumo is a commercial grade implementation of the data for mission paper. And so I think reading it, it was mentioning that you beat Kumo beats many companies internal ML models, especially within the context. Often working with graphs is perceived as slower in the first place. I'm just curious, like, how is that possible? What is the secret sauce or the implementations that allows this to be so much better than ML models, especially when, again, graphs traditionally has been seen as a slower process?

Jure Leskovec [00:20:57]:
Sure. Let me maybe, I think this is a great question, but let me maybe explain it the following. So what we've done is at Stanford, we just published a position paper called Relational Deep Learning. We presented it at ICML, the international conference on machine learning, a few months back, and got a great, great reception. We also developed a set of structured data sets together with the predictive tasks and kind of a reference prototype implementation of this relational deep learning idea. And we have released that as a project that we call rel bench. So Relbench Stanford.edu for people who are interested. And we also done user study where we took actually a high end data scientist, somebody with kind of a 4.0 master's degree from Stanford, 4.0 gpa, five years of working experience building high end models in financial industry.

Jure Leskovec [00:22:04]:
And we said, look, here's a data set. Go build the model traditional way with feature engineering, kind of XGBoost data warehouse to run SQL. And we said, okay, you have two days to do this, come back. And then we also said, now go build the same model using relational deep learning. And the result was that this person was able to build the Model 20 times faster, and they needed to write 20 times less code to get to the same or better performance just on this kind of, you know, smaller toy data sets. So we've done kind of proper research and replicated this many times that show that it amplifies the productivity of a data scientist by about 20 x. Now, what we've also done in parallel is we founded a company called Kumo AI, where we have a large scale, industry grade implementation of this, where we have. And now to make this, what do you need to make this work? Is exactly what you say, right? This kind of traditional graph databases, they are more meant for kind of transaction processing, updating, and they are super slow for analytic or type workloads, right? And you guys know this the same way because there is analytic workloads and there are these kind of OLTP type workloads and Databricks and Spark would not exist if the analytic workload wouldn't be fundamentally different from what kind of oracle is doing.

Jure Leskovec [00:23:42]:
And the same thing happens in graphs as well. You have these graph databases, but they are orders of magnitude to slow. So what we've done at Kumo is we built a distributed system that takes your relational structured data, automatically, lays it out as a graph, and then applies this graph transformer technology directly to it to build that predictive model for fraud, forecasting, recommendation, personalization, type use cases, and so on. And if you build that system kind of optimized ground up, then you can scale to tens, 50 plus billion billion entities, no problem. You can train these models in a couple of hours and things like that. So I think it's really about analytical versus transactional type processing and building that system. And it took us two and a half years to build such a system.

Brooke Wenig [00:24:47]:
So it seems like the main benefits of working with the graph representation is you don't have to do manual feature engineering. You can outperform traditional ML models. What about interpretability, though? Because I know that's one reason why people really like using techniques like extra boost on structured data versus a neural network. So how do people still get the interpretability when working with graph representation?

Jure Leskovec [00:25:10]:
That's a super cool question. So what we've done is we had to innovate and invent basically similar concepts and generalize them to this kind of graph based thinking. And the way we have built explainability is similar. If you think in computer vision, you have this notion of saliency map where you can ask the neural network, what are you looking at? To tell me whether this is a human or not? And then the neural network says, oh, I'm looking here where the head is, or the neural network is looking at the corner and seeing whether the light is on or whatever. By asking the network, what have you learned? Where are you looking at? You get a really good sense what is important, what pattern the neural network is learning. So we have now we at Kumo, we developed capabilities that allow you to ask your neural network, what tables, what columns are you looking at and what data is important for your prediction. And this both give you kind of a way to gain trust in the models and gives you a way to say, hey, you are learning from some spurious correlation, some spurious data that you shouldn't be learning from. There should be nothing there.

Jure Leskovec [00:26:18]:
Okay, I have a problem. I need to, you know, I need to, I need to think about. And we can do this both at kind of the level of the entire model. What the entire model is looking at. And what is super cool is we can also give these explanations on a per instance basis. So when you say, okay, here's a prediction, we can go back and say, this is what it was important. And why is this nice? Is because in traditional explanations, explainability, you are limit the vocabulary. Kind of the vocabulary of the explanation is limited by the features.

Jure Leskovec [00:26:49]:
If something is not in the feature, you can never explain that. You can never use that signal. But with this type of approaches, you can actually go all the way down to the raw data and say, oh, wow, this is what it is important. It's this transaction is this type of behavior and so on. So you actually have, you get much more accurate and much more diverse explanations than with traditional approaches.

Brooke Wenig [00:27:17]:
That's super powerful. So what are the use cases? People are currently using traditional machine learning for that they should just stop and consider switching over to a graph. Deep learning approach.

Jure Leskovec [00:27:28]:
Yeah, that's a great question. So my answer would be, if your data is more than in one table, then you should be using relational deep learning. Okay. And I think, awesome. I like that one because it's so general. Right? Like, you know, with some, with some of our clients, like, you know, a big, big insurance here in us. We are, for example, learning over 30 tables, predicting whether there will be, there are or there will be kind of people injured in a, in an accident. Right? So accident happens.

Jure Leskovec [00:28:02]:
I know people go to hospital insurance. Only here I know weeks later whether from the hospital were people truly injured. Right. And they have, you know, it's a very complex schema. 30 different tables with policies, claims, who was in the accident, what car, things like that. Right. And doing feature engineering over 30 tables is a daunting task. With relational deep learning, you can build a model in a couple of hours or in a couple of days, actually, we worked with them.

Jure Leskovec [00:28:31]:
We were able to build 62 different predictive models in a matter of eight days for all different kind of properties. So that's kind of the power. So any kind of classification regression recommendations is another super cool and super powerful use case. Because in the graph world, recommendation is a link prediction task. It's a link, you are predicting link between the user and the item or the user and the category. And the beautiful thing here is that when you think about graphs in recommendation personalization, usually people think of graphs as this kind of collaborative filtering, right? People, you know, I, who are, who is my lookalike? Whatever that lookalike did, I'm likely to do, right? And you can capture this through the graph based signal. But then the other thing is, you also have properties. You know, there is a blue t shirt, there is a red t shirt, there is a big t shirt, small t shirt, and so on.

Jure Leskovec [00:29:25]:
And so far, when building recommender systems, it's been hard. You usually had a graph based thing on one side and you had a feature based thing on the other side, but those two never talked to each other. But with the relational deep learning, it's actually a richly labeled graph with all these properties of products, users and so on, plus the relational structure. So the neural network can kind of bring together the best from the both worlds and learn from, from both. So it works really well on cold start. It works really well on people where you have lots of history. So I would say recommendations is another excellent use case. We see great results in fraud.

Jure Leskovec [00:30:06]:
Fraud is interesting for two reasons. First is fraud is like very subtle, you know, like people, it's very hard to detect it, right? So, but with this kind of graph based approach where you put all these little signals together, the network can learn. And then the second thing is in fraud is fraud, behaviors change all the time, right? So as a, let's say, fraud machine learning engineer, you're always playing catch up, you're always engineering new features. Every day you're engineering new features because the fraud vectors patterns are changing. With this neural network based approach, there's nothing to engineer. You just refresh, retrain the network, and the network is going to pick up the new patterns, the new signals, whatever you want to call them. So retraining and keeping the model not to degrade is actually much, much easier than always kind of playing catch up and always, oh, the model has degraded. Let's engineer new features.

Jure Leskovec [00:31:00]:
You engineer new features. Oh, it's getting better. You have two weeks of peace. And then again, oh, it's degrading engineer, more features and so on. So this feature experimentation that is happening a lot is also going away. So that's, I would say, are good use cases.

Denny Lee  [00:31:20]:
This is a really, really cool and really interesting because I love the idea that I don't have to keep on building new features to do feature trading. So, sold on that part. But I guess the part of me that also asks from a performance, from a latency, from a cost perspective, like the cost to retrain or cost to add to it. How am I supposed to think about that? Because I was alluding to previously that graphs traditionally are seen as slower. I'm just curious, what is the performance of location or the cost of location for constantly doing things like this?

Jure Leskovec [00:31:56]:
Yeah, good point. I think the beauty here is that you don't have to have a graph. You just need to have structured data. Okay. And then the rest of kind of the graph building and all that is automatic. Traditionally, that's been all. You have to do that manually. And then you're like, how do I build a graph? What do I put in there? How do I annotate it? And that takes half a year just to do that.

Jure Leskovec [00:32:21]:
And then you're like, okay, now I get some value here. You just say, these are the tables. Let's start learning over it. So you don't need to even build a graph. The graph just gets build automatically. I think the second thing is, you know, in terms of cost, if we accelerate a data scientist 20 x, then, you know, what's the cost of 20 data scientists? We are far, far cheaper than the cost of data scientists. Right? And just to say, also right, like, these models are very efficient in a sense that you can train them in a couple of hours, you can train them in a day. So we are not talking now about hundreds of GPU's running for months or thousands of GPU's running for months.

Jure Leskovec [00:33:07]:
So, actually, from the cost perspective, it's very, very competitive. Like, you get better models, you get them much faster, and even just the amount of human labor that you saved pays for all the additional computation. Right. An hour of GPU is far cheaper than an hour of data scientists.

Brooke Wenig [00:33:32]:
I was sold at the one table threshold, sure.

Denny Lee  [00:33:37]:
But still had to ask, and I'm definitely even more sold. So we're good.

Brooke Wenig [00:33:42]:
And so I would love to dive a little bit deeper into the technical side of things, because I know under the hood, it's using attention across many tables. Could you talk a little bit more as to the inner workings of how this actually works under the hood.

Jure Leskovec [00:33:54]:
Yeah. So the way, the way we do this kind of conceptually, the simplest way to think of this is that we take the tables and every table defines, let's say a node type, and every node is annotated with attributes, which are the columns of that table. Okay? And now that we have these node types, for every row we create a node. So for every customer there is a customer node, for every transaction there is a transaction node. For every product there is a product node, and then primary foreign key relations where a customer id points to the transaction id. That's just an edge. Okay, so you can automatically create this. It's a bijection if you want to think of it that way.

Jure Leskovec [00:34:43]:
Of course, when we actually do this, in practice, we are, we are a bit smarter how this graph is created. But conceptually, you can think of it, you just have this heterogeneous graph nodes are annotated with column information and there are timestamps, so that when the neural network arrives to the node, it knows what time it arrived there. So it knows it can only look at the links that happened before the arrival time, not after. So given that we talked about information leakage and time travel. So that's, that's the, that's the approach. And then what you need to do now to, to make this scale is that we actually separate out the, the graph wireframe from the attribute node attribute information. We distribute the wireframe across large memory machines, so machines with terabytes of ram, and we put the column information on fast SSD's. Okay? And then the third type of machine that you need is the GPU machine.

Jure Leskovec [00:35:51]:
So now what you need to be doing is the following. If you are, let's say, predicting whether a customer is going to churn, you need to, first on the graph, sample a customer. Take all the neighbors, all the node neighbors of that customer. It should be the transactions, all the stores, all the products. And you know, you do that one hop, you did that two hops, you sample that, you move it to the place where you have the attribute information to attach that on that, your small local subgraph patch. And now we need to send that subgraph patch to the GPU to now learn how to attend over the first order neighbors and the second order neighbors to give you the prediction. And this only starts working when you can do all this complicated process very, very, very fast to actually keep the GPU busy at the end of the day. So I built a system like that, first at Pinterest, and now we have a generalized version, a super scalable version of that.

Jure Leskovec [00:36:51]:
Here at Kumo as well, we integrate really nicely into the Databricks. So that kind of a lot of data processing and pre processing and data layouts and things like that are all done with Databricks on Databricks side. So that then all that kind of has to go is just binary encoded data directly into the memory, GPU, memory, CPU memory of these machines to actually do the actual compute.

Brooke Wenig [00:37:21]:
And because Databricks has more than one table, I don't think that's any proprietary knowledge. We are in turn using Kumo to help with some of our internal marketing use cases as well.

Jure Leskovec [00:37:30]:
Yeah, I thought that was a, it's, it's actually, I think it's a beautiful story, right? It's. So I can, I can explain, right? So the, the marketing team, any marketing team, right, gets in marketing leads and need to decide who they want to talk to. Okay? And then the question is, usually you get more leads than you have capacity. So you need to sort. And you know, if you don't sort, the conversion rate is usually less than a percent. If you sort using the best tools on the market, the conversion rate maybe increases a bit over a percent. So what we did is we built a model that learns over, I think it's like seven, eight tables, like in that seven, eight tables in the schema, and try to predict, will a given lead convert? Will we be able to have a meeting with that given lead? And the interesting thing there is that if you start saying, okay, I need to predict whether a lead is going to convert, I don't even know what features to generate. I could say, what's the lead name? What is the industry type? What was the channel they come in, what time of the day they come in.

Jure Leskovec [00:38:41]:
And then I ran out of ideas even what features to generate for this. But with relational deep learning, we were able to just learn over all those tables. And rather than have a conversion rate be 1%, the conversion rate now is nearly 6%. Right? And that's the difference between a human doing feature engineering and trying to build features versus a deep neural network figuring out how to build the representation, the embedding, if you want to think that is actually predictive of this particular type of conversion that, let's say the Databricks marketing team really cares about. And it was interesting because it allowed Databricks to be able to much more efficiently close many more proof of capabilities, because now the marketing team is really engaging with the prospects that are really promising versus picking them at random or ranking them some other heuristic way. Awesome.

Brooke Wenig [00:39:46]:
Yeah. Thank you for helping us out with everything there to close out the episode today. Would love to just end with what advice would you have for our listeners who want to learn more about either graph deep learning or just getting started with graphs?

Jure Leskovec [00:40:01]:
Yeah, so that's a great question. So a few pointers, I think is first, I would point people to go to Relbench stanford.edu and read the position paper as well as one thing I didn't mention is we also released the prototype implementation and all that. So that's all free, available on the website. We also have developed graph machine learning library called Pytorch geometric, or pyg. So pyg.org is the website. It's an extremely popular library, I think over 20,000 GitHub stars and so on. But it's a very high end library, right? It's like says, oh, you build the graph yourself. You annotate the graph yourself.

Jure Leskovec [00:40:47]:
You create a training dataset yourself. You create the neural network architecture yourself. You train that yourself. So that's great for research and to play, but to put it in production, to scale it up, it's a huge lift. And in this respect, you know, then kind of Kumo is a way to do that. Not even worry about how to build a graph, how to store it, how to lay it out. But, but the system, the system does it. So real badge Stanford.edu if you want to learn more pyg, if you want to train your own graph neural networks, and Kumo, if you want to put anything in production and see the magic in practice.

Brooke Wenig [00:41:36]:
I know I said that was the last question, but now I am curious. How did you pick the name Kumo?

Jure Leskovec [00:41:43]:
Good question. It's a funny story.

Denny Lee  [00:41:46]:
So first, those are the best stories.

Jure Leskovec [00:41:49]:
So, yes, first, the name of our company should have been vertex AI. But then I got the domain name.

Denny Lee  [00:42:00]:
That seems to be a name that we've heard of.

Jure Leskovec [00:42:03]:
I already got a domain name. And then two weeks later, somebody announces vertex products, and I'm like, ooh. So that's the story. And then my wife was like, okay, how? You know, like, we were brainstorming, and my wife was like, okay, how do you say spider in Japanese? And actually, Kumo is a japanese word for spider, but it also means a cloud. And that's kind of beautiful because Kumo takes your cloud data, represent it as a network, as a web, and we are like a spider that extracts knowledge from that web. So I think it's beautiful. But, yeah. Kumo means a cloud and a spider, as well as a spider, depending, I think, what kind of kanji you use in Japanese.

Jure Leskovec [00:42:55]:
So that's how the name of the company became. And, you know, the domain name was free. So here we are.

Brooke Wenig [00:43:03]:
That's absolutely incredible. Well, thank you so much for your time today, educating everybody on graph deep learning, when and where to be, using traditional ML versus graph deep learning, as well as a great lesson on japanese kanji.

Jure Leskovec [00:43:18]:
Yeah, thanks a lot. Very fun talking to.