What's Up with Tech?

Graph Databases For Enterprise AI

Evan Kirstel

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 25:46

Interested in being a guest? Email us at admin@evankirstel.com

Most AI teams are learning the hard way that dumping more text into a prompt does not guarantee better answers. We sit down with Philip Rathle 
Chief Technology Officer from Neo4j to talk about the missing ingredient: relationships. When your data is inherently connected, a graph database can turn scattered facts into usable context, so LLMs and agentic AI systems can respond with more precision and less noise.

We walk through why graph technology is showing up as a “quiet power layer” behind enterprise AI, from knowledge graphs and digital twins to metadata, lineage, and even relationships between vector chunks for graph RAG. Philip explains the practical difference between raw data and knowledge, why multi-hop reasoning matters in domains like financial services and supply chain, and how an AI system can delegate deterministic parts of a problem to a graph while the model focuses on language and judgment.

We also get specific about engineering tradeoffs: why relational databases struggle with constant schema changes, what index-free adjacency means for performance, and how graph queries can run 100x to 2000x faster with less hardware for deeply connected questions. Then we look ahead at where the category is going, including why “graph as a bolt-on feature” often misses the real benefits, plus a roadmap update on Infinigraph for scaling graphs into the 100+ terabyte range. Finally, we cover how AI is making graph adoption easier by inferring graph models from relational sources and helping teams write Cypher queries quickly.

If you’re building enterprise AI, graph RAG, or agentic workflows and you care about accuracy, context, and causality, this conversation will sharpen your architecture instincts. Subscribe, share this with a builder on your team, and leave a review. What’s the hardest connected-data problem you want AI to solve?

The Fresh Patch Podcast - Where Good Pets Get It.

Welcome to the Fresh Patch Podcast where we talk about everything, from dog...

Listen on: Apple Podcasts  

Support the show

More at https://linktr.ee/EvanKirstel

Why Graphs Power Enterprise AI

SPEAKER_01

Hey everyone, really excited for this chat today as we dive deep into graph technology, the quiet power layer behind serious enterprise AI with the leader in the space at Neo4J. Philip, how are you?

SPEAKER_00

I'm doing great, Evan. Really nice to meet you.

SPEAKER_01

Good to have you here and really intrigued by what you guys are are doing. Before that, maybe introduce yourself a little bit about your journey. And how do you describe Neo4J these days?

SPEAKER_00

I've been working in data and databases for a long time, like let's say three decades, give or take. And uh, you know, the first two of those roughly to round down, um, to round up, is uh with relational and a bit more than the last decade in graphs. And I worked on some of the world's largest database systems on both the operational, transactional, and analytics side back in the relational world with lots of big companies, um, and fell in love with this idea, which was at the time counter trend in the early 2010s, to say, you know, instead of removing, making the model simpler so that we can handle simple data at scale, saying instead, no, let's actually go all in on the richness of the data and complex relationships and the connections in the data, figuring that, you know, there are examples from you know, from everything from sociology to companies like Google, Facebook, um, LinkedIn, um, as well as a lot of networking companies that have entirely built their businesses on uh networks and network effects. And that has uh gone well as it turns out. Um, you know, fast forward close to 14 years, the company is now uh, you know, roughly a quarter of a billion dollars in revenue, um, you know, valued around 2 billion, and if not higher, uh, with close to a thousand employees.

Relationships Beat Raw Prompt Data

SPEAKER_01

Amazing. And for someone you know who hasn't really heard of a graph database who is main mainly working at the application layer, how would you explain why relationships between data points suddenly with AI matter so much?

SPEAKER_00

I'll give you two ways to look at it. One is if we are just going through our day-to-day lives and trying to understand and assess the things that we encounter, you the first thing you do is you identify, all right, what are the characteristics of that thing. But then you really want to try to understand how it interacts with the world. Um and if if I look at uh you know Google is a great example of you know a trillion dollar graph company. We we all know Google's knowledge graph, which takes uh extracts information off of the World Wide Web and assembles it into concepts and ideas that are all interlinked and that we can navigate. And then actually their foray into uh into web search was let's take the relationships between how the web pages are structured and use those to inform how I should rank the results. Um this one's pretty uh I think a good pointer uh and analogy for what we're doing now with vectors and just words and you know, throwing lots and more and more data into the prompt. Turns out there was a meta study, uh, I think around August of last year, looking at 1500 research papers about uh prompt engineering and finding that the more you flood the prompt with lots of raw data, the worse your answer gets. It's actually counterintuitive. And I mean, that kind of sucks because it's also more expensive to put more data into the prompt and things run more slowly. Uh, if you can be more surgical and pull um knowledge, so let's extract signal from the noise, figure out how things are connected and put that into the prompt, you get um better answers. And there are lots of ways to use the topology of the system, how things are connected to each other, both the real world digital twin, digital world, digital twin of um the domain, as well as you know, how other kinds of connections like the, you know, we there's a lot of talk these days about semantics and ontology and uh metadata and lineage graphs, as well as another kind of graph um which you you can uh create from the vector chunks and how they relate to each other and where they were drawn from and how how proximate they are to each other, where where in the document they came from, um, what the collection was, who the author was, and so on and so forth. That that can all be represented in a graph.

Where Relational Databases Hit Walls

SPEAKER_01

Interesting. You mentioned you go back three decades in data. Uh technologies. Uh I worked for Oracle like 10 years ago, different lifetime, it seems, but we had hundreds of different databases actually. Um why don't traditional relational databases solve these kinds of problems? Why can't they be stretched uh to fit? Why do they hit a wall and where?

[Ad] The Fresh Patch Podcast - Where Good Pets Get It.

SPEAKER_00

It's uh the there are two big areas where they hit a wall. One is if I want to bring in more kinds of data, then as you have probably done hands-on and I have painfully over the years, is you need to rethink your schema. And God forbid you have data in your schema, and it's even worse if you have an application running on top of it, then you need to really plan your schema change, your schema migration, uh, dropping and recreating tables and foreign keys and constraints and uh dropping and recreating indexes just so that you can add some columns and some new relationships and some new tables. So that's that's one is with graphs, you have optionality. Some vendors are schema fixed because they're built on top of relational databases. Um, the technology I am have been building, Neo4j, takes a schema flexible approach. Uh, so you have two approaches in the graph world. And with schema flexibility, or maybe more accurately, progressive schema features where you can choose to lock things down, but it's a choice only once you know what that data should look like. And then I'm gonna introduce some keys and some constraints. And until then, I'm just gonna let new data flow in, or if I knew have a new kind of data set come in, uh, because look from one day to the next, we don't know what data is gonna be giving us valuable signal, and you know, every business is moving so fast. So schema flexibility is one. The second is the way data is written and the fact that uh, and I'll speak to Neo4j. Again, not all graph databases, particularly those based on relational, don't have this quality. But having uh it's it's a term the term for this is called index-free adjacency. And the idea is that each relationship is its own object, each node is its own object, and each relationship has a pointer to the location in memory, or likewise on flash or disk, of where that um where the adjacent node is, so where the node is at the other end of the relationship, and you have this bi-directionally. And what that means is I can use modern compute, which is actually good at random I.O. You probably remember when you were at Oracle, I certainly do, of doing everything you can to avoid random I.O. Let's pre-arrange the data on disk in a way that anticipates queries. But you can't anticipate the way things are connected up in the real world. Literally, each query, each individual points of data connects in a different way. So you need to lean all the way into random I/O. And we did this by saying, let's just pre-connect the data, take a little bit of an extra hit when you're writing the data, so that now when I read it, I can just spider through the data once I've landed into the graph. You can do about two million pointer chasing operations per compute core per second. And that ends up being three orders two to three orders of magnitude faster than a relational equivalent, you know, even faster than a NoSQL equivalent where you don't have structures to accommodate relating data. And um and it's much more compute efficient. So as a result, it you get queries that are 100 to 2000 times faster on one-tenth the hardware. Wow and and I'll add one more uh a couple other benefits that are uh maybe more C-level benefits that get observed as time goes on with a project. So not the original reason for buying, but the impedance mismatch, i.e., you know, the the difference in the shape of the data when data shows up as the networks or or as hierarchies, and then trying to put that into tables, there's a huge impedance mismatch. The distance between those representations is great. And that ends up becoming turning into a lot of effort and also just a lot of misunderstanding because business and DB, the business and the DBAs, and likewise the DBAs and the developers end up having to speak completely different languages and then having to spend multi-months of modelers in a room trying to translate between one language uh and the other, and very few people who can speak both. So having a model that accommodates speaking um in technology and business terms in the same way, and having that be the way the data shows up in the world, which is naturally as as networks, naturally as hierarchies, naturally as paths and journeys, ends up being in the long a huge differentiator and advantage.

SPEAKER_01

Wow. Fantastic. I'm looking at your website here, you know, so many great logos and customers. Any uh anecdotes or real-world examples of where you know your technology really changes how a customer operates? What's your what are your some of your favorite stories?

Fitting Graphs Into Existing Stacks

SPEAKER_00

Sure. Um let's see, I'll take some obvious ones and then I'll take some less obvious ones. So some of the obvious ones are of course, a supply chain is a deep-nested hierarchy. And it turns out one of the pharma companies who uh you know we all depended on during the pandemic for for vaccines um had to spin up a super complex supply chain in no time and then uh have uh agility behind that. Um and uh and so one of the big pharma providers actually spun up Neo4j in a very short time and used that as the basis for their entire supply chain um through throughout the pandemic. That project has since expanded into more than 30 different projects, because another characteristic looking at the long-term value of the graph is it it inherently accommodates data network effects and use case network effects, meaning the more data I put in there, the more, the better I solve my existing problems. But then the more I'm able to solve new kinds of problems, because you know, I might have 90% of the data to solve one problem, um uh to solve a different problem, and I just need 10% of the new data in, and all of a sudden I can solve two, three problems instead of just one. Um turning to AI, graphs are kind of have accidentally and naturally become the ideal format for storing um knowledge, context, memory alongside um alongside LLMs to inform AI systems. And there are two ways in which this happens. Um and I'll draw from examples that have been presented from Uber, Adobe, Walmart, um uh Novo Nordisk, App V, and many more, that you can kind of boil it down into two patterns. One is I'm gonna have your agentic system reach out to the graph, pull more context, meaning the thing and the thing around it, and the thing around that, and then and then you know, kind of subject object in your in your query, several levels out and how they interact. Pull that back, feed it to the model, let the model churn on it, um, and the model makes a better, more informed decision, and you send that back to the user. The other pattern is look, sometimes there's no room for error, and I can answer my question through a connect the dots or a multi-hop reasoning is the term you hear these days. Gardner's written at least three papers I've seen just this month on graphs as a top differentiator and generally and for financial services, um, where multi-hop reasoning means look, my supply chain calculation is something I can do in the graph. It's 20 levels deep. There's a lot of complex math. It's not something a model is inherently built to do. And that's fine. That's why we have multiple tools. So here the model delegates, as say the right brain, which is spontaneous creative, would delegate to the left brain to solve a particular kind of problem that's more deterministic. Um, reach into the graph, bring the data back, and pass it through. Um, so that's a cool and for me, kind of you know, new and unanticipated pattern a few years ago before we really understood models or what they were good at and how they could be used.

SPEAKER_01

Fantastic. Um enterprises have a number of challenges. Building AI infrastructure as the United Doubt experience firsthand, a lot of technical debt, a lot of um, you know, disparate infrastructure. Uh they're worried about ripping and replacing you know uh legacy data states and affecting existing operations. Uh, how do you think about where graph technology, how how Neo4j fits into how they're building? Is it foundational or is it more complementary to what they're doing today?

Will Graphs Become A Commodity

SPEAKER_00

It can slide in really nicely in a complementary way. Um so what what what I love about graphs is and I I've been through now several generations of a whole class of vendors saying you need to go and take all the data in your tiny or company and move it in. First it was data warehousing, and then it was data licks, and then it's lake houses. And you know, I understand that CIOs, CDOs are tired of that. They don't want to be told you need to move everything again one more time. Um, and the good news is you don't need to do that. You really have a choice of first of all, when I talked about flooding the prompt with data, I I make a distinction between data and knowledge. And um knowledge is the signal extracted from the noise and the connections amongst the pieces of signal that you have for the things that are relevant to the particular problem that you're trying to solve. So uh for your first use case that's powered by graphs, that might be 1% or less of the data that's in your, you know, Snowflake or Databricks or BigQuery or you name it. Um and then that also implies and describes a complementary scenario where you're feeding the graph from an existing repository. So it's a you know subseted reflection of the data. The the other pattern is, and I'm seeing this a lot in startups now. We've had more than 500 AI startups signed up to our startup program just in the last few months since we launched it. Um, is they're saying, look, my domain is inherently a network or you know, or or involved a path or journey. Um, and it's everything from companies that are mirroring an SAP system to keep track of you know all the different configuration uh changes to companies doing, hey, let's do insurance claims right from the ground up, to companies doing um just generic AI memory uh and and enterprise search uh and on and on. And in that case, they're using the graph as the system of record. Um so much like you would use Postgres or MySQL or you know and any number of other relational databases um as your system of record, you can equally use the graph as a system of record. Um, I'll name a few examples that have spoken publicly. City private bank eight years ago migrated the um system of record from Oracle to Neo4J. Now, I'm not saying that the the for all the listeners, the the top use, uh highest and best use right now of graphs is go migrate existing systems. If it's not broken, you know, don't fix it. Um go build something new. There's so much new to that can be built. But essentially have the optionality to use the graph as the system of record for you know some new application or new subject area that's been built that is highly interconnected, and where you need um you you can derive additional value uh from uh uh from going deeper with context. So context and causality for almost since the time I've been at Near 4J have been my two favorite terms because that's ultimately what you get out of the graph is understanding not just the facts, but the things around it, the reasons behind it. And then causality is butterfly effects, you know, so many things happen in the world uh every every day that are not based on, you know, it's one cause to one effect, it's multiple causes, and then those cause causes are effects of other causes and so on and so forth, and they each have different weights. You can represent that in the graph.

SPEAKER_01

Fantastic. Where do you see things headed in the next couple of years? Do you see graph continuing to be its own sort of unique category, or is it just a capability baked into every data platform?

Infinigraph And Scaling To 100TB

SPEAKER_00

The um the category's been growing like mad. Um, and that growth seems to be accelerating. And we've had experiments over the last decades where a decade where multiple vendors have come in and said, let's just commoditize graphs as a feature in the same way that's already happened with vectors. Every DBMS, every respectable DBMS these days has added vectors and vector um similarity as a feature, much in the same way, um, including Neo4j, by the way. Much like um, you know, prior to that, JSON, XML, um, objects, you know, they're these have all been absorbed into other database platforms. You you might have some specialized databases to the side that are used at the very, very high end, but that's not the norm. Um, so there have been multiple attempts over the years to do this with graphs. And the reality is if you come back to the benefits that I talked about, schema flexibility and then 100 to 000x performance on 10x, the hardware for anything deeply connected, those depend on something that's built natively from the ground up for graphs. Um, in other words, you don't get either of those benefits with a relational database. Um, and there are some technologies out there. Um, I'll I'll pick on Google Spanner Graph, where it exposes a graph data model, but only for reads, not for writes. And your data is like literally have to model your tables and do your writes into tables. So you you really miss these benefits. So I have no doubt, look, use whatever technology works for you. And there are going to be uh to the degree that a category proves useful, it's only natural that companies that already have data platforms are going to add some level of graph support. And that's great. I mean, that's good for the world. I believe in this as a movement. Um, but uh for doing anything serious, uh, there's almost like a reverse 80-20 with graphs. You need 80% of the functionality in the platform to get 20% of the benefit. Um, so it's it's not something that can easily be grafted on. So I I believe that that plus the fact that the world inherently shows up as networks of biology, ecology, computers, communication, disease, uh all these things, um, ideas, uh, as well as in hierarchies and journeys and paths, customer, customer journey, patient journey, that um the category is going to become much, much more significant than it already is.

SPEAKER_01

Well, that's a mic drop moment, uh to say the least. Uh, what are you excited about the rest of this year personally, professionally? What are you up to? What can you share in terms of uh uh roadmap or what have you?

AI For Graph Ingestion And Queries

SPEAKER_00

We just dropped something that we've been working on for over a decade in different iterations, which is you know, a knock on graphs, I think, or a limitation has always been if if I have more data than I can fit inside of a single image, then I need to start splitting it up. And about five years ago, we actually developed a way of doing this called fabric, which lets you federate between graphs so you can split data up and do it thoughtfully. And but you you need to think about it. And the my my my View as a database professional has always been I don't want end users to have to think more than they should have they should think about their domain and their problem, the application they're building, not how to organize and arrange their data. Um so we came up with a scaling technology called Infinigraph, which does a vertical sharding. So instead of splitting up the data sideways so that your graph gets broken up, the graph remains intact. And I'm sharding out my property values, which often which usually are like 90-95% of the volume, um, is the node properties and the relationship properties. Uh that gets done in the background. And so we're pretty excited that 10x is the size of graphs that can be um held. We've we've uh we have multiple customers who um are uh actively trying it out. Um we have some customers who've actually already um already procured it, and this gets graphs into the 100 plus terabyte range.

SPEAKER_01

Wow. And kudos to your brand team and Finny Graph. It's a great name. Uh congratulations again on all the success and what you're building. It's uh phenomenal.

Security Use Cases And Closing

SPEAKER_00

I'll add one more thing is AI has made it much easier to solve some of the really difficult problems that have existed in in terms of getting one's data into the graph and learning and using it. So there's now technology, and we have MCP servers and agents and you know, all kinds of tech for this front ends for saying, let's just point to a relational database, decide what tables we want, click a button, and then even if there's no referential integrity, which is the case for certain platforms like Snowflake, to infer the graph model and then click the data and bring it over. Um that we can do thanks to you know, throw the model at a um in an AI agent and the and the uh LLM basically helps create that. And then likewise on taking unstructured text and doing any extraction, bringing that into the graph. And then likewise on creating queries, which helps democratize query writing. Um and and a fun surprise has been models are better at writing graph queries using the cipher language in HSQL, which is an ISO standard, than they are SQL queries. Um, because cipher queries are much simpler and more terse. SQL queries can get very long and cumbersome. Um so not only do they run better, but models are better at writing these things using a graph language.

SPEAKER_01

Amazing. Well, it sounds like we have a lot more to unpack. I'm headed to RSA, the big security show in San Francisco, in a couple of few weeks. And um sounds like a great solution for trust and uh and fraud prevention and data protection, all kinds of use cases there as well. Absolutely. So what a cool use cases. Yeah, what a cool environment. So thanks for joining. Really appreciate the insight.

SPEAKER_00

My pleasure, Elvin.

SPEAKER_01

And thanks everyone for listening, watching, sharing the episode. And be sure to check out our TV show, Tech Impact.tv on Bloomberg TV and Fox Business. Thanks, Philip. Thanks, everyone.