Biotech Bytes: Conversations with Biotechnology / Pharmaceutical IT Leaders
Welcome to the Biotech Bytes podcast, where we sit down with Biotech and Pharma IT leaders to learn what's working in our industry.
Steven Swan is the CEO of The Swan Group LLC. He has 20 years of experience working with companies and individuals to make long-term matches. Focusing on Information technology within the Biotech and Pharmaceutical industries has allowed The Swan Group to become a valued partner to many companies.
Staying in constant contact with the marketplace and its trends allow Steve to add valued insight to every conversation. Whether salary levels, technology trends or where the market is heading Steve knows what is important to both the small and large companies.
Tune in every month to hear how Biotech and Pharma IT leaders are preparing for the future and winning today.
Biotech Bytes: Conversations with Biotechnology / Pharmaceutical IT Leaders
Revolutionizing Biotech Data Infrastructure with Stavros Papadopoulos
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
The future of data management in biotech holds the key to faster discoveries and groundbreaking innovations. With new technology shaping the field, using structured multi-modal data could change how scientific progress happens.
In this episode, I’m delighted to be joined by Stavros Papadopoulos, founder of TileDB. With a distinguished background as a Senior Research Scientist at Intel and a contributing member of the MIT CSAIL, Stavros brings a wealth of experience to our discussion.
We talk about the challenges of data management, from breaking down data silos in biotech to how multi-dimensional arrays can improve data storage and access. Stavros also shares how TileDB’s technology is set to make data handling more accurate and efficient.
Learn how these tools can speed up biotech breakthroughs. Listen now!
Specifically, this episode highlights the following themes:
- The challenge of siloed data systems in biotech
- The importance of a universal data management approach
- TileDB’s role in accelerating scientific discovery
Links from this episode:
- Get to know more about Steven Swan: https://www.linkedin.com/in/swangroup
- Get to know more about Stavros Papadopoulos: https://www.linkedin.com/in/stavrospap/
- Learn more about TileDB: https://www.tiledb.com/
Stavros Papadopoulos [00:00:00]:
In my opinion, my vision is that we need to go faster. If we're going to see in our lifetimes, we're going to see discovery of important breakthroughs, then we're playing not against each other, but against.
Steve Swan [00:00:16]:
Welcome to Biotech Bytes, where we talk about technology with leaders in the tech industry for biotech companies. I'm your host, Steve Swan, and have the pleasure of being joined today by Stavros Papadopoulos, the founder and CEO of TileDB, which is a software company for scientific discovery that helps to structure all data types. Stavros, thank you for joining us. Excited for this?
Stavros Papadopoulos [00:00:39]:
Thank you, Steve. I really appreciate the opportunity.
Steve Swan [00:00:41]:
Yeah, no, this is great. And I always like starting out just real basic for our listeners. Tell me about you. Tell me how you got to where you are today. How did you ascend to being the founder and CEO of TileDB?
Stavros Papadopoulos [00:00:55]:
I'm a computer scientist by training. This is the first company that I'm creating. I started off in Greece. This is where I did my bachelor's degree in computer science. And then very early, I left Greece to go to Hong Kong to do my PhD, continuing of course, in computer science. And this is where I work mostly with data. It goes all data management from the beginning. At the time, I was doing a lot of algorithms and data structures.
Stavros Papadopoulos [00:01:22]:
At some point, things got a little bit out of hand and I got into cryptography. So I was doing a lot of mathematics, always related to data and data structures and algorithms. After I finished my PhD, I stayed in Hong Kong for another three years. I became a professor. So I was very much into academics. And in 2014, I got an amazing opportunity to come to Boston, where I still live and work at MIT and Intel Labs. It was a dual appointment. I was working for Intel.
Stavros Papadopoulos [00:01:50]:
I was a senior researcher at Intel Labs, but I was stationed at mit working with the database group there, one of the most powerful database groups in the world. I had the privilege to work with these folks. And actually I started B very early on. Probably in the first month the idea got shaped and we said, okay, let's go and test it out. That was the beginning of everything.
Steve Swan [00:02:10]:
And so you had the idea. Now, did you. Did you have partners there or is this your baby? You did this.
Stavros Papadopoulos [00:02:15]:
When I started mit, I worked very closely with Michael Stonebraker, who's a God in databases. He. He did so many companies and he's a creator of Postgres, among other good things, alongside with Sam Madden, another amazing professor at. @ MIT. Some people at intel, initially, the ideas were just a Research project.
Steve Swan [00:02:37]:
Okay.
Stavros Papadopoulos [00:02:37]:
We were trying to understand how to better model data so that we marry the high performance computing work that was being done at intel and the database work that was being done at mit. What kind of data do we want to manage and handle and process and analyze so that these two domains, which look very, very different, they can come together? So high performance computing is mostly about supercomputing, you know, a lot, of course, a lot of CPUs, problems like computational linear algebra, a lot of mathematics. But this mathematics lies at the core of machine learning and deep learning and other sophisticated operations. So. But then in the database world, databases are doing a lot of sophisticated operations. And, and I noticed that those two domains don't go to the same bars like people from those two domains don't, don't hang out a lot together. And there are two different streams of work, so I wanted to marry those. This is how I started reasoning around data and how to model data in a much more general way so that we can structure any data beyond the traditional tables that relational databases could handle.
Steve Swan [00:03:49]:
That's awesome. Now, so kind of a sidebar. My nephew, who does a lot of AI work, and he's working for a startup out in California, he was just telling me that a lot of what's going on now with the AI work is exactly what you just said. We're proving out the theorems with mathematics. So the AI can, you know, loosely anyway. Reason. Right. Reason a little bit better, you know, and, and he said, think of high school calculus and algebra, Uncle Steve.
Steve Swan [00:04:14]:
That's what we're doing. It's, you know, we got to get through it that way.
Stavros Papadopoulos [00:04:18]:
It's pretty much linear algebra. And linear algebra is about very efficient computations on matrices and vectors. And other people are calling them tensors. Multidimensional arrays. We use the term multidimensional array. So that was actually the Observations team. I was observing that no matter what form the data is in on your disk, be it genomics, transcriptomics, imaging, point clouds, whatever tables, when you bring it in main memory in the machine and then in the CPUs and GPUs, the data gets converted into vectors and matrices and arrays. So the question then, the scientific question that I had at the time was why aren't we storing this data in this format to begin with, since we're converting into that format to process it.
Stavros Papadopoulos [00:05:04]:
So that was the very deep scientific question that I had. But then the ideas evolved from there and the question became a little bit bigger, and the question was, can a single database system exist that can deal with all data, with multimodal data. Right. With different types of data. Not just the tables, not just the images, not just the genome, but all of it. And that was the beginning of everything. And we pull this off with this magical data structure, the multi dimensional array.
Steve Swan [00:05:32]:
That's awesome. I love that.
Stavros Papadopoulos [00:05:33]:
Yeah.
Steve Swan [00:05:34]:
Because everything I'm hearing about right now, and you know this, obviously you're in the middle of it. Everything I'm hearing about is the data, the storage, the retrieval that. I mean, my podcasts are all these leaders within biotech. Right. Big consumers of data. Right. I'm both commercial and the R and D side, mainly the R and D side. That's where a lot of that big data is and that's where you guys live.
Steve Swan [00:05:52]:
But we can talk about AI all day long. But if you're not putting data into it that's decent, that's like not having gasoline for an engine. Right?
Stavros Papadopoulos [00:06:00]:
Yeah. So that was a kind of pleasant consequence for us as well. I mean, AI distracted people a lot because there was a huge hype and very, very quick one. So people got distracted a little bit and they shouldn't, they should be focusing on data to begin with and then AI. So first start with the data foundation, then go to AI. But here's the thing, how it helped at least us. When I was talking about multiple modalities at the time, we were talking about 2014 to 2017. When I was at MIT, the company got created in 2017.
Stavros Papadopoulos [00:06:32]:
But during those three years I was trying to tell people and I was trying to find use cases for a multimodal database system. A database system that can capture more than table in the business world, there was little appetite for multimodal data. It was all table. It was all tables. Like if you go to a financial institution, they're going to tell you we have tables and time series, you know, mostly 90%, perhaps more so when I was talking about a multimodal database system, there wasn't much appetite. We found applications in defense. A lot of geospatial data that are multimodal, that is multimodal, but not so much. And of course life sciences, and we're going to talk a lot about this, but not so much in the business world.
Stavros Papadopoulos [00:07:12]:
And now we, with AI, everybody's talking about multimodal data because the, the large language models are multimodal. In other words, those organizations are realizing that in order to gain better insights from their businesses or for their businesses, they need to combine the tabular data with their PDFs with their text, with their email, with pretty much all the data that the organization is generated. And this is a great ground for us.
Steve Swan [00:07:38]:
That's where you're standing, correct?
Stavros Papadopoulos [00:07:41]:
And we are ahead because we've been doing this for 10 years.
Steve Swan [00:07:44]:
That's awesome. That's great. And, and yeah, it's just amazing to me all these things that have come to, you know, come to converge on all this right now, you know, so now when it comes to the data. Right, Here's a question for you because I've had people say this on my podcast, some members of the, of the biotech community, you know, they'll say to me, you know, I asked one, one gentleman who's the CIO of a real large now a large biotech, he said, listen, we corrected it, we got our data in shape starting probably 20, 15, 16, right in there, right. So we're good from that point forward, but we got to go back and we got to deal with all our 30 years worth of experimental data and all that stuff. Does TileDB help with that? You know, help me, help me. CIO of a large biotech with 30 years worth of data that I'm having trouble getting through. Is that something, you know, it's just so I fully understand.
Stavros Papadopoulos [00:08:38]:
Yeah, absolutely. Great question. And actually it's bi directional, not moving only towards the past, but also to the future. And I'm going to give you an example. One of the first use cases that we had was with the Broad Institute while I was having the intel and MIT hat, not during the TileDB years. And the problem was that they amassed a lot of genomic variant files, right? So genome sequencing data. And they had reached the scalability. Not only them, the, the, the entire life sciences domain, they had reached the scaling ceiling for handling this kind of data.
Stavros Papadopoulos [00:09:13]:
The data exploded. And of course, they had a lot of legacy data in the past. So the legacy data is typically stored in bespoke formats. It's not like CSVs. It's something more, more complex than that, for sure. More complex. And the data is bigger. So you need to deal with this legacy first.
Stavros Papadopoulos [00:09:30]:
So this is the one direction moving to the past. Like, how do we deal with all these legacy formats out there? And there's a lot. And when, when we're talking about multimodal data, we're talking about a lot of formats. That's the thing. If it was one, it would have been easier. We would have focused on one data type and that's it. But the problem is like there are hundreds, so big problem going towards that direction. But also, and this is what a lot of organizations don't understand and, and this pertains to organizations that are building technology in house themselves for bespoke formats.
Stavros Papadopoulos [00:10:02]:
And this is what I'm completely against. And that's why I'm seeking for universal solutions, not for bespoke. A bespoke solution is limited by time. Like you build it once for the data of your time and of the past time, but not of the future time. We go the other direction as well, into the future. And the example here is that genomics was the first use case. So we adapted the TileDB technology, this multi dimensional array technology for genomics and it worked phenomenally well. But then we hired an amazing person in our company, Bioinformatician, who came in to work on genomics in one of our one on ones very early on we're talking about four or five years ago.
Stavros Papadopoulos [00:10:45]:
He says, you know what, like in life sciences there is another data type coming up a lot. And this is what the big pharmaceuticals are looking into. And this is single cell. This is like RNA sequencing, right? It's transcriptomics. There is genomics, but also there is transcriptomics. And this is another up and coming super valuable data asset for organization. It took us, of course, we work with organizations like the Transcript Initiative, so we work with luminaries in the space to do this. But we did it, we did it with array.
Stavros Papadopoulos [00:11:19]:
We didn't build a new technology. We adapted the core technology we have, which is arrays, in order to handle single cell as well. And then before the LLMs came up and the vector databases and vector embeddings, we kept on telling organizations, adopt a technology that is extendable, do not adopt bespoke solutions that doesn't scale. The number of data assets as exponentially increased. And then all of a sudden vector embeddings became a big deal. And RAG LLMs, right, for generative AI. It took us two weeks to build a library for embedding with arrays. Why? Because a vector is a one dimensional array.
Stavros Papadopoulos [00:12:00]:
So we're going both ways. It's not, we shouldn't be thinking only about the past and all the legacy formats, that too. But what about the future? There is, I guarantee you in the next five to 10 years there's going to be yet another data type in life sciences and others. And people are going to say, oh, our current infrastructure doesn't handle it, we need to add another solution. Yeah, but that's exactly the problem. You're keeping on patting things instead of completely solving the problem and you can solve it only within universal technology. There's no other way.
Steve Swan [00:12:29]:
Right? Yeah, no, that's, that's, that's great. And that's where you guys are sitting. Are there other companies doing what you're doing that you know of or trying to.
Stavros Papadopoulos [00:12:37]:
So not exactly. So there are we. The domain, the domain seat is very crowded and we empathize a lot with our customers and our prospects and any other organization that is looking into technology because there are thousands of solutions and there is a lot of noise. So we empathize with that. Someone may say, yeah, to an extent, people are doing it. Maybe Snowflake is doing. Database is doing it. Not really.
Stavros Papadopoulos [00:13:01]:
Like, if you look into their infrastructure, it's not the same. We build it from fraud with foundational primitive. Usually in order to be able to pull off what we're pulling off, you need to put together 10 different solutions. But that's not what we're doing. We're not an aggregator of other people's solutions. We have a single solution built from scratch. So a single solution, universal, built from scratch, which can handle tables and files and generative AI and genomics and transcriptomics and point clouds and so on, so forth. No.
Stavros Papadopoulos [00:13:29]:
Why? Because they didn't want to build a universal technology. They wanted to build in a modular, incremental fashion yet another solution which complements the platform that is not sustainable, it's not scalable, it confuses and it creates a very convoluted infrastructure for organizations.
Steve Swan [00:13:48]:
Sure it does. Yeah. And like you said, you now can use that across. I mean, if different groups have different data structures, right. Commercial versus R&D or whatever. Right. You know, if they have different data sets that they're looking at, you can handle it, you know, and you can pull it into their analytical tool, whatever that is, whether it's AI, whether it's, you know, I don't know, informatica, whatever it is, you know, doing their work or looking at something. Right.
Steve Swan [00:14:11]:
You know, that's, that's awesome. So what, so let me ask you this. So you know, you guys saw this, right? You brought it forward. You said you got a little lucky, right, with the AI thing. It kind of came into your wheelhouse, which is awesome. Right? What are you seeing coming down the road that you think, you know, may help us here, right, in our industry, in the biotech industry, or what are you angling yourself towards to get ready for? What do you see certain folks needing to do or being able to do that? You know, tile DP is going to help us with.
Stavros Papadopoulos [00:14:41]:
I can only speak about our vision and our mission and what we would like the world to be like. Right. Because other people may have different opinions and they may go about it in a different way. But I can tell you at least how I think a vision like TileDB can help organizations accelerate discovery. Focusing on biotech for the audience. TileDB doesn't just do biotech. It's just that this is of extreme focus in the next couple of years with TileDB because we see a lot of traction, we see a lot of importance. Right? We believe this work is important.
Stavros Papadopoulos [00:15:13]:
But TileDB is more universal, as I mentioned. But I will speak specifically about life sciences and this can be extrapolated to other domains as well, specifically in life sciences. I'll tell you the blockers, the blockers for acceleration and for innovation and for realizing things like personalized medicine and other stuff like this and discovery in general, like drug discovery in general. The first problem is the silos of the data. In an organization like you have some of the smartest people I have met work in biotechs and a big pharma in hospital, and they're completely siloed from each other. They don't collaborate because their solutions are different. A different solution for oncology, for imaging, different solution for bioinformatics, for genomics, different solution for single cell, different solution for tables, different solution for files. And they don't communicate with each other.
Stavros Papadopoulos [00:16:04]:
For those more advanced organizations who spend a lot of money in infrastructure, they end up building massive teams, very big teams who are trying to harmonize this. So instead of reaping the software and either building or buying a universal, holistic, singular data infrastructure, they're keeping those bespoke solutions and they're trying to build layers on top. That takes a lot of time, a lot of money, and it's not scalable because as soon as a new tool comes in, you need to integrate that tool. And then a new tool comes in, you need to integrate that tool that becomes completely unsustainable and extremely, extremely expensive. So. But those more sophisticated organizations at least have these layers. They're calling them data meshes. Some are calling them the modern data facts.
Stavros Papadopoulos [00:16:51]:
Like, you know, there is a lot of different terms that people are throwing around. But again, these organizations that should be focusing on discovery, they're focusing on building infrastructure, which I find crazy.
Steve Swan [00:17:00]:
Yeah, right. That's not their business.
Stavros Papadopoulos [00:17:02]:
It's not their business. It is equivalent. It's crazier the way I'm going to phrase it now than it is to say yeah, Big pharma is building infrastructure. That's crazy. Imagine tb, a database company trying to find the cure to cancer with web labs. But we're not a wet lab organization. But it's equally crazy. Why isn't it crazy to say that in a big pharma, or any other organization for that matter, whose main business is not data infrastructure, they're building infrastructure from scratch? You should be partnering with organizations who are doing this 247 for years with deep expertise in the topic.
Stavros Papadopoulos [00:17:38]:
And this is exactly what, again, that's the vision. There are different opinions. They may say, no, we really want to build the infrastructure ourselves because we believe it's ip. We will disagree there. The IP is in drug discovery, right? It's in design, not in the data infrastructure. The infrastructure at some point becomes commodity. So that's not your. It cannot be the ip.
Stavros Papadopoulos [00:17:58]:
People will, you know, at some point the technologies will converge, the performance will converge. That's not your ip. Your IP is in the science. We want to help accelerate the science. And we're seeing that like organizations that could be trying to build infrastructures for years and years, they can take advantage of TileDB and in a couple of months they can be up and running. But you're saving all this time. So it's truly a game of time. Giving enough time and money, the pharmaceuticals or anybody else will be able at some point to build something that is at the same level of Taliban or somebody else, like Snowflake and database or somebody else.
Stavros Papadopoulos [00:18:36]:
But in my opinion, my vision is that we need to go faster. If we're going to see in our lifetimes, we're going to see discovery of important breakthroughs, then we're playing not against each other, but against time.
Steve Swan [00:18:50]:
So, great, why don't you tell me what you think or what you're seeing in the future for our industry, for biotech and such.
Stavros Papadopoulos [00:18:57]:
I'll tell you what my vision is with TileDB and specifically for the life sciences domain. We are up against time, right? Given enough time and resources, at some point, you know, the technology is going to advance and multiple different technologies may converge around which one is better, faster, cheaper, but we're up against time. What we can do with TileDB is accelerated. Accelerating, right? Because the technology exists today, we believe we are ahead in terms of the features and the sophistication of the technology itself. And we can help organizations, as I mentioned before, avoid the silos, right? The silos exist and this is what is blocking organizations. So we can des. Silo. The organizations bring a holistic catalog, a holistic system so that the organizations can view all their assets, all their data assets.
Stavros Papadopoulos [00:19:47]:
And specifically in life sciences and biotech, these assets are very diverse. So have a holistic view of all of these, have a holistic government across these assets. And then exactly. Because of the RA technology of Tagby, for the most difficult data assets, we can put structure in them. And this structure gives performance, it expedites the analysis and it brings the cost down. So this is how TileDB envisions to accelerate the discovery in these organizations so that in our lifetime we see the fruits and we bear the fruits of these discoveries and these breakthroughs.
Steve Swan [00:20:27]:
That's great stuff. I love hearing this because again, everybody that I talk to, all these technology leaders that are running all this data are just stumbling on this and this makes so much sense. I even see a lot of these P and BC firms trying to own the whole value chain. Right. With data. I mean, they see data as the petroleum of our, of our generation. And especially with AI coming on so strong and so big. Again, data's the fuel that fuels it.
Steve Swan [00:20:53]:
Right. You know, so being able to pull it, like you said, is just incredibly powerful, you know.
Stavros Papadopoulos [00:21:00]:
Absolutely. And I do believe that organizations realize how adversely impactful the silos are, that every, every subgroup has its own data and its own solutions. I would just urge against particular pitfall we have seen organizations realizing that the era of consolidation of technology, consolidation has, has come. And that's a good thing in my opinion because right now technologies are very fragmented in organization. Too much software to deal with. But you shouldn't be doing, you shouldn't be doing it in the wrong way. And the wrong way is to try to force fit unnatural data types in a single purpose built database. Even a relational database is purpose built.
Stavros Papadopoulos [00:21:40]:
It's built for the purpose of storing tables. You cannot store an image in a table, you cannot store genomics in its entire complexity, in their entire complexity into a table. You can't do that. You need something much more versatile. And this is what we're doing with TileDB. We're offering something that is much more flexible than a table and that can capture these modalities efficiently. Not just with a, with a half measure like we. In an optimal way or in a near optimal way.
Steve Swan [00:22:13]:
Yeah. And scalable, which you've said a hundred times today, which is everything you're saying. I mean, it's awesome. Again, based on everybody that I'm talking to now. Let me ask you a question. I mean, we went through the technology, we went through some of the use cases there. But let's just say I'm watching this and I'm thinking, I'd love. Maybe I'd love to work for these guys.
Steve Swan [00:22:34]:
Why would I want to work for you? And why would I want to work for your company? What, what, what. What separates you guys out from. From, I don't know, somebody else. Away from the fact that your technology is totally awesome and it's. And it's going to change, you know, the way we do things.
Stavros Papadopoulos [00:22:47]:
You know what? Like, I. I always ask myself and my leadership team this exact question. Question for our own benefit, right? How do we retain all this talent? So what is it that we offer to our existing team before we promote this to people that we would like to take part in what we're doing? So this is a recurring question. We're trying our best, of course, for the answer to be awesome again, for our own benefit as well. I believe one of the most reoccurring themes is the sense of purpose. We are at the cutting edge of technology. We're building something new from scratch. We have assembled some of the smartest people on the planet, and I kid you not, excluding myself, okay, I'm speaking for my team.
Stavros Papadopoulos [00:23:30]:
Some of the smartest people I have ever met in my life. I work with us at TileDB and we're building something meaningful, right? We're building something that could contribute in some way to discover a cure to a disease, right? Or to detect a rare disease, as we're trying to do with radio, Children's hospital, the nicu, right, for newborns, so that we save the lives of babies. Because we're going to take action on something that the data has revealed. So there is a huge sense of purpose in the company, and everybody knows it. And we make absolutely certain that we repeat this internally, that, folks, this is why we're getting up every day, and this is why we're spending so many hours on this. So definitely the sense of purpose. The other thing is that, again, if you're into technology, I mean, it cannot get better than that. We're working at the pinnacle.
Stavros Papadopoulos [00:24:24]:
At the pinnacle of technology. This is completely new stuff that we're inventing. In the very least, we're keeping ourselves up to date with other technologies as well, so that we're kept on our tone in order to. In order to be able to stay ahead. And finally, I would say the team. The team is remarkable. Like, I can't be more proud about what we built with the team and the team that we have assembled. Like half of the team has PhDs.
Stavros Papadopoulos [00:24:48]:
And this is not to say that we're not going to welcome people without. But this is kind of remarkable. We're building technology here. We're not a research institution. And yet. And at the same time, everybody has amazing personality. Of course, we, we hand pick everybody, as you can imagine. But one of my proudest moments is seeing this company grow with these individuals.
Stavros Papadopoulos [00:25:10]:
At the end of the day, the company is the team.
Steve Swan [00:25:12]:
Yeah. And you said, I think you said you're around 70 folks now or so.
Stavros Papadopoulos [00:25:16]:
About 70, yeah.
Steve Swan [00:25:18]:
Is everybody based in Cambridge? Where's everybody located?
Stavros Papadopoulos [00:25:20]:
The headquarters are in Cambridge. This is where I started the company. We're fortunate, at least in my opinion, to be a remote first company since before the pandemic. So we got incorporated in 2017. The pandemic hit in 2020. So we were fortunate because the pandemic did not impact our operations whatsoever and we maintained it. We never changed back to an office only policy. We do maintain physical offices.
Stavros Papadopoulos [00:25:45]:
We have one in Cambridge. We're about six, seven people here. We have a newly created one in New York City. We have a big hub now growing in New York City. We have a big presence in Greece. We have about 25 people in Greece and growing. And then the rest are kind of scattered all over the US and some people in Europe as well. So fairly remote.
Stavros Papadopoulos [00:26:07]:
But we've been very good at making this work very well, operationally well.
Steve Swan [00:26:11]:
And I'm sure that that helps you too, right, with your recruitment and with getting the best talent. It doesn't matter where they're located. They can get it done for you. Right.
Stavros Papadopoulos [00:26:17]:
Indeed. That played very well with certain talent that, that we were very fortunate to hire. Whereas if we followed an office only policy in Cambridge, it would have been quite difficult to get these individuals.
Steve Swan [00:26:28]:
Right. Right. Well, so another question for you really about your technology. What makes Tile DVs technology different? You know, if I'm a CIO, sometimes I might want to think about those things. And that's, that's my audience. Right. So take me through that real quick.
Stavros Papadopoulos [00:26:43]:
Yeah, I would say, well, at its core, I would say it's the multidimensional array as the universal data structure. To put structure into what other people see as unstructured data and to make it more efficient, especially with complex data that is not immediately obvious. How to model these data as tables, which is, which are a little bit more common or as blobs, as just flat, flat files. Right. So the multidimensional array definitely plays A very key role in TileDB and how we structure the data, how we obtain performance for complex and diverse data. But I would say the second thing would be the holistic approach we take. And we took this holistic approach since day one. We welcome all the data.
Stavros Papadopoulos [00:27:29]:
And now we have a very important upcoming release, early 2025, where this is going to be more even more pronounced. We take a holistic approach to cataloging, to authenticating users, to controlling the access, to logging the data for auditability for all the data in the organization. And this is quite important. If you overfit your technology to capture one data type or a limited number of data types, you will find yourself in a situation where the organization will seek for multiple different solutions to satisfy their data needs. With TileDB and this holistic approach and the fact that it is future proof, we're always there to welcome any new data type and make sure we optimize for it as well with the current technology without having to build ton of technology on top of it.
Steve Swan [00:28:19]:
I like that. Future proof. I like that. That's nice. So everybody's thinking about and everybody's talking about the whole world, not just you and me and not just some of my CIOs. Everybody's talking about AI, right? So, you know, maybe you can give me a quick appear to me. I would think in my head that TileDB would play great with AI, right? I mean, how about that? I. I know you said, you know, when you started this whole thing, right.
Steve Swan [00:28:44]:
That wasn't even really a thought, but now it is. It's got to be front center, right?
Stavros Papadopoulos [00:28:48]:
Yeah. When I first started TileDB and architected around arrays, in my mind I had machine learning, traditional machine learning, not what today the market is perceiving as AI, which is mostly large language models. AI is much more than large language models like ChatGPT. However, in my mind I started with a more unsexy things, so to speak, like the data plumbing. I was convinced that I need to start from that, from the foundation. And here's the reason. When you're creating machine learning models or AI models, you need to train those models on data. So you need to have infrastructure capable of this training.
Stavros Papadopoulos [00:29:32]:
But this infrastructure is not just a file manager with a bunch of files and some kind of a distributed computing engine to train the models. Like you need authentication, you need, need governance, you need access control, you need to be able to locate the data, you need to be able to version the data, you need to be able to version the models. But you see, once you start Adding functionality, features. Effectively, you're building from scratch a database, right? A database that could not satisfy your needs with the. Like the traditional databases could not satisfy these needs. So you need, you need a new database for AI effectively, right? For training the models, for accessing the data for these models, keeping track of which data you accessed to train these models, how to update these models. There are more things that come into the scene like this so called RAG LLMs, right? You need to create vector embeddings from data so that you provide context to these LLMs and answer advanced queries on private data that was not used for training the LLMs. This is what RAG LLMs do in a very simplistic description here.
Stavros Papadopoulos [00:30:44]:
Well, but this is also data. So then now you have the vectors and you need to store the vectors to access control the vectors, to version the vectors, you need to be able to associate the vectors with the actual model. So now you have special metadata handling for this. So eventually the whole AI space is a data problem. It's not even a compute problem. Of course a lot of compute is involved, but at the end of the day it is a data problem.
Steve Swan [00:31:12]:
Well, 75% of those projects fail and it's because of the data. They can't, you know, it's not working. You can build the sexy mob, you can make it look nice, but getting your data, having your data right, like you just said, they're like, whoa, we forgot about this.
Stavros Papadopoulos [00:31:27]:
Yeah, you need the data infrastructure. Let me, let me put it this way. You need to start with the data infrastructure. And then I believe AI can be a very, very important piece of that infrastructure built on top of the data infrastructure, which can provide a lot of value to an organization. But if you think about AI in a vacuum without talking about the data, that, that facilitates all these, all these models to be built and trained and versioned and evolved. I believe you, you, you will eventually fail.
Steve Swan [00:31:58]:
Oh yeah, there is, there is.
Stavros Papadopoulos [00:32:01]:
You will hit a data problem. It's not going to be the AI that is going to create issues for you. It's going to be something that broke on the data aspect.
Steve Swan [00:32:12]:
That's what I'm hearing over and over and over again. That's why I'm getting so fired up talking to you about this stuff. This is great stuff, you know, because I'm hearing about these problems all the time. So everybody I'm talking to could use what you guys do anyway. Well, that was great. I appreciate it. If you have anything else to add, want to throw it out? Because I have One final question I ask of folks, and I never tell anybody what it is, and it's. It's one question that I ask everybody.
Steve Swan [00:32:37]:
It's the same question. But if you got anything more you want to add that we haven't hit on from either technology or whatever perspective, go ahead and then I'll ask my final question. But if you don't, no big deal.
Stavros Papadopoulos [00:32:47]:
I think we covered good aspects of what we're doing here at Aldi B. We're always open to hearing back from, from thought leaders in the space. We're having amazing conversations with them all the time, and we're prompting most of these conversations, but we're very happy to hear from them. Anybody listening in can find information about alibi on talibi.com and they can reach.
Steve Swan [00:33:09]:
Out to you as well. Right? With any questions.
Stavros Papadopoulos [00:33:10]:
Yeah, directly, of course. Stavros.com I would love to exchange thoughts on this topic because this is serious. Like, it eventually affects lives. The way this technology is used or could be used, it can affect people's lives.
Steve Swan [00:33:24]:
And whenever I'm talking to people, this is just what I do for a living, Right. If someone tells me about, I'm going to be looking for, I don't know, I'm just going to make this up. A CTO that does X, Y and Z. As I'm just out talking to you, I'm like, hey, somebody told me that they're looking for that. Same way as with what you guys do, if someone brings it up, I'll say, hey, you guys need to talk to Tyler. You know, that's just what I do. That's. That's who I am.
Stavros Papadopoulos [00:33:44]:
So, anyway, and we appreciate that, Steve. Thank you.
Steve Swan [00:33:47]:
Yeah, no, no problem. No problem. But final question for you. I'm a music guy. I love live music. I love seeing bands. I love seeing concerts and things like that. So just to loosen it up at the end, I always ask folks if they have a favorite live band or a favorite live act, one that they've seen at any point in their life that they could say, you know what? That was good.
Steve Swan [00:34:09]:
Or if you haven't, that's fine. If you were too busy building your database, I get it, you know, so. But if there was ever anything that you can, that you can say you saw that you the best live show you've ever seen, or if, again, if you don't see many live shows, you're too busy doing your thing, then it is what it is.
Stavros Papadopoulos [00:34:25]:
It's definitely not about being busy. And if any of my Friends listening would laugh at this point because they will know that I'm not this kind of a person going. I mean, it's not one of the things I do. Let me put it this way, okay. However, I have attended live music festivals and bands and. But I will trace it back to my. To my teenage years when I was listening to. To Greek.
Stavros Papadopoulos [00:34:49]:
Greek rock, like Tripes, for example. And again, if any Greek is listening in, they would. They would. They would smile.
Steve Swan [00:34:55]:
They'll get it.
Stavros Papadopoulos [00:34:56]:
Yeah, they'll get it.
Steve Swan [00:34:58]:
I've had a bunch of folks talk to. Talk to me much older than you, and I on my podcast mention stuff from when they were 13, 14, 15 years old. You know, those were their favorite live act.
Stavros Papadopoulos [00:35:09]:
And don't get me wrong, I enjoy it a lot. It's just that I. I don't do it. I do. I can't explain it. I.
Steve Swan [00:35:16]:
And that's fine.
Stavros Papadopoulos [00:35:17]:
Some people don't like it, but I.
Steve Swan [00:35:19]:
Like your time and money elsewhere. Right. You know? Yeah, you spend your time. That's what. That's.
Stavros Papadopoulos [00:35:25]:
That's a great question, though, Steve.
Steve Swan [00:35:27]:
What is yours?
Stavros Papadopoulos [00:35:28]:
What is yours? I don't know if you shared it on another podcast, but maybe I'm the first one asking, what is yours?
Steve Swan [00:35:33]:
You're absolutely the first one that's asked me 100, man.
Stavros Papadopoulos [00:35:37]:
Everybody's neglecting the host.
Steve Swan [00:35:39]:
If I had to say my favorite live act ever that I saw, I really liked watching the Allman Brothers Band. Those guys could play guitar. They were just great, great musicians. And it's amazing to me how six or seven, or in their case, eight or 10 people on stage can play so tight in one song and all be playing different instruments at the same time. Are you kidding me? You know, it's like they know where each other's gonna go. Probably like a bunch of great computer programmers, right? They know. And I'll be honest with you, too. Show me a good musician, I'll show you a good computer programmer.
Steve Swan [00:36:11]:
Right? I mean, that's kind of the same, but, yeah, I'd say the Allman Brothers Band, that would be mine. But it's funny, again, nobody's asked. Nobody's ever asked nobody. So thanks.
Stavros Papadopoulos [00:36:22]:
It's great. It's very interesting.
Steve Swan [00:36:24]:
Well, listen, I appreciate your time. That was great. Thank you very much.
Stavros Papadopoulos [00:36:28]:
Thank you, Steve. Thank you. I appreciate the opportunity. It's great, great to talk to.