Selling Signals - the Data Monetisation Podcast

Selling Signals is the podcast for anyone building, selling, or buying data, with a focus on commercialising data in the investor ecosystem.

Each episode brings together industry insiders to share real, first-hand experience from the front lines of data sales. We unpack what actually works when turning raw data into revenue, whilst exploring other data buying silos to break down the walls between them.

Selling Signals delivers practical lessons to help data teams sell better and build stronger, more commercial data businesses.

All Episodes

Selling Signals - the Data Monetisation Podcast

Jonathan Chin: Building a Tier 1 Dataset

May 28, 2026 • James Worthington and Eric Evans • Season 1 • Episode 11

0:00 | 42:14

In this episode of Selling Signals, we’re joined by Jonathan Chin, co-founder of Facteus and author of Data-Minded. Jonathan brings a founder’s view on what it takes to build a transaction data provider trusted by institutional investors.

We talk about why data businesses do not behave like SaaS companies. Data often sells optionality rather than a fixed outcome, which changes the way buyers evaluate products. Jonathan explains where SaaS playbooks break down, what still carries across and why data quality issues are different from software bugs.

We also discuss aggregation, alpha decay and the role of AI in the future of alternative data. A big part of the conversation focuses on whether alternative datasets could become part of future model training rather than simply being queried through tools or MCP servers.

SPEAKER_02 0:01

Welcome to Telling Signals, the podcast focused on how businesses actually monetize and sell data. Each episode, we interview an industry insider to hear their experiences and lessons learned.

SPEAKER_03 0:11

The series is powered by Valtis, the company that transforms your data into investment-ready intelligence products.

SPEAKER_02 0:17

If you enjoy the episode, please subscribe from wherever you get your podcast.

SPEAKER_03 0:23

Today we're joined by Jonathan Chin, co-founder of Factist and author of Data Minded, which offers a unique founder's lens on the realities of scaling a data business. Jonathan has spent the last decade building one of the leading transactional data sets used by institutional investors. And in this episode, we'll unpack why SaaS thinking fails in data, how data products behave differently, how AI is reshaping the market, and what makes or what it takes to build something investors trust. Jonathan, welcome to the podcast.

SPEAKER_00 0:51

Hey, thanks for having me. Um really excited to jump in and talk to you guys about data.

SPEAKER_03 0:56

Yeah, welcome. Awesome to finally have you on, man.

SPEAKER_00 0:59

Yeah, sorry, it's been hard to schedule. Um it's been a busy year.

SPEAKER_03 1:03

It has it has been. It has been. So I mean, why why don't we start by sort of defining uh defining your career through the book and factius? Do you want to give a little intro in sort of what triggered it in writing the book and then the sort of quick background on Factius?

SPEAKER_00 1:17

Yeah, um, I'll probably go out of order. I'll start with the Factius. Um, so I co-founded Factius. Um, I was there for 15 years from the day one. Uh, we didn't start the company as a let's go find data and sell to hedge funds or financial institutions, but uh the genesis of the company had always been around uh spending behaviors was a valuable uh asset and a very valuable link in understanding consumer behavior. Um honestly, when we started the company, we actually thought a lot more about the corporate and that type of uh marketing um type of use cases. And it was really just one of those coincidences where funds were just starting um to get wind of the data and came to us. And that's how once we discovered that market, we really just dove straight into it and said, this is a great market, it's a great monetization. And realistically, I think at the time, I mean, I was there for 15 years, and we still I think as industry leaders here, we still say corporates are behind how funds use the data. If you imagine 10, 15 years ago, it was even further behind. Um, so then when we met funds and they really like the way they wanted to use the data made a lot of sense, and it just was a lot faster to market, faster um to get the company into that monetization space. Um, so that was kind of the genesis of the company.

SPEAKER_03 2:39

And and did they find you or did you find them in terms of the the investor angle?

SPEAKER_00 2:43

You know, it's it's kind of I want I'll probably say they found us. Um, I think it's a mix of like network connections, understanding what we do, um, and people connecting, but really like the genesis of like, hey, I have a use case, I want to buy your data. It was it started from their side coming to us. That was not something we imagined um in or envisioned in the future. Um, so the second question as far as the book, um, I think just through the years and the grind of like building business, um it was one of those things where you start to try as an entrepreneur and a professional to think through how do other people do it, what resources are available. Um, I'm running into some problems, or how do I make something more efficient? And you know, there's a lot of literature on SaaS, SaaS tech companies, all different types of SaaS companies, B2B, B2C, B2B to C, um, tons and tons of different large, small-scale self-serve, be you know, uh sales team driven. But uh as I did my own research and reading, just I found it was things were just a little different in a data company and just not the same. And this kind of the structure of the companies look similar. We have product people, technology people, lots of developers. Um, but and how we organize them is relatively similar. We've all got Jira boards, we all have Confluence, we all use Slack, um, we use Sprints, and it's it looks similar, but that was all surface level stuff. When you dove into like how do you how do you build a pipeline? How do you QA something? It's like totally different than a SaaS company. And that's I think the genesis of the book was more just honestly, it started with just a journal of learnings, like, hey, this doesn't work in SaaS. This is how we would solve it at Factius. And uh Lauren, my co-author, he he and I just kept a very long log of just lessons learned, we called it, or stuff um, that we just think, you know, that's good to just arc somewhere. And it just accumulated so large. We're like, I think this could be a pretty good resource in a book for someone else in a data company.

SPEAKER_02 4:55

And are most of those uh lessons learned kind of described in contrast to their SaaS equivalent? Is that how you you kind of structured it?

SPEAKER_00 5:04

Yeah, exactly. In the book, there's definitely yeah, I try to compare and contrast them, mainly just because it anchors the reader onto something that's more known on how something would be done in SaaS. Um and where um DAS is just different, um, similar but different.

SPEAKER_02 5:20

Did you find you were having to kind of react in real time via trial and error? Like you try some conventional SAS tactic or strategy, and then you go, okay, that that it's not quite working uh because it's a data business. How do we tweak it to be more appropriate? Or was it more like you would say read a book about SaaS strategy and go, that makes some sense, but given what like the industry we're in, we might tweak it. Like, were you doing it reactively or like proactively, I guess?

SPEAKER_00 5:48

It was definitely more reactive, I'd say. Um, as a startup and you know, you an entrepreneur yourself, it's there's not a lot of time to research things. But um, so it was definitely a lot of it was a lot of reactionary. Um, and I think the like our company had a lot of most of us came from software or some sort of SaaS background. So everyone's bringing their prior knowledge and learnings of years and years of experience, and it's like that doesn't work, like um, or it doesn't apply.

SPEAKER_03 6:14

And if we for a little bit, if we get explicit, what would you think were the top three themes that came from came from the book or from your experience that you felt that you know all data founders or people going on this data monetization journey from a SaaS background should be should be vigilant of?

SPEAKER_00 6:31

Um I think one, uh I think the number number one thing is uh what we what we consider QA and SaaS is almost like a perfect loop in a way. Like you build something, this is what it's supposed to do, and then you test what it's supposed to do. I think that fundamentally is like how you test code and very important at a DAS business. But at the end of the day, the failure isn't the code or the pipelines per se. A lot of times it's the data or the data coming in. And I think, like they said, number one, that's just you have to know that going into being a data entrepreneur and essentially be comfortable with that. It's it will break, things will break, and it's because not because you did a poor QA process, but um data is the data coming in is something you do not control. And since you do not control it, you cannot account for every eventuality. Um, I think being comfortable in your skin that way helps a lot. I think be I'd say prior to really that mindset, it's like, what are we doing wrong? Why do things keep failing? Why does the pipeline keep breaking? And it's frustrating, and you know, you're annoyed at engineers that should have not had, you know, why are why why does our production pipeline keep breaking? And you know, obviously there's obvious reasons sometimes like bad, like you can't have bad code. I'm not like trying to blanket cover this concept conceptually, but it does it does help like understanding that things can happen and root cause analysis, like, oh, this was a data issue that we just didn't know, and the vendor never told us about it. Maybe the vendor didn't know, but that's actually nine nine out of ten times the vendor doesn't even know their data as well as data companies. Take it in. Um, so that was number one. Um, I'd say number two, um that's a good question. That's a good question, Eric. Um I think number two would be probably uh also it's again kind of a mindset concept, but knowing that every data is different. So, you know, what worked for transaction data will not work for geolocation data. And like in SaaS, things are very reusable. Um and in DAS it's it's maybe. It's always a maybe. It's maybe a part of it is, or maybe none of it is. Um, but it's it's not uh in and that's okay. I think like like I said, you have to understand like if you're a company that's gonna take multiple data sets in, like that are completely disparative different. I mean, even multiple transaction data sets are very different. I mean, you guys know this too. Um, and fundamentally they all need their own expertise. Um and then yeah, I don't know, Eric, maybe you you you named the third most important.

SPEAKER_03 9:21

I was gonna say I I I agree with you because I think there's a lot of businesses out there that are thinking coming to market that that data just immediately equals revenue, and not all data assets are created equally. Uh and actually there may be data assets that are already solving that problem, and you're just competing in in the same world that that your data is an additive beyond that point. So um yeah, that there's a lot of work that needs to be done on on top of that. Um, but no, I agree with those two points. Um think more about sort of the Factius journey, and we think about the sort of product development. You you built Factius into into sort of a tier one transaction vendor as it's known in our world. What it'd be great if you could define what people mean by tier one credit debit card panels, but like what did it take to actually build something like that?

SPEAKER_00 10:13

That's uh a lot of therapy. Um but uh but uh I'd say so. I mean, one, I think for anything to be considered tier one, I think it has to have size and breadth. Uh that just fundamentally has to be large enough, at least in the finance world, to cover essentially national, um, national coverage. Um there's always going to be some bias in data sets, but for the most part, there's national coverage. And I think uh some sort of daily, um, some sort of daily to less than one week cadence of uh regularity, I think would consider it tier one. I know there's there's probably a handful of other random data sets out there that are like monthly updates, but you know, in in the industry we don't consider them tier one. I think the uh the real-timeness within a week, um, either one day lag to four or five days lag would be considered tier one. Um and I think today versus 10 years ago, um, a big difference in kind of tier one data providers is kind of the readiness of the data to consume. Um, when we started 10 years ago, more than actually 15 years ago, it was it was definitely much easier to just provide raw data from essentially what the almost be a middleman. Like I'd say that definitely was where we started. Like we were kind of a middleman, um maybe anonymizing things in the middle, but then like passing over essentially what was just passed to us um to the funds. Uh now funds expect a lot more um fidelity and kind of the data, the QA of the data, meaning like, are there dupes? Um, how do they resolve each other? Tagging, um, no schema changes, consistency in schema, documentation. I think all that used to be optional. Now I think if you're considering a tier one provider, all that is like mandatory, which is a lot of work um to document and do.

SPEAKER_02 12:08

Um so and coverage level, what would you consider? Like you mentioned national coverage, what sort of percentage of uh a nation and and which nations are most important? I mean, you know, obviously the US is critical, but what percentage coverage of like US consumers would you consider like kind of tier one level?

SPEAKER_00 12:26

I think so. For US, it's probably the standards are higher just because there's more competition. So I'd say, you know, probably between at least at least a point like uh between one to probably five percent of US spending is probably what I'd consider large enough in tier one. Um, I think other countries, it it's probably more of a scarcity issue. So it I think as there's less data sets in UK and less data sets by EU transactions, then probably that um is it it it's different. I'm less familiar with that. But um I think yeah, probably one to five percent minimum is uh good blanket like tier one coverage size, which then with that much data, you probably you know, you probably have a few hundred tickers or names that um have enough volume that funds can really dig into and trust data for or you or use the data for.

SPEAKER_03 13:20

And you mentioned that during the time uh fact is as as you guys matured in the market, that the pro the work that you had to do on the data became more and more. Uh why do you think we're seeing sort of why do you think that happened over time? Was it sort of being able to serve uh service a longer tail of the market? It seems like that sort of renewed interest in sort of aggregation rather than raw data is is here today. So like what do you why do you think that changed over time? What drove that?

SPEAKER_00 13:48

I think I think it's a combination. I think one, as more vendors, like you know, there's multiple transaction vendors. I think we as competitors were trying to outcompete each other. It's like, okay, if I provide better tagging than my competitor, then that'll allow me to hopefully create more market. And that was just us improving the product and trying to beat some competition rather than using price as like the only lever, is the readiness of the data, different cuts of the data, different lags that we offer just so that we can create maybe more perfect uh price economics that way. Um, I also think the other force on the fund side was just as you just I mean, there's a chart. I think uh one of those alt data, like I think the bit data has their alternative data.org or something. There's a chart on how many data sets are available um every year. And I think as more data just became available, the funds got in it. They they used to do a lot of the work that I was just describing on prepping data. And then, you know, you go from if they're only consuming one to two data sets, their team can definitely do that. When they're starting to consume 10 to 20 data sets, spending so many hours and man hours on just transaction data was just not efficient. And so then they tried to push that back onto the vendor side. I think the combination of the two, just competition and just the saturation of just more data sets in market that were consumable for the finance and funds industry, um, really led to everyone needs to be more prepared. Funds want to just consume the data with as little integration as possible.

SPEAKER_03 15:22

Yeah, I I I I feel that too. I think as I sat in the seat commercializing to the buy side, it felt like there's this consensus that the bigger firms will take raw data and will tickerize itself. But it does slow, still slow down the process. And there is such an abundance of data out there now, and their pipelines are so full that it's almost like they have to prioritize the data that's already been tickerized so that they can get through the pipeline efficiently. And as we talk about that, as you weigh up quality of data set, so finding that that data set that only you know about versus the ability to derive insight out of that. How do you feel like the market sort of evolved in that perspective? It feels like sourcing is not a challenge for the buy side anymore. So it feels like the the sort of exclusivity or the idea of finding that that data set no one else knows about is is much less common now.

SPEAKER_00 16:20

I I completely agree. I think one, you're you personally and probably your former firm new data had a lot to do with some of that. Um just I mean, you guys exist to help find data that didn't exist years ago, you know, that like wasn't that was an internal function only at funds for a long time. So to your point, who had a better internal function used to that used to be an edge, um, which now it's not. Uh but I think that's just the maturity of any market. Um, I think that's a classic just growth curve or um industry maturation that any market would go through, um, where edge used to be kind of the expert edge on where to find something. And now it's more about how you mine the the alpha or the gold out of the data, um, is more more important, or the combinations of data um that also I think is becoming kind of more important. How do you combine the data sets instead of each one being treated as an independent source?

SPEAKER_03 17:24

And I I mean, uh I don't know if you felt this, but as you entered into that tier one bracket, you you accumulated clients. There's a lot of talk about alpha decay. Um, did you feel like that generally existed? Did you feel that friction as you got bigger? Were some clients churning because they felt like the data was becoming less valuable as it got into more hands? It feels like the the alpha decay theory would prevent businesses like similar web, SP be as big as they are.

SPEAKER_00 17:51

I think I I mean, I don't I'm not gonna doubt that there's a concept of alpha decay that happens as data becomes more saturated. Um, but that being a uh I think that had a lot more to do with pricing negotiation than sales negotiation. I'd say that I would I would I would probably plug it in that category. Um because I think by definition, if there's alpha decay, then there's a bit of beta in there. So you uh meaning you don't if you don't have the data, you're missing out on some market trend that everyone has. Um so I think that's and I I actually think at one of the New Day conference, I mentioned that as a vendor, it's um funds are always looking for alpha. And I think nirvana for a vendor is becoming beta. It's you basically just have to have us, or else you're not at level. So um, you know, that that friction is always gonna be there. Um, but I think that when that convert when those conversations start coming up in pricing, to me, that actually means, okay, then we're actually saturated enough where you don't want to pay because you're saying there's less alpha, but at the same time, you're telling me all of your peers have it. So if you don't have this, you're missing something versus you need something or you're looking for something new.

SPEAKER_02 19:13

And I I guess also once you become uh let's say you become a kind of a beta provider uh in that respect, you um you end up it ends up being the case where the funds need to then assume everybody has this data and change their own strategies on the basis that everybody has transaction data. At that point you go, okay, well, what does the transaction data tell us? And what other data set, it comes back to your point about how you combine these things, what other data sets can we get that tell us something deeper, which means that we can not only guess where the market's going to move purely on a transaction basis, but will then give us, I don't know, maybe rather than just calling the quarter, we can call this quarter from you know earnings, but also guidance and then you know guidance for the quarter after it and things like that.

SPEAKER_00 20:02

Exactly. Yeah, that's it just levels up everything or everyone got leveled up. And that's exactly what you're kind of saying about similar web or honestly, like SP and kind of we'll call the beta data sets. Like everyone has to, every fund when you when you start a fund, there's like some baseline data that you need. There's multiple vendors, but you have to have them. Um, I'm not saying transaction data is there, and we're still in the alternative data industry, but I'd say amongst the buyers of alternative data, transact. We got lucky, transaction data is one of those things that like is on everybody's list and is on everybody's agenda. They have at least one vendor, maybe more than one. And you know, that's whenever renewal cycles come up, that's where some you know friction comes and where people talk about pricing and uh alpha decay. But like I said, alpha decay to me, the other side of the same coin is beta.

SPEAKER_03 20:53

I've never thought of it like that. So that's really interesting. The the AI conversation then around uh does it lower the sophistication barrier? Essentially, will AI bring more buyers into the market? How are you viewing that from your position of time in market?

SPEAKER_00 21:10

I mean, I definitely think uh from uh from the perspective when especially when AO is launched, I definitely was thinking AI is gonna lower the barrier and create more data buyers. Um, I'm not sure, to be honest, if I've seen that trend happen yet. Um uh, but I do feel like as a user, it it does lower the barrier to um understanding or digging into and analyzing something. Um, so whether that's being pointed at alternative data yet, maybe has, maybe has not. But I do I do think from a technology perspective, that help can help create. And when I say lower the barrier, I you know on the fun side, you know, say it's cost. It lowers the cost for them to consume something because every data set costs money, every data set costs man hours. So I think by lowering that. cost or entry cost, we can it can grow the market. I am a believer that like it'll it's lower, it's lower code, lower um domain knowledge. You can bake a lot of it in uh into the into the into the platforms or into the AI platforms. So the learning curve is going to be a lot faster. Granted, funds need to trust it, the AI, I mean that that part of the technology trust has to be there first. But once it is there, and I think we're all getting there and seeing it, I think it will lower the barrier and grow the adoption.

SPEAKER_02 22:38

So but you know discussing AI, you've obviously been in the industry for for a long time. As you said, you know, fat you were um at Factius for 15 years and you've seen the like evolution of um the transaction data from kind of very nascent to essentially being you know everybody's got to have it. AI is obviously changing every industry very rapidly. Every well at least every other conversation we have on the podcast, the term MCP comes up. We also regularly are talking about the um the kind of consumption uh pricing model as opposed to kind of the more traditional uh enterprise license pricing model. Um on the research call we we had a discussion not only about those uh matters which are kind of happening to some degree in in real time but also about uh ways in which AI maybe more fundamentally um is going to change the alternative data industry so I'd I'd love to hear like your thoughts on you know where where do you and you can kind of answer however you like but if you were to do a kind of how where we might be in a year, where we might be in five, um that could be a that could be an interesting angle. But yeah just love to know how you think the industry is going to change and and what it has.

SPEAKER_00 23:56

Yeah I and I totally agree. I think most of the time when we talk about AI everyone talks about how it's going to internally change what they're doing or how it's going to affect their industry. Like how does AI affect my industry? And I think that's that's a normal reaction and I think it's affecting our industry totally I think in technology and in data and DAS there's tons of efficiencies that it creates. I use AI every day uh as far as like my efficiency as a product um a product a product guy like what used to take me plus engineers is now down to me which is pretty amazing um and kind of fun actually um to use uh but I I think where alternative data where we have a really interesting and unique play and I think role to play in this AI story that's evolving and coming of age um is actually the data itself. Like right now um all of us from a career perspective have sold data or monetized data or helped the finance industry use the data. I think in my opinion AI is the AI industry can be a consumer of alternative data. I think that might who knows that might even be a bigger larger frontier for us as alternative data people and professionals than the funds themselves because I mean you see how much money is going to that industry it's crazy. And globally and what I mean by that is uh like the dead internet theory now or how LLMs have essentially scraped and learned from most public what we'll call public human interactions that are available on the internet. They're striving for more things to consume that are human-centric human interactions with the world and I think alternative data is actually a huge source of that and just me knowing transaction data I was like transactions are human interactions with human money and merchant or vendor or service even. And that whole system is not tapped by LLMs at all. And I I won't like try to be an expert or understand how LLMs are trained or how those uh networks work or anything. But you know all I do know is that the more they want more genuine human interaction because you don't want them to learn on other AI outputs because that's cyclical becomes just you essentially start to get you know it the the intelligence and the learnings just become worse and worse if you train it on itself. And it wants you want it to learn more about the world. And we have like similar web is click interaction web interactions of humans and webs, especially if you can if you can filter out bots. And you know human foot traffic data is humans moving around um at a very granular level I I think there's just we have a wealth of human interaction data in alternative industry alternative data industry which is exactly why the funds liked it. The funds wanted it because they're trying to understand how companies are doing and companies that sell to humans um have we have data on humans buying things from companies or walking into stores. So you know to me that's like a very nice parallel now right now LLMs are very good at understanding things in words and context. Yeah and transaction data is not like that foot traffic data is not like that. So like I said I'm not an expert on that technology but to me there's that that might be an opportunity um maybe there's a bridge layer maybe some technology in the middle that allows LLMs to consume transaction data in a friendly way like a Reddit post versus how it is structured now or same with foot traffic data. Maybe that's an opportunity in the industry. But at the end of the day I think like if if we look at AI as actually an industry for alternative data to be consumed, I think I I I'm excited about that and I think that that's a potential opportunity for all of us.

SPEAKER_02 28:00

Yeah I guess it's super interesting that like because as you say it's um large language models are trained on text and therefore when they see you know a number they treat that number as like you know part of potentially a a chunk of text right um and they've they've they're able to um interpret that in certain ways but you know my um my because I did maths at uni I've got like a bunch of a bit of a loser on LinkedIn I have a lot of like maths influencers that I follow and like I can't like at least once a week somebody's posting you know this math problem that uh you know chat GPT like most advanced model isn't able to solve and that's because fundamentally it doesn't understand like the mathematics underneath that. That being said you're also then again almost once a week reading about some physics professor who used ChatGPT or like deep research, something like that, and it was able to like massively aid them in writing this paper and it totally you know got all this like really really high level maths absolutely bang on. And so there's kind of completely agree with you like what does that mean? What does it mean? What would what would it mean to train a large language model on alternative data particularly the kind of numeric aspect of alternative data and and I and I'm also pushing myself well beyond my uh understanding I guess of this technology but like there's a what is it the case that you need a foundational model that is specifically trained to understand like numerical data. Someone must be working on this. In fact I I think I've heard of a few things like there's something called Cronos I think uh and then there's another one called like Moira AI I think that might be Salesforces they're all trying to build I think they're all going look we've if we've cracked or we're cracking the text side of this issue can we also crack the numerical side and then have a sort of you know like interplay between the two because that would be I guess where where the alternative data might go in. But I think you're right. Like is there a way of taking an alternative data set and distilling the key insights that are inherent in it into a text-based form um like I guess it from a transaction perspective you'd be you'd probably want to like analyze that and distill it into kind of like themes that you're seeing within it and then allow the allow the LLM to to train on that or something I don't know I'm sort of thinking out loud but it's a fascinating idea.

SPEAKER_00 30:32

I mean I think that might be the intermediate layer like you're just describing like maybe maybe today these AIs can't consume the raw data itself right now which then we know they can't we've all tried and we threw it against the wall and it doesn't work. But to your point maybe maybe that is the middleman layer it's how do you textualize transaction data so that an LM can quote unquote read it. When I say read it I'm like like literally read it like text. I I that was like those the analogy I sometimes use is like imagine if there was a way to turn transaction data rows and transactions of like rows and columns into like a Reddit feed and comment and then an LLM would understand it perfectly going to ask a really dumb question.

SPEAKER_02 31:18

If LLMs are currently really bad at math why are there so many data sets listed as MCPs are to be able to pull the data into a chatbot is that a dumb question no I think my understanding would be that there are like there's lots of tooling um so so what what the LLM is good at is like reasoning in certain ways. What I think we're talking about is like training an LLM uh like from scratch and allowing it to um be trained on that numeric data. But you're right I was gonna I was gonna say I guess what we have at the moment with the MCP is that kind of like uh first step in the direction of having LLMs interact with this stuff. Because they to be clear they can they can do maths they can do um they can do all of this stuff. It's just that they have these weird blind spots because they don't fundamentally understand it's almost like the way I like to think about it is and I'm sure people who work in AI would tear me apart for this but it's like it's computation as a a facsimile an incredible facsimile for reasoning and it turns out that computation and massive amounts of compute can become an incredible facsimile for what we would consider to be reasoning. But does the and this is the um onto Jonathan's point about training an LLM on alternative data that your question is does it is it training on alternative data in the and therefore storing that information in the way we would expect it to for it to be able to deliver insights rather than being able to like pull that data kind of understand it does that make sense and probably done a bad we probably that does that makes sense.

SPEAKER_03 33:00

Could have asked Stuart that I feel like yeah I feel like I'm a bit lost in this conversation conscious of time I think although I love conversations about AI and maybe this this uh sort of feeds into that but if we're looking ahead uh or forward in the industry where do you see the data industry heading over the next five years?

SPEAKER_00 33:26

I think I mean I do think some I think the element of AI to be honest in five years will probably come more from the data side than it is from the AI side. I think you're gonna see a lot more introduction of data companies producing an AI that can somewhat reason with data or their data better. It's probably going to come from that side more um I think that's a first step. I think um I think a similar trend what we're seeing on the funds will be uh that what we talked about more funds will consume data but it'll be we're gonna get further and further from the raw data I think as more data is being available AIs are going to be used intermediately um on both sides of the equation. So I do think the we're we're gonna swing all the way back to like less raw data again um and more the consumption of data is going to be more uh uh curated and aggregated and whether that aggregation comes from the funds and using AI or the vendors also using AI, I think that has a huge part of it. I think the corporates are going to totally like we talked about lowering the barrier to entry on the fund side. I think more importantly that hopefully can help on the corporate side I think cracking cracking that open I think part of part of the issue is economies in the world are pretty you know suffering. I think so I think brands and companies are gonna look for intelligence and data more to understand things or try to save their jobs as executives there. So I think from an alternative data perspective I think that will open more and as well as lowering the barrier. I think the A like I said the AI component does help with that. And then I think I I think those are some of the themes I see coming probably I don't know if it's five years but um already starting to happen. And I think yeah I think there's gonna be new uh I think there's gonna be new types of data. I think that's the other thing AI is doing. If the way we consume internet and information is different and changed there's a lot of things changing. So I like if if the way we use the internet's different then similar webs data it's not irrelevant or bad it's just different now. And are there other sources now that have been created from that because of the AI kind of interruption in all of our lives. So I think like that's gonna be interesting like what kind of new data interactions are going to not just emerge but like what's important? Like what will a fund be looking for in the next five years? There's gonna be some something on the menu that is different that we probably didn't expect um influenced by AI. And I'm not sure what that is and I don't know what insight they're trying to find but I just know like all of us have changed behavior. And so that's gonna change the data itself and it's also going to change it's also probably creating new data sets.

SPEAKER_03 36:34

Yeah and I think the really it's going back to the whole AI conversation like if you think about SEO and then AEO yeah when you're when you're searching in Google you're returning 25 pages on one screen but you're returning every every page you think about the blue links. But with AI it typically is recommending unless you ask it to recommend more three to five options. So if you're asking a question about a brand or a product it's not surfacing hundreds of links it's surfacing one to three and a lot of conversations going on at the enterprise level is how do we make sure we're one of those brands that's surfaced. And I mean there's lots of conversations probably around you know fair competition and all how to govern that fairly for organizations and new businesses coming and but uh yeah how do you get access to that data? How do you know which which business are being surfaced? What types does that vary by type of person, prompt, feelings where's that data coming from yeah it opens up a whole tin of worms for sure.

SPEAKER_00 37:34

I'd also say I don't even know if the nine out of 10 people would know that it comes from three sites. They just read the AI summary they're satisfied with their answer and move on. I mean Google gets the credit most of the time for that. But yeah that like SEO as an industry is totally turned on its head. Like what does it even mean now? So uh that is interesting. Um it's very interesting.

SPEAKER_02 38:02

And I wonder at what point the um like your open AI is your anthropics are going to start monetizing their own data or if they'll or if they'll just never do that and they'll keep it fully internal.

SPEAKER_03 38:13

Well a lot of people now are just prompt because you can prompt through the API right they're just prompting like there's companies out there God forbid the yeah God forbid the the water and energy is being used but like prompting it 10,000 times a day using similar prompts to understand what brands are being surfaced over time and how that changes you know that's a data set. And that's a data set but oh my god is that using serious energy and I don't know how sustainable that is in terms of as you know AI costs either increase or decrease I don't know where it's going to go but that's a data set in itself that a fundamental firm may or may not care about but a corporate definitely does today.

SPEAKER_00 38:48

I think I just read that Utah the state of Utah here in the US approved a data center the size of Manhattan Island wow Utah they haven't broke ground or anything but I'm just like I don't even know what that like that's crazy.

SPEAKER_03 39:06

Utah has too much space um constant time maybe the the last question um so the the book that you wrote to do full circle came out two years ago if you were to rewrite it now is there anything you would add or change I don't know if there's anything necessary I'd change um but my God like in two years the shape of AI I don't even know what I wrote about AI to be honest probably sounds like terrible um looking back two years whatever I spoke about and know about but um yeah I think I think the how AI is shaping things is is like um something I'd definitely add maybe there's another addition um on how that works I I I want to say I might have alluded to the idea of alternative data selling into AIs um as as a as a thing.

SPEAKER_00 39:57

A visionary yeah I think so right um I I don't know if I I don't think I'd change anything specifically mainly just because most of what was written was learned the hard way we'll say so I know those worked um but you know uh I think there's probably a lot more shortcuts today than um available I mean even even like if you think about like even during our time or my time like going from like FTP to S3 to Snowflake share was like honestly life changing in some ways when you're sharing data with a font. It used to be so much more complicated and now it's not um and just those types of things like also uh have evolved a lot but um yeah I don't I don't I don't know if I'd change too much Eric I'd probably just add. That's good. Well do you think there's anything I should change?

SPEAKER_03 40:50

No I enjoyed the read. I read it about two years ago in fairness uh so uh and I enjoyed I enjoyed the read um but I wouldn't say just because technology is advanced makes your writing terrible it it just means that technology is advanced and and that was written at a point in time.

SPEAKER_02 41:05

It was a point in time that's that's fair. And I think also what was it 2023 or 24? That was 23 wasn't it the book yeah I think so that's a pretty tough time to have been able to predict what was going to happen over the next year. I think with the chat GPT I mean it was out at that point right but it was GPT 3. Yeah exactly was the amazing it was amazing but just less amazing and and that's I mean you as you say you talked about like FTP to S3 to Snowflake. I guess now would you say like step four now is MCP?

SPEAKER_03 41:39

Maybe yeah I mean maybe yeah like that might be that might be it yeah I could see I I think that's evolving into that awesome I love I love the conversation Jonathan always love catching up so um I appreciate you coming on and uh yeah let's catch up soon in person.

SPEAKER_02 41:59

Yeah hey thanks for having me guys appreciate it thanks so much Jonathan it's been great