Jordan Hauer: Taking Stock of the Alternative Data Industry Artwork

Selling Signals - the Data Monetisation Podcast

Selling Signals is the podcast for anyone building, selling, or buying data, with a focus on commercialising data in the investor ecosystem.

Each episode brings together industry insiders to share real, first-hand experience from the front lines of data sales. We unpack what actually works when turning raw data into revenue, whilst exploring other data buying silos to break down the walls between them.

Selling Signals delivers practical lessons to help data teams sell better and build stronger, more commercial data businesses.

All Episodes

Selling Signals - the Data Monetisation Podcast

Jordan Hauer: Taking Stock of the Alternative Data Industry

June 12, 2026 • James Worthington and Eric Evans • Season 1 • Episode 12

0:00 | 56:49

In this episode of Selling Signals, we’re joined by Jordan Hauer, Founder and CEO of Amass Insights. Through Amass, Jordan helps investment firms discover, evaluate, and source datasets while also helping data providers better understand and monetise their assets.

Having spent more than a decade at the centre of the market, Jordan has a unique perspective on how alternative data has evolved, from the industry’s early days of scarce and highly differentiated datasets to today’s world of abundant data and rapidly advancing AI tools.

We discuss which datasets are seeing the strongest demand in 2026, how funds evaluate new data sources, and why sales cycles remain stubbornly long despite significant improvements in the market. Jordan also shares his views on trials, pricing, backtesting, and the common mistakes data providers make when trying to sell to investment firms.

Whether you’re a data provider, investor, or anyone building in the data economy, this episode offers a valuable look at where the industry is heading next.

SPEAKER_01 0:02

Welcome to Telling Signals, the podcast focused on how businesses actually monetize and sell data. Each episode, we interview an industry insider to get their experiences and lessons learned.

SPEAKER_02 0:12

The spirit is powered by VATIS, the company that transforms your data into investment-ready intelligent products.

SPEAKER_01 0:17

If you enjoyed the episode, please subscribe wherever you get your podcast.

SPEAKER_02 0:23

Today's guest is Jordan Hoyer, founder and CEO of OMAS Insights. He's one of the best-known faces in the alternative data ecosystem. Jordan is focused on helping investment firms discover and evaluate data sets, whilst also helping businesses better monetize their data. Jordan's been in this space for well over a decade and has had a front row speed to how the market has evolved. From the early data web scraping and these data sets through to today's abundant data world. In this episode, we get into how the alternative data market is changing in 2026, what funds actually want from data providers today, whilst sales cycles are still painfully long, and whether AI agents are about to completely reshape how funds sort and evaluate data. Welcome to the podcast, Jordan.

SPEAKER_00 1:02

Thank you for having me.

SPEAKER_02 1:05

I've done well with names so far, but apologies. Apologies. Maybe we'll just go JH going on.

SPEAKER_00 1:15

Welcome to the podcast, Jordan Hower.

SPEAKER_01 1:17

There we go.

SPEAKER_02 1:17

Thanks, James. Maybe to jump in, give us a quick background on yourself and MS Insights. I mean, you've been the founder and CEO for the there for about a decade. So it'd be awesome to hear what you've been up to.

SPEAKER_00 1:28

Sure. Yeah. Let me uh rewind a little bit because I think it's relevant. Um I started off uh as a professional as more of a data nerd, I'd call it, you know, information scientist uh by trade. Um so a little bit of an abnormal background for somebody in the hedge fund space. Um and I got uh headhunted by a uh firm called Hunter Global, which is a billion-dollar longshore equity fund, fundamental research shop uh back in 2012 or so. Um I was the first technologist they ever hired, which was pretty amazing to me, uh, understanding how those types of investors were operating uh without uh technical background uh back then. And so I got a real good sense for how uh the types of data that fundamental research uh investors were were using. And I knew that uh coming from the data world, there was a lot of other data sets out there, especially in the ad tech uh space, and saw the need for and the the opportunity to use some of those data sets to make better investment decisions. Um and at the time, this was you know 2013 or so, there was really only one quote-unquote alternative data set at the time, which was really credit card transactions. And so what was the next natural uh evolution of that was how how else can we get transactional data? Um and I did you know some soul searching and realized there were some big email platforms out there that were helping consumers get rid of spam and organize their emails better, um, had a couple million users and not uh you know as a free tool, so not a lot of uh of business there. And so I said, what if we aggregate and anonymize the receipts that are in these emails? Um so we were the first to ever start monetizing email transactional data uh back then. Um uh so fast forward to today, that that uh company was acquired a couple times. It's now part of Nielsen IQ. Um and uh after that acquisition, that initial acquisition, I went back to the drawing board and realized in 2015-16 that there was a lot of new data sources coming online. The term, the industry, alternative data was kind of just coming to the fore. Um, and it was hard for hedge funds and even data providers to keep track of what was out there and what data was useful and uh and who was interested in buying those that data um from both both ends of the spectrum. So um that's what we've been doing for the past 10 or so years is sitting in between uh a whole hell of a lot of you know 24,000 data providers that we've profiled, and then a a lot of uh hedge funds, about three or four hundred that you know regularly buy alternative data. Um and our main goal is to make the way that they interact and transact more efficient and effective, so get the data to be flowing a little more freely in the industry. Um, in in practice, we partner with a small subset of those 24,000 providers that we've profiled, and also with a small subset of the hedge funds that we we work directly to help them source data on their behalf. Um, and I'll I'll leave it there.

SPEAKER_02 4:32

That's an awesome, uh awesome background you have. In terms of the receipt data set, I imagine uh even by today's standards, it's not, I mean, AI makes it a lot more easier, but sort of extracting that sort of textual information from um from the receipts must have been quite challenging back then.

SPEAKER_00 4:49

Yeah, one of the uh main hurdles was parsing it accurately, parsing these these data sets accurately. Um at the time, there was uh there was no granular uh you know item level transactional data, um, which was a big advantage of having the email receipt data um over credit card transactions. Uh so you know early on, we were able to get a get a sense for, for example, I think back it was 2014 or so, Amazon launched their own phone. Um, and that was a potentially a big growth driver for the company itself, that at least they professed it was going to be. Um but you we could see from our granular transactional data that that phone was not selling at all. Um, it was it had very little sales on it after launch. So that was just one example of something you could get from the email receipt data versus not being able to get from the credit card transaction data. Um but uh but also the the just the nature of who is part of the panel is gonna be different um across different you know transactional data sets. Um and so you know the the some of the credit card transaction data sets had much different uh demographics than some of the uh email receipt data sets. So you can kind of triangulate uh trends based on you know multiple panels.

SPEAKER_01 6:09

And and did that become then the kind of primary um uh revenue source for the that the email spam company?

SPEAKER_00 6:17

Uh yes, it would they they made a very small amount of money through advertising within these roll-up emails that they were sending out. Um that was their initial kind of uh way of making revenue, but the vast majority of the revenue came from um selling data to hedge funds.

SPEAKER_01 6:33

And quite interesting as well that the the signal that you you described, the Amazon phone, was less about what was there and more about what wasn't there in the in the data set. There were no, as you're saying, very few um purchases of that phone, and that that was the signal you were finding.

SPEAKER_00 6:47

Yeah, and the only way you can see that is by by knowing the what items are actually in that transaction.

SPEAKER_02 6:54

Yeah, of course. Awesome. So if we take this to uh today, actually, and we look at Amass Insights. I I was looking on your website um uh ahead of this, uh and I saw you cover you know 20 plus thousand data sets. That's quite a lot. How are you covering so many?

SPEAKER_00 7:11

Uh-huh that's the that's the million-dollar question, right? Um the the the short answer is a lot of hard work over a long period of time. Um I'll give you, but I'll give you a little more detailed answer since we're we're we're chatting on the podcast. So um, you know, we've been I've been in the space for quite a while, like I mentioned. Um really the that that data bit the database and that uh directory has just started with a very simple spreadsheet of you know um any data provider that I'd heard of in the industry, um, you know, one column of the data provider, one column for just a general classification or categorization about what they do. Um and it's really snowballed from there. Um and what we end what I ended up realizing is there was all the there was all sorts of data sets that um were similar but also different in a lot of a lot of ways. So we what we had to do was create a taxonomy uh to categorize and characterize data sets in as much detail as possible. So it's actually not only 24,000 data providers, it's also about three or four hundred um pieces of metadata for each data provider as well. And what we do, the way the way we find them is by you know old-fashioned web research is what is one way. Um if you read an article and it's quoting a uh data source or a source, that is typically a data provider. Um and people kind of ignore um the the sources at the at the bottom of graphs that are in articles or or even just people people um quoting others. Uh, but we we pay you know very deep attention to that. We've been doing that for over 15 or over 12 years. Um the other thing is you know keeping you know uh close and deep relationships with uh a number of big players in the space and you know getting getting updates from them on a regular basis on what's out there, what they're interested in, and uh and what they what they've been seeing. Um another another way is there's all sorts of data marketplaces that are for lack of a better term, open source, or you're able to at least scrape um what's what's on them. And a lot of them are at the data set level. Um we actually don't really analyze uh data providers at the data set level. We typically are actually looking at them from the what I call the data provider or the brand level. Um so understanding who's actually the provider behind each of these data sets that are being offered online and and then profiling that particular company or that particular brand. Um there's a lot of government data sources. Um there's probably three or four thousand open government data sources that you can actually gather information about uh online. There's people also just don't kind of overlook certain types of companies as data providers. So there's a lot of industry groups that, you know, they they're it's kind of a give to get model where um people that are part of the industry group will upload data to them and then they'll get insights back on their behalf. Um and then sometimes those industry groups actually also sell the data externally. Sometimes they can't sell they can't sell it at a granular level, um, but there's still still potentially useful data, um, especially you know in like a uh industry lot wide or a macro use case. Um there's there's uh also just a lot of different creative ways um to source data sets. Uh one new one kind of being um all the MCP servers actually that are that are being uh um there's actually open source information about what which ones are out there and and a uh um a system that there's a system kind of the way that Google um analyzes uh websites, you can actually uh declare your website your data set as an MCP with uh standardized metadata on your website. And so you can actually scrape systematically what what MCP servers are out there. And obviously some of those MCP servers are more of a what I call a data tool or a service, um, but some of them are actually just data sets that are sitting on a server and you're able to retrieve them. So that's that's the that's one of the newer ways to find data sets, the new data sets that are out there.

SPEAKER_01 11:27

Has that been added then as a I assume as a piece of metadata to each mass data provider then?

SPEAKER_00 11:33

It's we're working on it. It's uh it's in progress at the moment. Actually, if I um there's uh about five or ten thousand MCP servers. Um, and I I probably about 20 or the 30 percent at least are actually data providers. Um, those have not even yet been added to our our system. Um so we we might we might be getting to 30,000 data sets or data providers pretty soon.

SPEAKER_02 11:58

The pro the problem with getting so big is it's like a snowball effect. You have to keep maintaining. So that the the more things you cover, the more things you have to maintain. It becomes uh sort of a self-efficating problem.

SPEAKER_00 12:11

For sure. There's definitely that issue. Um AI does help with that to some extent. Uh you know, they they can tell me when things are completely offline and shut down, um, clean clean up our our database a bit. Um we also do manual research. Um so we have a team of researchers uh that we've been working with for about 10 years. So they're experts in alternative data now as well that we uh employ and they are every day researching the new data providers that come into our system or even updating old ones that haven't been updated for a while.

SPEAKER_01 12:45

How do you go about prioritizing uh which um which data provider should be kind of manually assessed? Because obviously if you've got I think you said 24,000, that you've got to have some sort of system for saying this is the type of provider that is the most interesting and therefore there's the most value uh to us to go through and make sure that this has been meticulously curated.

SPEAKER_00 13:06

Yeah, a couple couple of ways. Um one of which is our categorization. Uh so we have about 139 kind of high-level data categories data categories um that we fit each provider into. Um and about half of those are not need not as relevant or as useful to the hedge fund industry, to the asset management industry. So that we kind of deprioritize them right off the bat. Um, but then on uh going further than that, it's it's based on client requests and based on what we're seeing as you know demand out there uh in the industry. Um so you know, if we have a client that comes to us and asks us a question about a particular name or particular thesis that they want, um we'll drill down in and and do a deep dive on that particular category of data on their behalf. So, you know, there are there are times when we haven't done, let's say, a project uh digging into uh I'll just make up make up a category like digging into clickstream for maybe a year or two. And so our data could be or our metadata could be a little bit out of date because we haven't we haven't looked into it in that in that uh year year or two. But um, you know, we if we have a a new request, we'll we'll go and and kind of redo that research again.

SPEAKER_02 14:24

Awesome. And if we move on to think about the alternative data industry uh as a whole, you you've you've sat in this market for more than 10 years. It'd be really interesting to hear your take on how the space has evolved during this time.

SPEAKER_00 14:38

Sure. Um I think one main kind of obvious uh way that it's changed is when I first started just having a relationship and having um access to a you know net new data set or net you especially a net new data set in a new category, like we did with with email transactional data, that was immensely valuable. Um and you know, you could fetch a pretty high price on a per client basis for each uh for each data set. Um whereas today there's not a lot, there's very few categories that have only one or two choices of data providers. Um and so each the pri the price of each data set is naturally going to come down or it has come down. Um and the the value of just knowing who is out there has come down significantly. Um so uh you know, and and that's kind of evidenced also by the fact that there's just a vast uh number of data providers that that we've profiled. So, you know, the adding an incremental data provider is not gonna be, you know, I'm not gonna move the needle for everybody. Um but also just the ways that people are using data sets have have changed significantly over the years. Um I think one of the main things that has affected uh the industry is how people are looking at quote-unquote KPIs, uh key performance indicators of companies. Um and really there's better ways, first of all, to get um estimates from the sell side on those KPIs. Um there's more people covering and and uploading their KPI estimates. Um and then uh using those and correlating them with the data providers, the alternate data providers that you're uh analyzing has become a much more common practice. Back when I started and you know was analyzing Amazon, for example, with the email receipts, I had to go and manually grab the right KPIs from the 10K and 102 filings um and and uh you know add them to my to my database and then compare them to the the actuals that we are seeing in our data set. Um whereas now that's you know being done either by AI or by by a you know financial market data provider, and then there's uh then there's you know analysts that are putting estimates on each of those KPIs as well. And so then you can compare your and the analyst estimates, you can compare the actuals, and you can compare the alternative data sets and you know kind of triangulate where you think that the the um stock is gonna go on you know their quarterly filing. Um whereas uh you know back in you know 2013, um there it we didn't have that kind of ecosystem built out. Um and so you you know you kind of had to make some some leaps. You kind of you had to hope that a KPI that you had that you were tracking actually did affect the stock price. Um because you know there there really wasn't a lot of people that were you know following those KPIs as as closely as as we were. Uh so but but now a lot of that is actually priced in. And so in some cases, you know, particularly transactional data, it's it's kind of table stakes at this point. It's it's still considered alternative data, I think. Um but to me it's not alternative anymore.

SPEAKER_01 18:03

You mentioned um uh the use of the use of cell side estimates and the fact that um given more KPIs being covered more uh broadly by the cell side, um that sort of changed the investment process. To what degree have you seen the cell side adopting alternative data? Because if if that is the case and they're adopting the same sort of alternative data that the buy side is currently paying for, at what point does the buy side start to say, well, we can kind of get a sense for what the alternative data is saying without actually licensing it? Because we've got the cell side providing us with a consent as estimate that already um is is pricing in all that alternative data.

SPEAKER_00 18:43

Uh that's that that's still super early, weirdly enough, like in terms of the the cell side actually adopting uh proprietary data sets. Um there's only I would say a handful or less of the cell side that actually does that to to a decent extent. Um one, you know, UBS Evidence Lab has been doing it for quite a long time, um, you know, prior to even when I was in the in the industry. Um so that's uh but but they they're they're very you know kind of um they're very specific on the types of data sets that they're looking at, and they're not looking at the data sets as broadly as um you know some of these big multi-strep uh funds will will be looking at them. Um and they're very naive in some cases. Uh not not just you I don't want to call out UBS. There's other sell-side banks as well. Um, you know, the the way that they look at data is is somewhat naive. They're they're trying to just create a buy-sell signal. Um they're not uh you know looking at things as granularly as you know you would need if you're if you're on the buy side, I think. Um there's and and they're not um offering you know analyzed uh data sets you know uh systematically. They're you know putting everything kind of like fitting everything everything usually into an investor research report. Um so it's it's a little bit different the way that you kind of consume the data from them. Um but yeah, you are you are able to get a little bit of the signals um through some of these sell-side banks. I mean the Barclays does a great job. Um they look at hiring trends, they look at uh some some transactional data um and they kind of fit that into their reports. And it, you know, it is starting to become priced in because of that as well.

SPEAKER_02 20:28

Yeah, and uh, I mean, from the conversations I have with other providers and and my day job, it's uh it's kind of twofold. There are providers, well actually probably maybe threefold. There's providers that decide not to sell to the sell-side because they seem they they feel that it's too big of a risk to cannibalizing their core buy-side business. Or they're actually in one of their investors is a sell-side institution, and therefore part of in getting that that sort of investment they're mandated not to sell to other sales-side institutions, or on the other side that they they sell a lacked feed. Um, which I think if you think about what's happening with the Bloomberg terminal, like providing your data to some of the other aggregators that a sort of delayed feed actually makes your valuable your data more valuable to someone that's gonna front run that signal because it's getting put into market. And there's a large cohort of people that are trading off that that delayed signal.

SPEAKER_00 21:12

But um It's the age-old question of whether you're gonna cannibalize your business by selling your data to the sell side. Um, I've my opinion on that has evolved significantly over the years. Um, when I first started, we wouldn't even go step anywhere near a sell side bank with our alternative data. Um, we we we just thought it was gonna tip cannibalize our business. Why would they ever come to us if they can get kind of the insights from our data through that bank? I've I've evolved my thinking significantly. Um like you said, there's a lot of different ways that you can license data, um, lagged feeds. Uh there's also when you when you sell data to any company really, the licensing terms are very important. So are you able to republish that data when you when Or are you, you know, in in the case of a of a selling data to a hedge fund, they're usually just the end user. And so you put that into the end agreement. They're not able to, you know, uh send on those insights to others, which they're happy to agree to because they that's yeah, they're proprietary insights. But obviously with a Southside bank, um, that's their that's part of their business. So the question is, how much do you charge them? Do you charge them extra uh for republishing rights? What kind of republishing rights do they have? Can they um republish data in full, or can they only publish kind of derivative products of your data or or only like charts based on your data? Um and so, and then also do they have to um sort, you know, put put your data, put put uh the sourcing of your data into the report. So it's it says that it says that you the data came from you know your a particular um alternative data provider. In that case, it actually might be good marketing how to especially for the funds that are that are that want to retrieve the entire data set and are not going to just be making decisions based on uh investment research report.

SPEAKER_01 23:08

Yeah, and I guess there's there's also an extra angle uh around the fact that when you're selling into the buy side, your data set is not being well very likely your data set is not being sold into a kind of like into isolation there. They have other alternative data sets. And there's a a question of how your data set fits and complements those. Just because you're licensing it to the sell side doesn't mean that they're going to be able to squeeze all the juice out of it that say uh uh the the buy side fund that was um licensing their research would be able to. So I completely agree with you. I think if you were able to get that them to agree um as part of the contract, if they use your data in a chart, et cetera, they give you a kind of uh a credit, then that could be huge.

SPEAKER_02 23:53

I mean your call outside pretty important, Jordan, is that if especially if you're a bigger organization, you know uh that is you know, I think a an enterprise corporate that's looking to monetize data, it's it's from a commercial legal perspective, they would see it as a completely different use case. It's a it's different terms. You know, the way that the data's being used and shared is completely different. So it's definitely worth thinking about from that perspective. As we talk about the different users and sort of how that's evolved over time, as you think about data sets today, you mentioned that back in the day um you know new data set type entrants were wasn't necessarily uncommon. That's much less common today. So where do you see most of the interest in data sets um in 2026 and why?

SPEAKER_00 24:36

Yeah, I mean, uh everything comes back to AI, right, these days. Um I think uh one huge trend I'm seeing is um anything with an MCP has significant significant interest from from at least those that are that have set up their systems uh to to handle that. Um the you know everybody wants to be able to just throw all the data set that they possibly can uh into their you know quad um and and query it easily with uh with an MCP server. So um and then kind of going along with that is huge open text data sets, um and textual, you know, messaging, chat, uh any kind of text that is unique and proprietary is of of high interest right now because you know you can you can actually make sense out of that without necessarily doing all the you know deep NLP research that maybe you were doing in in the past. Um and so for example, uh one of our newest partners uh has a humongous corpus of text or or or or you know words coming from uh radio stations across the world, um, which is actually uh surprisingly you can get real-time insights before anybody else through radio. For example, like the Iran strikes, uh a um in Israel, a radio station picked that up and realized that five minutes before you know Reuters or anybody else um picked that up. So um it's you know it's speed, but it's also insights that you can cat get from from you know these new uh text sources. But also just generally, like anything happening in the world is gonna affect the types of data sets that we were looking at. So in the past in the past few months, you know, geopolitical data sets, military data sets, anything that you can get a sense for you know this this these new wars, what's going on with them, um, macroeconomic uh indicators, you know, inflation indicators. I know that's especially use, especially interesting with the inflation numbers that came out uh much higher than expected yesterday or two days ago. Um and then data sets that are just generally not well covered by quote unquote alternative data for many different reasons, but you know, B2B transactions is has been historically hard, um, as well as just industries that are a little bit more opaque or or more complicated, um like industrials or healthcare and pharmaceuticals. Um they're just generally if it's harder to conceptualize how to use the data set um in an investment use case, that's both both a blessing and a curse.

SPEAKER_02 27:21

With the B2B blind spot, naturally that's because of the risk of cannibalizing your your core business, right? And and the other uh material non-public information risks that come with that monetization effort. Do you feel like with the normalization of these big corporates monetizing becoming more common and uh especially how easy it is to create SaaS tools now and you think about Spendhound building a uh payments procurement software and providing that for free and monetizing the data, the those types of software has become much more common in the sort of collection of B2B transaction and other types of B2B data become much more uh frivolous in the relation uh in the market?

SPEAKER_00 28:04

Um I think that people people are waking up to the value that they're sitting on. Um but I think it there's there's a there's a little bit of a disconnect because the the money that they think that they can get from selling the data maybe doesn't match up to the reality often. Um the you know when you when when you're outside of the hedge fund industry and you're looking on and you're looking in, you think, okay, these funds are sitting on a hundred billion dollars of AUM. Um they the net the next the this this data set, this net new data set that they're bringing on board, they can make one trade and you know make a hundred billion dollars from that trade. And so obviously they'd be willing to spend a million dollars on this data set. Um but I think that that's that's kind of a naive way to look at at this industry. There's um it it's it's really about the ROI of your data um in relation to all the other data that that these these funds are already using. Um so you know maybe they have some signals based on traditional financial market data about these B2B data sets, B2B companies. Um, and the the lift that your data set that your new your new you know spend hound data set is is giving them is you know only gonna uh make them an extra 1% of that stock. And and so in that case, um it's it's not worth them a million dollars to them. The way I think about it was always if the data set can make them they should be willing to people should be willing to spend one-tenth of what they can make with that data set. Um obviously you're never gonna hear from the fund what they can make, what they think they can make with the data set. Um, but if you can try to estimate that in your using your own methodologies, using your own platform, maybe um that's James's platform, um, then uh then you know that's that's that's super useful. I think you're right. Very helpful in your negotiations.

SPEAKER_01 30:02

Subtle plug for James, thank you. Subtle plug for Valtis. Thank you, John. Um so I think you're totally right that AUM as a metric for fund size is massively distorting to like what a person thinks that the fund's willing to pay. If you're if you're happy to think people are being paid to save money to well, then and so I mean, if you're happy to talk at a high level, what would you consider to be like like a low price for a data set, a medium price for a data set, and a high price for data? I'm aware that there's like so much uh kind of idiosyncrasy across the board there, and I'm asking a very broad question, but if you were advising like a new entrant of the market and they were like, what what are the the kind of tranches, what types of data sets sit in those tranches? Um it would be great, yeah, if you you had um any insight from that perspective.

SPEAKER_00 30:47

To make it to make things really simple, um, there's way more types of buyers than I'm gonna mention, but I'm gonna I'm gonna mention just two specific types of buyers. Um you know, to make it simple is there's like the the pure quant uh type of buyer, and then there's like the kind of mixed quantumental type, fundamental slash quantitative mix people. Um and I think that they look at data sets very differently and they they use they use data sets very differently, and and the types of uh you know delivery methods they want are different. So I think the pricing ends up being a lot different. Um with with uh like the pure quants, they often will say they want everything, um, the entire data set. Uh in practice, that's not necessarily what they think that they think they want it, but they may not actually be able to handle it in some cases. Um, but they are gonna be looking at typically a enterprise, uh, you know, enterprise-wide license um to use everything for any purposes they want. You're not gonna learn a lot about that about what the how they're using it. Um, but you're you want to think about them as like a uh uh your your highest paying, kind of lowest hanging fruit usually um buyers. And they're willing to spend you know maybe 50,000 on a yearly yearly basis for a new data set up to I don't know, you it's hard, it's hard to get over 200 to $300,000 these days. Um you know that may that maybe that's a little bit diff different than you know spec when I've started where you know there were there were mil definitely million dollar data sets out there. Um and then on the more quantum mental side, for some reason the number 30,000 like ends up and a lot of a lot of people ask for that specific number. I don't know where that came from, but like the the number like the $20 to $50,000 range is is more I think realistic in terms of like a uh a general range for pricing on a yearly basis. And that that might actually be a subscription just to a dashboard rather than the full data set itself. Um but as long as it gives them you know real insights into the names that they're trading and it's easy for them to handle it, then then they're willing to spend a premium on that.

SPEAKER_01 32:59

Sorry, what was that point about that the number 3,000? 30, 30,000. I don't know why.

SPEAKER_00 33:03

Oh, right, 30,000, right, right. That number seems to be asked whenever somebody asks like a fundamental fund, they always ask for that number. I wonder if it's something to do with like budgets.

SPEAKER_02 33:13

They've all got the same trainer.

SPEAKER_00 33:14

Yeah.

SPEAKER_02 33:15

The same negotiation trainer.

SPEAKER_01 33:16

Yeah. Always ask for 30,000.

unknown 33:19

Yeah.

SPEAKER_01 33:20

Not a penny more, not a penny less.

SPEAKER_02 33:21

If you can shoot first, you're price anchoring down. That's true. Um I bet I bet it's that. Um so I I guess while we talk about sort of pricing models, uh, we're seeing a lot in the industry uh uh of talk about consumption model-based pricing versus the sort of traditional licensing we've just talked about. Do you see the industry going through that sort of consumption-based model, especially as we see AI become sort of like if agents are going to come buy data like that will move naturally towards consumption. But do you see that really happening in our industry?

SPEAKER_00 33:54

I think I'm a little bit of a uh I'm a little bit against the grain on this with from a lot of other people. Um I actually asked this kind of very similar question to this to the crowd at my at one of my events recently. Um in February, I had a breakfast uh event. Um, and I asked the crowd whether they saw our industry, the hedge fund industry, moving towards a consumption-based model. Um, and the crowd was probably a mix, it was a mix between data providers and data buyers. Um, and really, I was actually surprised there was almost, or maybe even about half of the um of the attendees raised their hand and said they thought that the industry would go more towards consumption-based price pricing um going forward. Um that was that that was a little bit against what I think. Uh in I think that there's there's the there's the funds that buy a lot of data. So they're they're they could be quantimental, they could be quantitative, but they buy more than, let's say, a few data sets a year. Um a few alternative data sets a year. And um, and they typically they're they're wanting that I I don't think that they're gonna really change a hell of a lot of how they're um buying data sets. They're gonna be looking to buying buy you know enterprise-wide uh you know data sets that you know, the full data sets. And it if they were able, if they were planning on buying it on a consumption-based model, it would just cost them way, way more. Um But I do think that that if you're looking at like real-time insights on a particular name, you may only want that that insight once or a few times. Um, or you know, it's a it's a it's a need that's that's burning um you know at that moment, and maybe you know, you'd you'd hadn't you'd not you're not gonna necessarily need that in the future, then why not uh query an LLM and allow that LLM to procure the data for you, pay you know, a fraction of what you would pay for the full data set, um, and get that insight right away. Um I think a pure fundamental fund, um, they're gonna be doing that much more frequently than they are today. So they're they're gonna be probably net new buyers of alternative data based on, and they might not even know that they're buying alternative data in the background, um, because it it might just be through their LLM portal that they're they're analyzing that that data. Um, but then I I had this conversation last night at my happy hour, but what happens when you want to, for example, um you want to have just you want to know what the what the average uh uh dwell time of a user on a particular website is in a clickstream data set, for example. Um you you may need to actually, in terms of the amount of like records that you need to gather in order to do that that that analysis, um, it might be millions of records, but you only you're only looking for like one number out of it. So you're really looking, it's really just kind of one SQL query being run in the background. Um should you be paying for each of those, you know, let's say millions of records, or should you be paying for you know a margin on top of how much it costs to actually run that query? Um and that's a kind of open question, I think. Like, how do you I think it really comes down to like what is and this it comes back to the same thing. It's what's the value that you're getting out of that insight or of that data set in the end to the end user? Um and you know, I don't I don't have a good answer to this. I I think that uh we're we're gonna be figuring out the these new business models for selling data now that you know MCPs and and LLMs are querying data on our behalf.

SPEAKER_02 37:37

Yeah, I really agree with your last point around uh the way you price data sets is largely around use case, right? If if you might be pulling a very small amount of data in comparison to another buyer, but the use cases could be completely different. Um I think the great thing about consumption model is it does maybe open the longer tail of funds to the alternative data world that were largely priced out. Like I I think it's quite common knowledge in industry that starting a competitive quant hedge fund now is a very, very expensive task. And AI and consumption models would make that maybe uh that bar that barrier to entry lower. Um but yeah, the the way you the pricing of that consumption would need probably need to vary per customer, and that gets a bit complicated and convoluted.

SPEAKER_00 38:20

Yeah, I mean, and and there's only so many complications you can handle with pricing, and and uh you know you uh a firm needs to have standardized pricing to really scale their business.

SPEAKER_01 38:32

I I've also I was been thinking quite a lot about how fundamental firms um plugging into MTPs, and as you say, maybe not even knowing um what data set's being used to provide them with the like output KPI, et cetera. It it sort of it runs really counter to my experience of like the way these funds historically have like conducted their modeling exercises. Like for my sins, I did a brief stint on the IB side, and like there is a level of obsession with correctness of data, like to the decimal point level, that I I've my theory has always been that because there's there's kind of no uh the these funds don't tend, sorry, well, the the uh cell side is such a fundamental, in my experience, don't tend to track the forecast accuracy of these KPIs. And if they do, they don't they don't like keep that track record over time and and see um you know how much more accurate or much less accurate they're getting. There's therefore this sense that you use the the accuracy of the historical data as a proxy for the accuracy of the forecast data. And I've I've kind of wondered how that mindset going to chime with the notion that you're receiving um uh uh a forecast from a sort of non-deterministic, like LM derived system. Um I really don't know what what that's gonna like how how they're gonna accept that, at least not in my experience of the mindset there. I did but you know, maybe as um more senior people retire and more junior people become the senior people, like this some notion of vibe forecasting becomes a thing. But I honestly don't, yeah, it's just completely counter to my experience of the way they you know meticulously collate their spreadsheet models, etc.

SPEAKER_00 40:17

I I feel like the the trust that we have in these these new AI systems, these LLMs, is gonna change very significantly over the next few years. Um as you know, as we you get further from the hallucinations that you know plagued us over the last couple of years, and then as we get further to the like actually having more deterministic methodologies underneath the hood, um and having you know real-time access to uh you know good clean data sets um that that we all are you know kind of inherently trust already, um and being able to kind of call back to those data sets. Um I I think that there's there's a lot that's gonna change, you know, it's even just like like uh culturally, um as you know, the the the you know people coming right out of school that have been using LLMs for the last four years um are coming to the workforce, um, they're you know they're gonna be kind of more inherently trusting, I think, of of LLMs. Um but that that might be warranted based on the advancements in technology that we and and and data uh you know connections that we have for those LLMs in the next few years.

SPEAKER_02 41:29

I mean, a really it's a good point, but uh a really strong point to suggest that to your confusion, James, it's already happening is the company Rogo, right? That they've nailed the AI for MA banking. Uh they just raised 180 million to go and target the the uh the European and APAC regions, but yeah, they're flying. And that's suggestive that that's kind of already happening in that vertical. So that trust must be already growing.

SPEAKER_00 41:57

Yeah, then you have like uh bigdata.com, like Ravenpack, um, with their you know, we're really well trusted, like they're kind of the go-to like news alternative data source for for many years. Um and and they're kind of putting uh LLM on top of all their data, as well as a hell of a lot of other data sources under the hood. Um, and you know, it's purpose-built for the the investment management hedge fund industry. Um, you know, that you're not gonna get that necessarily um elsewhere, you know, from the from the big from the big AI companies.

SPEAKER_02 42:33

Agreed. The consumption model takes us on quite nicely, HT to I want to talk to you also was the length of sales cycles in industry. It feels like the consumption model could be a contributor to shortening that very long, painful sales cycle right now. But yeah, what are your opinions on where that is today? How you think we could you know shorten that as sales reps to help businesses build data businesses quicker?

SPEAKER_00 42:58

Um I think the first thing is uh education of data providers. A lot of them, first of all, a lot of companies that uh you know profess or try to sell uh data to the gas management industry, some of them have never sold data externally before at all. So there's there's a kind of a hurdle for them to you know get things prepared, getting their materials in place. Um and by materials I mean just really uh kind of just a handful of things that are you know key. Um one of one of which being you know just like kind of an overview, uh one or two pager about you know what you're doing, what your data comes from, what you know, potential ideas about where the value lies. Um then there's you know one of the probably most key uh pieces of information is having your having a nice clean data dictionary in place, um, and uh being able to express what each of these what each of the fields mean uh and how they can be used, um, as well as uh just an example for what what they look like in practice, what the data looks like in practice. And I usually call that a data snapshot. You could think about it as a small sample. Maybe it's a hundred rows or a thousand rows of your data, you know, actually putting your being able to put your hands on the data itself. And then a kind of an overview explanation for how how all that all those materials fit together. And being able to put that into a a nice kind of like succinct email or introduction to that data provider. That's kind of the first step that we go through with the the partners and the providers that we work with is just getting kind of those those necessary materials in place. That goes a long way. Then obviously having standardized pricing or at least a good idea for where you're going with pricing is helpful because it uh it it it will rule you out or rule you in um for certain buyers uh whether whether they are willing to to spend that that money on on this particular type of data. And then there's like there's kind of an overarching debate on whether it's useful to go much deeper in terms of your you know researching your data set and and putting it into practice and doing back tests. I think it's a net positive it's usually just I think a net positive to have at least a naive back test or use case report in place. And by naive I mean you don't necessarily need to go and and do a deep uh dive into you know if I use my data set in conjunction with financial market data sets and it uh and I you know was trading for the last three years I would have made you know uh 10% more than the SP and it would have been a sharp ratio of two. Yes you could have all that but just making just making showing your how how your data changes what the fund would have have thought of that that particular stock at that particular time is helpful. And by what by that I mean we were talking about this before but like how does my data set change my perception of the estimates of those KPIs that are that are going to be published in the next quarter. It may you know that maybe maybe those KPIs and and and that this is for the fund to determine maybe those KPIs aren't as as impactful for the overall stock as they wanted them want them to be but that's their deter that's them for them to determine. It's not for the data provider to to determine that that you know this it's not it doesn't always come down to how the stock price moves based on a data set. It comes down to um you know what what are the fundamentals of that company and how are they going to be affected by uh you know what's happening in real time you know in that are that are being shown in these data sets.

SPEAKER_01 46:52

Yeah I completely agree with you. And whenever I'm asked about back tests I say exactly the same thing. Like the signal validation in a sense is more important than a paper portfolio back test. Because as you say the fund themselves are going to take those insights and figure out similarly actually to what I was mentioning earlier on. We were talking about the fact that um the the buy side will have multiple other data sets. So it's not only what um what signal are you providing but also how does that signal complement the other data sets that they have for those names that uh your your data set covers. Right.

SPEAKER_00 47:26

And there's a couple of other like things that you know things that they really can use to uh speed up the process. If you have nice mapping um you know whether that's if you know it could be a to a ticker symbol which maybe is uh somewhat useful um but you know if they're if a particular funds use different uh identifiers it could be an ISIN a C DAL um or or like a figgy I think it's called Bloomberg the Bloomberg figgy um they they each uh each fund has their own flavor of your securities master that they like and so being able to plug very nicely into those uh is is help is definitely helpful. Ticker and entity mapping is actually a much kind of more complex uh topic than people realize uh especially in these alternative data sets um you know that you have brand names that are you know there that are you know there could be like multiple brands that that map up to a particular um parent company um there's there's all sorts of uh subsidiary companies that nobody's ever heard of that that show up in different places on the internet um and so having a good methodology for for how you map all those is helpful it could be done on the provider side or actually could also in some cases be done on the the hedge fund side um depending on their expertise.

SPEAKER_01 48:43

Yeah we're smiling because we uh uh earlier in the the season we did a whole uh hour long episode specifically on ticker mapping so uh Jordan that is very kind of it's your second act this was an accidental plug of about it because that's what we do we do ticker mapping for for vendors and and fans.

SPEAKER_02 48:59

The the funny thing that came from from that conversation that uh what you were just saying reminded me of is is the dominoes example is that there's different entities of dominoes in different regions and some of them are publicly traded or listed separately which I thought was quite comical.

SPEAKER_00 49:14

Well you and then you also have you have subsidiary companies that are that are being uh posted in the 10Ks uh often that's that's that's like maybe the best place to go to find the subsidiary companies that are out there but then you have all sorts of other places um one of our data partners is a the the biggest patent data set out there um and in patents you actually find subsidiary companies that you wouldn't find anywhere else um and it's in some cases really hard to figure out how you map them up to the their parent company um but it's still really important to to get that right and uh and that's actually one of the main you know selling points of of that uh data provider that we have you touched on it a little bit earlier about um sort of getting to the evaluation and and the funds will decide whether it's important to them or not and and how to extract value from from that particular data set. What do you feel like are the common or yeah common misunderstandings from providers as they enter enter into trial with the fund common misunderstandings let's see um I mean overestimating the value of your data set is is like the the most the most common uh thinking that that um a uh fund should be willing to spend money on a trial I think is a common misunderstanding um for better or for worse the the industry has kind of coalesced around free trials and usually it's actually been 90 days specifically I'm not sure where we how we got to that but that's that's how it's been um and the the newer entrants to the space they they often you know kind of balk at both of those things um but you know it does take quite a quite a long time in some cases uh often to to try all these and and figure out what the value of those data sets are so uh better not to to swim against the tide in my opinion in that in that sense um other misconception misconceptions that uh providers might have is that their data set is is useful in uh a much wider sense than they think um you know that they uh every day every data provider thinks that they can that their data set is is the most uh valuable data set out there but it may be only useful for one sector or even one ticker maybe um and that's in some cases just fine uh you just the hard part is just finding the right buyer yeah for that data set um if you if your data set is super impactful for let's say a mid-cap mid or a small cap name go to the 13f filing and figure out who is the biggest owners of that particular stock um and maybe you're able to make all the money you want to make in one deal um in that case.

SPEAKER_02 52:00

Yeah I know a couple of providers that have uh that cover five if not ten stocks really really well and have found the the the buy side investors that care about those those companies and they have managed to license the data for six figures because the data are so valuable for those five names and those funds are quite overweight on those names. Conscious of time I I saw on uh on LinkedIn maybe as a sort of concluding remark that you had launched the the data market network. Tell us about it.

SPEAKER_00 52:34

Sure yeah so I've been as I mentioned in the space for a while um honestly like in every conversation with a with the buy side I usually get compared to the main event companies in the space um and my main response usually was we're not an event company we we you know we source and monetize data but we you know don't really uh do events um and I guess I've I've gone against that uh at this point because we've been hosting events for three years now uh it's we did our 35th happy hour last night they've all been mo they've mostly been pretty informal um and uh just you know engendering networking and and relationship in the space um but it's I I I've been thinking about it for for quite a while and and thinking about how we can do things a little bit more systematically and provide real ROI on these uh you know in-person get-togethers um so we decided to to launch our event brand called the Data Market Network uh or DMN for short um you know it's it's gonna encompass all of our all the events that we will and have uh hosted. Um so we have the happy hour we have a quarterly breakfast event that we've been hosting for the last year so we've done four of them um where it's panel plus networking you know mostly before market open and uh that's been very successful and we've had some really interesting experts uh on our panels and then we'll be launching a few new event series as part of this um the first one being a what we call the DMN exchange which is a the first format will be virtual um where we bring together uh both sides of the industry as well as you know uh service providers to the industry and allow them different ways to understand content so we we'll have a a panel part we'll have what we call lightning insights so small five or 10 minute analyses that people can can kind of showcase the research they're doing with with alternative data and then um kind of a lightning a uh a shark tank type uh atmosphere where we bring you know a few different providers maybe that are complementary or competitive and have some people on the buy side kind of judge what they think about about those data sets. And then at the end uh kind of breakout rooms for for different topics and for different um groups of people um then the other two events that we're will be launching are smaller scale kind of dinners for subgroups within our community um the main two groups of our community are the what we call data investors and then the data providers and the feedback I've been getting is they want to network within each other as well as obviously between each other but they they also want to kind of shell share best practices and get to know each other internally. So um that that's one of the gaps I've seen is smaller format, you know, nicer dinners where there's a little bit of educational component maybe you have a few you know 30 minutes of content um from an expert uh and then you have mostly just dinner and networking. So I don't think that the industry needs another you know big conference. That's not what we're doing. But I do think that the there's different kind of forms of events and forms of content that are missing uh from our industry. And the main goal really is just to get people talking uh and honestly it's comes all the way back to a mass's main goal which is get the data flowing more freely between hedge funds and data providers. I love that where can we sign up uh we have a website uh I haven't I haven't secured the SSL yet uh but it's uh so you you have to go to HTTPS oh sorry not HTTP rather than HTTPS but it's theddatamarketnetwork.com.

SPEAKER_02 56:20

Awesome well uh I look forward to to attending some of those events I I love the the New York uh drinks that that you host um semi-regularly so um excited to to be a part of that and thank you for taking part in in the episode today yeah I've been really enjoying the podcast I've been hearing from you guys and uh you know thank you for having me on thanks Jordan and Greg