UX for AI

EP. 95 - Cowboys, Data, and the AI Gold Rush w/ Franceso Tisiot

Bonanza Studios

Send us a text

Enterprises are racing to integrate AI — but most aren't ready for the risks. Francesco Tisiot, Field CTO at Aiven, breaks down why real-time data, strong infrastructure, and clear governance are no longer optional. If your AI agent makes decisions in milliseconds, your data better be bulletproof.


You can find Francesco here: https://www.linkedin.com/in/francescotisiot

Interested in joining the podcast? DM Behrad on LinkedIn:
https://www.linkedin.com/in/behradmirafshar/

This podcast is made by Bonanza Studios, Germany’s Premier Digital Design Studio:
https://www.bonanza-studios.com/

1
00:00:00,000 --> 00:00:06,800
So what's going on? Tell me. What's going on? I believe we are at a turning point in the industry

2
00:00:06,800 --> 00:00:13,440
where every time you see yourself around, you see that AI is solving any kind of problem. And

3
00:00:13,440 --> 00:00:20,160
possibly companies are now feeling the pressure of showcasing AI or embedding AI or integrating AI

4
00:00:20,160 --> 00:00:27,680
in everything they do. Yeah. And it's a valid aim because I believe AI can solve, can simplify,

5
00:00:27,680 --> 00:00:34,560
can accelerate a lot of what we do. But I believe everything comes with various paths that you can

6
00:00:34,560 --> 00:00:40,080
use in order to get yourself into AI. If we take the two extremes, on one side you have what I

7
00:00:40,080 --> 00:00:45,600
would call it the cowboy style where you just, you know, throw AI at every single problem,

8
00:00:45,600 --> 00:00:50,560
throw AI at every single data point that you have in your company, throw AI at every single tool.

9
00:00:51,280 --> 00:00:59,360
And that might kind of fix point problems. Fix for example, I don't know, you have a set of

10
00:00:59,360 --> 00:01:05,360
machines that are logging every thing that they do and you want a summary of that logs and you

11
00:01:05,360 --> 00:01:10,400
have a perfect AI tool that will solve that problem. But on the other side, having an holistic approach

12
00:01:11,760 --> 00:01:19,520
to AI allows you to avoid risks, for example, about exposing data to customers that shouldn't

13
00:01:19,520 --> 00:01:25,920
see some data or exposing from data to the same customers. I believe this is the interesting point

14
00:01:25,920 --> 00:01:33,680
is how do we mix all the innovation that is coming in the AI space with all the problems, solutions,

15
00:01:33,680 --> 00:01:40,480
little details, governance, rules and regulations that we have in the corporate world around our

16
00:01:40,480 --> 00:01:46,640
data and our customer data as well. So here is where we need to probably do a step back.

17
00:01:47,840 --> 00:01:52,880
And yes, willing to go into AI, but doing a little bit of steps before doing that.

18
00:01:52,880 --> 00:02:01,760
So the cowboy data, sorry, the cowboy approach is the fast approach to basically bringing some

19
00:02:01,760 --> 00:02:08,080
AI into the organization. They're probably going to be some low hanging fruit wins for the

20
00:02:08,080 --> 00:02:13,840
organization. Some of them going to save them hours of work per employee potentially per day.

21
00:02:13,840 --> 00:02:21,760
But what you told me just gave me basically increased my heartbeat by 2x is that what if

22
00:02:22,400 --> 00:02:27,360
you don't have a proper governance, you don't have proper data and infrastructure.

23
00:02:28,080 --> 00:02:35,840
I don't want to throw a lot of keywords because I'm dealing with a CTO that does this for a living.

24
00:02:35,840 --> 00:02:41,120
So I just want to be, you know, suffice with, be careful about what I'm saying because I don't want

25
00:02:41,120 --> 00:02:48,320
to sound stupid in front of you, to be honest. But the worst, the nightmare that you outline is

26
00:02:48,320 --> 00:02:56,000
something very serious. What if it exposes the wrong customer data to a wrong customer?

27
00:02:57,920 --> 00:03:03,120
What if like a customer asking about certain things, it pulls some information from other

28
00:03:03,120 --> 00:03:10,000
customers and exposing it because you basically try to bulldozer your way into the AI space

29
00:03:10,000 --> 00:03:13,840
without having enough government and security measures in place.

30
00:03:13,840 --> 00:03:21,040
Yeah, that is I believe where, from my point of view, the fun part starts is it's, with AI,

31
00:03:21,040 --> 00:03:28,000
with all the tooling that we see now, it's extremely easy to go from zero to one to have

32
00:03:28,000 --> 00:03:33,360
the first mock-up of a solution. But then to take that solution and go into production,

33
00:03:33,360 --> 00:03:38,560
that's another pair of eyes. This is where all the security settings, all the governance layer,

34
00:03:38,560 --> 00:03:44,800
all the approvals need to flow, need to happen within a company to move something, to move

35
00:03:44,800 --> 00:03:50,240
something from a prototype to something that you can use internally or externally. It really doesn't

36
00:03:50,240 --> 00:03:57,440
matter. But I believe that is the piece where things get serious. And on one side, I would say

37
00:03:57,440 --> 00:04:03,680
it's all about innovation and AI. On the other side, I believe this kind of process of thinking

38
00:04:03,680 --> 00:04:09,280
about exposing your data through AI, not through AI is nothing new. I believe if you think with

39
00:04:09,280 --> 00:04:15,360
this kind of mindset of you need to provide the right data to the right person at the right time,

40
00:04:15,360 --> 00:04:21,600
now it's AI, before it was a human looking at the dashboards. We always had this kind of rules to

41
00:04:21,600 --> 00:04:27,760
say this person is, for example, the manager of the Italian department. They should only look at

42
00:04:27,760 --> 00:04:33,360
Italian data. They shouldn't be able to look at German data. And now the same thing applies with

43
00:04:33,360 --> 00:04:40,800
AI, where a bot that is talking with customer X should only be able to get the data about customer

44
00:04:40,800 --> 00:04:46,080
X and nothing else. We are not changing the rules of the game, but possibly what we are changing here

45
00:04:46,080 --> 00:04:53,280
with AI and with all these agents in the AI space is the velocity that the decisions are making.

46
00:04:53,280 --> 00:04:59,600
If, for example, we were dealing with a human, let's say, let's showcase a very easy example.

47
00:04:59,600 --> 00:05:06,560
A human goes to a website, needs to buy shoes, and they will take maybe two minutes to decide which

48
00:05:06,560 --> 00:05:11,360
shoes to buy. Now compare this with an agent doing the same thing. They could arrive to the same

49
00:05:11,360 --> 00:05:19,120
decision in seconds. So if we provide the wrong data, and let's do one step back, if we want to

50
00:05:19,120 --> 00:05:24,560
empower the human to buy the shoes, we may want to give them the maximum amount of money that they

51
00:05:24,560 --> 00:05:30,560
can spend, so the budget. We give them $100 to spend on shoes, and they will take two minutes

52
00:05:30,560 --> 00:05:37,920
in order to define which shoe is the best and buy it. And if you think that in 10 minutes they will

53
00:05:37,920 --> 00:05:45,920
be able to buy five shoes and spend $500. Now let's say that we did a mistake and instead of

54
00:05:45,920 --> 00:05:52,480
telling that the budget was $100, we are telling the human that the budget is $1,000 a shoe. Now

55
00:05:52,480 --> 00:05:58,640
the human will still make the buy the shoes, but they will make a mistake which is capped by $5,000

56
00:05:58,640 --> 00:06:04,320
because they will have maximums buy five shoes in the 10 minutes. The agents on the other side,

57
00:06:04,320 --> 00:06:10,960
if we do the small mistake, since they can make decisions in seconds, the blast radius of such

58
00:06:10,960 --> 00:06:17,120
a mistake could be huge. We are not talking about $5,000, we may be talking about $100,000

59
00:06:17,920 --> 00:06:25,360
of loss because of a small detail in the data. So this is where providing up-to-date accurate data,

60
00:06:25,360 --> 00:06:32,000
doing all the steps that we did in the past for the AI user to provide good data in front of humans,

61
00:06:32,000 --> 00:06:37,920
we need to do the same before we expose data to AI. It's kind of the same paradigm, accelerated.

62
00:06:37,920 --> 00:06:43,600
Got it. So what you are saying, I think we were aligned on this before when we wanted to initially

63
00:06:43,600 --> 00:06:52,480
record this podcast, the game hasn't changed. When it comes to the companies, it needs to become

64
00:06:52,480 --> 00:06:59,280
data-led and digitally transformed. That's the game that enterprises especially need to play,

65
00:06:59,280 --> 00:07:06,960
but now there is AI in the mix that accelerates the outputs of such digital infrastructure by

66
00:07:08,160 --> 00:07:17,920
a lot, like exponentially a lot. And hence now you could unlock very independent agents that can

67
00:07:17,920 --> 00:07:26,560
make decisions at such a scale that will correlate to potentially making mistakes at scale as well.

68
00:07:26,560 --> 00:07:31,920
Yeah, to give you another example, let's say that tomorrow your company decides to automate

69
00:07:31,920 --> 00:07:36,640
all the LinkedIn reach-outs to potential clients and leads. Would you allow

70
00:07:37,840 --> 00:07:43,760
an agent running those to write an email with a very evident spelling mistake? No,

71
00:07:43,760 --> 00:07:50,400
because that is really bad. So you need to give the same level of trust, not only to the words,

72
00:07:50,400 --> 00:07:56,000
but to the data, because the data, it's again, a component of a message, is a component of trust

73
00:07:56,000 --> 00:08:01,920
between you and the other counterparts. The medium will change before it was a human. Now it could be

74
00:08:01,920 --> 00:08:08,800
a human assisted by AI or an AI bot, but the overall end-to-end flow is the same. Message and

75
00:08:08,800 --> 00:08:14,320
data goes together in order to arrive at the results. So Francesco, you're the field CTO of

76
00:08:14,320 --> 00:08:22,800
Aiven. You're not new to the thing, to basically, you've been doing this for years now. So I would

77
00:08:22,800 --> 00:08:30,160
like to understand the sentiment shift of your clients, pre-AI and post-AI. What you are seeing

78
00:08:30,160 --> 00:08:34,800
when you're talking, we probably talk to a lot of clients or leads or prospect, whatever the case,

79
00:08:34,800 --> 00:08:43,200
what you are seeing that has been drastically changed, let's say, since 2022 onwards. I would

80
00:08:43,200 --> 00:08:49,840
say that they think the question that I keep asking or receiving from clients is, who are you

81
00:08:49,840 --> 00:08:57,280
building for? And that is a drastic change in the mindset of building any kind of product. Because

82
00:08:57,280 --> 00:09:03,360
if your aim is to build for humans and there is a huge market to build for humans, a lot of what we

83
00:09:03,360 --> 00:09:11,840
did traditionally still works. However, we see a new wave of product concepts that are built for

84
00:09:11,840 --> 00:09:19,360
humans and AI. There are not a lot of changes in the rules of the game, but there are some.

85
00:09:19,360 --> 00:09:26,080
The first one is that APIs become the most relevant thing that you need to think of.

86
00:09:26,800 --> 00:09:32,240
Yes, you need to have a nice website, an eye-catching website, and the AI can parse a website.

87
00:09:32,240 --> 00:09:38,320
But where the core of the interaction and possibly the acceleration is, is in the APIs.

88
00:09:40,160 --> 00:09:46,000
AI, an agent, can arrive to your website and quickly find out how to spin up your resources,

89
00:09:46,000 --> 00:09:53,200
how to use your product by itself, by using APIs. Well, that's a win, because now you don't need

90
00:09:53,200 --> 00:09:59,440
someone to come understand you code and start implementing. Everything can be automated. All

91
00:09:59,440 --> 00:10:05,840
the onboarding is way, way faster. All the usage is way, way faster. So you are scaling much, much

92
00:10:05,840 --> 00:10:12,880
more. So all this concept of who are you building for? What is your target? How can you build

93
00:10:12,880 --> 00:10:18,720
products that are both usable by humans and AI becomes really, really relevant because it allows

94
00:10:18,720 --> 00:10:24,720
your business, your company to scale or to propose your asset in very, very different ways.

95
00:10:24,720 --> 00:10:30,480
That's fascinating. So basically the way I understood it and perceive what you say is that

96
00:10:30,480 --> 00:10:36,160
it reminded me of the time that you remember like the mobile websites that's, hey, you need

97
00:10:36,160 --> 00:10:42,400
to design your websites to be mobile friendly because a lot of users coming and use your content

98
00:10:42,400 --> 00:10:49,360
by a mobile. But what you're saying now is that not only that, now you need to build your infrastructure

99
00:10:49,360 --> 00:10:55,520
that is AI agent friendly. So other agents could come in and plug into your services,

100
00:10:55,520 --> 00:11:02,000
could make sense of them and potentially offer what you're offering to the person who prompting them.

101
00:11:03,280 --> 00:11:10,320
As fast as us? That's fascinating. Okay. And that's the last piece that comes on that is

102
00:11:10,320 --> 00:11:17,280
that it's not only about surfacing the information in a way that an agent can pick it up, but also

103
00:11:17,280 --> 00:11:25,520
it's working at the speed that those agents, those automation operate. So it's providing

104
00:11:25,520 --> 00:11:31,280
infrastructure, providing any kind of assets in seconds, milliseconds, providing a way to

105
00:11:31,280 --> 00:11:36,240
transmit data in milliseconds, provide a way to join data coming from different sources in

106
00:11:36,240 --> 00:11:43,200
milliseconds. This is where I see a lot of the innovation as we are at the race to become faster

107
00:11:43,200 --> 00:11:48,720
and faster. Everything is becoming faster and faster. And your need of time, you keep repeating

108
00:11:48,720 --> 00:11:54,480
milliseconds is milliseconds. It depends. This is an interesting piece where there is a little bit

109
00:11:54,480 --> 00:12:01,840
of dichotomy in the world because nowadays after like OpenAI and all this beautiful AI tools came

110
00:12:01,840 --> 00:12:07,600
out, we somehow rediscovered the fact that we can wait for something to come back to us within a few

111
00:12:07,600 --> 00:12:14,240
seconds. Right. Deep search, deep reasoning. Yes. On the other side, the automation that we can build

112
00:12:14,240 --> 00:12:20,400
with some of this tool, if we narrow the focus, it's astonishing. And the speed of this tool is

113
00:12:20,400 --> 00:12:26,640
astonishing as well. So I go like, I was thinking about this podcast and I was thinking about what

114
00:12:26,640 --> 00:12:32,560
is an example that could resonate really well with people. I believe with the good analogy,

115
00:12:32,560 --> 00:12:38,720
which is nowadays, self-driving cars are something that we know they exist in the world.

116
00:12:38,720 --> 00:12:44,480
So the question that I will ask to companies is, would you trust a car that is driving,

117
00:12:44,480 --> 00:12:53,360
checking the image or the status as it was five or 10 seconds ago? Probably not. So why do you

118
00:12:53,360 --> 00:12:59,920
trust the same to happen within your company? Nowadays, we need to feed all our data stakeholders

119
00:12:59,920 --> 00:13:05,920
with the most complete up to date status of the world. Because if we don't do that, well,

120
00:13:05,920 --> 00:13:10,320
it's not a crash in our case, a physical crash, but it's a crash in the business.

121
00:13:10,320 --> 00:13:17,520
If you are keeping selling tickets for a concert that is already sold off, then you need to give

122
00:13:17,520 --> 00:13:25,440
back money to people and your fame, your brand is the perception of your brand is altered because

123
00:13:25,440 --> 00:13:31,280
you were not dealing with live correct data. So, and that also like double down why the cowboy

124
00:13:31,280 --> 00:13:39,040
approach might be actually backfiring because if you don't have that data infrastructure in place,

125
00:13:39,040 --> 00:13:46,800
you might end up creating so much mess, unwanted mess for yourself because especially for businesses

126
00:13:46,800 --> 00:13:54,560
that they have to cover real time use cases, that could become like quickly undoable nightmare.

127
00:13:56,160 --> 00:14:02,640
Here is another interesting piece. Now we have AI agents, which are our reality. And these AI agents

128
00:14:02,640 --> 00:14:08,720
are this ability to perform callback function. So they can call back system A and system B to

129
00:14:08,720 --> 00:14:14,640
get for example, from system A, like it happens in a lot of enterprises, you have your customer data

130
00:14:14,640 --> 00:14:19,520
sitting in a place, your sales data sitting in a place, your inventory data sitting in another

131
00:14:19,520 --> 00:14:25,120
place. And the agent could call those three systems and retrieve the information. However,

132
00:14:25,120 --> 00:14:31,360
you are providing to the agent the responsibility of getting that data and integrating those data

133
00:14:31,360 --> 00:14:37,120
and creating a unified view for which could be a sensible option in some ways. But on the other

134
00:14:37,120 --> 00:14:43,280
side, you may want to have that integration defined in just one way, in a standard way that

135
00:14:43,280 --> 00:14:49,760
is always up to date, that is can be shared across 10 different use cases. Because otherwise,

136
00:14:49,760 --> 00:14:55,360
tomorrow you have a bot that answers the phone and give some information. You have another bot,

137
00:14:55,360 --> 00:15:01,360
another AI agent that does the same for email. Those agents have slightly different ways of

138
00:15:01,360 --> 00:15:05,840
integrating the data. They end up giving completely different replies to customers.

139
00:15:05,840 --> 00:15:11,920
And again, you are back to the Tesla example where something that was true 10 seconds ago

140
00:15:11,920 --> 00:15:18,640
is not true anymore now. Wow. So there is a lot of established traditional businesses out there,

141
00:15:18,640 --> 00:15:24,480
right? And you know them much better than I do, most likely, because that's what you're working

142
00:15:24,480 --> 00:15:29,760
with them. And I think it's, for me, at least, I don't know about you, the sales call, when it

143
00:15:29,760 --> 00:15:37,760
comes to, okay, we are lacking digital infrastructure, it's becoming way easier than before,

144
00:15:37,760 --> 00:15:42,960
because now they are sensing what's going on. They're seeing their competitors doing crazy

145
00:15:42,960 --> 00:15:50,480
things, offering services at such a scale, with decent enough quality that they need to act upon.

146
00:15:50,480 --> 00:15:56,320
So I think there is no convincing, it's a matter of when we can start. What problems you are seeing,

147
00:15:56,320 --> 00:16:02,640
especially within this group of companies, established traditional businesses that hinder

148
00:16:02,640 --> 00:16:10,640
them in a sense that they cannot potentially think about integrating AI? This goes in different layers,

149
00:16:10,640 --> 00:16:17,680
depending on the maturity of the company. Okay. Let's set a baseline. In 2025, I wouldn't accept

150
00:16:17,680 --> 00:16:22,960
a company that doesn't have at least a minimum digital footprint. That is the level that I'm

151
00:16:22,960 --> 00:16:28,960
playing. We cannot go to the local shop in the town that sells flowers and records every sale

152
00:16:28,960 --> 00:16:34,800
in a book. History is still there somehow, but it's not the level. We are talking about companies

153
00:16:34,800 --> 00:16:41,760
that have a basic digitalization already in place. But still, I would say that I've been dealing

154
00:16:41,760 --> 00:16:47,920
within my time at I've been, and within my time before, I did 15 years of enterprise consulting

155
00:16:47,920 --> 00:16:54,320
in the data space. I believe that the consistent problem that I saw in enterprises, in big and

156
00:16:54,320 --> 00:17:00,720
small enterprises, in big and small companies, is that companies have this various amount of

157
00:17:00,720 --> 00:17:07,840
different tools. The data is usually very segmented in different data tools because business

158
00:17:07,840 --> 00:17:13,200
unit A needs a specific tool to solve a specific problem so that tool is on board and then three

159
00:17:13,200 --> 00:17:20,000
years down the road, you have a set of critical information only living in that system. So this is

160
00:17:20,000 --> 00:17:28,240
the first challenge that I see is, if you want to empower consistent usage of data for AI, you need,

161
00:17:28,240 --> 00:17:35,760
first of all, to map all your data assets across all the technologies and come up with a way to

162
00:17:35,760 --> 00:17:41,040
bring them together. This is what we were saying before. You don't let the agents call three

163
00:17:41,040 --> 00:17:48,160
different places. You solve part of the problem by integrating the data yourself. So you take all

164
00:17:48,160 --> 00:17:55,840
these data silos and you say my customer data, my inventory data, my sales data, my orders by my

165
00:17:55,840 --> 00:18:00,960
clickstream in the website. At a certain point, they come all together. Why is this crucial? Because

166
00:18:01,520 --> 00:18:06,480
AI or not AI, everything is based on data. You need to make decisions based on data. Your boss

167
00:18:06,480 --> 00:18:13,520
will make decisions based on data. For example, if my agent needs to suggest a discount to a

168
00:18:13,520 --> 00:18:20,000
certain person, I would like to analyze the clickstream of that person, the last few seconds,

169
00:18:20,000 --> 00:18:25,840
hours, days, the previous sales, the location, all these things usually come from different places.

170
00:18:25,840 --> 00:18:32,160
So I need to have this unified view about the customer in this case. And this is usually the

171
00:18:32,160 --> 00:18:38,160
first step. You start from a huge variety of places and you try to condense into one place.

172
00:18:39,200 --> 00:18:46,240
Couple of follow up question on this. Let's say you've got, let's just name Hopspot for your CRM.

173
00:18:47,440 --> 00:18:53,200
Let's just say then you've got some resource planning tool to manage time off, whatever the

174
00:18:53,200 --> 00:18:59,200
case may be. They're integrated because whoever that goes on time off, maybe your AE then you need

175
00:18:59,200 --> 00:19:04,240
to basically another AE taking care of, there are some connections if you want to look deeper.

176
00:19:04,240 --> 00:19:10,160
Then you got a bunch of applications that you're using for creating documents, examination,

177
00:19:10,160 --> 00:19:15,920
whatever the case may be. So what you are saying is don't rely on the databases,

178
00:19:15,920 --> 00:19:23,120
pull the data from the API coming in and create your own database. And like basically based on

179
00:19:23,120 --> 00:19:30,800
the criteria that you see fit for the future use cases that you can leverage AI. Basically go from

180
00:19:30,800 --> 00:19:36,800
data silos that you have database on different application, bring them all together in a cohesive

181
00:19:36,800 --> 00:19:43,520
organized database that you can control that. That's what you're saying. Yeah, I believe that

182
00:19:43,520 --> 00:19:50,800
this is usually the first step because until you map all these data assets and you unify them,

183
00:19:50,800 --> 00:19:57,360
you don't have also any means to understand if they talk the same language, if you have duplicates.

184
00:19:57,360 --> 00:20:03,440
And this is kind of the second step. Once you mix everything together, you understand what the data

185
00:20:03,440 --> 00:20:11,840
quality across your inventory of data is. If you have, for example, customer is a customer,

186
00:20:11,840 --> 00:20:18,720
but customer across five different tools could be five different things. Absolutely. So this is where

187
00:20:18,720 --> 00:20:25,520
yes, you could rely on AI agent to solve this, but maybe this is a business decision that you

188
00:20:25,520 --> 00:20:32,800
have to make. What is a customer for our end? If I'm trying to give you a discount, should I think

189
00:20:32,800 --> 00:20:39,600
about only your data or anyone in your company and their behavior in my website in the last week?

190
00:20:39,600 --> 00:20:46,160
Customer is a customer, but it's not. It could be very different depending on who I want to talk

191
00:20:46,160 --> 00:20:52,480
with and which kind of behavior I would like to try. A follow-up question on basically the previous

192
00:20:52,480 --> 00:20:58,800
step, which is basically going from data silo to integrated organized data layer. I know for a fact

193
00:20:58,800 --> 00:21:05,040
that there are a lot of organizations right now using some outdated applications that they don't

194
00:21:05,040 --> 00:21:13,440
offer APIs. What should they do with those apps? If you were to consult with them, and I know for

195
00:21:13,440 --> 00:21:20,560
example, that one of our clients, one of the core part of their operation, they use an outdated

196
00:21:20,560 --> 00:21:30,640
application that doesn't have any API. So basically we cannot access to that data. So should they

197
00:21:31,600 --> 00:21:36,800
kill that application and build something in-house? Should they migrate to a new application

198
00:21:36,800 --> 00:21:40,640
that offer API? What's the best approach here in your opinion?

199
00:21:44,640 --> 00:21:51,680
I believe here I'm in the camp of if you are using, if you're leveraging a tool that doesn't

200
00:21:51,680 --> 00:21:58,480
allow you to export your data, you're basically on a time bomb. At a certain point this will

201
00:21:58,480 --> 00:22:04,080
explode like part of your business will not work anymore. And this drives me to one thing that I

202
00:22:04,080 --> 00:22:10,240
always share with clients and prospects. There are two types of technological decisions that a

203
00:22:10,240 --> 00:22:15,360
company needs to make. One is about innovation, what they call innovation at the edge. Innovation

204
00:22:15,360 --> 00:22:22,400
at the edge is where you want to bet for the next months or years in your company. And it's, you know,

205
00:22:22,400 --> 00:22:30,080
in this time, it could be the latest OpenAI model compared to the latest model from

206
00:22:30,080 --> 00:22:36,080
another vendor. It's a bet where you can put a lot of risk in because you know that something

207
00:22:36,080 --> 00:22:41,600
completely new will change, will come in, you know, a few weeks, few months, and you need to

208
00:22:41,600 --> 00:22:46,640
go there and make this bet with the knowledge that you can also lose. And there is a potential

209
00:22:46,640 --> 00:22:52,640
rework that needs to be done for that decision to pay off or for that decision to evolve.

210
00:22:52,640 --> 00:22:58,960
What's the contrary? The contrary is where you make decision, technological decision at the core

211
00:22:58,960 --> 00:23:04,560
of your company. Decisions at the core, like the choice of a database, the choice of an application,

212
00:23:04,560 --> 00:23:10,720
are choices, no matter what people say about migrating from database A to database B, that

213
00:23:10,720 --> 00:23:18,400
will stay long in the life of the company. A choice of a database is a choice that will last decades.

214
00:23:18,400 --> 00:23:24,800
And this is where I'm always suggesting when it comes to the decision at the core, try to

215
00:23:24,800 --> 00:23:31,520
choose something which allows you to have a lot of choice. And open sourcing this space gives you an

216
00:23:31,520 --> 00:23:37,760
amount, an amazing amount of choice. If you think about Postgres, not only there are now 20, 50,

217
00:23:37,760 --> 00:23:42,960
100 different vendors of Postgres, but the format of the data that is stored in Postgres is always

218
00:23:42,960 --> 00:23:48,000
there. The mechanism of taking the data from a Postgres instance to another are always there.

219
00:23:48,000 --> 00:23:53,200
So you are playing a game that allows you to be completely and always in control of your data.

220
00:23:53,200 --> 00:23:58,000
So this is what I'm always suggesting is when you take critical decision about

221
00:23:58,000 --> 00:24:04,400
stuff that will last in your company, don't look at only the features that are about the usage now,

222
00:24:04,400 --> 00:24:13,120
but think always with your mind about how portable, how sustainable is this solution in the

223
00:24:13,120 --> 00:24:19,760
future. Because those are very two different questions and the optionality once you define

224
00:24:19,760 --> 00:24:25,360
this kind of metrics becomes very, very different. Back to your question though, so there be

225
00:24:25,360 --> 00:24:30,320
rebuilding the application. My opinion, if it's a critical application for the company should be

226
00:24:30,320 --> 00:24:35,920
built on top of what I would call open data formats, should be built on top of a Postgres

227
00:24:35,920 --> 00:24:41,920
database, which is fully open source, should be built with tools that are open source libraries

228
00:24:41,920 --> 00:24:48,000
that everybody can learn, study and accept. I believe this is the beauty where open source

229
00:24:48,000 --> 00:24:54,080
is not a cost reduction factor, or not only a cost reduction factor, open source is a pool of talent.

230
00:24:54,080 --> 00:25:00,160
It's a portability, it's optionality for companies. Yeah, it's called being written without you paying

231
00:25:00,160 --> 00:25:06,320
anyone. It's just basically a collective endeavor that everyone knows that if you contribute,

232
00:25:06,320 --> 00:25:11,200
everyone else will benefit from that collective. Yeah, I believe it's and, you know, from a company

233
00:25:11,200 --> 00:25:17,520
point of view, yes, it's like open source can give you a lot of good things. I would always think

234
00:25:17,520 --> 00:25:23,120
about sustaining open source because yes, we have this idea that is free. Someone else will develop.

235
00:25:23,120 --> 00:25:29,360
Look that this someone else is a human with families, with kids to feed, with a life. And

236
00:25:29,360 --> 00:25:34,400
there are ways, there are very effective ways also to feedback some of the value back to this open

237
00:25:34,400 --> 00:25:40,560
source communities to have them being healthier. So back to basically our roadmap, let's say we

238
00:25:40,560 --> 00:25:47,840
moved away from data silos to organize data layer. Now we have a data layer that we can work with.

239
00:25:47,840 --> 00:25:54,320
But you mentioned something really interesting. I think the way you phrase it, it stuck with me that

240
00:25:54,320 --> 00:26:02,000
every application in your ecosystem sees certain actor differently than the others. Your Google

241
00:26:02,000 --> 00:26:11,600
analytics sees the lead coming through your funnel differently from your hotspot. So there is a need

242
00:26:11,600 --> 00:26:23,440
to create a shared understanding of data regarding every user type in our database.

243
00:26:24,480 --> 00:26:32,000
What does that mean exactly? Do you recommend, for example, you map fields from one app to the other

244
00:26:32,000 --> 00:26:38,480
app and sort of create a, basically create a detailed map of how we are going to pull information

245
00:26:38,480 --> 00:26:44,000
from each app and like combine them and add them to the our data layer. What's, what are the best

246
00:26:44,000 --> 00:26:49,600
practices here? Yeah, I believe what you said is the basic best practice is understanding

247
00:26:49,600 --> 00:26:56,800
what kind of input sources you have, understanding at what level do they work? Do you have, for example,

248
00:26:56,800 --> 00:27:04,160
I don't know, do you have all your customer being flagged with customer ID, customer ID from tool

249
00:27:04,160 --> 00:27:09,440
one might be equal or completely different from customer ID of tool two. So how do you map the

250
00:27:09,440 --> 00:27:17,520
two customers? How do you aggregate up the data? How do you join different tables coming from

251
00:27:17,520 --> 00:27:23,280
different fields? How do you get a holistic view of your customers? All of this, again,

252
00:27:23,280 --> 00:27:31,200
is not something new. We have decades of history in data warehousing, for example, that allows us to

253
00:27:31,200 --> 00:27:37,600
set up a set of basic rules to collect the data first and then understand what is the data quality

254
00:27:37,600 --> 00:27:44,400
of these data feeds, what is the data quality of the join of those feeds, and then create a curated

255
00:27:44,400 --> 00:27:51,920
view of all our data sources in a data warehouse on in a similar pattern. And another point to

256
00:27:52,880 --> 00:28:02,960
what you said earlier is that you were not inclined to expose AI to raw data because that gives

257
00:28:02,960 --> 00:28:13,840
a lot of room for errors and hallucinations. It's not only that. There are several components to this.

258
00:28:13,840 --> 00:28:19,680
When you start exposing AI to raw data, you could get good insights. For example, if you expose an

259
00:28:19,680 --> 00:28:26,400
AI tool to a row list of logs, you will be able to get some interesting information that probably

260
00:28:26,400 --> 00:28:33,600
the agent will be able to parse in much, much faster time compared to a human. So it's a good

261
00:28:33,600 --> 00:28:40,400
use case. However, there are two things there. One is raw data usually is on a way, way bigger

262
00:28:40,400 --> 00:28:48,640
scale compared to aggregated data. And in this world where you pay AI per token can be really

263
00:28:48,640 --> 00:28:55,280
expensive. The second thing is exposing AI to raw data, you could face the risk of, for example,

264
00:28:55,280 --> 00:29:02,240
exposing PII data because your customer in your data source will have everything in it. And if

265
00:29:02,240 --> 00:29:07,840
you just put an AI bot on top of it, this means that they can do something good or something not

266
00:29:07,840 --> 00:29:13,760
good and you don't have a lot of definition. So this is where creating an abstraction layer

267
00:29:13,760 --> 00:29:19,120
on top of the raw data that allows you to define what are the fields that should be used, how they

268
00:29:19,120 --> 00:29:25,680
should be used, what is the level of abstraction that an AI agent should use in order to calculate

269
00:29:25,680 --> 00:29:31,920
a forecast, calculate a discount is where it's basically you need to prepare your data to be fed

270
00:29:31,920 --> 00:29:38,880
to AI. And this is where for a corporate point of view, this is where you need to apply all the

271
00:29:40,320 --> 00:29:49,680
internal laws about what data you can use, how you should use, what a certain agent, human, should

272
00:29:49,680 --> 00:29:56,160
see. Because it's not about PII data or not PII data. It's also about what we were saying before.

273
00:29:56,160 --> 00:30:02,480
If you are coming to the website and I show you the data of someone else, this is an extremely

274
00:30:02,480 --> 00:30:09,360
bad experience. This is data breach. So yes, let's try to minimize the number of steps needed. At the

275
00:30:09,360 --> 00:30:16,160
same time, let's keep security as a top priority. Because if we don't nail security in this kind of

276
00:30:16,160 --> 00:30:22,960
interaction, again, with the scale that this kind of agents can work, we could create a massive

277
00:30:22,960 --> 00:30:29,600
problem. You said something earlier that was very beautiful. I think you have a good way of using

278
00:30:29,600 --> 00:30:36,080
examples that everyone could understand. In any ERP system, you have different user types and roles.

279
00:30:36,080 --> 00:30:41,600
For example, a paying officer should only see stuff about Spain. The same thing, the way I,

280
00:30:42,400 --> 00:30:51,440
I mean, we can apply the same rational to agent. There will be a user type agent and the agent could

281
00:30:51,440 --> 00:30:58,320
only see this type of information, right? And not the other types. That's basically where we are

282
00:30:58,320 --> 00:31:05,840
heading is that, okay, limit what AI agent could get exposed to. And, you know, someone could say,

283
00:31:05,840 --> 00:31:12,000
well, there is the system prompt that allows you to do that. Because in any kind of modern LLM,

284
00:31:12,000 --> 00:31:16,720
you have the prompt itself and the system prompt that detects, that defines the behavior. So you

285
00:31:16,720 --> 00:31:22,000
could say in the system prompt, hey, look, only use field A, B, and C. To me, that's

286
00:31:23,840 --> 00:31:30,160
something to investigate, but it's to make another probably silly example, me telling you,

287
00:31:31,200 --> 00:31:37,600
hey, I did something really bad, please don't tell anybody. And now I have to rely on your

288
00:31:38,480 --> 00:31:44,160
trust and willingness to not share that information with anybody else. If someone comes to you and

289
00:31:44,160 --> 00:31:49,440
says, no, no, I need to know everything about Francesco, you will tell them. Compare this to

290
00:31:50,080 --> 00:31:54,320
you not being able to know something, some information, there is no way

291
00:31:56,160 --> 00:32:01,840
for you to get the information in the first place. This is where we should start with security,

292
00:32:01,840 --> 00:32:07,840
not add security in a second step. I would be very hesitant to leave

293
00:32:07,840 --> 00:32:13,920
instructing AI agent on system prompt layer. I think that's to search for businesses. You need

294
00:32:13,920 --> 00:32:23,520
to basically say, here's a data bucket, let's call it like this, that AI would not get exposed to.

295
00:32:23,520 --> 00:32:29,840
Here's the interpretation of that bucket they could get exposed to, but not the raw data of it.

296
00:32:29,840 --> 00:32:38,480
Yep. And also, the more you expose raw data and then an outcome, the more you allow an external

297
00:32:38,480 --> 00:32:45,440
actor to basically be able to draw the line between the raw data and the outcome. So you

298
00:32:45,440 --> 00:32:50,960
are potentially also exposing business decisions because you say, you may have internal rules that

299
00:32:51,920 --> 00:32:58,000
define what is the discount percentage on a certain item. If you expose the discount percentage and

300
00:32:58,000 --> 00:33:03,200
the raw data, someone with enough time and enough willingness could be able to draw the line, OK,

301
00:33:03,200 --> 00:33:09,840
you use point A, B and C to tell me that you give me a 30% discount. While if I fake an account and

302
00:33:09,840 --> 00:33:15,760
I change my hair color from brown to blonde, I will get 55 discounts. Is there something that

303
00:33:15,760 --> 00:33:22,080
you want? It's a risk. I mean, curating the data is not the most exciting or most newest type of work,

304
00:33:22,080 --> 00:33:27,920
but it's critical for AI. Most of the enterprises, what I will say, especially the digital innovation

305
00:33:27,920 --> 00:33:34,400
officer I work with, they have really good heart. They don't see the bad actors so much in their

306
00:33:34,400 --> 00:33:40,880
decision. What I like to say always to them is that, look, yes, a lot of people are acting

307
00:33:40,880 --> 00:33:47,520
on goodwill, but there is few bad actors with so much evil energy and so much capability.

308
00:33:47,520 --> 00:33:52,880
If you hit one of those, they are going to create a nightmare that you cannot get out of.

309
00:33:53,600 --> 00:34:01,360
I don't want to scare people out, but I've seen a website called lovable.com, if I remember correctly.

310
00:34:01,360 --> 00:34:08,960
You can go there and you can write the app that you want to create and it will create the app in

311
00:34:08,960 --> 00:34:16,560
seconds. I believe they were able to create 10,000 apps a day. What you're saying about

312
00:34:16,560 --> 00:34:23,680
one actor dedicating time, this could not scare people off, but when we talk about AI, AI could

313
00:34:23,680 --> 00:34:28,560
be potentially used for good, but also for bad. We have to also go away from the fact that we are

314
00:34:28,560 --> 00:34:34,400
thinking about the bad actor being a human. A bad actor could be an AI bot as well. Now, it's even

315
00:34:34,400 --> 00:34:41,200
more critical for us to do safe steps in exposing anything to the public because what before could

316
00:34:41,200 --> 00:34:48,560
be done by a single person with enough time, now we have a huge amount of bots that could do that

317
00:34:48,560 --> 00:34:54,720
in an incredible amount of, with an incredible volume in a little chain. That's a scary scenario.

318
00:34:54,720 --> 00:35:01,920
I see where you're coming from. I shouldn't scare people because data is not my specialty,

319
00:35:01,920 --> 00:35:08,000
but if it's coming from you, I think people will take it with great, will pay great attention to

320
00:35:08,000 --> 00:35:11,920
it because you're dealing with data. I think that's something that... I think security is

321
00:35:11,920 --> 00:35:17,440
something that is being on their plate so much, I would say right now, because there is a lot of

322
00:35:17,440 --> 00:35:23,680
cowboyism is going on. A lot of enterprises, they want to love the term innovation at the age. They

323
00:35:23,680 --> 00:35:28,800
are paying so much attention to the innovation at the age and really not paying enough attention to

324
00:35:28,800 --> 00:35:34,880
their core infrastructure. And that I think very... It's going to be a... It's a high-risk environment,

325
00:35:34,880 --> 00:35:43,200
I would say. I mean, one thing that we also... I want to go away without this always scary image.

326
00:35:43,200 --> 00:35:48,000
I would say that... Let's go back to the beginning. None of this is new. We have a solid

327
00:35:48,000 --> 00:35:55,920
understanding on how to build secure, scalable, fast data platforms that allow us to expose safely

328
00:35:55,920 --> 00:36:01,680
everything we built. It's just a matter of thinking correctly at every step. Yes, follow the innovation.

329
00:36:01,680 --> 00:36:07,360
Yes, have a budget to follow the innovation. On the other side, prepare to expose anything you have

330
00:36:07,360 --> 00:36:14,160
to the innovation in a safe way that is not only safe to implement and use, but is also very easy

331
00:36:14,160 --> 00:36:20,240
to monitor and understand what's happening. If you build something like this, it will serve you for

332
00:36:20,240 --> 00:36:25,840
any use case, AI or not. So, before we get to the real time, I would like to go back to the real time

333
00:36:25,840 --> 00:36:32,800
you touched upon. And I think it's such a core component of any AI-powered application

334
00:36:32,800 --> 00:36:39,840
that you want to basically get behind. I think it's basically out of all the discussion we had,

335
00:36:39,840 --> 00:36:47,680
this is maybe the simplest to talk about, but very important. How do you use these LLMs, local, or

336
00:36:47,680 --> 00:36:53,440
like you tap into OpenAI APIs, or whatever the case, or whichever API you want to use?

337
00:36:53,440 --> 00:36:58,560
I could see a scenario that especially for the innovation at the edge, you want to tap into this

338
00:36:58,560 --> 00:37:06,560
API, but what kind of information you're exposing to needs to be limited and obscure enough.

339
00:37:06,560 --> 00:37:12,640
What's your take on this when it comes to basically leveraging LLMs in your operation?

340
00:37:12,640 --> 00:37:18,880
So, we did an interview to, I believe, 100 C-level employees at various companies in EMEA

341
00:37:18,880 --> 00:37:24,960
and the US a few months ago with Ivan. And what we found out was that most of the people who

342
00:37:24,960 --> 00:37:29,920
replied were saying that they were using off-the-shelf models from OpenAI or other vendors.

343
00:37:29,920 --> 00:37:35,360
And it makes sense because at the time, OpenAI or other vendors were the most advanced. So,

344
00:37:35,360 --> 00:37:40,720
if you are working on the edge, on the innovation, you want to first test that there is something

345
00:37:40,720 --> 00:37:48,000
valuable there. And the burden of recreating all the stack in a private way, maybe it's too much

346
00:37:48,000 --> 00:37:51,520
if you are just seeking the innovation and the validation at the beginning.

347
00:37:51,520 --> 00:37:57,520
On the other side, what we also saw in the same interview is that data privacy is a major problem,

348
00:37:57,520 --> 00:38:02,400
data security, data freshness, are a major problem. And all this speaks about, okay,

349
00:38:02,400 --> 00:38:08,320
you start with something that is out there in order to prototype, but then there is the willingness to

350
00:38:08,880 --> 00:38:14,960
create a more secure LLM usage within some safe boundaries. Within a boundary where you exactly

351
00:38:14,960 --> 00:38:20,960
know that this data that you're sending out is secure and the data that you're sending out

352
00:38:20,960 --> 00:38:26,480
possibly is not even used to retrain the model. Because when you send that out, you may know the

353
00:38:26,480 --> 00:38:30,400
boundaries and that might be contractual boundaries, but the world is your oyster.

354
00:38:30,400 --> 00:38:35,440
There will be so many vendors. So, what I will see in the future is probably an initial approach

355
00:38:35,440 --> 00:38:41,120
where you will use the off-the-shelf models because that's the first way to get the result.

356
00:38:41,120 --> 00:38:48,880
And then probably I will see an increase of companies where they take an off-the-shelf

357
00:38:48,880 --> 00:38:57,520
open source model. They fine tune it locally. And then they apply a rug or other system to

358
00:38:57,520 --> 00:39:03,120
make it personalized and run it in a safe place where they know whether it is coming from and

359
00:39:03,120 --> 00:39:08,480
where it's going. Basically, a potential approach would be okay. Use the off-the-shelf

360
00:39:08,480 --> 00:39:13,360
with limited data exposure for validation. See which one works and which one doesn't work.

361
00:39:13,360 --> 00:39:18,720
Then when you see there are real use cases that actually could help your organization move faster,

362
00:39:18,720 --> 00:39:26,560
then attempt to basically internalize it using your own setup when it comes to that LLM of your

363
00:39:26,560 --> 00:39:32,800
choice and exposing them potentially to more data. Yeah. One interesting thing that you made me think

364
00:39:32,800 --> 00:39:38,080
about is if you use off-the-shelf data, off-the-shelf models, and you are sending out the data,

365
00:39:38,080 --> 00:39:43,520
think about sending the same data out to a customer because that is a good mindset.

366
00:39:43,520 --> 00:39:48,160
If you are sending that data out to a model and that model will talk to a customer, you're just

367
00:39:48,800 --> 00:39:53,760
a minute away from sending that to the customer. So when you think about sending that data out,

368
00:39:53,760 --> 00:40:00,480
is it secure? Is it okay? Should you send this piece of data out? If not, you have some work to

369
00:40:00,480 --> 00:40:05,120
do in order to understand what data you should expose to that model, to that interaction.

370
00:40:05,120 --> 00:40:12,560
Real-time-ism, I think, it's becoming a religion than anything else. So why is it important, right?

371
00:40:12,560 --> 00:40:19,440
We talked about it a bit, right? Maybe if you can touch upon it again, that would be great. And like

372
00:40:19,440 --> 00:40:26,800
based on your read of the market and the companies that you do work with, how far are these companies

373
00:40:26,800 --> 00:40:33,840
from having the right infrastructure in place to provide real-time services and experiences? And

374
00:40:33,840 --> 00:40:40,800
then what could they do to basically move faster towards real-time offerings and products and

375
00:40:40,800 --> 00:40:47,920
services? Okay. So let's start from the first question, why is real-time a need? And we need

376
00:40:47,920 --> 00:40:54,480
to go back to what we were saying before. I, computers, work at a pace that is way, way different

377
00:40:54,480 --> 00:41:02,320
than the human pace. Any decision that is made with wrong data has X amount of times the impact

378
00:41:02,320 --> 00:41:12,640
of a decision made by human with the wrong data. And when I started in my job, I was working in

379
00:41:14,160 --> 00:41:20,160
building BI dashboards and people in the business, they were okay in today reporting

380
00:41:20,160 --> 00:41:24,880
about the data of yesterday. And you have the entire night to load the data from the sources

381
00:41:24,880 --> 00:41:32,080
and feed the data warehouse. Nowadays, you need to make decisions about what's happening now,

382
00:41:32,080 --> 00:41:37,920
what was happening in the last few milliseconds. This is where the real time component is critical.

383
00:41:37,920 --> 00:41:44,320
The analogy that we were discussing before is, would you trust an autonomous driver to drive with

384
00:41:44,320 --> 00:41:50,240
the information, the image that it was taken five seconds ago? Never. We are talking about

385
00:41:50,240 --> 00:41:54,800
milliseconds, even less. You need the most up to date information as possible.

386
00:41:55,760 --> 00:42:04,720
Or like maybe like at the P2C example, can you afford not putting product, and let's say set up

387
00:42:04,720 --> 00:42:12,160
new outfit for a new hype train that just popped up on Instagram and other competitors are leveraging

388
00:42:12,160 --> 00:42:19,120
in it. And basically shrinking your revenue because your outdated product are not fitting what

389
00:42:19,120 --> 00:42:24,000
consumers want right now. Can you afford that? Because that's a serious loss of revenue.

390
00:42:24,000 --> 00:42:32,160
Yeah. I mean, if you're taking, like, I believe the market of betting is more used to real time.

391
00:42:32,160 --> 00:42:39,280
But for example, if you have a certain win rate for AC Milan winning against

392
00:42:39,280 --> 00:42:45,360
Juventus, and then AC Milan scores, you need to have that rate immediately changed. You cannot

393
00:42:45,360 --> 00:42:49,920
have delays because otherwise it's a huge cost for your company. I know not all the business are in

394
00:42:49,920 --> 00:42:56,400
this kind of fast moving pace, but the reality is that all it takes is one mention by one random

395
00:42:56,400 --> 00:43:02,080
person that is an Instagram star. And now your business booms, and you may need to be ready for

396
00:43:02,080 --> 00:43:08,240
that. They are wearing a new pair of shoes that you are selling in your website. Now you need to

397
00:43:08,240 --> 00:43:13,040
be ready for that. You need to be ready to have all the data about the inventory immediately

398
00:43:13,040 --> 00:43:20,800
available. Otherwise your brand is affected. Real time is a criticality. Now, how to arrive

399
00:43:20,800 --> 00:43:28,320
to real time? There are various ways that you can navigate from historical systems that were

400
00:43:28,320 --> 00:43:33,760
batch based. So every night they were loading the data from the internal system into the data

401
00:43:33,760 --> 00:43:40,320
warehouse. The first step usually is to reduce the batch time. So instead of running every night,

402
00:43:40,320 --> 00:43:46,480
you run every hour, every five minutes. That is usually the first step. However, once you start

403
00:43:46,480 --> 00:43:52,000
going into that direction, what happens usually is that in order to minimize the latency of the

404
00:43:52,000 --> 00:43:58,880
data, you are adding a lot of extra stress to the source system because your operational source

405
00:43:58,880 --> 00:44:06,960
system that has to deal with your website now has also to publish the data or to be queried by the

406
00:44:06,960 --> 00:44:14,000
extraction routine. So this is usually an intermediate step in order to move to other tools

407
00:44:14,000 --> 00:44:19,760
like Apache Kafka, Apache Kafka Connect, and implement what is called a Change Data Capture

408
00:44:19,760 --> 00:44:26,640
Solution. We change data capture, what basically is happening is that you have your internal app,

409
00:44:26,640 --> 00:44:33,200
your website app that is baked by a database, and all these technologies will listen to any change

410
00:44:33,200 --> 00:44:38,800
happening in the database and transmit that downstream. If you implement such cases, you are

411
00:44:38,800 --> 00:44:43,440
first of all, only milliseconds away from the original data being written in the database.

412
00:44:43,440 --> 00:44:49,360
Second of all, those technologies have been built since years to minimize the load on the source

413
00:44:49,360 --> 00:44:56,080
technology. So you are achieving the best of both worlds. You maintain your website operational at

414
00:44:56,080 --> 00:45:01,920
the maximum power, while at the same time, you are moving the data in real time to the other places

415
00:45:01,920 --> 00:45:06,000
in the company where you need them to be. That's exciting. That's exciting. Basically,

416
00:45:07,840 --> 00:45:14,800
minimize the latency between when the data is created and then your data warehouse is updated

417
00:45:15,760 --> 00:45:23,440
and exposing that through APIs to all the basically micro services that are fetching from the API.

418
00:45:23,440 --> 00:45:31,440
Yeah, I always tell that data in a company is not a static asset, is a journey. You ingest it,

419
00:45:31,440 --> 00:45:37,840
you transform it, and you serve in one or multiple ways. Nowadays, we need just to minimize

420
00:45:37,840 --> 00:45:44,240
the journey because we cannot wait five minutes before knowing if a customer clicked on an item

421
00:45:44,240 --> 00:45:50,800
or not. Francesco, you cannot thank you enough for being on the podcast. You really broke down

422
00:45:50,800 --> 00:45:55,200
stuff that everyone talks about and everyone probably doesn't have a clear idea of what

423
00:45:55,200 --> 00:46:00,160
they're talking about when it comes to data. For me, this was one of those episodes I have to go

424
00:46:00,160 --> 00:46:04,720
back and listen to it because it just helps me understand the right terminology and the right

425
00:46:04,720 --> 00:46:11,840
strategy. As last remark, what's the single technology approach solution that you're so

426
00:46:11,840 --> 00:46:19,520
excited about in 2025 and you think that a lot of companies are going to adopt it and use it?

427
00:46:19,520 --> 00:46:26,720
What would be the outcome on luck as a result of that? I'm a data person. I believe that a

428
00:46:26,720 --> 00:46:34,240
technology that allows you to glue the data together in real time, is solution agnostic,

429
00:46:34,240 --> 00:46:38,720
can be reused in multiple different ways. Going back to what we were saying before,

430
00:46:38,720 --> 00:46:44,720
is a decision at the core which can last years and years and years, is Apache Kafka. What we were

431
00:46:44,720 --> 00:46:50,160
saying about taking the data from the silos, reorganizing the data, providing the data upstream

432
00:46:50,160 --> 00:46:55,520
in real time in an organized, secure way, is the bread and butter of this technology.

433
00:46:55,520 --> 00:47:02,400
More and more companies are leveraging Kafka as the backbone of the data journey. I'm really excited

434
00:47:02,400 --> 00:47:11,040
about the present, the future of Kafka because it just enables companies to do the next step in their

435
00:47:11,040 --> 00:47:16,320
data journey. Where they before were just starting collecting data and exposing data manually,

436
00:47:16,320 --> 00:47:23,840
with Apache Kafka, they could jump 20 years into the new world of providing data in real time to

437
00:47:23,840 --> 00:47:30,080
the consumer, when the consumer need it. And the consumer can be humans or AI agents. It's the same.

438
00:47:30,080 --> 00:47:39,280
Now we have to basically cater to two main users, human users and agents, which is, I saw a demo of

439
00:47:39,280 --> 00:47:46,960
two agents talking in a cryptic language that was scary, yet very exciting. I think that's going to

440
00:47:46,960 --> 00:47:56,000
be, again, what's the fastest way we provide the outcome for the agents representing us. So they

441
00:47:56,000 --> 00:48:02,880
are actually thinking about how can be faster in providing services to us. I think from everyone

442
00:48:02,880 --> 00:48:09,840
operates on basically to follow their own self-interest. I think if you want to see

443
00:48:09,840 --> 00:48:16,880
humans, ourselves and our self-interest, we might well better off having agents representing us doing

444
00:48:16,880 --> 00:48:23,680
things that we would need to spend days in seconds, minutes, maximum. I mean, what is the last time

445
00:48:23,680 --> 00:48:29,520
that you had to deal yourself with spam? Oh my God. It's 10 years. It's a 15 years problem. I don't

446
00:48:29,520 --> 00:48:36,080
recall. Nothing is new. We need just to delegate more and different tasks, so we can focus on things

447
00:48:36,080 --> 00:48:43,520
that we really care about. How can folks find you? I'm sure like it's if after this part that you

448
00:48:43,520 --> 00:48:49,520
probably get a bunch of really good exposure, where people can follow you, use your content.

449
00:48:49,520 --> 00:48:55,600
There are two main places nowadays. The first one is LinkedIn. I have quite a unique name between

450
00:48:55,600 --> 00:49:00,960
my name and my surname. I don't think there are a lot of other people around as of now. So if you

451
00:49:00,960 --> 00:49:06,640
find Francisco Tisioti on LinkedIn, that's the main. The other piece is if you want to understand more

452
00:49:06,640 --> 00:49:13,360
about all this conjunction of real time, open source data pipelines, data journeys, my company

453
00:49:13,360 --> 00:49:20,160
offers a platform that allows you to do that. So go there. There is plenty again of content that I've

454
00:49:20,160 --> 00:49:25,200
wrote in the previous years and use one of those methods to contact me if you want to discuss

455
00:49:25,200 --> 00:49:30,000
forward. I appreciate it. Thank you. Thank you very much for having me. It was a real pleasure to be

456
00:49:30,000 --> 00:49:31,200
the guest in the show.