
UX for AI
Hosted by Behrad Mirafshar, CEO of Bonanza Studios, Germany’s Premier
Product Innovation Studio, UX for AI is the podcast that explores the intersection of cutting-edge artificial intelligence and pioneering user experiences. Each episode features candid conversations with the trailblazers shaping AI’s application layer—professionals building novel interfaces, interactions, and breakthroughs that are transforming our digital world.
We’re here for CEOs and executives seeking to reimagine business models and create breakthrough experiences, product leaders wanting to stay ahead of AI-driven product innovation, and UX designers at the forefront of shaping impactful, human-centered AI solutions. Dive into real-world case studies, uncover design best practices, and learn how to marry innovative engineering with inspired design to make AI truly accessible—and transformative—for everyone. Tune in and join us on the journey to the future of AI-driven experiences!
UX for AI
EP. 95 - Cowboys, Data, and the AI Gold Rush w/ Franceso Tisiot
Enterprises are racing to integrate AI — but most aren't ready for the risks. Francesco Tisiot, Field CTO at Aiven, breaks down why real-time data, strong infrastructure, and clear governance are no longer optional. If your AI agent makes decisions in milliseconds, your data better be bulletproof.
You can find Francesco here: https://www.linkedin.com/in/francescotisiot
Interested in joining the podcast? DM Behrad on LinkedIn:
https://www.linkedin.com/in/behradmirafshar/
This podcast is made by Bonanza Studios, Germany’s Premier Digital Design Studio:
https://www.bonanza-studios.com/
1
00:00:00,000 --> 00:00:06,800
So what's going on? Tell me. What's going on? I believe we are at a turning point in the industry
2
00:00:06,800 --> 00:00:13,440
where every time you see yourself around, you see that AI is solving any kind of problem. And
3
00:00:13,440 --> 00:00:20,160
possibly companies are now feeling the pressure of showcasing AI or embedding AI or integrating AI
4
00:00:20,160 --> 00:00:27,680
in everything they do. Yeah. And it's a valid aim because I believe AI can solve, can simplify,
5
00:00:27,680 --> 00:00:34,560
can accelerate a lot of what we do. But I believe everything comes with various paths that you can
6
00:00:34,560 --> 00:00:40,080
use in order to get yourself into AI. If we take the two extremes, on one side you have what I
7
00:00:40,080 --> 00:00:45,600
would call it the cowboy style where you just, you know, throw AI at every single problem,
8
00:00:45,600 --> 00:00:50,560
throw AI at every single data point that you have in your company, throw AI at every single tool.
9
00:00:51,280 --> 00:00:59,360
And that might kind of fix point problems. Fix for example, I don't know, you have a set of
10
00:00:59,360 --> 00:01:05,360
machines that are logging every thing that they do and you want a summary of that logs and you
11
00:01:05,360 --> 00:01:10,400
have a perfect AI tool that will solve that problem. But on the other side, having an holistic approach
12
00:01:11,760 --> 00:01:19,520
to AI allows you to avoid risks, for example, about exposing data to customers that shouldn't
13
00:01:19,520 --> 00:01:25,920
see some data or exposing from data to the same customers. I believe this is the interesting point
14
00:01:25,920 --> 00:01:33,680
is how do we mix all the innovation that is coming in the AI space with all the problems, solutions,
15
00:01:33,680 --> 00:01:40,480
little details, governance, rules and regulations that we have in the corporate world around our
16
00:01:40,480 --> 00:01:46,640
data and our customer data as well. So here is where we need to probably do a step back.
17
00:01:47,840 --> 00:01:52,880
And yes, willing to go into AI, but doing a little bit of steps before doing that.
18
00:01:52,880 --> 00:02:01,760
So the cowboy data, sorry, the cowboy approach is the fast approach to basically bringing some
19
00:02:01,760 --> 00:02:08,080
AI into the organization. They're probably going to be some low hanging fruit wins for the
20
00:02:08,080 --> 00:02:13,840
organization. Some of them going to save them hours of work per employee potentially per day.
21
00:02:13,840 --> 00:02:21,760
But what you told me just gave me basically increased my heartbeat by 2x is that what if
22
00:02:22,400 --> 00:02:27,360
you don't have a proper governance, you don't have proper data and infrastructure.
23
00:02:28,080 --> 00:02:35,840
I don't want to throw a lot of keywords because I'm dealing with a CTO that does this for a living.
24
00:02:35,840 --> 00:02:41,120
So I just want to be, you know, suffice with, be careful about what I'm saying because I don't want
25
00:02:41,120 --> 00:02:48,320
to sound stupid in front of you, to be honest. But the worst, the nightmare that you outline is
26
00:02:48,320 --> 00:02:56,000
something very serious. What if it exposes the wrong customer data to a wrong customer?
27
00:02:57,920 --> 00:03:03,120
What if like a customer asking about certain things, it pulls some information from other
28
00:03:03,120 --> 00:03:10,000
customers and exposing it because you basically try to bulldozer your way into the AI space
29
00:03:10,000 --> 00:03:13,840
without having enough government and security measures in place.
30
00:03:13,840 --> 00:03:21,040
Yeah, that is I believe where, from my point of view, the fun part starts is it's, with AI,
31
00:03:21,040 --> 00:03:28,000
with all the tooling that we see now, it's extremely easy to go from zero to one to have
32
00:03:28,000 --> 00:03:33,360
the first mock-up of a solution. But then to take that solution and go into production,
33
00:03:33,360 --> 00:03:38,560
that's another pair of eyes. This is where all the security settings, all the governance layer,
34
00:03:38,560 --> 00:03:44,800
all the approvals need to flow, need to happen within a company to move something, to move
35
00:03:44,800 --> 00:03:50,240
something from a prototype to something that you can use internally or externally. It really doesn't
36
00:03:50,240 --> 00:03:57,440
matter. But I believe that is the piece where things get serious. And on one side, I would say
37
00:03:57,440 --> 00:04:03,680
it's all about innovation and AI. On the other side, I believe this kind of process of thinking
38
00:04:03,680 --> 00:04:09,280
about exposing your data through AI, not through AI is nothing new. I believe if you think with
39
00:04:09,280 --> 00:04:15,360
this kind of mindset of you need to provide the right data to the right person at the right time,
40
00:04:15,360 --> 00:04:21,600
now it's AI, before it was a human looking at the dashboards. We always had this kind of rules to
41
00:04:21,600 --> 00:04:27,760
say this person is, for example, the manager of the Italian department. They should only look at
42
00:04:27,760 --> 00:04:33,360
Italian data. They shouldn't be able to look at German data. And now the same thing applies with
43
00:04:33,360 --> 00:04:40,800
AI, where a bot that is talking with customer X should only be able to get the data about customer
44
00:04:40,800 --> 00:04:46,080
X and nothing else. We are not changing the rules of the game, but possibly what we are changing here
45
00:04:46,080 --> 00:04:53,280
with AI and with all these agents in the AI space is the velocity that the decisions are making.
46
00:04:53,280 --> 00:04:59,600
If, for example, we were dealing with a human, let's say, let's showcase a very easy example.
47
00:04:59,600 --> 00:05:06,560
A human goes to a website, needs to buy shoes, and they will take maybe two minutes to decide which
48
00:05:06,560 --> 00:05:11,360
shoes to buy. Now compare this with an agent doing the same thing. They could arrive to the same
49
00:05:11,360 --> 00:05:19,120
decision in seconds. So if we provide the wrong data, and let's do one step back, if we want to
50
00:05:19,120 --> 00:05:24,560
empower the human to buy the shoes, we may want to give them the maximum amount of money that they
51
00:05:24,560 --> 00:05:30,560
can spend, so the budget. We give them $100 to spend on shoes, and they will take two minutes
52
00:05:30,560 --> 00:05:37,920
in order to define which shoe is the best and buy it. And if you think that in 10 minutes they will
53
00:05:37,920 --> 00:05:45,920
be able to buy five shoes and spend $500. Now let's say that we did a mistake and instead of
54
00:05:45,920 --> 00:05:52,480
telling that the budget was $100, we are telling the human that the budget is $1,000 a shoe. Now
55
00:05:52,480 --> 00:05:58,640
the human will still make the buy the shoes, but they will make a mistake which is capped by $5,000
56
00:05:58,640 --> 00:06:04,320
because they will have maximums buy five shoes in the 10 minutes. The agents on the other side,
57
00:06:04,320 --> 00:06:10,960
if we do the small mistake, since they can make decisions in seconds, the blast radius of such
58
00:06:10,960 --> 00:06:17,120
a mistake could be huge. We are not talking about $5,000, we may be talking about $100,000
59
00:06:17,920 --> 00:06:25,360
of loss because of a small detail in the data. So this is where providing up-to-date accurate data,
60
00:06:25,360 --> 00:06:32,000
doing all the steps that we did in the past for the AI user to provide good data in front of humans,
61
00:06:32,000 --> 00:06:37,920
we need to do the same before we expose data to AI. It's kind of the same paradigm, accelerated.
62
00:06:37,920 --> 00:06:43,600
Got it. So what you are saying, I think we were aligned on this before when we wanted to initially
63
00:06:43,600 --> 00:06:52,480
record this podcast, the game hasn't changed. When it comes to the companies, it needs to become
64
00:06:52,480 --> 00:06:59,280
data-led and digitally transformed. That's the game that enterprises especially need to play,
65
00:06:59,280 --> 00:07:06,960
but now there is AI in the mix that accelerates the outputs of such digital infrastructure by
66
00:07:08,160 --> 00:07:17,920
a lot, like exponentially a lot. And hence now you could unlock very independent agents that can
67
00:07:17,920 --> 00:07:26,560
make decisions at such a scale that will correlate to potentially making mistakes at scale as well.
68
00:07:26,560 --> 00:07:31,920
Yeah, to give you another example, let's say that tomorrow your company decides to automate
69
00:07:31,920 --> 00:07:36,640
all the LinkedIn reach-outs to potential clients and leads. Would you allow
70
00:07:37,840 --> 00:07:43,760
an agent running those to write an email with a very evident spelling mistake? No,
71
00:07:43,760 --> 00:07:50,400
because that is really bad. So you need to give the same level of trust, not only to the words,
72
00:07:50,400 --> 00:07:56,000
but to the data, because the data, it's again, a component of a message, is a component of trust
73
00:07:56,000 --> 00:08:01,920
between you and the other counterparts. The medium will change before it was a human. Now it could be
74
00:08:01,920 --> 00:08:08,800
a human assisted by AI or an AI bot, but the overall end-to-end flow is the same. Message and
75
00:08:08,800 --> 00:08:14,320
data goes together in order to arrive at the results. So Francesco, you're the field CTO of
76
00:08:14,320 --> 00:08:22,800
Aiven. You're not new to the thing, to basically, you've been doing this for years now. So I would
77
00:08:22,800 --> 00:08:30,160
like to understand the sentiment shift of your clients, pre-AI and post-AI. What you are seeing
78
00:08:30,160 --> 00:08:34,800
when you're talking, we probably talk to a lot of clients or leads or prospect, whatever the case,
79
00:08:34,800 --> 00:08:43,200
what you are seeing that has been drastically changed, let's say, since 2022 onwards. I would
80
00:08:43,200 --> 00:08:49,840
say that they think the question that I keep asking or receiving from clients is, who are you
81
00:08:49,840 --> 00:08:57,280
building for? And that is a drastic change in the mindset of building any kind of product. Because
82
00:08:57,280 --> 00:09:03,360
if your aim is to build for humans and there is a huge market to build for humans, a lot of what we
83
00:09:03,360 --> 00:09:11,840
did traditionally still works. However, we see a new wave of product concepts that are built for
84
00:09:11,840 --> 00:09:19,360
humans and AI. There are not a lot of changes in the rules of the game, but there are some.
85
00:09:19,360 --> 00:09:26,080
The first one is that APIs become the most relevant thing that you need to think of.
86
00:09:26,800 --> 00:09:32,240
Yes, you need to have a nice website, an eye-catching website, and the AI can parse a website.
87
00:09:32,240 --> 00:09:38,320
But where the core of the interaction and possibly the acceleration is, is in the APIs.
88
00:09:40,160 --> 00:09:46,000
AI, an agent, can arrive to your website and quickly find out how to spin up your resources,
89
00:09:46,000 --> 00:09:53,200
how to use your product by itself, by using APIs. Well, that's a win, because now you don't need
90
00:09:53,200 --> 00:09:59,440
someone to come understand you code and start implementing. Everything can be automated. All
91
00:09:59,440 --> 00:10:05,840
the onboarding is way, way faster. All the usage is way, way faster. So you are scaling much, much
92
00:10:05,840 --> 00:10:12,880
more. So all this concept of who are you building for? What is your target? How can you build
93
00:10:12,880 --> 00:10:18,720
products that are both usable by humans and AI becomes really, really relevant because it allows
94
00:10:18,720 --> 00:10:24,720
your business, your company to scale or to propose your asset in very, very different ways.
95
00:10:24,720 --> 00:10:30,480
That's fascinating. So basically the way I understood it and perceive what you say is that
96
00:10:30,480 --> 00:10:36,160
it reminded me of the time that you remember like the mobile websites that's, hey, you need
97
00:10:36,160 --> 00:10:42,400
to design your websites to be mobile friendly because a lot of users coming and use your content
98
00:10:42,400 --> 00:10:49,360
by a mobile. But what you're saying now is that not only that, now you need to build your infrastructure
99
00:10:49,360 --> 00:10:55,520
that is AI agent friendly. So other agents could come in and plug into your services,
100
00:10:55,520 --> 00:11:02,000
could make sense of them and potentially offer what you're offering to the person who prompting them.
101
00:11:03,280 --> 00:11:10,320
As fast as us? That's fascinating. Okay. And that's the last piece that comes on that is
102
00:11:10,320 --> 00:11:17,280
that it's not only about surfacing the information in a way that an agent can pick it up, but also
103
00:11:17,280 --> 00:11:25,520
it's working at the speed that those agents, those automation operate. So it's providing
104
00:11:25,520 --> 00:11:31,280
infrastructure, providing any kind of assets in seconds, milliseconds, providing a way to
105
00:11:31,280 --> 00:11:36,240
transmit data in milliseconds, provide a way to join data coming from different sources in
106
00:11:36,240 --> 00:11:43,200
milliseconds. This is where I see a lot of the innovation as we are at the race to become faster
107
00:11:43,200 --> 00:11:48,720
and faster. Everything is becoming faster and faster. And your need of time, you keep repeating
108
00:11:48,720 --> 00:11:54,480
milliseconds is milliseconds. It depends. This is an interesting piece where there is a little bit
109
00:11:54,480 --> 00:12:01,840
of dichotomy in the world because nowadays after like OpenAI and all this beautiful AI tools came
110
00:12:01,840 --> 00:12:07,600
out, we somehow rediscovered the fact that we can wait for something to come back to us within a few
111
00:12:07,600 --> 00:12:14,240
seconds. Right. Deep search, deep reasoning. Yes. On the other side, the automation that we can build
112
00:12:14,240 --> 00:12:20,400
with some of this tool, if we narrow the focus, it's astonishing. And the speed of this tool is
113
00:12:20,400 --> 00:12:26,640
astonishing as well. So I go like, I was thinking about this podcast and I was thinking about what
114
00:12:26,640 --> 00:12:32,560
is an example that could resonate really well with people. I believe with the good analogy,
115
00:12:32,560 --> 00:12:38,720
which is nowadays, self-driving cars are something that we know they exist in the world.
116
00:12:38,720 --> 00:12:44,480
So the question that I will ask to companies is, would you trust a car that is driving,
117
00:12:44,480 --> 00:12:53,360
checking the image or the status as it was five or 10 seconds ago? Probably not. So why do you
118
00:12:53,360 --> 00:12:59,920
trust the same to happen within your company? Nowadays, we need to feed all our data stakeholders
119
00:12:59,920 --> 00:13:05,920
with the most complete up to date status of the world. Because if we don't do that, well,
120
00:13:05,920 --> 00:13:10,320
it's not a crash in our case, a physical crash, but it's a crash in the business.
121
00:13:10,320 --> 00:13:17,520
If you are keeping selling tickets for a concert that is already sold off, then you need to give
122
00:13:17,520 --> 00:13:25,440
back money to people and your fame, your brand is the perception of your brand is altered because
123
00:13:25,440 --> 00:13:31,280
you were not dealing with live correct data. So, and that also like double down why the cowboy
124
00:13:31,280 --> 00:13:39,040
approach might be actually backfiring because if you don't have that data infrastructure in place,
125
00:13:39,040 --> 00:13:46,800
you might end up creating so much mess, unwanted mess for yourself because especially for businesses
126
00:13:46,800 --> 00:13:54,560
that they have to cover real time use cases, that could become like quickly undoable nightmare.
127
00:13:56,160 --> 00:14:02,640
Here is another interesting piece. Now we have AI agents, which are our reality. And these AI agents
128
00:14:02,640 --> 00:14:08,720
are this ability to perform callback function. So they can call back system A and system B to
129
00:14:08,720 --> 00:14:14,640
get for example, from system A, like it happens in a lot of enterprises, you have your customer data
130
00:14:14,640 --> 00:14:19,520
sitting in a place, your sales data sitting in a place, your inventory data sitting in another
131
00:14:19,520 --> 00:14:25,120
place. And the agent could call those three systems and retrieve the information. However,
132
00:14:25,120 --> 00:14:31,360
you are providing to the agent the responsibility of getting that data and integrating those data
133
00:14:31,360 --> 00:14:37,120
and creating a unified view for which could be a sensible option in some ways. But on the other
134
00:14:37,120 --> 00:14:43,280
side, you may want to have that integration defined in just one way, in a standard way that
135
00:14:43,280 --> 00:14:49,760
is always up to date, that is can be shared across 10 different use cases. Because otherwise,
136
00:14:49,760 --> 00:14:55,360
tomorrow you have a bot that answers the phone and give some information. You have another bot,
137
00:14:55,360 --> 00:15:01,360
another AI agent that does the same for email. Those agents have slightly different ways of
138
00:15:01,360 --> 00:15:05,840
integrating the data. They end up giving completely different replies to customers.
139
00:15:05,840 --> 00:15:11,920
And again, you are back to the Tesla example where something that was true 10 seconds ago
140
00:15:11,920 --> 00:15:18,640
is not true anymore now. Wow. So there is a lot of established traditional businesses out there,
141
00:15:18,640 --> 00:15:24,480
right? And you know them much better than I do, most likely, because that's what you're working
142
00:15:24,480 --> 00:15:29,760
with them. And I think it's, for me, at least, I don't know about you, the sales call, when it
143
00:15:29,760 --> 00:15:37,760
comes to, okay, we are lacking digital infrastructure, it's becoming way easier than before,
144
00:15:37,760 --> 00:15:42,960
because now they are sensing what's going on. They're seeing their competitors doing crazy
145
00:15:42,960 --> 00:15:50,480
things, offering services at such a scale, with decent enough quality that they need to act upon.
146
00:15:50,480 --> 00:15:56,320
So I think there is no convincing, it's a matter of when we can start. What problems you are seeing,
147
00:15:56,320 --> 00:16:02,640
especially within this group of companies, established traditional businesses that hinder
148
00:16:02,640 --> 00:16:10,640
them in a sense that they cannot potentially think about integrating AI? This goes in different layers,
149
00:16:10,640 --> 00:16:17,680
depending on the maturity of the company. Okay. Let's set a baseline. In 2025, I wouldn't accept
150
00:16:17,680 --> 00:16:22,960
a company that doesn't have at least a minimum digital footprint. That is the level that I'm
151
00:16:22,960 --> 00:16:28,960
playing. We cannot go to the local shop in the town that sells flowers and records every sale
152
00:16:28,960 --> 00:16:34,800
in a book. History is still there somehow, but it's not the level. We are talking about companies
153
00:16:34,800 --> 00:16:41,760
that have a basic digitalization already in place. But still, I would say that I've been dealing
154
00:16:41,760 --> 00:16:47,920
within my time at I've been, and within my time before, I did 15 years of enterprise consulting
155
00:16:47,920 --> 00:16:54,320
in the data space. I believe that the consistent problem that I saw in enterprises, in big and
156
00:16:54,320 --> 00:17:00,720
small enterprises, in big and small companies, is that companies have this various amount of
157
00:17:00,720 --> 00:17:07,840
different tools. The data is usually very segmented in different data tools because business
158
00:17:07,840 --> 00:17:13,200
unit A needs a specific tool to solve a specific problem so that tool is on board and then three
159
00:17:13,200 --> 00:17:20,000
years down the road, you have a set of critical information only living in that system. So this is
160
00:17:20,000 --> 00:17:28,240
the first challenge that I see is, if you want to empower consistent usage of data for AI, you need,
161
00:17:28,240 --> 00:17:35,760
first of all, to map all your data assets across all the technologies and come up with a way to
162
00:17:35,760 --> 00:17:41,040
bring them together. This is what we were saying before. You don't let the agents call three
163
00:17:41,040 --> 00:17:48,160
different places. You solve part of the problem by integrating the data yourself. So you take all
164
00:17:48,160 --> 00:17:55,840
these data silos and you say my customer data, my inventory data, my sales data, my orders by my
165
00:17:55,840 --> 00:18:00,960
clickstream in the website. At a certain point, they come all together. Why is this crucial? Because
166
00:18:01,520 --> 00:18:06,480
AI or not AI, everything is based on data. You need to make decisions based on data. Your boss
167
00:18:06,480 --> 00:18:13,520
will make decisions based on data. For example, if my agent needs to suggest a discount to a
168
00:18:13,520 --> 00:18:20,000
certain person, I would like to analyze the clickstream of that person, the last few seconds,
169
00:18:20,000 --> 00:18:25,840
hours, days, the previous sales, the location, all these things usually come from different places.
170
00:18:25,840 --> 00:18:32,160
So I need to have this unified view about the customer in this case. And this is usually the
171
00:18:32,160 --> 00:18:38,160
first step. You start from a huge variety of places and you try to condense into one place.
172
00:18:39,200 --> 00:18:46,240
Couple of follow up question on this. Let's say you've got, let's just name Hopspot for your CRM.
173
00:18:47,440 --> 00:18:53,200
Let's just say then you've got some resource planning tool to manage time off, whatever the
174
00:18:53,200 --> 00:18:59,200
case may be. They're integrated because whoever that goes on time off, maybe your AE then you need
175
00:18:59,200 --> 00:19:04,240
to basically another AE taking care of, there are some connections if you want to look deeper.
176
00:19:04,240 --> 00:19:10,160
Then you got a bunch of applications that you're using for creating documents, examination,
177
00:19:10,160 --> 00:19:15,920
whatever the case may be. So what you are saying is don't rely on the databases,
178
00:19:15,920 --> 00:19:23,120
pull the data from the API coming in and create your own database. And like basically based on
179
00:19:23,120 --> 00:19:30,800
the criteria that you see fit for the future use cases that you can leverage AI. Basically go from
180
00:19:30,800 --> 00:19:36,800
data silos that you have database on different application, bring them all together in a cohesive
181
00:19:36,800 --> 00:19:43,520
organized database that you can control that. That's what you're saying. Yeah, I believe that
182
00:19:43,520 --> 00:19:50,800
this is usually the first step because until you map all these data assets and you unify them,
183
00:19:50,800 --> 00:19:57,360
you don't have also any means to understand if they talk the same language, if you have duplicates.
184
00:19:57,360 --> 00:20:03,440
And this is kind of the second step. Once you mix everything together, you understand what the data
185
00:20:03,440 --> 00:20:11,840
quality across your inventory of data is. If you have, for example, customer is a customer,
186
00:20:11,840 --> 00:20:18,720
but customer across five different tools could be five different things. Absolutely. So this is where
187
00:20:18,720 --> 00:20:25,520
yes, you could rely on AI agent to solve this, but maybe this is a business decision that you
188
00:20:25,520 --> 00:20:32,800
have to make. What is a customer for our end? If I'm trying to give you a discount, should I think
189
00:20:32,800 --> 00:20:39,600
about only your data or anyone in your company and their behavior in my website in the last week?
190
00:20:39,600 --> 00:20:46,160
Customer is a customer, but it's not. It could be very different depending on who I want to talk
191
00:20:46,160 --> 00:20:52,480
with and which kind of behavior I would like to try. A follow-up question on basically the previous
192
00:20:52,480 --> 00:20:58,800
step, which is basically going from data silo to integrated organized data layer. I know for a fact
193
00:20:58,800 --> 00:21:05,040
that there are a lot of organizations right now using some outdated applications that they don't
194
00:21:05,040 --> 00:21:13,440
offer APIs. What should they do with those apps? If you were to consult with them, and I know for
195
00:21:13,440 --> 00:21:20,560
example, that one of our clients, one of the core part of their operation, they use an outdated
196
00:21:20,560 --> 00:21:30,640
application that doesn't have any API. So basically we cannot access to that data. So should they
197
00:21:31,600 --> 00:21:36,800
kill that application and build something in-house? Should they migrate to a new application
198
00:21:36,800 --> 00:21:40,640
that offer API? What's the best approach here in your opinion?
199
00:21:44,640 --> 00:21:51,680
I believe here I'm in the camp of if you are using, if you're leveraging a tool that doesn't
200
00:21:51,680 --> 00:21:58,480
allow you to export your data, you're basically on a time bomb. At a certain point this will
201
00:21:58,480 --> 00:22:04,080
explode like part of your business will not work anymore. And this drives me to one thing that I
202
00:22:04,080 --> 00:22:10,240
always share with clients and prospects. There are two types of technological decisions that a
203
00:22:10,240 --> 00:22:15,360
company needs to make. One is about innovation, what they call innovation at the edge. Innovation
204
00:22:15,360 --> 00:22:22,400
at the edge is where you want to bet for the next months or years in your company. And it's, you know,
205
00:22:22,400 --> 00:22:30,080
in this time, it could be the latest OpenAI model compared to the latest model from
206
00:22:30,080 --> 00:22:36,080
another vendor. It's a bet where you can put a lot of risk in because you know that something
207
00:22:36,080 --> 00:22:41,600
completely new will change, will come in, you know, a few weeks, few months, and you need to
208
00:22:41,600 --> 00:22:46,640
go there and make this bet with the knowledge that you can also lose. And there is a potential
209
00:22:46,640 --> 00:22:52,640
rework that needs to be done for that decision to pay off or for that decision to evolve.
210
00:22:52,640 --> 00:22:58,960
What's the contrary? The contrary is where you make decision, technological decision at the core
211
00:22:58,960 --> 00:23:04,560
of your company. Decisions at the core, like the choice of a database, the choice of an application,
212
00:23:04,560 --> 00:23:10,720
are choices, no matter what people say about migrating from database A to database B, that
213
00:23:10,720 --> 00:23:18,400
will stay long in the life of the company. A choice of a database is a choice that will last decades.
214
00:23:18,400 --> 00:23:24,800
And this is where I'm always suggesting when it comes to the decision at the core, try to
215
00:23:24,800 --> 00:23:31,520
choose something which allows you to have a lot of choice. And open sourcing this space gives you an
216
00:23:31,520 --> 00:23:37,760
amount, an amazing amount of choice. If you think about Postgres, not only there are now 20, 50,
217
00:23:37,760 --> 00:23:42,960
100 different vendors of Postgres, but the format of the data that is stored in Postgres is always
218
00:23:42,960 --> 00:23:48,000
there. The mechanism of taking the data from a Postgres instance to another are always there.
219
00:23:48,000 --> 00:23:53,200
So you are playing a game that allows you to be completely and always in control of your data.
220
00:23:53,200 --> 00:23:58,000
So this is what I'm always suggesting is when you take critical decision about
221
00:23:58,000 --> 00:24:04,400
stuff that will last in your company, don't look at only the features that are about the usage now,
222
00:24:04,400 --> 00:24:13,120
but think always with your mind about how portable, how sustainable is this solution in the
223
00:24:13,120 --> 00:24:19,760
future. Because those are very two different questions and the optionality once you define
224
00:24:19,760 --> 00:24:25,360
this kind of metrics becomes very, very different. Back to your question though, so there be
225
00:24:25,360 --> 00:24:30,320
rebuilding the application. My opinion, if it's a critical application for the company should be
226
00:24:30,320 --> 00:24:35,920
built on top of what I would call open data formats, should be built on top of a Postgres
227
00:24:35,920 --> 00:24:41,920
database, which is fully open source, should be built with tools that are open source libraries
228
00:24:41,920 --> 00:24:48,000
that everybody can learn, study and accept. I believe this is the beauty where open source
229
00:24:48,000 --> 00:24:54,080
is not a cost reduction factor, or not only a cost reduction factor, open source is a pool of talent.
230
00:24:54,080 --> 00:25:00,160
It's a portability, it's optionality for companies. Yeah, it's called being written without you paying
231
00:25:00,160 --> 00:25:06,320
anyone. It's just basically a collective endeavor that everyone knows that if you contribute,
232
00:25:06,320 --> 00:25:11,200
everyone else will benefit from that collective. Yeah, I believe it's and, you know, from a company
233
00:25:11,200 --> 00:25:17,520
point of view, yes, it's like open source can give you a lot of good things. I would always think
234
00:25:17,520 --> 00:25:23,120
about sustaining open source because yes, we have this idea that is free. Someone else will develop.
235
00:25:23,120 --> 00:25:29,360
Look that this someone else is a human with families, with kids to feed, with a life. And
236
00:25:29,360 --> 00:25:34,400
there are ways, there are very effective ways also to feedback some of the value back to this open
237
00:25:34,400 --> 00:25:40,560
source communities to have them being healthier. So back to basically our roadmap, let's say we
238
00:25:40,560 --> 00:25:47,840
moved away from data silos to organize data layer. Now we have a data layer that we can work with.
239
00:25:47,840 --> 00:25:54,320
But you mentioned something really interesting. I think the way you phrase it, it stuck with me that
240
00:25:54,320 --> 00:26:02,000
every application in your ecosystem sees certain actor differently than the others. Your Google
241
00:26:02,000 --> 00:26:11,600
analytics sees the lead coming through your funnel differently from your hotspot. So there is a need
242
00:26:11,600 --> 00:26:23,440
to create a shared understanding of data regarding every user type in our database.
243
00:26:24,480 --> 00:26:32,000
What does that mean exactly? Do you recommend, for example, you map fields from one app to the other
244
00:26:32,000 --> 00:26:38,480
app and sort of create a, basically create a detailed map of how we are going to pull information
245
00:26:38,480 --> 00:26:44,000
from each app and like combine them and add them to the our data layer. What's, what are the best
246
00:26:44,000 --> 00:26:49,600
practices here? Yeah, I believe what you said is the basic best practice is understanding
247
00:26:49,600 --> 00:26:56,800
what kind of input sources you have, understanding at what level do they work? Do you have, for example,
248
00:26:56,800 --> 00:27:04,160
I don't know, do you have all your customer being flagged with customer ID, customer ID from tool
249
00:27:04,160 --> 00:27:09,440
one might be equal or completely different from customer ID of tool two. So how do you map the
250
00:27:09,440 --> 00:27:17,520
two customers? How do you aggregate up the data? How do you join different tables coming from
251
00:27:17,520 --> 00:27:23,280
different fields? How do you get a holistic view of your customers? All of this, again,
252
00:27:23,280 --> 00:27:31,200
is not something new. We have decades of history in data warehousing, for example, that allows us to
253
00:27:31,200 --> 00:27:37,600
set up a set of basic rules to collect the data first and then understand what is the data quality
254
00:27:37,600 --> 00:27:44,400
of these data feeds, what is the data quality of the join of those feeds, and then create a curated
255
00:27:44,400 --> 00:27:51,920
view of all our data sources in a data warehouse on in a similar pattern. And another point to
256
00:27:52,880 --> 00:28:02,960
what you said earlier is that you were not inclined to expose AI to raw data because that gives
257
00:28:02,960 --> 00:28:13,840
a lot of room for errors and hallucinations. It's not only that. There are several components to this.
258
00:28:13,840 --> 00:28:19,680
When you start exposing AI to raw data, you could get good insights. For example, if you expose an
259
00:28:19,680 --> 00:28:26,400
AI tool to a row list of logs, you will be able to get some interesting information that probably
260
00:28:26,400 --> 00:28:33,600
the agent will be able to parse in much, much faster time compared to a human. So it's a good
261
00:28:33,600 --> 00:28:40,400
use case. However, there are two things there. One is raw data usually is on a way, way bigger
262
00:28:40,400 --> 00:28:48,640
scale compared to aggregated data. And in this world where you pay AI per token can be really
263
00:28:48,640 --> 00:28:55,280
expensive. The second thing is exposing AI to raw data, you could face the risk of, for example,
264
00:28:55,280 --> 00:29:02,240
exposing PII data because your customer in your data source will have everything in it. And if
265
00:29:02,240 --> 00:29:07,840
you just put an AI bot on top of it, this means that they can do something good or something not
266
00:29:07,840 --> 00:29:13,760
good and you don't have a lot of definition. So this is where creating an abstraction layer
267
00:29:13,760 --> 00:29:19,120
on top of the raw data that allows you to define what are the fields that should be used, how they
268
00:29:19,120 --> 00:29:25,680
should be used, what is the level of abstraction that an AI agent should use in order to calculate
269
00:29:25,680 --> 00:29:31,920
a forecast, calculate a discount is where it's basically you need to prepare your data to be fed
270
00:29:31,920 --> 00:29:38,880
to AI. And this is where for a corporate point of view, this is where you need to apply all the
271
00:29:40,320 --> 00:29:49,680
internal laws about what data you can use, how you should use, what a certain agent, human, should
272
00:29:49,680 --> 00:29:56,160
see. Because it's not about PII data or not PII data. It's also about what we were saying before.
273
00:29:56,160 --> 00:30:02,480
If you are coming to the website and I show you the data of someone else, this is an extremely
274
00:30:02,480 --> 00:30:09,360
bad experience. This is data breach. So yes, let's try to minimize the number of steps needed. At the
275
00:30:09,360 --> 00:30:16,160
same time, let's keep security as a top priority. Because if we don't nail security in this kind of
276
00:30:16,160 --> 00:30:22,960
interaction, again, with the scale that this kind of agents can work, we could create a massive
277
00:30:22,960 --> 00:30:29,600
problem. You said something earlier that was very beautiful. I think you have a good way of using
278
00:30:29,600 --> 00:30:36,080
examples that everyone could understand. In any ERP system, you have different user types and roles.
279
00:30:36,080 --> 00:30:41,600
For example, a paying officer should only see stuff about Spain. The same thing, the way I,
280
00:30:42,400 --> 00:30:51,440
I mean, we can apply the same rational to agent. There will be a user type agent and the agent could
281
00:30:51,440 --> 00:30:58,320
only see this type of information, right? And not the other types. That's basically where we are
282
00:30:58,320 --> 00:31:05,840
heading is that, okay, limit what AI agent could get exposed to. And, you know, someone could say,
283
00:31:05,840 --> 00:31:12,000
well, there is the system prompt that allows you to do that. Because in any kind of modern LLM,
284
00:31:12,000 --> 00:31:16,720
you have the prompt itself and the system prompt that detects, that defines the behavior. So you
285
00:31:16,720 --> 00:31:22,000
could say in the system prompt, hey, look, only use field A, B, and C. To me, that's
286
00:31:23,840 --> 00:31:30,160
something to investigate, but it's to make another probably silly example, me telling you,
287
00:31:31,200 --> 00:31:37,600
hey, I did something really bad, please don't tell anybody. And now I have to rely on your
288
00:31:38,480 --> 00:31:44,160
trust and willingness to not share that information with anybody else. If someone comes to you and
289
00:31:44,160 --> 00:31:49,440
says, no, no, I need to know everything about Francesco, you will tell them. Compare this to
290
00:31:50,080 --> 00:31:54,320
you not being able to know something, some information, there is no way
291
00:31:56,160 --> 00:32:01,840
for you to get the information in the first place. This is where we should start with security,
292
00:32:01,840 --> 00:32:07,840
not add security in a second step. I would be very hesitant to leave
293
00:32:07,840 --> 00:32:13,920
instructing AI agent on system prompt layer. I think that's to search for businesses. You need
294
00:32:13,920 --> 00:32:23,520
to basically say, here's a data bucket, let's call it like this, that AI would not get exposed to.
295
00:32:23,520 --> 00:32:29,840
Here's the interpretation of that bucket they could get exposed to, but not the raw data of it.
296
00:32:29,840 --> 00:32:38,480
Yep. And also, the more you expose raw data and then an outcome, the more you allow an external
297
00:32:38,480 --> 00:32:45,440
actor to basically be able to draw the line between the raw data and the outcome. So you
298
00:32:45,440 --> 00:32:50,960
are potentially also exposing business decisions because you say, you may have internal rules that
299
00:32:51,920 --> 00:32:58,000
define what is the discount percentage on a certain item. If you expose the discount percentage and
300
00:32:58,000 --> 00:33:03,200
the raw data, someone with enough time and enough willingness could be able to draw the line, OK,
301
00:33:03,200 --> 00:33:09,840
you use point A, B and C to tell me that you give me a 30% discount. While if I fake an account and
302
00:33:09,840 --> 00:33:15,760
I change my hair color from brown to blonde, I will get 55 discounts. Is there something that
303
00:33:15,760 --> 00:33:22,080
you want? It's a risk. I mean, curating the data is not the most exciting or most newest type of work,
304
00:33:22,080 --> 00:33:27,920
but it's critical for AI. Most of the enterprises, what I will say, especially the digital innovation
305
00:33:27,920 --> 00:33:34,400
officer I work with, they have really good heart. They don't see the bad actors so much in their
306
00:33:34,400 --> 00:33:40,880
decision. What I like to say always to them is that, look, yes, a lot of people are acting
307
00:33:40,880 --> 00:33:47,520
on goodwill, but there is few bad actors with so much evil energy and so much capability.
308
00:33:47,520 --> 00:33:52,880
If you hit one of those, they are going to create a nightmare that you cannot get out of.
309
00:33:53,600 --> 00:34:01,360
I don't want to scare people out, but I've seen a website called lovable.com, if I remember correctly.
310
00:34:01,360 --> 00:34:08,960
You can go there and you can write the app that you want to create and it will create the app in
311
00:34:08,960 --> 00:34:16,560
seconds. I believe they were able to create 10,000 apps a day. What you're saying about
312
00:34:16,560 --> 00:34:23,680
one actor dedicating time, this could not scare people off, but when we talk about AI, AI could
313
00:34:23,680 --> 00:34:28,560
be potentially used for good, but also for bad. We have to also go away from the fact that we are
314
00:34:28,560 --> 00:34:34,400
thinking about the bad actor being a human. A bad actor could be an AI bot as well. Now, it's even
315
00:34:34,400 --> 00:34:41,200
more critical for us to do safe steps in exposing anything to the public because what before could
316
00:34:41,200 --> 00:34:48,560
be done by a single person with enough time, now we have a huge amount of bots that could do that
317
00:34:48,560 --> 00:34:54,720
in an incredible amount of, with an incredible volume in a little chain. That's a scary scenario.
318
00:34:54,720 --> 00:35:01,920
I see where you're coming from. I shouldn't scare people because data is not my specialty,
319
00:35:01,920 --> 00:35:08,000
but if it's coming from you, I think people will take it with great, will pay great attention to
320
00:35:08,000 --> 00:35:11,920
it because you're dealing with data. I think that's something that... I think security is
321
00:35:11,920 --> 00:35:17,440
something that is being on their plate so much, I would say right now, because there is a lot of
322
00:35:17,440 --> 00:35:23,680
cowboyism is going on. A lot of enterprises, they want to love the term innovation at the age. They
323
00:35:23,680 --> 00:35:28,800
are paying so much attention to the innovation at the age and really not paying enough attention to
324
00:35:28,800 --> 00:35:34,880
their core infrastructure. And that I think very... It's going to be a... It's a high-risk environment,
325
00:35:34,880 --> 00:35:43,200
I would say. I mean, one thing that we also... I want to go away without this always scary image.
326
00:35:43,200 --> 00:35:48,000
I would say that... Let's go back to the beginning. None of this is new. We have a solid
327
00:35:48,000 --> 00:35:55,920
understanding on how to build secure, scalable, fast data platforms that allow us to expose safely
328
00:35:55,920 --> 00:36:01,680
everything we built. It's just a matter of thinking correctly at every step. Yes, follow the innovation.
329
00:36:01,680 --> 00:36:07,360
Yes, have a budget to follow the innovation. On the other side, prepare to expose anything you have
330
00:36:07,360 --> 00:36:14,160
to the innovation in a safe way that is not only safe to implement and use, but is also very easy
331
00:36:14,160 --> 00:36:20,240
to monitor and understand what's happening. If you build something like this, it will serve you for
332
00:36:20,240 --> 00:36:25,840
any use case, AI or not. So, before we get to the real time, I would like to go back to the real time
333
00:36:25,840 --> 00:36:32,800
you touched upon. And I think it's such a core component of any AI-powered application
334
00:36:32,800 --> 00:36:39,840
that you want to basically get behind. I think it's basically out of all the discussion we had,
335
00:36:39,840 --> 00:36:47,680
this is maybe the simplest to talk about, but very important. How do you use these LLMs, local, or
336
00:36:47,680 --> 00:36:53,440
like you tap into OpenAI APIs, or whatever the case, or whichever API you want to use?
337
00:36:53,440 --> 00:36:58,560
I could see a scenario that especially for the innovation at the edge, you want to tap into this
338
00:36:58,560 --> 00:37:06,560
API, but what kind of information you're exposing to needs to be limited and obscure enough.
339
00:37:06,560 --> 00:37:12,640
What's your take on this when it comes to basically leveraging LLMs in your operation?
340
00:37:12,640 --> 00:37:18,880
So, we did an interview to, I believe, 100 C-level employees at various companies in EMEA
341
00:37:18,880 --> 00:37:24,960
and the US a few months ago with Ivan. And what we found out was that most of the people who
342
00:37:24,960 --> 00:37:29,920
replied were saying that they were using off-the-shelf models from OpenAI or other vendors.
343
00:37:29,920 --> 00:37:35,360
And it makes sense because at the time, OpenAI or other vendors were the most advanced. So,
344
00:37:35,360 --> 00:37:40,720
if you are working on the edge, on the innovation, you want to first test that there is something
345
00:37:40,720 --> 00:37:48,000
valuable there. And the burden of recreating all the stack in a private way, maybe it's too much
346
00:37:48,000 --> 00:37:51,520
if you are just seeking the innovation and the validation at the beginning.
347
00:37:51,520 --> 00:37:57,520
On the other side, what we also saw in the same interview is that data privacy is a major problem,
348
00:37:57,520 --> 00:38:02,400
data security, data freshness, are a major problem. And all this speaks about, okay,
349
00:38:02,400 --> 00:38:08,320
you start with something that is out there in order to prototype, but then there is the willingness to
350
00:38:08,880 --> 00:38:14,960
create a more secure LLM usage within some safe boundaries. Within a boundary where you exactly
351
00:38:14,960 --> 00:38:20,960
know that this data that you're sending out is secure and the data that you're sending out
352
00:38:20,960 --> 00:38:26,480
possibly is not even used to retrain the model. Because when you send that out, you may know the
353
00:38:26,480 --> 00:38:30,400
boundaries and that might be contractual boundaries, but the world is your oyster.
354
00:38:30,400 --> 00:38:35,440
There will be so many vendors. So, what I will see in the future is probably an initial approach
355
00:38:35,440 --> 00:38:41,120
where you will use the off-the-shelf models because that's the first way to get the result.
356
00:38:41,120 --> 00:38:48,880
And then probably I will see an increase of companies where they take an off-the-shelf
357
00:38:48,880 --> 00:38:57,520
open source model. They fine tune it locally. And then they apply a rug or other system to
358
00:38:57,520 --> 00:39:03,120
make it personalized and run it in a safe place where they know whether it is coming from and
359
00:39:03,120 --> 00:39:08,480
where it's going. Basically, a potential approach would be okay. Use the off-the-shelf
360
00:39:08,480 --> 00:39:13,360
with limited data exposure for validation. See which one works and which one doesn't work.
361
00:39:13,360 --> 00:39:18,720
Then when you see there are real use cases that actually could help your organization move faster,
362
00:39:18,720 --> 00:39:26,560
then attempt to basically internalize it using your own setup when it comes to that LLM of your
363
00:39:26,560 --> 00:39:32,800
choice and exposing them potentially to more data. Yeah. One interesting thing that you made me think
364
00:39:32,800 --> 00:39:38,080
about is if you use off-the-shelf data, off-the-shelf models, and you are sending out the data,
365
00:39:38,080 --> 00:39:43,520
think about sending the same data out to a customer because that is a good mindset.
366
00:39:43,520 --> 00:39:48,160
If you are sending that data out to a model and that model will talk to a customer, you're just
367
00:39:48,800 --> 00:39:53,760
a minute away from sending that to the customer. So when you think about sending that data out,
368
00:39:53,760 --> 00:40:00,480
is it secure? Is it okay? Should you send this piece of data out? If not, you have some work to
369
00:40:00,480 --> 00:40:05,120
do in order to understand what data you should expose to that model, to that interaction.
370
00:40:05,120 --> 00:40:12,560
Real-time-ism, I think, it's becoming a religion than anything else. So why is it important, right?
371
00:40:12,560 --> 00:40:19,440
We talked about it a bit, right? Maybe if you can touch upon it again, that would be great. And like
372
00:40:19,440 --> 00:40:26,800
based on your read of the market and the companies that you do work with, how far are these companies
373
00:40:26,800 --> 00:40:33,840
from having the right infrastructure in place to provide real-time services and experiences? And
374
00:40:33,840 --> 00:40:40,800
then what could they do to basically move faster towards real-time offerings and products and
375
00:40:40,800 --> 00:40:47,920
services? Okay. So let's start from the first question, why is real-time a need? And we need
376
00:40:47,920 --> 00:40:54,480
to go back to what we were saying before. I, computers, work at a pace that is way, way different
377
00:40:54,480 --> 00:41:02,320
than the human pace. Any decision that is made with wrong data has X amount of times the impact
378
00:41:02,320 --> 00:41:12,640
of a decision made by human with the wrong data. And when I started in my job, I was working in
379
00:41:14,160 --> 00:41:20,160
building BI dashboards and people in the business, they were okay in today reporting
380
00:41:20,160 --> 00:41:24,880
about the data of yesterday. And you have the entire night to load the data from the sources
381
00:41:24,880 --> 00:41:32,080
and feed the data warehouse. Nowadays, you need to make decisions about what's happening now,
382
00:41:32,080 --> 00:41:37,920
what was happening in the last few milliseconds. This is where the real time component is critical.
383
00:41:37,920 --> 00:41:44,320
The analogy that we were discussing before is, would you trust an autonomous driver to drive with
384
00:41:44,320 --> 00:41:50,240
the information, the image that it was taken five seconds ago? Never. We are talking about
385
00:41:50,240 --> 00:41:54,800
milliseconds, even less. You need the most up to date information as possible.
386
00:41:55,760 --> 00:42:04,720
Or like maybe like at the P2C example, can you afford not putting product, and let's say set up
387
00:42:04,720 --> 00:42:12,160
new outfit for a new hype train that just popped up on Instagram and other competitors are leveraging
388
00:42:12,160 --> 00:42:19,120
in it. And basically shrinking your revenue because your outdated product are not fitting what
389
00:42:19,120 --> 00:42:24,000
consumers want right now. Can you afford that? Because that's a serious loss of revenue.
390
00:42:24,000 --> 00:42:32,160
Yeah. I mean, if you're taking, like, I believe the market of betting is more used to real time.
391
00:42:32,160 --> 00:42:39,280
But for example, if you have a certain win rate for AC Milan winning against
392
00:42:39,280 --> 00:42:45,360
Juventus, and then AC Milan scores, you need to have that rate immediately changed. You cannot
393
00:42:45,360 --> 00:42:49,920
have delays because otherwise it's a huge cost for your company. I know not all the business are in
394
00:42:49,920 --> 00:42:56,400
this kind of fast moving pace, but the reality is that all it takes is one mention by one random
395
00:42:56,400 --> 00:43:02,080
person that is an Instagram star. And now your business booms, and you may need to be ready for
396
00:43:02,080 --> 00:43:08,240
that. They are wearing a new pair of shoes that you are selling in your website. Now you need to
397
00:43:08,240 --> 00:43:13,040
be ready for that. You need to be ready to have all the data about the inventory immediately
398
00:43:13,040 --> 00:43:20,800
available. Otherwise your brand is affected. Real time is a criticality. Now, how to arrive
399
00:43:20,800 --> 00:43:28,320
to real time? There are various ways that you can navigate from historical systems that were
400
00:43:28,320 --> 00:43:33,760
batch based. So every night they were loading the data from the internal system into the data
401
00:43:33,760 --> 00:43:40,320
warehouse. The first step usually is to reduce the batch time. So instead of running every night,
402
00:43:40,320 --> 00:43:46,480
you run every hour, every five minutes. That is usually the first step. However, once you start
403
00:43:46,480 --> 00:43:52,000
going into that direction, what happens usually is that in order to minimize the latency of the
404
00:43:52,000 --> 00:43:58,880
data, you are adding a lot of extra stress to the source system because your operational source
405
00:43:58,880 --> 00:44:06,960
system that has to deal with your website now has also to publish the data or to be queried by the
406
00:44:06,960 --> 00:44:14,000
extraction routine. So this is usually an intermediate step in order to move to other tools
407
00:44:14,000 --> 00:44:19,760
like Apache Kafka, Apache Kafka Connect, and implement what is called a Change Data Capture
408
00:44:19,760 --> 00:44:26,640
Solution. We change data capture, what basically is happening is that you have your internal app,
409
00:44:26,640 --> 00:44:33,200
your website app that is baked by a database, and all these technologies will listen to any change
410
00:44:33,200 --> 00:44:38,800
happening in the database and transmit that downstream. If you implement such cases, you are
411
00:44:38,800 --> 00:44:43,440
first of all, only milliseconds away from the original data being written in the database.
412
00:44:43,440 --> 00:44:49,360
Second of all, those technologies have been built since years to minimize the load on the source
413
00:44:49,360 --> 00:44:56,080
technology. So you are achieving the best of both worlds. You maintain your website operational at
414
00:44:56,080 --> 00:45:01,920
the maximum power, while at the same time, you are moving the data in real time to the other places
415
00:45:01,920 --> 00:45:06,000
in the company where you need them to be. That's exciting. That's exciting. Basically,
416
00:45:07,840 --> 00:45:14,800
minimize the latency between when the data is created and then your data warehouse is updated
417
00:45:15,760 --> 00:45:23,440
and exposing that through APIs to all the basically micro services that are fetching from the API.
418
00:45:23,440 --> 00:45:31,440
Yeah, I always tell that data in a company is not a static asset, is a journey. You ingest it,
419
00:45:31,440 --> 00:45:37,840
you transform it, and you serve in one or multiple ways. Nowadays, we need just to minimize
420
00:45:37,840 --> 00:45:44,240
the journey because we cannot wait five minutes before knowing if a customer clicked on an item
421
00:45:44,240 --> 00:45:50,800
or not. Francesco, you cannot thank you enough for being on the podcast. You really broke down
422
00:45:50,800 --> 00:45:55,200
stuff that everyone talks about and everyone probably doesn't have a clear idea of what
423
00:45:55,200 --> 00:46:00,160
they're talking about when it comes to data. For me, this was one of those episodes I have to go
424
00:46:00,160 --> 00:46:04,720
back and listen to it because it just helps me understand the right terminology and the right
425
00:46:04,720 --> 00:46:11,840
strategy. As last remark, what's the single technology approach solution that you're so
426
00:46:11,840 --> 00:46:19,520
excited about in 2025 and you think that a lot of companies are going to adopt it and use it?
427
00:46:19,520 --> 00:46:26,720
What would be the outcome on luck as a result of that? I'm a data person. I believe that a
428
00:46:26,720 --> 00:46:34,240
technology that allows you to glue the data together in real time, is solution agnostic,
429
00:46:34,240 --> 00:46:38,720
can be reused in multiple different ways. Going back to what we were saying before,
430
00:46:38,720 --> 00:46:44,720
is a decision at the core which can last years and years and years, is Apache Kafka. What we were
431
00:46:44,720 --> 00:46:50,160
saying about taking the data from the silos, reorganizing the data, providing the data upstream
432
00:46:50,160 --> 00:46:55,520
in real time in an organized, secure way, is the bread and butter of this technology.
433
00:46:55,520 --> 00:47:02,400
More and more companies are leveraging Kafka as the backbone of the data journey. I'm really excited
434
00:47:02,400 --> 00:47:11,040
about the present, the future of Kafka because it just enables companies to do the next step in their
435
00:47:11,040 --> 00:47:16,320
data journey. Where they before were just starting collecting data and exposing data manually,
436
00:47:16,320 --> 00:47:23,840
with Apache Kafka, they could jump 20 years into the new world of providing data in real time to
437
00:47:23,840 --> 00:47:30,080
the consumer, when the consumer need it. And the consumer can be humans or AI agents. It's the same.
438
00:47:30,080 --> 00:47:39,280
Now we have to basically cater to two main users, human users and agents, which is, I saw a demo of
439
00:47:39,280 --> 00:47:46,960
two agents talking in a cryptic language that was scary, yet very exciting. I think that's going to
440
00:47:46,960 --> 00:47:56,000
be, again, what's the fastest way we provide the outcome for the agents representing us. So they
441
00:47:56,000 --> 00:48:02,880
are actually thinking about how can be faster in providing services to us. I think from everyone
442
00:48:02,880 --> 00:48:09,840
operates on basically to follow their own self-interest. I think if you want to see
443
00:48:09,840 --> 00:48:16,880
humans, ourselves and our self-interest, we might well better off having agents representing us doing
444
00:48:16,880 --> 00:48:23,680
things that we would need to spend days in seconds, minutes, maximum. I mean, what is the last time
445
00:48:23,680 --> 00:48:29,520
that you had to deal yourself with spam? Oh my God. It's 10 years. It's a 15 years problem. I don't
446
00:48:29,520 --> 00:48:36,080
recall. Nothing is new. We need just to delegate more and different tasks, so we can focus on things
447
00:48:36,080 --> 00:48:43,520
that we really care about. How can folks find you? I'm sure like it's if after this part that you
448
00:48:43,520 --> 00:48:49,520
probably get a bunch of really good exposure, where people can follow you, use your content.
449
00:48:49,520 --> 00:48:55,600
There are two main places nowadays. The first one is LinkedIn. I have quite a unique name between
450
00:48:55,600 --> 00:49:00,960
my name and my surname. I don't think there are a lot of other people around as of now. So if you
451
00:49:00,960 --> 00:49:06,640
find Francisco Tisioti on LinkedIn, that's the main. The other piece is if you want to understand more
452
00:49:06,640 --> 00:49:13,360
about all this conjunction of real time, open source data pipelines, data journeys, my company
453
00:49:13,360 --> 00:49:20,160
offers a platform that allows you to do that. So go there. There is plenty again of content that I've
454
00:49:20,160 --> 00:49:25,200
wrote in the previous years and use one of those methods to contact me if you want to discuss
455
00:49:25,200 --> 00:49:30,000
forward. I appreciate it. Thank you. Thank you very much for having me. It was a real pleasure to be
456
00:49:30,000 --> 00:49:31,200
the guest in the show.