Data Point of View

AI from Implementation to Inception with Sunil Saini

September 21, 2021 Mobilewalla Season 1 Episode 1
Data Point of View
AI from Implementation to Inception with Sunil Saini
Show Notes Transcript

With access to so much data, how do the world’s leading data scientists improve their processes to work faster and better? Find out some of the behind the scenes tips in this episode of Data Point of View. Guest Sunil Saini has both deep experience in data science and financial services and leads the data science team at BharatPe, an Indian leader in FinTech. 

More and more organizations are using artificial intelligence to positively impact their businesses. When others struggle to deliver, market leaders like BharatPe are creating their competitive advantage with AI. Core to understanding this approach and their success is the blend between short term tactical plans and long-range planning. 



Mobilewalla - Data Point of View - Sunil Saini - Transcript

[00:00:00] Laurie Hood: Thank you for listening today. I'm Laurie Hood, CMO of Mobilewalla, and this is Data... Okay. Let's just start again. Thank you for listening today. I'm Lori Hood, CMO of Mobilewalla, and this is Data Point of View. Joining me is my colleague, Varun Chugh, who's Head of Product at Mobilewalla and our guests, Sunil Saini, Head of Data Science at BharatPe. Today's

[00:00:33] podcast is AI, from initiation to implementation. Artificial intelligence is a hot topic and with corporate leadership asking how can it be used to positively impact the business, while many organizations are struggling to deliver, market leaders are using AI to establish competitive advantage. Sunil has both deep experience in data science and financial services and leads the data science team at BharatPe,

[00:01:02] one of India's leading FinTech companies.

[00:01:11] So, Sunil, you've joined a young, super hybrid company and are leading a team that's critical to the success of the business. So tell me about some of the challenges that you faced when trying to create AI solutions that could be quickly implemented and integrated.

[00:01:29] Sunil Saini: Sure. First of all, thanks Varun and thanks Laurie, for inviting me to the podcast. See at BharatPe we enable small merchants to start accepting issue payments, within five minutes of downloading the app. And a very large set of these merchants are digitizing their businesses very quickly. And as you said, this is resulting in a very high growth within the company and in the industry itself.

[00:01:56] And with this high growth comes a lot of challenges as well because the, the dynamic nature of business, the ever evolving business processes gives you a lot of problems that you can solve, interesting data things problems as well, right? But at the same time, there is a need to manage it properly so that you can deliver the best value to the business using the data or other data driven

[00:02:20] decision-making. So some of the problems that you usually face is, you know, prioritization. So there is always a maze of requirements in a, you know, in a startup that you need to navigate and come up with solutions, and also understand which one is the most important one for you to work on. So that it aligns with the organization goals. Right? Some of the other things that usually organizations face as challenges are the data quality because as part of the startup journey the everything evolves: the processes, the systems, the business, the, the people itself. So that creates a lot of technical debt in terms of the data quality that you have. And that, that your team, your business analysts, your data analyst

[00:03:06] so the data scientists need to work on. So that becomes a very big problem in the initial years for a startup. And apart from that, some of the issues that you face specifically in our business, because a lot of our merchants are very small merchants in India. They have just started using digital mediums, right? For their businesses, or at least first accepting payments.

[00:03:29] And that creates a data sparsity for us because they have very less digital footprints. So very less data to play with, very few attributes that we can use for, you know, various decision-making, et cetera. And along with all these comes the problem of, you know, operationalizing the data solutions itself. 

[00:03:53] Varun Chugh: I think you bring up a good point. I mean, so, so what are some of the operational challenges that you're facing and how are you trying to solve those? 

[00:04:02] Sunil Saini: Sure. See, so some of the operational challenges that 

[00:04:06] in fact startups as well as very established enterprises face, 

[00:04:10] is that there is always a big team of data analysts and data scientists who are working on various 

[00:04:16] problems, right? They are analyzing data, day in, day out, or studying the cuts of the data, understanding the impact on the business.

[00:04:23] But what happens that 80 to 90% of the time, all this work remains as part of, you know, their laptop or it just over some emails. And the outcome is not actually realized in the business. And that's where the work is done, the analysis is done, the outcomes are there. Maybe you'd also run a model or you come up with the outcomes, but these are not integrated into the business processes.

[00:04:48] Right? And that's where the operationalization becomes a challenge that, okay, you have spent so much time, you have spent so much effort in understanding or coming up to a solution for a problem and, but still the solution is not implemented in the best way into the business process. And that's why the operationalization of such problems 

[00:05:08] is a challenge for the industry itself.

[00:05:12] Varun Chugh: Well, and you brought up a valid point earlier about data quality. So you are facing data quality challenges. So, what, what are you doing to solve that? Are you looking at third-party data or are you trying to improve first-party data? What kind of 

[00:05:25] solutions you're looking at from a data 

[00:05:26] quality perspective? 

[00:05:28] Sunil Saini: What we do is, the data quality 

[00:05:30] problem we try to solve it in many 

[00:05:32] ways. One, let's say if there are 

[00:05:34] certain features or certain data points, which are not available directly. So how can you infer those, or how can you impute those with the other information that you that are, that is available with you?

[00:05:45] Right? At the same time, whatever information is not available directly as the system, we tend to partner with third-party systems, third-party applications, third-party vendors who provide the very important, you know, data points about our merchants, about the other industry for us to help the decision making. 

[00:06:04] Varun Chugh: Got it. Yeah, that, that, that makes sense. Okay. Now moving on to the actions that you've taken. So there's a lot of complexity in the process, went from analyzing a business problem to creating an operationalization solution. So what are some of the early challenges you made either either, any changes that you made either process or culture wise and how was, how has your strategy evolved over time to increase productivity and effectiveness? 

[00:06:29] Sunil Saini: Sure. See, what happens is in order to solve such problems at an organization level, there are always, so what I feel that there are two kinds of approaches that one needs to take. One is a short-term tactical approach. You know, where you manage some of the very high priority items, either through quick implementations of solutions.

[00:06:49] And these solutions can be, you know, an insight derived out of some analysis that you do on your Python or Excel. And that gets implemented as a rule in one of the business processes, which is there, right? Or it can be a batch model, which you run at the end of the day and the outcome of that gets ingested into system the early morning, and then the decisioning is done, right?

[00:07:13] So that that's a short and quick way to achieve a very high value 

[00:07:18] for particular business processes in a quick time. But such approaches 

[00:07:23] are good enough for a short time, but if you, if you keep continuing what an organization keep, you know, continues this approach, usually you will end up with a very large workforce

[00:07:33] who will keep analyzing the data and the processes, which gets decision by it, they will always run at a lack. That's what I have realized. So in parallel, what we started doing, so as when we started our data journey itself, and that would be how can we use, how can we increase the data driven decision making within the company.

[00:07:52] So we started with the short-term approach itself, right? A tactical one. And in parallel, a portion of my team started working on our strategic approach, where we started building the data platform for us. And this data platform not only serves as a, you know, single point or source of truth for the insights or reporting, but at the same time,

[00:08:14] so as, as a feature set. A feature data set for me and my team were then this enables us in creating models. This enables us in quickly creating models, creating intelligence and insights. For example, this can be a recommendation or a fraud signal, or the next best action that we need to take for the merchant.

[00:08:34] For example, it can be shown, it can be proved, it can be processed, right? But this data platform helps us in creating these solutions very quickly. And at the same time, what we have ensured that these solutions are directly integrated into the business process, without a manual intervention. And that can help, you know, that can be achieved through a, you know, a standardized interface or APIs where a lot of these inputs of next, next best action recommendation, the fraud signal, et cetera, can be integrated through an API or similar interfaces

[00:09:09] with that within the applications directly. And that's where how you can operationalize the intelligence that you or your team is computing and that can be directly consumed by the decisioning systems. 

[00:09:23] Laurie Hood: Varun, I'm going to jump in. Sunil, quick question. So on the data platform, was that something that you evolved to kind of need as your team grew and the need of the business grew? Like what was the driver behind that? And how would you sort of gauge it? You know, y'all are a very, very sophisticated organization.

[00:09:46] Did you grow into that? Could you talk a little bit about the history?

[00:09:50] Sunil Saini: See, as all organizations start with when they are on the journey of the building the application or building the solution itself, right, as a startup, the all the, you know, ways you tried to solve it all ad hoc, right? And then you try to analyze the data, you, how you model your data, et cetera, again, you know, done on a need basis and are always required within half an hour solve,

[00:10:16] when you want to, you know, check something. But over a period of time, what happens is that this creates these kinds of approaches. If continued for a long time, what happens is that this creates 

[00:10:29] an ambiguity 

[00:10:30] of the KPIs or the metrics itself, right? If there are multiple people who are looking at it and when they are trying to answer the same question, the answer comes out to be different, right?

[00:10:39] That's where we realized that we need a single source of truth for our data, where the, the multiple, the, the data is structured in a way in a warehouse. Right? In a scalable warehouse. And on top of that, we create our KPIs, the business KPIs, and we create our merchant attributes, the transaction attributes, et cetera, or you can say at a generic level and entity-level attributes, and these entities can be various entities, right,

[00:11:06] in unstandardized fashion. And now everyone consumes these KPIs or these pre-computed attributes for the reporting, for their analysis. So, this is why the thought of creating a single source of truth came into the picture, right? And once this was created, what we realized is that, "Okay.

[00:11:24] this becomes, for me, a very good and a quick way to train my models"

[00:11:29] Because at a single place, I have all the features of a particular merchant that I need, right? I know how many transactions someone did in the last six months, three months, two months. When was the last election time? The standard, the RFM metrics, et cetera, are available for me and we are, could we work computing it for various purposes.

[00:11:47] Right? So then it becomes for me a very quick way to train, test and deploy models and these models are at various levels. And, you know, so we had a centralized team and these models gets integrated for various business processes within the organization itself. And then on top of the data platform, what you can do is you can create your data science platform itself, right?

[00:12:13] So data platform becomes the base for your data, and the data science platforms is where is the playground for the data scientist. And at the same time, it becomes a way of one-click deployment of the model into the processes where you utilize it, you train your model, you study the feature importances, et cetera.

[00:12:31] You can version your model and then deploy as an endpoint. And that's where the real power comes. When a model can be trained and deployed as an endpoint, then that endpoint can be consumed within the applications. And that's where the operationalization comes into picture because, before this the models where, we had a lot of UT models, like end of the day models, which you will run, the output is generated in CSVs

[00:12:58] and then those CSVs are loaded somewhere. And, you know, the, it, the number of handshakes increase between the systems so the more time it takes to see the outcome and the more failure points are there in the system. So that's how, you know, by operationalization is important that you can cut down all this and directly,

[00:13:18] you know, deliver value to the process. 

[00:13:21] Laurie Hood: No, that's great. I love, I love your comment about the single point of truth, because we talked to so many of our customers that really struggle with that and I could see why it would be important.

[00:13:33] Varun Chugh: And Laurie, we had Mobilewalla and we'd failed at the same issues, right? 

[00:13:37] Exactly touching on the point. We have a small data science team, tactically, as they're doing a bunch of things on their own, but we could make it more efficient by creating a common platform. You know, everyone reading of the same databases and doing their modeling on top.

[00:13:50] So absolutely optimization and, and to achieve scale, I guess you, you have to achieve operationalization efficiencies to meet business goals. 

[00:13:59] Sunil Saini: Yeah,

[00:13:59] that's seems. They were a lot business processes, like it can be into the sales, right, or marketing, et cetera, which used to take three to four days of effort. Right? Even after generating the output from a model and that effort will be, you know, spent by a lot of people by messaging the data, by uploading it somewhere, et cetera, et cetera.

[00:14:21] But what we realized that once we starting actually integrating into the business process itself, so all that inefficiency is gone, and where we can actually scale up the processes now. And, adding or delivering an additional value that the time to deliver an additional value becomes very less so we can keep adding more and more attributes.

[00:14:43] We can keep adding more and more intelligent insights to this interface that we are creating. And that can be very easily consumed by the systems. 

[00:14:58] Laurie Hood: Sunil, I want to go back to, to an earlier point that you made talking about prioritization. You know, you're continuing to do all these things to increase the efficiencies for your team, your team's growing, but I'm guessing, you know, you become a victim of your own success and the organization continues to say, "Well, deliver this, deliver this, deliver this."

[00:15:20] So how are you, how are you looking at the initiatives that you take, what you're going to deliver and, and how you're determining what to work on, when, and how to scale up?

[00:15:34] Sunil Saini: Sure. See, as a earlier I was saying that, you know, you know, high growth environment, there are always numerous high priority requirements always. Right? Or for everyone has their own priorities and for them it is the thing to deliver, right? So that always becomes the case. But how we handle it and try to see that how can we prioritize all these requirements from our point of view is the best way is to align it to the organization priority,

[00:16:04] and see that, okay, what are the goals of the organization, what do we want to achieve in a short term, in six months, in nine months, and the number of things that we have to solve for. Where is the alignment? Which one is the highest priority according to that, right? And also see the which one can deliver them the most value for the company.

[00:16:24] So let's say, company is in a phase where we want to, give a lot of loans, right, and that's the growth phase for us, then we will spend a lot of our efforts on the credit decisioning part. And we will, we will be prioritizing the other things, but at the same time, if we want to, you know, do a lot of cross-sell, and then we will prioritize the recommendation part of it for some time and then, you know, focus our efforts there. 

[00:16:51] Laurie Hood: But you built this platform that's giving you the ability to be really agile. So that's going to help you to be both proactive and reactive to broader organizational demands. 

[00:17:05] Sunil Saini: Yeah. 

[00:17:07] Laurie Hood: Very cool. So when you, when you think about kind of, what, what are your next steps, what are you what's, what are you looking at doing in the future with your team?

[00:17:19] You know, where do you, where do you see taking your platform or other areas where you're investing, or how are you growing and scaling?

[00:17:26] Sunil Saini: Sure, see, when I see, as the next step for me, for my team is, my goal is to have a very small team. Right? A very small and niche team, but at the same time still scale at the organizational level. And like, and that's where a lot of tools, a lot of, of stra strategic calls comes into picture that my goal is to automate a lot of intelligent decisioning. Right? Whether it becomes, so for example, we have a very big,

[00:17:58] you know, workforce on the street, which is reaching out to merchants and, you know, helping them use our QR, sell products to them, et cetera. So starting from there and throughout the customer life cycle, right? It's the, it's the prospecting, it's the onboarding and then how they are utilizing our products, doing the cross sell, doing the support that how can we embed the data-driven decision-making throughout the customer life cycle,

[00:18:25] right? And that too, in an automated fashion, because that's the only way we can achieve scale and we can achieve, you know, driving value for the organization in real time. Otherwise, as I said earlier, 

[00:18:41] that we will end up with a very big workforce, which will be inefficient in the longterm. 

[00:18:51] Laurie Hood: No, very interesting. Varun, and do you have another question, any questions from your side?

[00:18:57] Varun Chugh: No, I mean, this all makes sense and we can relate to all of this as well, being a growing organization we are facing the same challenges on our end. And the last part about scaling is very important that you can only do so many things with 

[00:19:11] individual people to need platforms to help 

[00:19:13] scale organization. 

[00:19:14] Sunil Saini: Yeah, I think that's very important. 

[00:19:19] Laurie Hood: No, no, no, go ahead.

[00:19:20] Sunil Saini: So at the things, the platforms are very important because with availability of so many cloud platforms, right, and with a lot of features, the, the scaling can happen very easily. And with a lot of MLOps also getting automated, right, you can configure that very 

[00:19:37] easily. And at the same time 

[00:19:39] the training, et cetera, can also happen on the same platform.

[00:19:42] So all in all, you, you, you literally needed three to four people to just to manage the MLOps part for the model itself. And now that is not the case in where it can be handled from a single system, there are cloud solutions which can automatically scale. Or, you 

[00:19:55] don't have to manage them as well. And I think the platforms play a big role in that.

[00:20:00] Varun Chugh: Yep. And Sunil, to that point, I guess, with, with so much of emphasis on AI and model 

[00:20:06] building and so on, there are innumerous platforms coming up. Right? If you go to Amazon, you'll see, like hundreds 

[00:20:12] of products, some open source, some paid. So I guess, can you touch upon, how are you trying to evaluate some of these, you know, platforms and products in your data 

[00:20:22] pipelines?

[00:20:22] What, what kind of decisions, what kind of metrics do you see to, to decide whether you were platform, you use platform A versus B platform? 

[00:20:30] Sunil Saini: See, one of the, one of the important point or most important aspect is integration into the existing data platform, right? After a time when you have big processes, it becomes, you know, you become the, the step of the change. You don't want to change, or don't want to shake up a lot of things. And that's why the integration with the existing technologies, existing tools, existing platform that we are using becomes very important.

[00:20:58] But at the same time, it should enable, it should have the features, you know, which enables a small team to deliver a lot of solutions. Which means, if so, let's say there are platforms or applications which can help you automatically understand what, which one is the best model, right? If that solution is there, then I can save a lot of my time in understanding and identifying the best model, but for a given problem. But, because earlier we used to spend a lot of time in understanding. And that, okay, which one will be the best model, but with, with these solutions coming people can identify the best model

[00:21:38] and anyways you end up spending a lot of time on, you know, doing the feature engineering, doing the data cleanup, et cetera. So the model training product can be taking care of. So that's and

[00:21:49] so the integration into the existing 

[00:21:51] pipeline, the scalability or the automation that it provides 

[00:21:56] to for the data science pipeline or the data science management framework is important.

[00:22:00] So those will be the two criteria. And obviously the third will be the cost that how much it will cost overall. 

[00:22:07] Varun Chugh: Okay. And what are your views on open source platforms or 

[00:22:11] libraries of products? It doesn't hire or pay and also invest time in 

[00:22:14] exploding those?

[00:22:16] Sunil Saini: We do, we do, we do a check, you know, explore a lot of open source tools which, you know, for example, we right now are excluding a graph database and understanding how it can help us in solving some of the, you know, fraud detection related issues, et cetera. And we keep identifying, we'll keep exploring such solutions and see how these can be integrated into our business processes and help us make better decisions. 

[00:22:46] Laurie Hood: Cool. So Sunil, now, how do you quantify the success of 

[00:22:50] your approach? Like what makes you at the end of the day 

[00:22:55] go, "Yeah, we were getting at." What's, what, how does, how do you think about your success and the success of your team?

[00:23:03] Sunil Saini: See, I think it's important to quantify success in business terms. Because then only as a, as a data science team

[00:23:12] you can see the actual value of the work that you are doing. And that in a way, also, as you were asking earlier, helps us in prioritizing, right? Because then once you have defined a success in terms of business terms, for example, let's say a for credit decisioning

[00:23:26] it can be an increase in the approval rate or decrease in the default rate. For recommendation systems it can be the additional revenue that you have generated or for fraud it's it can be the amount of money that you have saved. Because once you start putting the, the value along with it, then you actually realize that, "Okay, this is the final, this is the best way to quantify it."

[00:23:48] If no other way of business term, you know, or 

[00:23:51] business or dollar value can be assigned, I think the another way can be to understand how much man-hours you have saved by automation, et cetera because then that can be translated into dollar value finally. But it's important to, you know, quantify in the business terms finally. 

[00:24:09] Varun Chugh: I love the focus on dollars. 

[00:24:11] Laurie Hood: That's a great point. Great point. So so as we wrap this up, three, three takeaways when you look at what you've been able to accomplish, three takeaways for our listeners, maybe thought provoking or things they could look at doing in their, in their business today.

[00:24:34] Sunil Saini: Having a good understanding of organization goals for the data science team, or if in fact for any team will help in navigating. As I was saying, you know, in navigating the, the day-to-day work itself, because if you are clear on those priorities, then, then you can actually prioritize your work. And 

[00:24:52] most of the time you end up having a lot of work in a startup and you need to prioritize, right?

[00:24:58] You need to see where exactly will be the most value. And I think the second will be, people need to understand there is always a trade off between the, you know, best solution for a problem and a quick solution, but can, can 

[00:25:15] have a very high impact, right? Well... 

[00:25:17] Varun Chugh: Perfect, perfect is the enemy of 

[00:25:18] progress. 

[00:25:19] Sunil Saini: Yes, it is. It is. 

[00:25:20] Varun Chugh: I guess, you're drawing to that point.

[00:25:22] Sunil Saini: It's not always, there is not always a need to build a model to solve the problem.

[00:25:27] If it can be solved with a simple rule, go ahead with it. Implement that, see the impact, and then make that rule more probabilistic, right? Make it more intelligent by creating a model, but you don't have to create a model by default. And that's, I think that's what a lot of you know, new data scientists who are just coming out of the college or have done a course are not understanding that it's important to come up with a solution and it need not be a model-based solution.

[00:25:57] Laurie Hood: That's great perspective.

[00:25:59] Sunil Saini: Yeah. After that, I think the third point 

[00:26:02] can be that 

[00:26:03] whenever a you are trying to solve a problem, using data science, try to think of the end to end operational solution, right? As I'm emphasizing on the operationalization of the problem, because you can always create a solution, which is, you know, a broken one or which is in, in a, in a silo, works in a silo.

[00:26:23] And then you go and try to, you know, integrate into the business process or something. But if you from the get-go, try to create a solution which integrates with the business process in real time. Right? That will be great. Because if you are able to achieve that, then the batch or the other cases are automatically solved, and that can lead to a great skill in a longer term for you. 

[00:26:47] Laurie Hood: Fantastic. Thank you so much. Thank you both for joining us. For those of you listening, I, I I'll let you in on a little secret that Sunil and Varun are actually good friends from college and university. And as a condition of Sunil's participating, Varun had to join us as well. So, Varun, you can consider yourself on the hook for a subsequent podcast. But wrapping up, Sunil, I cannot thank you enough

[00:27:16] for sharing your thoughts. This was, it's so interesting and educational. And for someone in your position, with your experience at a company that's experiencing tremendous growth, you know, we're so fortunate to have you join us. Varun, thank you as well. And to those of you listening, thank you for joining Data Point of View and please continue to follow us for new episodes.

[00:27:45] Sunil Saini: Thank you. Thank you, Laurie. Thank you, Varun. Thanks for... 

[00:27:47] Laurie Hood: Thank you. 

[00:27:47] Sunil Saini: ..inviting me to the podcast.

[00:27:50] Laurie Hood: Thank you, guys.