Catch the insights from our recent LinkedIn Live session held on November 9th: "From Battlefield to Data Fields: Integrating Military Precision with Data Science," featuring Hanh Brown & Nirmal Budhathoki. This engaging discussion showcased how military strategy's structured approach can significantly enhance the realm of Data Science.
In this session, you'll learn how the structured approach of military strategy can be applied to the field of Data Science. We'll delve into the balance between ideal data practices and the realities of their applications in the real world. The discussion will cover the challenges and opportunities found at the crossroads of Security and Data Science, providing insights into the role of Data Scientists in various sectors.
This isn't just a dive into Data Science; it's a comprehensive look at how the meticulousness and discipline from military experiences can enhance data analytics. You'll get a clearer picture of what Data Science really entails, especially in areas involving high security. Our speakers will share their knowledge on the tools, techniques, and ethics shaping the future of this field.
Whether you're just starting in Data Science or are a seasoned expert, this session will offer valuable perspectives on collaboration and its role in successful Data Science practices.
This talk is more than an information session; it's a guide for those aiming to excel in Data Science. Tune in to gain insights that could redefine your approach to data.
Follow Nirmal on LinkedIn: https://www.linkedin.com/in/nirmal-budhathoki/
👉 See our Website: https://podcast.boomerliving.tv/
🎙 Boomer Living Podcast: https://hanhdbrown.com/
👉 LinkedIn: https://bit.ly/2TFIbbd
Hello, welcome to Boomer Living Broadcast. I'm your host, Brown, also the founder of AI50. So as we approach the year's end, we're excited to bring you a lineup of remarkable thought leaders. Concluding the year on a high note. So after a reflective summer and fall where we took some time to regroup and
refocus, our schedule has been packed with a series of engaging speaking events. So the highlight for me was the NIC, the National Investment Center on Housing and Care. Where I had the opportunity to discuss the integration of AI in senior living. So behind the scenes, we've been actively strategizing a unified AI ecosystem for senior care.
This is through AI50. In collaboration with Microsoft startups, democratizing AI and providing enterprise AI capabilities without the heavy price tag to all the sectors involved with the aging population. So imagine a unified AI ecosystem. It's not just about cutting edge technology. It's about creating a network.
That amplifies the ability of businesses to deliver exceptional care to seniors. Well, with AI, we can anticipate health issues before they become emergencies. Personalized care to each individual's needs and streamline operations. So caregivers can focus on what they do best, caring for our seniors. So this isn't just about the present. It's a long term vision.
And by harnessing the power of AI, businesses in the aging sector can evolve, becoming more proactive, predictive, and personal in their services. So stay tuned as we explore these themes and much more with our guests. So today, we have the pleasure of introducing Nirmal Rakahauty, whose life story is as captivating as his work. Nirmal grew up in Naples and has
become a star in data science. Imagine climbing a mountain. Well, that's what Nirmal did. But in his career, starting out during a tough economic time, he didn't give up. Instead, he joined the U. S. Army and that experience shaped Uh, shaped him into who he is today, a top expert at Microsoft, making sense of
complex data to keep us safe online. But there's more, Nirmal also teaches others, helping students and professionals get better at understanding data. So he's a person that a go to for advice or when you need a helping hand in the tech world. So sit tight as we chat with Nirmal, a true guide and a guardian in the journey of data science.
So Nirmal, welcome to the show.
Thank you so much. Uh, that was a great introduction. Thanks for covering everything. Not only my military career, but also Uh, the mentoring work that I have been doing, uh, I usually like to highlight that as well. Uh, but you covered everything. So thanks for the introduction. Absolutely.
I'm happy to be here.
Yeah. Well, thank you. Thank you so much for your time. So yeah, share with us. I know I captured just a little bit, but share with us something personal about yourself. that maybe many people might not know?
Yeah, uh, definitely. So, as I, as you mentioned, I, I grew up in a very small town. Uh, some of the, uh, I mean, uh, personal things maybe, you know, like I may have nausea everywhere. It could be, you know, like how the conditions are when, when you're growing up, uh, in a small country. Nepal is still a developing nation, uh,
and, uh, One thing is there's a lot of talents coming up Uh, they are actually most of their goals is to come up here and do their higher study whether it be undergrad or grad Uh, yeah my Uh, thing is, uh, I never realized that I would probably be a data scientist or work in the data field, you know, uh, but if you look back, uh, a lot of the work that that we do in our daily life somewhere
kind of resonates, uh, with with data science, you know, like a, uh, you know, like, uh, is it going to rain tomorrow? Uh, is it not going to rain? You know, like even the farmers are doing doing data science on their daily life, right? So, for me, I think going in a small town, uh, if I look back, my personal secrets are like I was actually playing
with the data from the childhood, like when I was playing games, tried to do a better hands, you know, like, hey, uh, we, we play, uh, in back home, uh, during the festival time, quite a bit of card games are there, you know, so now I realize that, you know, like, why I was calculating those kind of stats, you know, like in, in my head, like, you know, like chances calculating my odds, you know.
Uh, so I think data science was there since, since, since, uh, childhood, I think, and it's probably true for many people. They don't realize, right, but it surrounded us.
Absolutely. I, I see it as equivalent to electricity, you know, how it's becoming very integral to our daily routines. So, and then of course, when we talk about AI, it's hard not to mention the recent open AI updates. And there are many, so what are your thoughts on their new changes? For example, GPT 4, Turbo, the Assistant
API, and now a simpler interface without a model picker and so forth. What's your take?
Yeah, yeah, that's a good, good segue here. Uh, so the, the, the data science has been seeing tremendous wave. of improvements, you know, I still remember when computer vision was first came in, it took the same kind of, uh, you know, rise in interest from people, uh, right now, I think even a couple of the recent
conferences that I went, uh, I would say every talk was based around that. Like LLM and the, uh, large language models, you know, and, uh, GPT models that like you mentioned, uh, I'm thinking that, uh, my, my take, you know, in, in, in maybe a few words would be because we are heading in a direction where everything is mostly natural language processing. Uh, we, if we look into, uh, many systems
surrounding us, like smart home systems. You know, voice assistance, everything, uh, we building some, some tools in a way that helps the human life, uh, or make it easier, uh, tracking all the transcripts and converting the text into some sort of understandable insights or metrics, you know, so I'm thinking that's where GPT is, is going to sign, uh, it's gonna, uh, bring a lot of value.
Uh, in one of the talk, uh, I hear someone said the next big programming language tool is going to be human language, which is obviously your natural language. So, you know,
it's all about prompting.
It's all about prompting.
It's not Python.
No, yeah. Python. It will create for you.
SQL. It will create for you. Any language is going to be secondary.
Yeah, but the commanding language is English, you know?
Yes. That's, that's amazing. Right?
So I think at the end of the day, like, you know, like no one has to be high tech or the, the expert in the coding to get something done. Right. And that's, that's the power that, that this large language models are bringing, you know.
I agree. Yeah. You got to get good at prompting, be directive and very clear on your, I guess I'll call it commands. So in terms of, in terms of the, the recent open AI's update, you know, I see that customer service using GPT force turbo provide you smarter features now, and you can save costs
with more affordable AI services. And then of course, you know, customers, you can give them. A simpler help system to check GPT's interface. Those are a few things. And then you can also ask complex customer questions, which, um, it's more effectively with the GPT 4 Turbo's expanded memory.
Right? Right. And now we've got this copyright shield. So that's huge, because many business that may be shy in getting on to the AI. I think hopefully this will assure them, right, with the copyright shield and so there, there are many, but that's great. So I was tuning in and still having to listen to it again just
to make sure I heard it right.
Yeah. No, no. I think you, you, you, you said it properly there, uh, because Uh, I think once we get into like right now, it's still in the phase where it's still training up, right? Like, because a lot of startups is founding a lot of open source market is getting heat, you know, so, uh, as
you mentioned in the beginning, it's like electricity or sometimes I even compared with the telecom industries when we had only one cell phone provider, uh, the competition is low. And then, uh, you know, there's a huge value because people are connected through, through the cell phones or any other communication, right? But.
Once there's going to be many players in the market, you know, uh, the competition will be a little higher and for the customer side, the competition is better because we're going to go, like you mentioned, the GPT is coming up with a better rate because they now have to, uh, understand that there's open source market players as well. So, uh, definitely at the end of the
day, uh, there's more, uh, companies doing it, more players in the market. Uh, for customers, I think it's better. Right. So, and we, we always have to embrace the open source market. Right. So...
So, with your background as distinct as yours, particularly from the military. So I'm curious about its influence on you. So how did serving in the U S army shape your approach to data science?
Yeah, that's a good question. Uh, so in military, I think as is, I look back my career now, you know, like at that point I may, I made a choice to go to military because that was one of the options that I have in my plate. Uh, but now I'm looking back. That was probably one of the best decisions I made in my career, uh, that not only saved my career, like
how you can come professionally. Right. So. A lot of things that we do with, with attention to detail, uh, you know, the, uh, leadership skills, uh, the teamwork, you know, and this may look like the soft skills, but in the tech world. Most of these soft skills, you know, uh, resonate very well
on your daily work, right? Because you have to take a leadership or ownership of a project, for example. And then how do I fit in and then plug in with the other leads, other project managers, product leaders, to understand, you know, like in military, we are always taught in a way that, hey, you follow the instructions, but at the same time, Try to have your own common sense to
figure out what is right versus wrong. So a lot of things, these things goes a long way. Uh, but if I have to tie it back to the data science, I think, uh, we will usually, uh, come back to accuracy, precisions, things like that, the things we do in military, you know, uh, the precision matters or, or we have to do. to achieve higher accuracy for
anything that we do, right? Uh, especially, uh, on any of the missions we are, uh, that's a few factors being tracked. The other things that is being tracked well is the deadlines, right? So they, like, we, there's no, uh, turnaround, you know, on some of the projects or the missions we do, right? So, and that kind of saves me, uh, even
when I talk to the other prior service military members in, uh, in, in Microsoft. They also kind of bring up that kind of, uh, things that, hey, not only in data science, but any other way. Right. So they, they want to do, uh, track all the attributes that matter to make a project success from the start to the finish.
Right. So I think project management life cycle is one of them. And then the, uh, doing the things with a high level of accuracy, the most we can offer, uh, is also important in data science. Right. So. Uh, and, and that is what I think two
major things that has shaped my career from military to data science, including the other soft skills I mentioned.
Mm hmm. Very true. Discipline, structure, accuracy, coordination, and I think mostly respect and order. Right. That defines life, but it sets you in such a great launch in life once you have all that instill from the military. So that's awesome.
All right. So moving on from such a structured environment into a tech industry must have come with some hurdles. So transitioning between fields, you know, is often a complex process. So can you talk about the challenges that you might face while moving from the military to data science role? And what did you do to overcome them?
Yeah, definitely. The first challenge, I think, when I was coming out, I was thinking, am I ready for the outside market, you know? Did I do enough investigation that, shall I continue my military career, or am I ready for the market job, you know? So that was already the first challenge I started with, uh, I think when you are in the institute, you are kind of
like used to the culture, you're kind of used to, uh, your daily routine, right? Uh, and you, as much as we do understand how the market is around, uh, outside market is, Especially from the tech world, we don't get much impression or depending on your own interest level, right? So I was interested to transition my career to the tech. That was my goal in my mind from the
beginning So I used to always kind of like look around for what's happening But I think the first challenge I had was Even I had a, um, you know, higher degree on the, the computer science field. Uh, the more time you are kind of staying away from that field, right? Uh, because in military, we have to do a lot of other things depending on your necessary, you
know, uh, criteria or whatever the military has required you to do. Uh, I was very close. I was at least lucky to work on some of the projects that deal with data, but it's not nothing close to what you expect in the market, like how the data scientists or the other folks in the data world do. Right.
So for me, the skill bridging was. was something challenging. One thing is just identify where I'm at currently and what are the things that I need to fill. And as you know, this, this data science is super moving fast field, right? So it's always a moving target. So even the continuous learning plan I have, uh, you know, because it
may, there's always a gap, right? So that was my first and main challenge. Uh, the way I did, I did overcome was, uh, I was more proactive to look around what is what kind of tools or requirements that each job category needs, you know, and I already have a mindset that I will start my career. I did analyst. I do.
I'm not going to go to data science directly. Uh, that is going to be a little bit steeper hill to to climb. I would rather take it slow one step at a time. So I prepared. Everything that I could do for data analyst role and then that's what the role I got picked up and I
transitioned into and I think that was the best decision I made because from the data analyst, then I was in the industry working in the data field. At least this gap is now getting closer and my goal is just to close the next gap to data science, right? So.
Great. Great. So at the heart of security is the ability to detect when things don't quite fit the pattern. So what are the key challenges in building anomaly detection models at Microsoft security? And how do you tackle them?
Yeah, that's, that's a good question. Uh, I think. Anomaly detection has always been, uh, one of the most popular problem or even challenging problem, I would say. Uh, you know, when you work in security, uh, it becomes a little bit more challenging, uh, for a few reasons. One is the data is vast, you know,
like we mostly in every, every industry today we deal with big data, but in security, we, the big data becomes much bigger, in a way. Uh, so a lot of devices, uh, sending the log systems, uh, you know, uh, creating data in every fraction of seconds. Now what is, uh, genuine behavior, uh, versus anomaly behavior, right? So sometimes, uh, it's,
it's not easy to figure out. Uh, however, uh, because of this, uh, new advancement, right, we, we are able to do, uh, some of the good work. Uh, I think one way I usually do when I first Uh, was working on anomaly detection. One of the parts that I was working on is like, Hey, uh, as much as, uh, the, the, the statistical approach matters,
like I could probably do just, uh, use the three sigma rule or any other like standard deviation rule to clear out some of the first low hanging fruits. Uh, then for us, the precision matters. So when we go. Figure out like, hey, some anomalies are they, uh, two things I have found very useful is the feedback, you know, mechanism.
Uh, your model is as only good as you continuously retrain it or bring the feedback back. So there's no such model that can start very well in the beginning. So as much feedback we can get from the, the human, right? Like once we, once we get a better data, I think the model will start perform good. That's one thing we did.
The other thing in anomaly is. Behavior matters more than anything because identity could be compromised. I always give this example that, uh, if I lose my house key, you know, someone else can use that key to come in. Uh, the lock is doing its own thing, right? Uh, the lock wasn't broken. Uh, they can get in.
But the, once they get in, their behavior, the way they do, you know, uh, move around in the house or like if they are looking up for some, some high value assets, I mean, they will probably deviate from my, my normal behavior I have been doing on a daily basis. So capturing the baseline matters when it comes to anomalies. So, so the baseline is the historical
behavior in this example, what I usually do in my house, right? So, Uh, so I think those are the two things that has helped us, like, hey, one is like feedbacks is important. You have to keep retraining and fine tuning your model. Uh, it's not going to start off with, uh, well, the other thing is, uh, create a good strong baseline based
on the behavior and pattern, right? So, uh, we call it the user entity behavior. Uh, it could be applied for the device entity behavior. It could be applied for many things, right? So. Uh, yeah, those are the things.
Okay. Great. Great. Now, there's a lot of noise about what data scientists do. And I think I, I could be wrong, but I think of one of your LinkedIn posts, you mentioned that some popular beliefs about data science role are false. So explain to us, what would you
say is the most misunderstood aspect of a data scientist's job?
Yeah, this is a, this is a good question. Uh, I, I get this a lot, uh, that hey, data scientists, I get a data science role and my building model. That, that's my daily job. Uh, you know, I, I wish that's the case. That's just a part of the, part of the job, but that's not everything. Uh, people think that, hey, data cleaning
or data engineering is not my role. So that is the biggest myth, uh, or the lie I have heard. Uh, and I, I'm helping the people to understand this definitions, uh, in such a way that I'm not trying to clarify what data science definition should be, but I'm helping in their career that, Hey, if you have a misconception about data science. Landing into the job and then, you
know, not meeting your expectation is, it wasn't because of the data scientist title, it was because of you started with the false expectations, right? So that's some, some things that I wanted to clear out. So there's going to be a lot of data cleaning job. You, you're going to be, uh, if you are in a startup environment, they
may not have enough budget to hire specific titles for data engineering. Uh, you know, so you may be wearing multiple hats to do from your own data. Uh, this is where I tell data scientists that, hey, SQL is not only for data analysts, you know, that is another biggest misconception in the market that, hey, uh, as long as I know Python, as long as I know,
uh, you know, the machine learning frameworks, I should be good to go. Why should I learn SQL, right? So, but, uh, I think in my own experience, uh, First we need to prove any project to be worth of putting in production. Is it generating any value? So everything kind of starts with the proof of concept then MVP During the proof of concept stage you are on your
own, you know, so there's the company. It's not gonna assign Additional resources or addition engineers to help you out to bring the data and do everything So at that stage, I think all the skills will, will help you out, you know, because you need to prove the value of the project before it can move into the production stage and they can assign resources afterwards.
Right? So, but initially it's it's, it's, it's on your own. Uh, the other one is like, uh, some people are thinking that PSDs, uh, is, is the mandatory or the advanced degree on the data science is mandatory. Yeah. Some of, some of the folks I talk to is like, Hey, is it even a right creative for
me and why it is so competitive, you know? So I think now I tell them that, Hey. Those are not mandatory. You know, I have seen, I've talked to folks like coming up from various backgrounds, right? One of the beauty of data science is it welcomes many different backgrounds. Every industry kind of needs data science people.
And those are some of the misconceptions that are out there. Uh, and then, uh, uh, but yeah, it's not because of how it is, uh, promised by the title, but it's how interpreted by the people. And that becomes, yeah, you know. So, yeah, pretty big that, yeah.
Yeah, I, I agree. I think roles are evolving so fast, right? What I thought of a role, I see a year ago is it's been changing and it will continue to change. So I remember when my kids were younger, but they're 26, 23 and 21. Now just, just think of it as a lifelong learner, regardless of what profession that you're in.
And then be a problem solver. Think of, think of developing yourself and solve problems. I mean, I know it's a very high level, you know, full of philosophy, but it's true in all demographics, in all fields, solve problems, keep learning and develop yourself in whatever means that is data science, art, and so forth. So in developing models.
Aligning with business objectives is crucial and ensuring that models serve their intended business purpose requires a clear understanding. So how do you ensure that the data models you develop at Microsoft are grounded in business understanding?
Yeah, this is a, this is another great question. Uh, uh, I love to touch upon this for various regions, uh, because, uh, when we learn data science, uh, when we practice data science, when we build our skills, uh, you know, using the publicly available data set Kaggle competitions or anything, when we are improving our building our skills.
One of the things that we lack, even I look back on my own learning journey, I was not much aware at that point that business metrics, how does it fit in? Uh, we evaluate our model with model performance metrics, uh, you know, obviously getting the high level of accuracy, precision recall, if it is a classification or any other. evaluation metrics that is
designed for a specific algorithm. We, we are only focused on that. We are completely blindsided about the business metrics, right? So, but when you get to the real world, uh, especially, uh, in Microsoft or any other roles I did, uh, the first thing that we do whenever we are defining the problem or the, the initial stage of problem definitions.
I prefer to capture the business metrics, the success criteria or exit criteria, whatever we call it. Uh, the key performing metrics is not only based on your model performance, but it has to be based on some sort of impact you're going to create. Are you going to ultimately save some time for the analyst? If I'm building a model that, you
know, classifies, uh, the, some sort of email classifications, for example. Uh, emails are the cases that has to belong to a specific team, for example, uh, then how much time I'm saving by this auto triage, uh, machine learning algorithm, right? So we can calculate that everything could be, uh, depending on, uh, different scenarios.
Like some cases may take longer to process. Some cases may be, uh, you know, takes our time to process. Right now, I think when we build that model, there's, there's going to be model metrics for sure. We, and we will set up some thresholds, some acceptable range for that, but then the business metrics matters a lot.
The, the reason is you may be able to build a high accurate, uh, model, but if it is not serving the business purpose, there's no point, right? So Uh, so a lot of people don't do it, uh, during the problem formulation stage, uh, you know, and then sometimes it's already late, like you don't want to go in the route of building the model, evaluating or retraining and reiterating that if it's
not creating much business value, right? So, yeah, in my experience, business objectives is what, you know, I'm gonna. Mostly matter must for success of your, uh, ML models or any project. And, uh, you have to be very proactive to make sure that you bring, uh, and, and another mistake that I've seen. Uh, people make is, uh, once you capture the requirements, like every
project, it starts with project requirements gathering, right? Once you have that done, uh, you, you, you started building a model and then, uh, you wait until I know, like, regardless of how much, uh, we call it, we are agile. We are using the CI CDs stuff. We are using the. Latest MLOps pipeline and stuff, but I have seen folks, uh, you know,
just kind of getting caught up and building the model and improving the accuracy, uh, onto the last stage. They are not bringing the stakeholders in the table. Uh, yeah, I would rather bring them to the table, uh, have some, uh, frequent updates with them, uh, this is what is performing.
Right. Forward feedback, be integral.
Yeah, that's key.
Yes. Yeah. We, we, we, that, that's not, not the definitions of ASL. If, if I'm, I'm following the ASL technology to, to build a model, you know, like continuous, uh, retraining of it or bringing more data, data augmentation and all, but the feedback should be the, the outer layer.
Uh, that actually, you know, like covers everything. Uh, and I think I will never, uh, feel shy to bring, or I will never feel, uh, you know, like, hey, this may hinder my, my progress in the project. Rather, I'll take it in a way that any feedbacks that I get, you know, so, and you have to be, you have to find the right balance too.
And it's, that is where a lot of things come from data scientists and product managers working hand in hand. That. This, there could be, uh, if you are only, you know, surrounded by a lot of these feedbacks, that could have been sidelined for now. Uh, then it's already make hindering your progress of the project too, right?
So you just have to find that perfect right balance, which is always hard. It's not easy to, to bring them to the table, but at the same time, how do you find the perfect balance of what feedbacks is, is helping you, uh, now versus later, right? So, uh.
No, that's key because I always think of it in terms we're not, well, I don't know we, but. we're not in the business of building a model, right? The whole purpose is to solve problems. And anything you do has to contribute towards solving that problem. And whoever, all the key stakeholders for that problem or solution needs to be
integral in that upfront conversation. And I'll, I'll share with you a recent example, you know, for instance, like yesterday, we had some automations running and we had problems that I thought was pretty good. But my gosh, it was producing garbage. We couldn't figure it out. I mean, you know, it's possible, but I didn't know until I checked
Twitter, the API was down. Who knew? Right. I mean, it was down for a couple hours. I'm sure you probably know that, but, and as a result, we were like kind of pick and poke like, what's, what's wrong with your automation? You know, we're kind of pointing fingers. No, what's wrong with your prompt?
But we never thought that. You know, right after what the, the, the dev day GPT 4, the API was actually down for many hours. So anyway, so, so I guess what I'm saying is that. It's a robust, synchronized system and it doesn't matter if you got a great prompt or you got a great automation or whatever, all the contingencies that you
have, they all have to work in harmony and it all has to produce value at the end.
Right. Yeah. So. The model is just one piece. Yes. Right. I agree. Right. But some people say, well, my bottle is good.
You know, like maybe it's your problem, you know, but we all have to be in harmony. Yeah. Exactly. This is where the initial curtain I was saying that you have to look at the big picture and business objectives, you know, how, how it is lining up with your other objectives, right?
The model performing good and the best is, is not going to be, uh, not going to be only solution that we are after, right? Because, uh, because ultimately we have to think about all the factors. Yes.
Mm hmm. Very true. So human element in tech is something that's gaining more attention, especially emotional intelligence. So emotional intelligence is often highlighted as a key skill, you know, in the workplace. So, but what role does emotional intelligence really play in a
successful data science career? And how do you go about cultivating that?
Yeah. So this is another key area that a lot of us, you know, especially in the tech world, uh, there's a stereotype of like, you have to be, uh, mostly on highly skilled on the tech skills, you know. Uh, I, I, I usually, in fact, I usually don't like to use the word soft for the soft skills either. Right?
So, because everything is important, right? So how, how do we define it? But I think the industry definitions are there. Uh, however, the emotional intelligence, uh, has been In fact, played a key role, uh, in, in my own career because, uh, you know, like you could be going through, like, especially once you
are on the career stage where you are going through a lot of things, right? Uh, yeah, sorry, did I?
I was. Oh, okay. You hit me. Okay. Yeah. So. You're emotionally stressed, right? So that's one thing like I usually like people in in the career, the eight hours of work doesn't
define their life cycle, right? They they are going through other things, right? Like, obviously, your your work life. You know, matters, right? So how much time you have for, for your kids, uh, things like that. And also the other factors like, Hey, uh, how do I balance the stress? You know, uh, how do I handle this?
This, uh, one thing that, that military has taught me is also some sort of patience. You know, sometimes, uh, we are wrapped around by move fast, feel fast, uh, kind of, you know, like a, like a style of doing the work, but at the same time, you know, like, how does Uh, are you 100%? Right? So that's the first question
I usually ask, right? Am I 100 percent uh, giving up for my everything for this project? If not, then what are the factors that is hindering me, right? Uh, take care of this, those factors as well. You know, uh, one thing I suggest the young folks, especially, uh, if I look back, my own young version is I used
to just sideline, uh, this emotional intelligence and I focus more on building my tech skills, you know, and, and delivering the project on time or Trying to, you know, like get a raise, uh, or trying to, you know, uh, get that promotion done in time, a lot of stuff, but, uh, I might be, I might be kind of like, uh, missing to develop some, some other skills that, uh, that
has to be naturally developed with time. You cannot just come in that, Hey, I wanted to, I wanted to be very patient today. And not tomorrow, you know, like this. So those kind of skills, uh, where you like stress management, having patience, you know, uh, you know, understanding your team culture, you know, uh, are you doing enough to fit in the culture?
You know, sometimes we, we as a human being sometimes tend to play that, uh, Hey, I am not good fit for the, for the team, but have you done from your side? Like what is the, what is the team culture look like? How do you fit in there? Right? Culturally. So.
Uh, you don't want to be the odd, odd out yourself with your own reasons, right? So, uh, there's a lot of factors actually come in when it comes to that, uh, emotional, emotional intelligence. And, and this is one area where AI probably won't help.
Yeah. Well, you know, for me, I felt like I had to upscale my emotional intelligence when learning how to prompt because I used to think things were linear. I mean, if it's a formula, well, it isn't. So
And to become ,uh, better at prompting, you really need to think 360 around how to arrive to your, uh, solution. So I had to learn a lot in developing my emotional intelligence and to become a better prompter. So, yeah, it's again, you know, people talk about how AI could replace your job and so forth, I guess to each his own.
I'm sure it's very possible, but I think if you use AI to develop yourself in one instance. your emotional intelligence can increase your value in all regards of life. Not only in the marketplace, right?
Yeah. One thing, one thing I would like to quickly throw in there was like the one factor for, for emotional intelligence that I always find it fascinating and it's very important not only for, to build your own, uh, On, you know, emotional intelligence skills, but that is also being, uh, kind of integrated with every product
that we build, which is empathy, right? Uh, and Microsoft also plays a huge. I mean, they also contribute a huge role towards that, uh, empathy being one of the emotional intelligence factor, but. That not only helps like you always have to take a step back and think about from from everyone else in the team's perspective, right? So like you have to have
that empathy, right? So and it actually resonates when you're building the product Be AI related product or no AI related product or anything. Whatever you are building the ultimate thing is how are you gonna solve some problem or if it is a customer facing tool or if it is Some something that a disabled community is gonna use, right? Uh, you are probably building a lot
of innovative tools, but if you do not factor in the empathy in there, uh, I think it's not going to work out well, uh, you know, so that is, that is going to be in your product as well as in your own, uh, skill set, right? As a emotional intelligence skill. So.
That is, yeah.
Wholeheartedly, especially if you are creating products for folks, you know, older adults, let's say 50 plus, there's considerations that you need to take into account. So, very much so.
Yeah, your whole requirements change based on the, your, your consumers, you know, like, you know, how, how well you are building something. The product is doing something doesn't mean that you build it right, you know. The product is doing something, but is it, is it doing the way that people want it? Is it doing the way that they like it? Is it doing the way that
they can embrace it? All these factors matters, right? So, so this is where empathy plays a big role.
Mm hmm. Mm hmm. I agree. So the integration of data can be as complex as it's transformative and data comes in all shapes and sizes and sometimes from multiple sources. So can you share, let's say an example of a project or a situation where you're integrating multiple
data sources, that proved to be both challenging and rewarding?
Yes. Yes. I can actually talk about my first data science project when I joined Wells Fargo, uh, uh, you know, uh, as official title as a data scientist. Right. So, uh, And we were in a project, uh, where, like I mentioned before, this is where your data
engineering skills comes handy. Uh, because what you have to do is, uh, when you are building the initial proof of concept, you know, uh, like you are trying to showcase that this is an ML solvable problem, right? Because we discuss problem in the room, we come out of the room with the stakeholders, then we're like, okay, what are we, how are
we going to solve this problem? So, and then what other data set do we have? So, the first thing we did was, uh, uh, the brainstorming session where if you have everything in an ideal world, if you have all the data set, forget about what data we have or not. You know, if we think about that, we are already biasing ourselves that, hey, We,
we may not be thinking all the hypothesis. So the first thing is like start a clean slate. Like if you have everything on your hand right now, if you get all the data, what is the hypothesis you think will help us to make this prediction better? The work we are, the project we are doing was, we are trying to predict whether the loan will default or not, right?
So we can think about like the transaction data, you know, like the, the credit bureau report for consumer. Various data set, we start thinking about it. Then the next step comes in, how do you get this data? Uh, in, in one of the challenges I had was in, especially in, in the big, uh, company like Wells Fargo,
you know, uh, and, and in, even in many other companies, my experience is like there is a data ownership, right? Everyone maintains their own data that they are producing, right? So, uh, and then sometimes it is not well integrated. It's always siloed. Uh. Uh, I think that's the reason why the data
lake companies are, are doing good in the startup market like Snowflake, Databricks, they, they realized that was the problem. So they come up with a solution that, hey, we can bring you the analytics workspace. We can, we can help you build that integrated platform. Uh, but when you don't have that, right, or in, in all cases, it may not be there that well integrated, uh, data is siloed.
Uh, the challenges are going to be like the. Connecting with those individual people to set up the data contract, you know, uh, and then. Some of the data could be PII related. How do you, how do you make sure that you, you are anonymizing or, you know, redacting or how to using it? Uh, you don't want to, uh, lose,
uh, customer information, right? You don't want to get cost customer information exposed because, uh, whenever we're building the POC or any, even if we are building the model, there's a lot of this. Uh, data that is not relevant for the model, we don't really have to bring it to our system and be exposable, right? So, I think, I think we, we,
there's various tools I have, like, you know, we used, uh, and that's when, like, we probably bring it to one, uh, integrative spot. Uh, I think we used PySpark to, to run the batch label job. Uh, Like every night or every whatever time frame we set up to run these jobs to orchestrate it. We touch upon all the data
sources that we need to. Uh, after creating the contracts with them, uh, signing up the proper paperworks and setting up the right access, uh, we make sure that the data we bring is in a secure location. It is only role based access for only data scientists who are working on this project has access. And then now that data, we
bring everything, but that data is, uh, needs to be cleaned. Uh, What is the keys that I can use to join these multiple tables? Uh, Everyone has their own definition of attributes. The way they define certain keys, uh, or certain fields is different. Uh, So many challenges were around the data format and data types.
Even after bringing them, uh, so, the trans, data transformation and data cleaning is actually I would say the most heavy lifting thing that data scientists will do, you know, and, and if that is not done properly, your model, whatever you are building in the successive stages, uh, is going to fail because ultimately it's going to be garbage in, garbage out, right.
So that's true for everything.
Yes. That's very true. That's very true. So now, keeping up with tools of the trade is part of any job and the toolkit for data scientists is always expanding. And so how important is it for data scientists to stay updated with new libraries of tools and how do you do it?
Yeah, this is very good question. Uh, I get this a lot from my mentees, uh, when I do the mentorship calls, they're like, Hey, this is a little overwhelming, you know, uh, how, how I keep up with this, uh, uh, train, you know, and, uh, I always tell that, you know, uh, there's, there's so much thing we can learn, you know, like you don't want to be over flood yourself, you know, uh, I actually
read it somewhere in one of the book that multitasking is a way to, uh, do everything wrong at the same time, uh, or you do everything at the same time, right? So I think we can like, there's some cases where multitasking helps, but some cases where if you define, uh, there's so many areas that you're going to focus on. In a way, you are limiting yourself to one specific area as well.
Uh, so, I suggest people, especially the way that I do is, uh, and I suggest the same thing that has been working for me, is like, hey, keep a lot of things as informational, uh, than distraction. Right? So, I may not be able to move, and I may not have to, because it depends on my area of specialization I wanted to focus on. Uh, so, depending on what is your area of
specialization or your to focus on, uh, what you think will be valuable, uh, come up with your own subset of that, right? Uh, don't go after the superset, right? So, and being up to date and informational is always good, because you are working in the data industry that Uh, tomorrow GPT 4 has some, uh, different new feature that has been released that none of the other model covers.
It's good to know that kind of feature, how it works in surface ly, uh, but you don't really have to commit yourself to learn that right away. Right. And are you ready to learn that right away is like, are you building upon your skills, uh, in a linear way, right? Like, because a lot of these things is, uh, it is foundational, right?
You have to have enough foundational, right? So I'm just learning classical ML today. And then for that, for that specific group of people, I'm not going to suggest to go jump into, uh, learn the lang chain and set up the agent and do the new GPT stuff, you know. I will probably suggest them to do. Hey, once you finish classical ML,
go take some classes of natural language processing because the basics are say, like the base foundation is going to stay the same. It's like how this words are tokenized, you know, how embeddings work, what is embeddings, you need to learn all those basics and then from there you can start learning how transformer model works, what is transformer model.
And then you go to GPT world. Don't go from classical ML to GPT world directly. That is the biggest mistake one can make, right? So, that is why you set up your learning journey or road map. And everyone's road map should be different. I mean, you know, that's why
it has to be personalized. Uh, I usually tell people to do self skill assessment first, where I'm like, like, you know, what are the things that I wanted to learn now? And then only you do the gap analysis, right? Mm hmm. You have to measure that first.
Very true. So. I want to touch upon the foundational aspect of data quality.
Clean data is the backbone of accurate analysis. So you emphasize the importance of data cleaning in the machine learning journey. So what's your go to technique for ensuring clean and reliable data?
Yeah. So this is, this is very important question and some of these things we have to define. Setting up some, some criteria that you've got to measure. Uh, you know, like, uh, especially there is some standard ways to, to look into, like, is, is my data complete, you know, how many missing values are there,
you know, uh, and if there are missing values, you need to not blindly say that, hey, in industry, we learn, hey. Missing values if it is only a few few hundred rows out of thousand data points, just drop it, right? So a lot of people blindly drop it. So that's not a standard. I mean that could be that could work in the some, you know, like a experimental
data set that we use to build something but You need to understand the, uh, implication of dropping those data points. What, what are you dropping for? Why are you dropping for? Sometimes it could be an easy fix. Uh, it could be just a diagnostic issue that for a few days the, the sensor was out. You've got to get back to the
data engineer saying, Hey, I'm having missing values. seems like only for a few days. Uh, so it's not missing at random. It's just missing for that particular time frame. Uh, they will check for you. And then it could be, well, the, just the, uh, diagnostic issue, right? Uh, in some cases it could be that,
uh, that particular field was not populated, that, that thing was not ready in the pipeline, you know, so everything is missing. So it doesn't mean that you're going to drop everything, right? So, Uh, so if, is your data complete? That's one thing. And is your data consistent? Right?
So are you like always good if you're bringing data from multiple sources? The example that I gave before to add another timestamp of when the data was added in the system, right? So when it was modified or something. So have multiple timestamps that gives you, uh, enough confidence that the, the data is consistent right? With, with the time.
Uh, it's, it's not like. There's no gaps, right? Uh, the other things will be the validation that the, you know, like how do you validate you, you random sample some data and then try to see, you know, validate some of your come up with some of your hypothesis to test in the data and then validate that. Uh, I mean, in today's, uh, today's
market, I think there's data observability is becoming very popular. And that data observability is actually dealing with the same thing, right? The same question that you asked, you know, that actually is a single word answer. But data observability means a lot of things, right? And it actually handles everything,
like I mentioned, like, you know, is your data complete, is it consistent, is the quality good, uh, how are you validating, you know, how often you are validating, uh, and I would, I would think you should set up this guardrails even in your ML pipeline, right? If, if you have a dependency of getting the data from multiple sources and you are batch processing it, like I said
before, using the Spark or any other tool, whatever the tool, regardless of tool, After you batch process it, you have to have some sort of, uh, guardrails or checkpoints to make sure that, you know, data is still good and to meet all these requirements before you pass it to model for retraining or whatever, right? Uh, that's another thing. Every time we retrain our model, we
have to make sure that we are furnishing or fabricating the right data, right? Uh, so that is very important.
Mm hmm. Mm hmm. We're getting close to the end. So I have one more question.
So with great power comes great responsibility, especially in AI. And as you know, AI is pushing the boundaries of what's possible, raising questions about data privacy. So with advancements in AI, like chat GPT's multimodal capabilities, what's your take on the balance between innovation and data privacy?
Yeah, this is, I think this is a very important question that everyone should touch upon, or if someone is working on the AI tool in today's GPT related model, the advanced LLM models, they always have to figure out like, what is the responsible AI piece to it, uh, you know, how this AI is interacting, uh, at the end, like how we, how we are, ultimately
it's like how human are using it. And is there enough, uh, security checks regarding data privacy, uh, regarding the, the trumps, uh, like all these prompts, uh, being. Uh, polluted or not, you know, uh, if we only talk about the, the ML side, there are various things that we can, we can work on like data poisoning, model poisoning, are those things
considered, you know, uh, but all, but at the high level, uh, I think, uh, I suggest people to set up, uh, responsible AI assessment, impact assessment. In, uh, Microsoft, we have different team or different professionals with expertise assigned for that. Any project that we do that has to go in production has to go through that, uh, you know, the, uh, loop with the
responsible AI champs to, to review it. Uh, so I think even at the startup level or any, any, any, any companies has to somehow, uh, come up with their own process, uh, to make sure that, you know, and now I think it is, it is coming up already with the executive order as well. Uh, which was previously just the voluntary commitment from the companies, but now the administration has rolled out
the agility order saying that, hey, uh, if you are dealing with the AI system. Using this, uh, advanced models, uh, you have to make sure you are setting all these test results, uh, you know, to, to the, to validate, right, from, from the authority, uh, personals. And then I think that is important because, uh, even without mandatory, a mandate option, I would probably
as, I'd probably do that, right? Uh, is, is the model, uh, leaking any of your sensitive information, uh, you know, uh, All your data like for example, we are using the private data to fine tune our model the advanced models like gpd models Where is that data residing right? Can we lock down that data to turn off the logging that the model will not take it back?
to Its own bigger purpose to train it, you know because you may not want to release all your data to the OpenAI to available for training. It could be some data that you may, you may not care. But if the data you care, then there has to be a way that when you interact with this API, how do you make sure that the, this logging is
not on, you know, memory caching is, is, is done right in a proper way. Uh, everything is, has to be confined to a system someone has established. So things like that. There's a lot of things. I think in summary, like I would say, Responsible AI is just, you know, following that guidelines. Uh, uh, and companies like Microsoft,
other companies is also doing it. Uh, we have also published a lot of, uh, handful or informative documents on how, how do you do, uh, the Responsible AI? How do you follow those guidelines? And, uh, at the minimum, you can follow those guidelines and make sure that every project gone through that process, you know?
Yeah. No, I agree. I mean, I put it simply how I look at it is be mindful of, let's say your work and its output. If your output or your solution, if you don't want that to impose on your children, then don't do it. Right. I mean, it's as simple as that.
It's being responsible with empathy. And it's humane. So, always think in terms of the work that you do, the output from AI that you produce, would you impose that onto your children? And there's your answer. Yes. You know, what you want to do with it. Well, thank you so much.
Do you have anything else you would like to share before I close up?
No, I think it's good. I love the session, uh, you know, to be frank, I didn't prepare much questions or anything and it naturally went very well. I like the way the questions were asked. It was very nice. And I'm happy that I was able to come here and share my experience, uh, not only related to the data science, but various aspects touching the
emotional intelligence, things like that always, always very important and crucial, right? For, for a professional career or, uh, in any field we are in, right? Not only data science. So thanks for having me. Yeah.
Yeah. Absolutely. Well, thank you. It's been wonderful learning from your expertise and your journey from Naples to the world of data science. You've shown us that no matter where you start, you can reach great heights. And your passion for teaching and helping others is really something special.
So thank you so much for. Just giving us a lot to think about and sharing your wisdom and what you do. So this conversation is what makes this podcast, what it is, sharing stories and, um, helping across the board. But a lot of the listeners here are from the aging sector. So it's, uh, it's been wonderful. So for you folks tune in for next
couple of weeks, where we will have more LinkedIn live events. The next couple of ones will be "AI Empathy - the Synthesis of Technology and Compassion", and another one is "Data Alchemy, Turning Numbers into Insights", then "Navigating AI's New Frontier", and then "Empowering the Golden Generations". Thank you so much. Until next time.