Conversations on Applied AI

Miles Porter - Perspectives on Analytics, AI and Data Science

May 03, 2022 Justin Grammens Season 2 Episode 9
Conversations on Applied AI
Miles Porter - Perspectives on Analytics, AI and Data Science
Show Notes Transcript

The conversation this week is with Miles Porter. Miles is an experienced data scientist focused on image processing, time series analysis, and combinatorial optimization opportunities. He is currently a lead data scientist at Trimble Central AI. He holds a BA in math and did graduate work in applied math at Colorado State. He's worked as a consultant at a range of companies such as Medtronic, BestBuy, and bluestem. Brands. He's also a musician, playing jazz bass and teaching taekwondo. 

If you are interested in learning about how AI is being applied across multiple industries, be sure to join us at a future AppliedAI Monthly meetup and help support us so we can make future Emerging Technologies North non-profit events!

Resources and Topics Mentioned in this Episode

Enjoy!

Your host,
Justin Grammens

Miles Porter  0:00  

I actually giggle about that a little bit. You know, I've never known a scientist who wasn't a data scientist because they're not using data. What are they doing? But it's interesting, you know, some of the terminology I think is sort of morphed and changed and become popular and sort of a fad. And then it will kind of ebb and flow and be called different things. But a lot of the principles I think, are still there. And I've been there like going all the way back to Student's T distribution and the guy measuring Guinness beer in England.


AI Announcer  0:32  

Welcome to the conversations on Applied AI podcast where Justin Grammens and the team at emerging technologies North talk with experts in the fields of artificial intelligence and deep learning. In each episode, we cut through the hype and dive into how these technologies are being applied to real-world problems today. We hope that you find this episode educational and applicable to your industry and connect with us to learn more about our organization at appliedai.mn. Enjoy.


Justin Grammens  1:03  

Welcome everyone to the conversations on Applied AI Podcast. Today we have miles Porter on the show. Miles is an experienced data scientist focused on image processing, time series analysis, and combinatorial optimization opportunities. He is currently a lead data scientist at Trimble central AI. He holds a BA in math and did graduate work in applied math at Colorado State. He's worked as a consultant at a range of companies such as Medtronic, BestBuy, and bluestem. Brands. He's also a musician, playing jazz bass and teaching taekwondo. Although miles I would guess probably not at the same time. No. Anyways, I really look forward to our conversation today. And thanks for being on the show. 


Miles Porter  1:41  

Yeah, thanks a lot. It's great to be here.


Justin Grammens  1:42  

Awesome. Well, like I gave a brief intro maybe about where you are today. And then also that you studied math in college, I actually have an undergrad in math as well, maybe you can fill on some of the dots. For us, though, sort of between those two points in your life.


Miles Porter  1:56  

I did get my bachelor's at UNC. And then I went on to Colorado State where I studied applied math. And I had a professor there, Michael Kirby, who did a really important piece of work as a graduate student on something called Eigen faces. And you can Google that. But it was really a fascinating thing that he did back in the early 90s, actually might even have been the late 80s At that point, where he was using people's faces, and basically coming up with ways to sort of classify people based on some really fancy linear algebra matrix kind of math stuff, and something called Eigen values. But I took this course from him. And it really got me fascinated, one of the things we had to do in this course it was long ago, but neural networks existed at that point. And we had an exercise where we had to create our own neural network from scratch in C. And it was brutal. Brutal. I mean, it's not like you could go into TensorFlow and say, Yeah, I want you know, the atom optimizer and just go, you had to like write the optimizer yourself. But I loved it, I thought it was really great. And that kind of got me really interested in thinking about analytics and math. I took a detour in my career at that point, because you know, you got to live right, so I got into computer networking, and then I went on to do some stuff in at the community college level, just organizing computer networks, and then I ended up in the Twin Cities. And I think maybe like you, I bopped around to a bunch of different companies as first as an employee, I was one of the first employees of techies.com here, and then I got into more consulting kind of stuff, and then eventually ended up at a company called people that that was kind of at the height of the IoT time, and people know, it was a really cool place to be people that has a device that they install on semi-trucks, and it tracks the location of the truck and a bunch of information about the truck. So I worked in that space for a little bit. But I always really missed the analytics piece, the data science piece. We didn't call it that back then. But I really missed that. So at one point in my career, I just kind of said, hey, you know what, at the encouragement of my wife, she was like, why don't you do what you want to do? And so I took some time off and started to do my own independent research and data science. I was very fortunate and had some connections back at people because they offered me the opportunity to come back and consult as a data science scientist. So I did that, that actually turned into being an employee for people that and then that evolved into being part of Trimble, the corporate umbrella for people that and working in that corporate group is the group name is central AI and being a lead data scientist there. So that's kind of how I got all into it. And then, just to add one thing, I just recently graduated from Georgia Tech with a master's in analytics to so I was actually able to kind of finish the graduate work that I had started way back in the early 90s. So it's great to know


Justin Grammens  4:57  

Wow, that's awesome. Yeah, no, no, that's I mean, very proud of you decide to kind of come back, I guess, fair circle, you know, and put in the dedication and the hard work to sort of get that master's degree for sure. It's funny in some ways, you know, again, like I said, My undergrad was in math. And back then, I mean, I graduated in the mid-90s. Back then they didn't feel like most of the math careers were either actuarial science, you know, basically going to grad school and being a professor or teaching somewhere, or nothing else, right? It didn't, it didn't feel like or I don't even know if this whole term data science really, you know, existed or was well known, and certainly not, you know, machine learning and AI, were areas that a mathematician would drop into back then it felt like I don't know if Did you feel the same way?


Miles Porter  5:41  

Yeah, you know, back then there was this field called operations research. And it still exists, the informs group, and I'll be speaking, actually, a informs conference coming up in April, that's sort of like the, the interest group for operations research that's, that's out there. And that has been there for a very long time. And I think a lot of what we do bumps into that kind of stuff, operations research, you know, a lot of data science too. In the past, it's had other names, it's been sort of maybe disguised as things, but I think you could make the argument that some of Six Sigma could be considered data science, you know, if you'd look at the domanick, sort of paradigm, the divine measure, analyze, improve control, well, define measure, analyze, improve. Sounds a lot to me, like the scientific method. And that's the underpinnings of data science, right? And you can't really call it science unless you're doing the scientific method. And I actually giggle about that a little bit. You know, I've never known a scientist who wasn't a data scientist, because they're not using data. What are they doing?


Justin Grammens  6:44  

Right? Yes, sure. Sure. Yeah. It's kind of redundant.


Miles Porter  6:48  

Yeah, some of the terminology I think is sort of morphed and changed and, you know, popular and sort of a fad. And then it'll kind of ebb and flow and be called different things. But a lot of the principles I think, are still there. And I've been there by going all the way back to Student's t distribution, and the guy measuring Guinness beer in England. So using statistics, using math to sort of model the real world and then taking that model and trying to make descriptions or predictions or prescriptions back in the real world, as always been there. Well, for a long time,


Justin Grammens  7:22  

For sure. I mean, one of the things that you had shared that interests you is I mean, obviously, probability and statistics, you know, some of the some of like, the core underpinnings, I guess, of mathematics, but how then does the analytic side of that relate to these broader terms that people use, like AI and machine learning and deep learning?


Miles Porter  7:39  

Yeah, you know, it's really interesting, because I went back, and I listened to a couple of the podcasts. And there's some really interesting and great insights back there. But I was struck by how people have kind of different mindsets based on where they're coming from and the problems that they're solving, I'll share with you mine, and it's probably going to be largely informed from academia, just because I just finished my master's, but I sort of think of it this way, and I'm a visual person, so I'll sort of describe it to you, and you can sort of, maybe your listeners can draw it out. But if you draw a circle, and you cut it into three parts, like the Mercedes Benz sign, that represents analytics, and the upper left-hand corner, that's AI, and inside of AI, you're going to have machine learning, which is a subset of that. And then inside of machine learning, you're going to have supervised and unsupervised machine learning. And now self-supervised machine learning, which is a new kind of thing. And then inside of supervised machine learning, you'll have deep learning. So that's in the upper left-hand corner, the upper right-hand corner, I think of that as simulation. And at the very bottom, I think of that as probability and statistics. You know, if you pick it apart, it can break down because probability and statistics are absolutely critical to simulation. And you absolutely have to have probability and statistics, particularly if you're doing any kind of like metrics for accuracy in artificial intelligence and machine learning, you know, any of those kinds of things. But that's sort of my my universe. Another thing that if I made it even simpler, if you Google, Nvidia, deep learning, and then you go to images, I think the first image that you'll find is like this really cool graphic that has a timescale at the bottom. And it has, you know, artificial intelligence, machine learning, and then deep learning sort of in a chronological order. I think I kind of relate to that, too. But I still think about analytics as being even bigger than that.


Justin Grammens  9:38  

Interesting. Like I say, it's even bigger than that. Right? So this analytics field sort of encompasses all of these areas that you're talking about. And the three things you said was aI simulation, and then probability and statistics was that that was the third one.


Miles Porter  9:51  

Yeah. And I'm a real stickler about that probability and statistics thing, because they're, they're so fundamental. There's this really important concept and I always ask People in interviews this question, it's amazing how, how often people don't give me the answer I'm looking for. But sometimes they'll say, What's the difference between probability and statistics? And the answer is really pretty straightforward. If you're doing probabilities, what you're really doing is you have a population. And you're trying to make some kind of statement about now, if I grab a sample a subset of that population, how likely is it that I'll have I have 6 million marbles? How likely is it if I pick 100? That I'll have five red ones? That's a probability question. Statistics goes the other way. So I have a sample. And I have five red marbles and three green marbles. And I think there are a million, you know, in the population. And typically, what you'll do with statistics is you'll have more than one sample, so I can't count all the million. So I'll get 15 samples of size 10. What kinds of things can I do to serve that a reason back the other way? So can I use the little sample to make statements about the population? So those are that's the difference. And sometimes when you hear people talk about probability and statistics that technically the terms are not interchangeable. And I don't know that the terminology is really important, but it's this idea of going from the big to the little and the little to the big that I think is so critical, so critical. It is part of the underpinning of machine learning, too, right? I mean, if you do a classification problem, right, you could theoretically, try to count everything, and then just classify things that way. But sometimes we can't know the population. And so that's when you start to use tools to sort of predict, you know, okay, based on these training samples, what do I think the population is going to have? And there you are, you're in an artificial intelligence machine learning kind of scenario.


Justin Grammens  11:54  

No, I love that. I love that viewpoint. I've actually never heard that before. It's kind of like, what do you know, today? And what are you trying to move towards? In probability? Like you said, you want to figure out like, what's the likelihood? It's more of a prediction? It seems like, you know, like you said, if I reach inside here, what's the likelihood you're gonna grab these marbles? I love that. And then hey, I, here's this, here's the people that I find, kind of like, I mean, I'm just again, I'm just sort of like spitballing. But you know, think of COVID today, right? So there's these tests that are being done. So that's hard data that we have with regards to people that are actually testing positive. But now we need to triangle the other way. Like, we know there's a lot of people out there that aren't being tested, that aren't aren't basically showing up. And so what's the likelihood that the entire sample size is is x, kind of what are you solving for for x, I guess, right?


Miles Porter  12:38  

Right. And there's some really interesting tools out there actually, to kind of help you get to those kinds of statements. That's a whole, particularly those kinds of questions when you're dealing with disease and the likelihood of infection and not infection and things like that. There's a thing called Bayesian statistics, that is sort of pioneered by this guy Bayes. And if you're interested, you can look into that as an interesting little area of struts at statistics. You know, there are people that really buy into that Bayesian statistics, philosophy, and there's a whole bunch of machine learning tools out there to kind of help you use that Bayesian stuff. I'll tend to get off in the weeds, so you're gonna have to pull.


Justin Grammens  13:16  

But yeah, sure. That's good. We got listeners from you know, all sorts of skill sets and backgrounds here. So I can definitely go in and geek out as much as we want for sure on this stuff. And you mentioned, like regression and like, traditional analytics. Is that kind of what you do in your day-to-day job.


Miles Porter  13:33  

Yeah, so I have my dream job. It's kind of cool. In working in Trimble. So the Trimble the organization, you may have heard of Trimble navigation. We originally started out doing just geospatial GPS or sort of the bigger term GNSS for positioning and what happened with Trimble as we continue to expand and expand. And now we have businesses like the people in that business, which is now called Trimbal Transportation. We do stuff in agriculture, we do stuff in, of course, geospatial a lot of stuff in construction, there's all kinds of applications of the technology. So in my job, on any given day, I might be talking with people in New Zealand about a construction problem. And then people in Germany about agriculture, and then maybe somebody up in Canada about forestry. And so the problems are very, really widely varied. And then within our central AI group, we actually have divided ourselves into folks that are particularly focused on deep learning and computer vision problems, and then everything else. And I'm the Lead data scientist for that everything else. Now there's a lot of overlap. And I have peers in the in the deep learning side that are absolutely brilliant. And, you know, frequently thank goodness helped me out when I get in over my head on some of these problems, but just because of the type of work that I do, I will typically focus on the non deep learning problems. And so that can mean any thing from exploratory data analysis. So, being in a central group, sometimes we have people come to us and say, Here's my data, give me insights. And it's like, okay, let's take a step back. And the first thing we always do is start off with exploratory data. Because, you know, it's like, everybody's data is kind of like their kid, everybody thinks their kid is beautiful, right? So you got to kind of go into the data and kind of peel it apart and say, Hey, let's look at this data. Do you have outliers in your data? Do you have co linearity in your data? Do you have missing data, bad data I make I make a big distinction between bad data and outlier data. How is that so outlier data is data that is, is probably right, but just doesn't fit in the context of everything else. Bad data is like, you have a field structured in your database to be a string, but really, it's containing floats. And then you come across like some crazy ASCII characters in there. So it's like impossibly data, that doesn't make any sense. It's like sure, you know, the wrong types,


Justin Grammens  16:05  

Are those typically things that maybe it's a sensor or something like that, that was an incorrect.


Miles Porter  16:09  

Sometimes it's sensors, sometimes it's just, you know, we're in some of these huge data pipelines, you have enough moving pieces going around in there, sometimes you just get data that gets corrupt. I came across one the other day that I was really, I thought was fascinating. We have these handheld devices that are used all over the world. And we have a database that our customers can agree to, to log information into the database. And I was looking through the database where we were capturing the error message. And I was like, Oh, my look, I've figured out how to say null pointer exception in 67 different languages, because it was in there in German and English and French, and Japanese and Korean. So Anna, the data stuff, I think, is always really interesting, and that we do this thing called the Data Quality Report. And that always is a very helpful thing, when we engage with our businesses, because it starts it starts them thinking really in the details about their data. And sometimes what they kind of thought they wanted us to engage with him to sort of explore turns out, they dig into their data and go oh, whoa, what's that? That's really interesting, we should explore that. So that exploratory data analysis, I think, is very helpful, handy tool. And a lot of times, it's just plain old statistics.


Justin Grammens  17:21  

You don't even need to get into any of this other stuff. Really, yeah, it comes to machine learning and neural nets.


Miles Porter  17:27  

Yeah, and we typically end up getting there. But it's just sort of, you know, to start out with really good thing. The other thing that I think is really interesting, and probably you might be aware of this too, having been in the IoT space is that sometimes when you come up with these really cool sophisticated models, particularly deep learning models, sometimes they don't work so well, in the IoT space, right. So if you have like a sensor that's detecting an anomaly, say, you may want to manufacture that sensor as cheaply as possible. And as a result, you may favor some kind of machine learning model, like, you know, I'm a big, big fan of one-class support vector machines, which is just a way to do anomaly detection. But support vector machines are really, really lightweight. And you can kind of deploy that on an IoT device, no will work really, really well. Whereas if even if you tried to deploy even like a three-layer, completely connected neural network, you know, it's just going to take for one thing, it's going to take a long time to train it. And for another thing, it's going to even take time and inference mode.


Justin Grammens  18:32  

Yeah, no, that's funny that you mentioned that actually interviewed a guy just last week, and it'll be on the podcast here dealing a lot in the industrial IOT space, you know, he was saying that a lot of these in particular, just like actuators or motors, they don't want to increase the cost number one very much. And then also, it's, you know, you can do some prediction before these things are going to fail, right? This is kind of what what the whole deal is, but he's like, you know, it can be very, very simple. Like, you know, you basically can use statistics and some sort of a regression analysis to basically say, hey, once it once the voltage falls out of this envelope, you know, like this, this window, then there's gonna be a problem. And he's like, you don't do not need to, like, you know, like, you're saying, sort of, like deploy a neural net, and you know, at the edge on this thing, it can be very, very lightweight, which is kind of what they want in those in those spaces. So sometimes it's like, you know, kind of, like use the right tool for the right time.


Miles Porter  19:25  

Right? There's things like control charts that were popular maybe back in the, in the 80s, and 90s. For industrial processes that are really work for a lot of things. Sometimes I think people sort of gloss over some of those in favor of some of the more I'll say, sexy, neural neural networks are cool. I just the other day I wrote I was playing around with neural net where I had to do a time series problem for one of our groups. And it's a really interesting problem of trying to classify a set of time series. So you know, time series is just measurements across a time domain right? And I was like, Well, I want to, I want to do something for this person, but I don't really want to solve the problem for them. But I want to do something that I could kind of maybe do a blog post on or something. So it's like, well, what kind of time series could I create? And then I was like, Oh, I know. So I don't know if you're familiar with open AI gym. But it is a reinforcement learning framework that you can use. And it's based on video games. So because I'm older than dirt, I remember the video game called lunar lander. And lunar lander basically had a left thrust, a right thrust and a bottom thrust, and you tried to land this little thing on the surface of the moon, and you had this constraint of, you can't run out of fuel. But if you land too fast, or at the wrong angle, you blow up, I was like, Okay, I've got it, here's how I'm going to do my time series, I'm going to just look at the velocity in any direction, just the speed of that lander, and I'm going to create a bunch of time series, I'm gonna have two classes, I'm going to have a reinforcement learning neural network that I'm going to train to land the thing, and then I'm going to land the thing. And then I'm going to see if I can use this time series model to classify between the two. Well, one of the problems is I'm so bad at playing the game, that it was just painfully obvious. So finally, I said, Okay, I'm not going to have me do it, I'll train another reinforcement model. So I had five different guys, you guys had five different agents, land, the lunar lander, from the super expert to the super novice, which was even better, it's still better than me, and then use that for the time series stuff. So I don't know, I just sort of mentioning this, because I think the reinforcement learning stuff is really, really cool. And I really enjoy that. I don't get to do too much of that in my daily job. So I sort of figured out a way to create a problem for it.


Justin Grammens  21:46  

Well, yeah, you kind of talking about unsupervised learning versus supervised learning as well. When you say reinforcement.


Miles Porter  21:51  

Yeah, so Reinforcement learning is really kind of an interesting phenomenon, I'm not really sure if he would classify it as supervised or unsupervised, or maybe self supervised. But what it is, is you basically have the neural network, try to play the game. And if it fails, or if it succeeds, you either give it a reward, or a sort of a punishment. And so you're kind of doing this. And because we have computers now that are super, super fast, we can do this kinds of thing, you know, I can train that model a couple million times, and eventually get the thing to actually learn how to land the lunar lander. Some of the reinforcement learning models will use just like the image of the screen as the sort of input vector, right. Others will use more detailed information about like, I wasn't doing the input of the screen, I was just saying the position of the lunar lander, and its velocity and angle and stuff like that to kind of make the training a little bit more simple. But these are, you know, reinforcement learning. And again, it's sort of, I think it sort of falls in that gray area between supervised and unsupervised, and I would consider it maybe self supervised learning, because you do have a reward, punishment kind of thing. But you don't have like an In supervised learning this defacto knowledge of here's your training set with your labels.


Justin Grammens  23:13  

Yeah, exactly. Everything's been cleanly labeled. And in these cases, you're stuck, you still are exploring, like you said, it's like, Hey, you're nudging closer and closer to this thing. So here's here's a reward, you're getting close. Right? Or you're getting away, you know, weight from it. So change your weights or balances or whatever it would be, I guess, right?


Miles Porter  23:31  

Yeah, it's interesting area, too. I mean, I think one of the other key differences between the space that I'm in and the deep learning spaces that in the models that I do, the datasets tend to be smaller, and we tend to be able to do a lot more towards explainability, than you can do with some of the really sophisticated deep learning models, you know, it's kind of hard to explain, you know, like a 55 million node neural network as to why it does what it does. But if you have like a random forest, or a decision tree or a support vector machine, it gets a little bit easier to sort of explain, and it gets a lot easier to reproduce, which is something that is can potentially be meaningful, you know, I was I was enjoying my Hugo's podcast a while back. And when he was talking about a lot of the things that they do in, in pharmaceuticals, you know, and I know he's spent some time at Medtronic, too, but, you know, being able to thoroughly explain what you're doing, why you're doing it, and the outcome is really, really critical, particularly in the eyes of the FDA. So yeah, it's a challenge or problem.


Justin Grammens  24:35  

Yeah, and you know, my experience doing a fair amount of stuff with TensorFlow is just there's so many knobs you can turn in and you're right, it is difficult to know, oh, I turn this knob. Now all of a sudden, it did this. Well, why right? Because it's, it is kind of a black box. You're trusting a lot of the a lot of the math going on under the covers, I guess. It's just sort of working. And that can be problematic.


Miles Porter  24:58  

You know, like if you do any kind of neuro All network optimization and you use any kind of like Stochastic gradient descent or anytime they use that word stochastic, that just means random. And if it's truly random, it gets really hard to repeat. You know, you can set seeds and stuff like that. But like I have colleagues in the deep learning space, they're like, Yeah, I can do a set seed. But you know what it does? It's still crazy, because there's so many random number generators, sitting down below the covers, that it's really hard to make sure you absolutely recreate everything, precisely the same every single time, for sure.


Justin Grammens  25:33  

Well, what are what are some some projects that you've been working on recently,


Miles Porter  25:38  

uh, you know, I guess one of the first ones that I did, well, I'll start with maybe sort of the recent ones that kind of work backwards. So I am working on a project right now really interesting project that is really more analytics, but actually does involve machine learning. And that is something called S curves. So if you ever imagine a construction site, and you think about over the course of that construction site, the company that's building whatever it is, has costs that they have to pay for, they have to pay labor, they have to pay for materials, so on and so forth. If you were to graph out over the course, over the time of that construction project, and the cumulative, the cumulative cost, there is this group called the PMI, the Project Management Institute, that is talked about how that graph formed something called an S curve, and it looks like an S curve or a logistics curve. And there's a whole bunch of articles out there about why that happens, and how that happens. There's even books written on it. So if anybody that ever takes construction, project management probably has come across this stuff. And, and so one of the things that construction companies really should try to do is they should try to figure out ways to manage their projects in a way to get that S curve. But construction projects typically don't happen that way. They typically happen by project managers or four people on site that are managing the project that are doing a lot more by get. And so what we're attempting to do in Trimble is we have in our construction business, a product where we help our customers capture all the information associated with construction projects, including cost. And this data tends to be pretty accurate, because the construction companies will use that data for tax reporting. So it kind of has to be right, what we are able to do is then take and kind of figure out those S curves. Now, where the machine learning part of that comes in is that if I have an S curve, you can follow this process of doing something called a logic transformation. But essentially, you you apply a mathematical function to the x and the y value. And you can turn your S into a line because of the way that that transformation works. If you know the line, you can go back to the s. Well, the cool thing about the line is that like any line that has two things that can be used to describe it, and intercept and a slope, if I can look at the intercept in the slope of that construction project, then one of the things that I can do is look on a graph of a company and see the slope and intercept of all of their construction projects over time. And based on where those individual. So in that graph, each construction project is a draw. If I look at that graph, I can say oh, here, projects over here to the right, are really slow to get started, right, they start and they chug along along along and then it's a mad dash at the end. And then there are other ones maybe over to the left that start and they shoot way up. And they do a lot of work. But then they don't close the project out. So in each of these cases, the construction project is not functioning optimally. But what we can do in trouble is we can take all of this data, the way ml comes in is I take the you know, I take the S curve, get the line, and then I fit a regression, a linear regression, just a simple regression to that to get that x and y. But I can do that for these construction company projects, and then show that information back to the customer. And then you can do some really cool things with this graph. You can say, okay, let's color code the dots. Based on the project manager. Do you have some project managers that are always in the sweet spot in the middle and some that tend to be going left or right, let's color coded based on the zip code. Let's color coded based on the type of project is this a heavy highway project? Or is this like some kind of specialty subcontractor? Is this government buildings? Or is this residential construction? And you can do you know, lots of other things. One of the things I've been most recently doing is looking at this in terms of time. So you imagine you got this scatterplot of all these dots? Well, what if you had a little slider at the bottom and you could say, I'm going to add these dots gradually over time, or my project starting to drift and it's kind of interesting because, like in one of the recent ones that I could see, the dots start to do riffs off to the right. And the reason why is because of supply chain problems. So they're not able to get product in, they're not able to actually account for that cost. And so the project starts dragging on and on and on and on. That's a real analytics focused kind of thing. But I like it because it's really applied. I mean, the customers love that kind of thing to be able to dig into their data like that.


Justin Grammens  30:24  

When you mentioned construction data, you mentioned cost is there other things that Trimble has out in the field to capture some of this stuff, like sensors, or if you ever


Miles Porter  30:33  

go by a construction site, and you see a guy standing there with a pole with a dish on the top? That's Trimble, we make, like 90% of the market share of those kinds of things. So we have all kinds of information about the survey location of the construction site. I don't know if you Google spot, Google Trimble spot, you'll see the Boston robotics dog with the with the three point scanner for a head, actually our headquarters I just funny story, our headquarters is in Westminster, Colorado. And I was out there for a meeting and I was walking down this one floor and I walked into this floor and I looked over here and like this cube Bay is like eight of these spot dogs with the little tremble heads on them. And I was like, oh, man, this kind of feels like the Terminator, a little scary. You don't realize how big those dogs are. They're big. But anyway, the the idea there is that those robots will wander around a construction site and create a 3d point cloud. And then, you know, we do things like 3d Point Cloud segmentation. And we use all of that kind of data to modeling. One of the things like when the the Notre Dame Cathedral burned down, they use Trimble to create a 3d point cloud of the structure that remained both inside and outside the cathedral. And then they use that information to help figure out how they were going to do the construction and rebuild the spire and things like that. So awesome. Yeah, there's all kinds of point cloud data out there. That's sort of the physical construction part. But then also for the business part, there's all kinds of data, you know, we really do help our customers from from the very beginnings of design, and BIM and layout for construction sites through actual building a thing with our viewpoint, division that really tracks construction projects and kind of helps those companies manage their projects. So it's a really, really rich space. And, again, it's one of the neat things about Trimble that's just one little part, but we have similar kinds of things in transportation and agriculture, natural resources and stuff. So


Justin Grammens  32:30  

Sure. Oh, yeah, I'd like to hear about another one. But I got a, I got a question about S curves, like, is that something that you uncovered internally? Was it like a customer saying, hey, I really wish I could have this, you know, what was the process around sort of coming up with this stuff? Yeah, so


Miles Porter  32:47  

The S curves, the idea of S curves is really well known. And it was well known to our division. So they were the first ones they have this idea of anybody that's maybe studied a little bit of business understands this concept of something called WIP, or work in progress. And it's part of sort of your accounting process to think about raw materials and work in process and manufacture product and accounting for all that kind of stuff. But being able to quantify this work in process is really important. So the business knew about it. And they were like, well, we want to try to apply this. And it was a kind of a neat opportunity, because I was like, Oh, cool. I don't know about that. But I'm gonna go find out about it. So I did some research. There's several guys out there. I say guys, just because they're all men, there probably are women. I haven't found any women that are in this field, but I'm sure there are. But anyway, there is a couple of researchers out of Cambridge, one of them is AP kaka and the other one is kenzley. And they've written a bunch of white papers about doing this kind of S curve analysis, the problem that they ran into is it and and if projects. So for us, it was like, Oh, we have tons approach, we have maybe too many projects. In fact, we got to the point where my coworker and I were looking at the data, and we were like, Okay, here's three customers, we have to get all of the individual costs associated with their products. So are their projects. So those three customers had 76,000 projects had 2.3 million costs, right? So the data starts to get big. And we have to start thinking about big data tools, you know, we use data, bricks and spark to kind of chug all that data and get it to where it's kind of where we can present it in the UI and make it meaningful. But there's a lot that we do there that isn't, you know, to be really, I think successful in sort of analytics and data science, you kind of have to be able to wear a bunch of hats. And in that case is sort of like you kind of have to put on the DevOps hat to be able to kind of build these tools and figure out how to get all this stuff to talk together and, you know, get it out the door. So


Justin Grammens  34:51  

Cool. Well, yeah, I mean, you got another project you wanted to touch on.


Miles Porter  34:55  

Sure. There was another one. This was a transportation one that was really fascinating. and some called Trimble dispatch assistant. And so one of the challenges that a transportation fleet has is trying to match their truck drivers and their trucks to possible loads. In order to do that, it's something called a work assignment problem, or a common attempt combinatorial optimization problem is really, really interesting. Sometimes we talk about in data science and analytics about greedy algorithms. And a greedy algorithm is one that sort of does the next best thing without thinking about anything else down the line. And there's some opportunities in this work assignment problem to say, Okay, I'm just going to take the best driver, I'm going to take this driver and find his best load, I'm gonna do that. And then I'm gonna take the next driver and find his best load. And I'm going to do that, that'd be a greedy algorithm. It's not optimal. It's really bad. But it's really a fascinating problem. Because you know, some of these on any given day, some of these fleets have lots of drivers and lots of loads. So you might have 100, drivers and 100 loads. Well, if you want to think about the number of permutations of 100, drivers and 100 loads, it's actually a number greater than the assumed the number of atoms in the universe, it's huge. So figuring out ways to solve that problem is kind of fascinating, actually, just even getting to the number of saying, If I have to quantify this driver carrying this load, I need to put a number on that. Well, that's a really complicated problem, too, because you have to start thinking about, you know, there's this thing called ELD, which is electronic logs for drivers and rules that drivers have to follow for the number of hours that they can drive, you have to think about the distance, you have to think about, you may not want to have this driver take this load, maybe they can't, because maybe it's a hazmat load, and they're not able to do it. Just coming up with that matrix is one thing, and then using tools to, to use a heuristic to solve that matrix. So basically shuffle the rows around in the matrix until the diagonal becomes a minimum is really fun and really interesting. And there's some interesting tools out there. There's a, this is actually an operations research kind of problem. And there's a tool set out there called Google hole are tools that has some stuff that you can use to solve it. That's actually how we went about it. That was a that was a fun project, because it sort of started off as we need to figure out how to give this advice to our customers. And it really went from that phase, all the way through to actually implementing the thing and then having it delivered in such a way that people that are working on a fleet management system and an es 400. Right, if you've remembered those things, how they can actually tie into the system, and somebody that's using a cloud-based solution from Trimble could tie into it, too. So that was a fun project.


Justin Grammens  37:51  

Yeah, when you mentioned about that, I just got to think about Yeah, what does user interface look like? Like, you know, how does it break their current state because people can build technology, and then it forces people to try and relearn and which means likely not so much adoption? You know, you can kind of have these really awesome tools. And if people need to change their flow of their job to sort of accommodate to them, I think, as we build software and build AI, machine learning, you know, analytics tools, it just it sort of needs to be sort of complementary to what we do. 


Miles Porter  38:19  

Yeah and you know, that's an interesting thing. One of the things that we did in that in that system was we said, we allowed the fleet manager the ability to say, Okay, here's my loads, here's my drivers, click a button, here's our recommended solution. And then the fleet manager can look at that and go, oh, boy, if I take this driver, and I make him go here, again, they're probably going to quit. So I got to fiddle with this a little bit. But allowing the human to sort of step in has been really, really huge. And I think helpful. There was another project kind of similar Well, in in transportation that we did, that had to do with this is a thing called Intellivue. But if you have a semi truck going down the road, it has sensors inside of it that will detect a collision or a potential collision situation. And those sensors are typically based on LIDAR or radar. Now you could go out and theoretically get a bunch of potentially self driving trucks, you know, but a lot of times, fleets don't want to spend multiple billions of dollars replacing their entire fleet, they got to try to do business with what they have. So for a very large logistics company trucking company, they have these trucks going down the road. The problem was if the trucks go under an overpass or next to a big billboard, the sensors go crazy and they think that the truck is going to get in a wreck. Well, not a big deal, except that it triggers a video to be recorded a forward facing video off the front of the truck. And they do that for insurance purposes. A lot of times that video then goes back to a safety manager and a fleet and they have to review those videos, the videos are 20 seconds long. And for one of our biggest customers, it was resulting in 80,000 videos that they had to watch in a month. That's like, all you did, right. And most of them were, you know, false alarms. So what we did was we took the videos and then use deep learning technology to essentially watch the video, do object detection and classification and get a little bounding box on the on the person. And then based on the size of that bounding box, and the location of that bounding box and some other special secret sauce that we have in there, we were able to then make a recommendation back to the fleet manager hate really watched this video. But if there were others, we let them see them all. But we could bubble those ones that seem to be the most critical up to the top. That was a fun one. I worked on that with a colleague of mine, and hunt. And we were able to get a patent on on part of not the deep learning neural network part. But the overall system part particularly how we were able to define the bounding boxes and all these other kinds of pieces in there. So that was another fun one to


Justin Grammens  41:05  

Really cool. Really, really cool. Well, yeah, I'm so happy that you mentioned you found your dream job. It definitely definitely sounds like that. If you look back, are you glad you still sort of took that foray into just software and development? And back? Are you kind of wishing Oh, man, I wish I would have jumped into this in the 90s.


Miles Porter  41:21  

No, I'm really glad that I did it. Because it just when you do data science, you know, being able to talk to the business and understand their problems is really important. But the further that you can carry the ball towards solution, the better off you're going to be. So you know, it's great. If you can do do a linear regression in R, that's fine. But if you can do a linear regression in R, and then translate it to Python, and then put it inside of a flask app running in a Docker container and deploy it to Azure, and then secure it with two factor authentication, or some other kind of authentication, and then put that out. And then you know, maybe do like a really simple 3d or d3 based JavaScript visualization. You know, maybe you're going to ship all that to production. But the more real you make it, the more likely people are going to be to Oh, yeah, let's take this and run with it. So I would have no skills in that area, if I hadn't spent a lot of time you know, working at a lot of different places on a lot of different problems. So it's not to say you can't get into data science without that. But every day, I'm glad. I'm like, even Yeah, I was writing some SQL queries on MySQL. And it was like, you know, if I didn't, if I didn't know how to do SQL, or if I didn't know how to do a sudo app install on a boon to to get MySQL running. It would be hard to have this conversation. It's


Justin Grammens  42:45  

Yeah, I think back I mean, I tell people, one of the best tools I ever learned was VI, actually, it was just like, I mean, the fact that I sat down to learn vi back in the back in the late 90s, or whatever, I saved my bacon so many times, you know, log into Unix system, and it's always there, man. And you can always edit whatever you need. And people are like, how do you understand the cryptic you know, this stuff? I'm like, you just gotta learn it, man. But I'm telling you, it is, you know, I don't use it as a day to day coding, you know, IDE, but man, it I am so glad. It's always there. Right. So always there, it's always there to be used. Well, a wave, you're thinking back, just a couple more questions here at the end. But I mean, you know, do you have any advice or classes that people should take, like, if people are just getting into this field? What, you know, as you've, you've explained your journey, what advice would you give to people?


Miles Porter  43:30  

Yeah, I think, you know, there's tons of resources out there. And Mike and his webcast mentioned it, frankly, Justin, I think this series is great to help people listen to videos and podcasts and maybe you know, the meetups and getting interested in an area and just dig into it. I wanted to call out one other podcasts. Sabina, Stan askew, did a podcast on ml ops, basically. That's incredible. That was a great podcast. And I actually had, I sent her a message and said, Thank you for doing that. Now, I don't even have to mention that. It's hugely important, but I didn't mention it. But why? Look, you know, listen to subpoenas, podcasts. And then in terms of resources, I do have some books that I'd recommend. One is kind of like the baby book for statistical learning. It's just called an introduction to statistical learning. And it's by James Witten, hasty and Trubbish. Ani, I have the one that has applications in our this is the baby book, The daddy book is the elements of statistical learning doesn't cover the mathematical foundations of you know, even the most simple algorithms right, have some kind of mathematical underpinnings. And if you're curious, this is a great book for it. The other one that I would mention, it's kind of expensive, but it's called the data mining for Business Analytics. And there's a version of it for Python. This is a really good book because it talks about different kinds of machine learning algorithms, not deep learning, but this will be all of the other like boring stuff like You know, K means and support vector machines and regression and Decision trees and random forests and boosted trees and all of that kind of stuff. It's just a really good reference to have, but it will give you some idea about here's a problem, here's how you can solve it. Here's some gotchas. And if you get that book, go to page 420, because there is a section that I refer to quite often. So there's this thing called spurious correlations. I don't know if you've ever seen that or not. But if Google's spurious correlations, it does things like I was trying to think of some of them like the divorce rate is highly correlated with the price of blue cheese or something. I mean, it's sure, yeah, crazy things. This book has a section in here that talks about, not not necessarily about spurious correlations, but about when you're trying to do predictions on things that are not predictable. You can do tests to show that certain things are random walks, and one of the things that is a random walk is the behavior of the stock market. So if anybody ever tells you they have an algorithm to predict the outcome of the stock market, you can say, Huh, I think there's a statistical test that you can run that says it's a random walk. So anyway, but that's a that's a really good book that I'd recommend people check out. And, you know, go Jackets, check out the OMS a program at Georgia Tech.


Justin Grammens  46:18  

Okay. Sure, which is the one that you went through for sure. Are you guys hiring within within your group,


Miles Porter  46:24  

Not within my group right now, but we will be probably towards the end of the year. And we are. Trimble always has opportunities for people I saw. We have the data science job that was posted not too long ago, we have a joint venture with Caterpillar, you know, the big truck things and they were looking for data scientists in that group. And so yeah, you can go to Trimble careers and and find all kinds of interesting positions all over the world.


Justin Grammens  46:24  

Yeah, cool. Cool. Yeah. I mean, it seems like it's a hot field. And you were wise to sort of follow your heart, I guess, you know, in some ways.


Miles Porter  46:57  

A little bit, a lot. A whole lot of following my wife's recommendations, I think,


Justin Grammens  47:01  

What does it happy wife happy life is the is what I liked it. So


Miles Porter  47:06  

Yeah, there's a little bit of that. But no, my wife, my wife is a longtime Java contractor in the Twin Cities. And she's actually far better programmer than I am. But she, she's the one that kind of really encouraged me to go down this path. So I have her to think.


Justin Grammens  47:20  

Well, awesome. Awesome. Well, cool. Are there any other topics there? Or things you've just wanted to share? Before we started? Close it out here?


Miles Porter  47:25  

No, not really, I really appreciate and appreciate the opportunity to chat with you. And thanks, again, for doing this stuff and helping organize this community? I think it's, I think it's really huge and tremendously important for the Twin Cities in the in the Upper Midwest to


Justin Grammens  47:40  

Excellent, well, I know, we're gonna have you at the meetup. The first Thursday in March, this will probably air after that, but we typically record those so people can be able to check you out and talk all about S curves, then I will obviously post all your contact information. What is the best way to do you know, I just should people find you on LinkedIn? Yeah, LinkedIn works.


Miles Porter  47:59  

miles_porter@trimble.com works. Either of those two ways, probably the best way to get a hold of me, I found


Justin Grammens  48:06  

Your blog, right? You've seen like, as you've been learning this for a number of years, you've just been able to just write a lot of blog posts and share. I mean, that's so awesome. I also,


Miles Porter  48:15  

oh, thank you. Yeah, it's data science, dot Netlify dot app, for people that ever go and look at that, that really is a journey. You know, when I first started writing that I had no clue about a lot of things. And then over time, I started to learn more and more, and realize how much I don't know, which is kind of where I'm at right now. I mean, one of the fascinating things about this field is that it changes so fast. I was thinking like cube, Hugo was mentioning transformers, and Bert, and all of these things for natural language processing. They're just hugging face. And all of these other kinds of things are just coming up constantly. So if anybody ever tells you, they know it all, they don't, but it's just cool. I just love being feeling like I'm in the stream. And being able to kind of, you know, drink from that stream and learn whatever I can learn. So I encourage people to check out that blog post. I'm just trying to put a disclaimer there. If you see something that seems wrong, it's probably wrong.


Justin Grammens  49:09  

Rather. Because you were figuring out along the way, well, no, I again, I appreciate you being on the program, sharing your knowledge, you know, attending our meetups, and for everything that you do, yeah, I definitely will point people in your direction and look forward to maybe having you on the program in the future. And we couldn't agree more about what we've learned. So Well, thanks again, miles. Appreciate it. And we'll talk to you soon.


AI Announcer  49:34  

You've listened to another episode of the conversations on Applied AI podcast. We hope you are eager to learn more about applying artificial intelligence and deep learning within your organization. You can visit us at applied ai.mn To keep up to date on our events and connect with our amazing community. Please don't hesitate to reach out to Justin at applied ai.mn If you are interested in participating in a future episode. Thank you for listening