SNIA Experts on Data

Q&A Podcast for "From Data to Decisions: Understanding How AI Models Learn"

SNIA Episode 27

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 50:18

In the most recent webinar in the SNIA Data, Storage & Networking “AI Stack” webinar series, “From Data to Decisions: Understanding How AI Models Learn,” Cal Foshee and Eric Gamble provided an in-depth look at how computer vision models learn. They shared specific techniques and concrete examples of how to train and test these models, drawing on their many years of experience. In this interview, Cal and Eric, take a deeper dive on the questions around how computer vision really works in production: small models working in sequence, bounding boxes as signal gates, and the relentless pursuit of pattern over assumption. Practical tips cover OCR, rotation, error bucketing, versioning, and why “fail fast” is more than a slogan. And webinar moderator, Erik Smith, explains what the “AI Stack” series is all about. 

SNIA is an industry organization that develops global standards and delivers vendor-neutral education on technologies related to data. In these interviews, SNIA experts on data cover a wide range of topics on both established and emerging technologies.

About SNIA:

Welcome And Guest Intros

SPEAKER_04

All right, welcome folks to another amazing SNEA Experts on Data podcast. My continuous pleasure to be a host of this fantastic community, and uh this is gonna be a great one, mostly because there's a plethora of Erics available on this call. That isn't always a requirement. I like to have a two Eric minimum, so it's even better that we got three plus. And uh Cal, you'll be an honorary Eric today. Uh so for folks that are brand new to what we do here, uh so SNEA is an amazing community that I love the fact that we're sharing so much information with our technology world, both in the community and beyond. Uh so my name is Eric Wright. I'm the co-founder of GTM Delta and also the host here of the SNEA Experts on Data Podcast. And I'm welcomed by three other amazing humans. We're gonna talk today about an amazing webinar that talked about vision AI, and this is a deep dive that I took a look at the content that is gonna be in here, and it's this is what we need. You know, people always have questions and they want to be able to get to the good stuff and they want to go deep, but also understand the why. This is such a beautiful merger of those two things. So we're gonna have a bit of a chat about you know the subject matter, and of course, we've got links down below to point to you know getting in touch with the folks here as well as seeing the webinar itself. So, with that, I'm gonna get onto the good stuff here and start. Uh so Eric uh Smith, if you want to introduce yourself for folks that are new to you, and uh and then we'll make our way around.

SPEAKER_03

Sure. Thanks, Eric. Uh Eric Smith, I am a distinguished engineer and I work for Dell uh as a member of their CTIO team. Uh I am also the DSN uh Data Storage and Networking Communities Chair. So that's a SNEA community that uh is sort of sponsoring uh the AI stack webinar series. Thanks for having us.

SPEAKER_04

And uh Eric Gamble, just want to keep the Eric flow going.

SPEAKER_01

I'm Eric Gamble. I work with IBM. I am an AI technologist, and also I do technically it's data science, but it's an area of AI modeling and predictive modeling. That's my specialty. Um, I spent four years in addition to my day job as an adjunct professor with Wake Forest University doing graduate analytics course.

SPEAKER_04

Fantastic.

SPEAKER_00

And uh Cal. Thanks. Uh so my name's Cal Foshi. Unfortunately, I'm not one of the three Eric's here, but I also, like Eric Gamble, work for IBM. I uh my day job is the uh storage procurement engineering manager. I manage our team of engineers that work with our suppliers, but I'm also one of IBM's subject matter experts in computer vision, and I do that both for internal use cases and with clients.

SPEAKER_04

Amazing. Excellent. All right. We got there's a reason why I call it experts on data. We got experts in the room. So uh Eric, I'd love oh, Eric, sorry, Eric Smith, I'd love to have you talk about the webinar series really quickly. Like what what what is the value of the AI stack webinar series and what else have we already talked about and kind of what's coming up? Because for me, I'm excited as heck, but uh I'm a bit of a captive audience here. So let's tell the rest of the world about it.

AI Stack Series Overview

SPEAKER_03

Yeah, so thanks. Thanks for the question. Uh the AI stack webinar series is really about introducing people to the basic concepts that are involved with AI. Uh, we started with an intro to AI ML concepts webinar that included a binary digit trainer and gave up gave people an overview of how um actually training a model works. Uh we've we've subsequently followed that up with a webinar on inferencing, uh uh, which was uh a few weeks ago. We've also done one uh just now on training a model, uh like the actual model used in production and one for vision uh vision at that. And uh it was a phenomenal session, and I I think everybody should check it out. Going on in the future, though, it's really becomes about how do you practically deploy uh AI ML solutions? You know, what does it take to deploy a model? What does it take to use it in inferencing? Uh how do you how do you set up an environment such that you could run a model? Um, you know, what are the what's the hardware requirements, all those things. And uh, you know, there's going to be a practical uh webinar coming up that's related to uh you know creating a chatbot that's based on RAG. So uh it's you know, we've got a bunch of interesting, interesting things coming up here uh over the next year.

Why Vision AI Is Different

SPEAKER_04

And what I really adore about this, and I'm blessed to be a part of this community, is that it is such a very strong community focus around education and around engaging and connecting people because we are birds of a feather, and we are all trying to solve similar problems for our friends, clients, customers, etc., and really advance the world through you know shared standards and shared outcomes. And and and what I love about this is even though we all come from different brands, when we're in SNEA, that's that's all that matters. So very, very cool to see the the webinar and and many more like this. So maybe I'll I'll have you start off, Eric Gamble. Let's let's talk about the vision challenge and what is really sort of distinct about vision AI and where this is an important piece of this, isn't just like every other AI. When people, unfortunately, when we say AI, the first thing they think is we won't even say it. It's the iPhone of AI, right? We think of generative AI and they go to immediately a brand. But it's not that. And especially beyond generative chat stuff, vision is fundamentally different. So let's I'd love to get your sense on what are the challenges and and what are the kind of cool breakthroughs that we're sharing through the webinar.

SPEAKER_01

So one of the unique things about vision, I like to characterize AI as not being a problem of ones and zeros, but it's a problem of getting, or I should say it's a challenge of trying to get ones and zeros to behave like people. And in the human context of this, vision is something all people can relate to as far as that goes. Um and as far as trying to understand how models work, how they how we would process algorithms, how we would do prediction. Um we can do a lot in the education space just by talking about vision and and trying to put that into action. So in in the in this in the in our specific case of computer vision, um one of the challenges we went through in the webinar is these algorithms operate at a low level. Um, you know, we're talking in the ballpark of 600 by 600 pixels. Um well when you go buy your high-resolution iPhone or Android, um what do you do with that? And so Cal and I can talk about that more, but but the the reality is that just like a person, like when we're staring at a high-resolution image as a person, um these algorithms are going to downsample, up sample pixels just to fit their models. But as people, we process it whatever we see, good or bad vision. Um and so we're able to make all those decisions. So the real challenge is how do we knowing that these algorithms work the way they do, knowing camera technology, knowing everything else, how do we really get it to work like a person and as good as a person? And and one of the benefits of that is once you actually can do that and automate that. Now we're an AI model um can operate it away in a way that a person can't. Like I'm going to get tired. If I did this for 16 hours, my eyes are gonna get really tired. Um, the AI model doesn't get tired, it will perform consistently. So so there's a benefit in a bridge once we figure out how to do this.

SPEAKER_04

Well, that's a a very interesting thing. And even when we talk about LLMs in general, it's tough for people to map like what it's trying to achieve and the way it behaves, and then we sort of complain, like, oh, it it behaves this particular way. Like, that's we've told it to be it's it's representative of the training data, like it's there's a reason, and unfortunately, it's also the average of us, so it's not always the best, and it will take the average, but vision is such an important thing because we do this today. So I've got a green screen, which no one can tell other than general lighting over my head. And even when I do camera work, we use things like depth of field, and quite often a lot of podcasters and even cinematographers they lean on this idea of like we use depth of field to force focus into one area because we're trying to tell the person to this is the only thing that should matter to them in the image, and we like forcibly move people towards that as humans trying to create this, but meanwhile, the human eye takes in about a trillion objects in the scene at any moment in time, and like you said, Eric, like we can't even process that, but it's there, it's being consumed, and now so this is interesting, and I I'd love to have you sort of keep going on this idea, Cal. Or either you or Cal, and like what is the what are the limitations and why is this neat and how computer vision takes this on?

SPEAKER_01

Well, so you just hit on something here as far as all the processing. So a lot of it is signal and noise. And and we don't think of it that way. We're just we just look and we see and we process things. But but the reality is that in our heads, um we are targeting signal and and extracting noise all the time at a very high rate. I'll let Cal to jump in here.

SPEAKER_02

Yeah, yeah.

Multi‑Model Pipelines Explained

SPEAKER_00

So, you know, as people, and this is an aspect of humanity that we often don't talk about, we have our our senses, our sight, smell, hearing, taste, feel. Oftentimes those senses are more effective at filtering out information than about gathering it, right? You can see and hear and smell a million different things. We have developed as people to understand what we need to be listening out for, right? We need to hear that tiger sneaking up on us. We don't really care about the birds flying in the trees. The computer vision has none of that, that evolution, that background, that context, right? So as modelers, we have to figure out how to how to either give the model that context or to simulate it. Um, one thing strategy we use often is let's say, let's think of a door. You might be wanting to see, is the door locked? And to see if it's locked, you want to look at the handle and see if it's turned in the locked position. Us as people, we can immediately look and see a door and we focus in on the doorknob, but the the computer vision doesn't know how to do that. So we may first teach it what a door is. Hey, this is a door. So take a picture of the room, find just find the door. So it finds the door. Then we have another model. It's like, okay, now that we've found the door and we've eliminated windows, we've eliminated um open uh entrances and exits, we've eliminated posters, all things that are rectangular that the model could get confused by. We've eliminated those. We just have a door now. Now we have a model that says, go find a doorknob. Um and that model specifically goes and finds the doorknob. That's all it's doing. And then we might have a third model, and that model is what's looking. Is it in the locked position or the unlocked position? Now, to a user who's taken that picture, they probably don't even realize all of that's happened. All they know is that at the end of the day, their inspection told them the door is either locked or unlocked. Um, we've used a series of simple models, a model that just finds a door, a model that just finds a doorknob, and then a model that looks for the position of the switch. And as Eric talked about with signal and noise, we've eliminated all of that other noise in the room, all the windows, the posters, the entryways. We've eliminated things that might be on the door, right? The panels, the windows, um, the deadbolt, if that's what you're looking at. And we've gone, we've you know, narrow focused it, but the model wouldn't do that itself. So we've had to, we had to train three models to narrow in what your e would naturally do just because of all that his that context we have. We know what a door is, we know what a doorknob is, right? We don't get confused by the window, but the model does. And um, that's a key part of filtering out all that extra noise for the models.

SPEAKER_03

That was I just wanted to jump in. That was actually one of the things that I walked away with the webinar, really having a healthy appreciation for was, you know, they introduced the concept of multiple models and how each model, you know, as Cal was just walking us through how each model does things uh progressively and uh and the value of doing that. And I I didn't have a good appreciation for that with vision models specifically, but I think that's uh also there are many, many other applications of that same concept.

SPEAKER_04

So well, and that's funny, Eric, when you say like I was thinking even from your side, like we've seen so much stuff that we've done where like at the very like sort of standards and like methods level that we're seeing that's happening, but then when we get to the like very specifics of inside how the outcome happens, what are the technologies and processes to get there? And you know, it's not even about taking in more, but in fact, in this case, it's about eliminating the noise quickly. And we're seeing you know, LLMs that are then being augmented with SLMs and other ways where we can get very purpose-built things. And like I said, humans are interesting. Like I said, when we see things, it's it's obvious, and it's completely obvious. We know exactly why things behave a certain way, but even when that's the case, we still we're still wondered by it. And what we miss sometimes is the LLM and these models are are dealing with that wonder, you know, and they have to also then present it. And we have to, as model builders, and how do we build these systems? We have to extract the very human required elements from them because we can't just you know take it all in and ship it all back. And we're we're a long way from you know, blueberry muffin or Pomeranian, which is where we uh kind of all started with that vision of like, you know, can these vision models actually work? Like, I know every time I accidentally lose my security key and I have to do a captcha, I'm like, really? We don't know what a fire hydrant is yet? We're still doing this thing, but it's because we have to continue to tune, train, and like that stuff all has a purpose. People often forget that those CAPTCHA models are not just for us, they're they're doing a neat thing behind the scenes. So on you know, Eric, uh I'd love to hear, you know, what is it that we see that we do now? What are some of the methods? Things like understanding what what is the the bounding box, how do we manage for for labeling? How like there's so many intricate activities that are happening in how we achieve the end result with Vision AI. So maybe walk through a couple of the technical hoops that we're jumping through.

Bounding Boxes And Patterns

SPEAKER_01

Yeah, so so um just as a background, one of the things we have to stay focused on with AI is that it really at its fundamental core, it's about patterns and and detecting changes in patterns. And all the other stuff is built around that. So on bounding boxes, um, they're simply rectangular. Um and most of what we do in the world of I'll say object detection, which is which is the bulk of this, and even in classification, it's based on um rectangular boxes. There is an area of detection known as segmentation where you can draw polygons with lots of sophisticated shape cutouts and stuff like that. Um, what happens is that if if you actually do all that sophisticated polygon work and feed it into an object detection model, like I like I'll use YOLO, I use that a lot in the presentation. Um, it's just gonna make it a box. It's gonna take all your time drawing and just box it. So you have to be aware of tools like that because some of the tools out there don't advertise that. But but the the key is that think of this box as a signal and noise thing. Um in a way, one of the strange metaphors I use is that if you are a baseball player and you're a professional batter, um you work with a batting box. If you had the ability to force that pitch where you want it every time, and inside that box, you would do that. Um that's our goal. That's what we're trying to do. We're trying to create these models that can really fight inside this box and do really well. So so, in some ways, like if we sit down with a client and a client tells us, I want this model to perform with an error rate of I'll say one in 10,000. A lot of people would stop and walk away from the table at that, especially with vision. Um the reality is you you you have to assess, well, how could we do that? And this is why we go back to the systems of small models is as far as um we can very effectively get these models to do a task really well and then pass the next task off to the next model. Um this is how we this is how we target those error rates and work that in that range. Um I've kind of stepped away from the the question on the bounty box, but um that's that's essentially what we're trying to think of in the bounty box. It goes back to signal noise and patterns. So we want that box to be signal. We don't want a lot of noise in there. We put a little bit of noise just to keep margin on it, um, but we want to box in the pattern. Now, when we say a pattern, um we have to think in patterns, and sometimes those patterns are like if I have a I use this in the example there, the Coke bottle. Well, well, maybe the only thing I care about in my decision is the bottle cap and the label. Well, maybe my pattern just needs to tie on that and leave everything else behind. Sorry, I'm not trying to advertise for Copricot. Um just just an example there. But the the the point being is that um you think in patterns and you want to think about signal and noise and just get rid of noise all the time. All the time get rid of noise.

SPEAKER_04

And you're trying to make it like as you said, like we want these things to behave in a human way. And the part of it is we're learning how humans behave as part of this because we don't understand fully, like we're building neural networks, and we still have to figure out where the the end result is because we don't quite know up here exactly how it works. We're we're very far along, thanks to a ton of really amazing research. But isn't like with vision, even simple things. I told people there's assumptive things that we do when we look at a scene. And people it's a small thing that people don't realize, but you know, we have to get rid of those assumptions. How do you train around that human problem? You want to take that, Cal, or I'll take it?

Training Around Human Assumptions

SPEAKER_00

It's up to you. So training around a human problem is always uh one of the biggest things that we have to do, and a big part of what Eric and I do is also teach people how to use vision tools. So he and I are are actually Experts in it, but our um our overall goal when we work with clients is that they end up being able to do it themselves without us. Uh so to teach how to overcome the the human assumptions is a challenging one. A big thing we run into is, and this is especially with with engineers that we work with, are um good and bad. They want us, they want the model to find it's that it's good or it's bad. And we have to remind them it's it's about patterns, right? We can find a pattern. We're gonna teach you how to find a pattern, a distinct pattern. Now, this pattern may represent a defect in your manufacturing process, but it's in itself not good or bad. It's just the pattern. And once we can it's just a pattern, and once we get the the the engineers or the modelers uh to start thinking in patterns and get that kind of mind shift from a you know go, no, go, good, bad to a pattern A and pattern B, then you can really get some cool things going with your inspections because you're not limited to that binary thinking. Because maybe you have three patterns for good and four patterns for bad. And um, so you may end up with seven objects where you're really just trying to say this is a good piece of equipment, this is a bad piece of equipment, but you end up with seven different objects um that you have decisions to make on. Um, and that's that's the key one. Uh, you know, as Eric said, it's it's all about the patterns.

SPEAKER_01

Yeah, and and what Cal mentioned there, this is another strength in how you approach the modeling, is that if I have three different patterns of bad, I don't I don't care if they get confused. I'm trying to reject on bad. So so so there's actually modeling techniques where we strive to drop the accuracy down because all the models are going to have error, but if we can bucket that error, like buckets the bucket a lot of the confusion in the bad among the bads, and bucket the confusion of the goods among the goods, that's the game. That's what we're trying to do. So some of this is gaming the algorithms and trying to figure out how to do that and and pocket the error.

SPEAKER_04

That's an interesting one too, because you think the we're we've we were a long way from hot dog or not hot dog to like being able to say, is this a manufacturing deviation that could lead to a rocket exploding when it's pressed by heat, like and being able to use that vision, but part of not just the capability to do it, but to optimize the compute required and the time required to do so. And I think this is one of the most amazing innovations we're seeing, and and certainly in in the SNEA organization and and far beyond, where people are coming together to we've already got the thing, but now how do we make the thing better, accurate, faster, cheaper, ethical, and safe? Like there's a lot of other criteria now, because as you said, we could just if I take a vision model, do I need to see the whole scene? Or should I can I narrow the focus on setting that bounding box to a particular area or whatever? There's minor things or using multiple SLMs and some kind of agentic handoff between that could in the end mean that we use less compute, less power, let where we get higher accuracy. There's many, many ways now that we are testing because the fundamentals I think are pretty sound, but now we can really experiment. So what's the the fun you're seeing around some of that stuff, Eric?

SPEAKER_01

Well, so so in the vision space, um, I mentioned the the yellow models and they keep evolving. And and um I'll use an example like yellow version three. I mentioned in the in the webinar that there's no licensing fees to use it in production, but if you use version eight or version 11, there are. The gain from that is these as these algorithms evolve, some of them are faster, some of them are more accurate, so depending on the use case. Um, and so that whole idea of total cost to do the inferencing or total total cost as far as time and and compute to do things, it does make sense to play with, I'll say the shiny new golf club. Go, go, go test the new tool, see what they can do, right? Take take those use cases you know, test them again, and see if you can improve that. Um a lot of that can ripple out into benefit and savings.

SPEAKER_04

And from that, maybe Cal, how does uh that evolution of a model, you know, play out? And where do you how do you define the start and finish of uh you know some type of tuning evolution and it is now better than it was? You know, what when do we see this goal achieved when it is a very there's a lot of places where you could go is tough to measure in my mind?

Accuracy, Error Bucketing, And Tradeoffs

Model Evolution And Versioning

SPEAKER_00

Yeah, so uh the evolution of a model is always a fun one to talk about. So to evolve a model that's the primary job of a modeler, uh the one who creates the models, and you might be evolving that model in a few different ways. Um, a common one we have is we we have it running, uh testing it, and we see that it may work really well on pictures captured during the day on the manufacturing line, but at night it still gets it right, but the confidence score is lower, right? It was 97, 98, 99 during the day, and it might be 88, 89, 90 during the evening. Still correct. Everything's going, and if you weren't looking at it, you wouldn't know. But that lower score suggests there's a weakness in the model. So the next evolution would be you take those pictures that were captured at night, you correctly label them with the bounding boxes like Eric talked about, and you retrain your model with those new images. Now your model is better trained on what it was previously weak at, right? Those nighttime images. Um you test that model if it's good, then you deploy it, and that's now your new main model for your inspection. Now, um I like to caution people here: a lot of the applications that create models will have um a feature that may allow it to tune itself. Uh, this is something that we tend that Eric and I will warn against. And I like to quote uh Dr. Malcolm from Jurassic Park you know, you were so preoccupied on whether or not you could, you didn't stop to think if you should. Now you can let these models update themselves and evolve themselves. Our warning on that is um your original model may have had an innate error in it. And that error is gonna propagate every time it iterates itself. And if you're not paying attention to it, if you're not observing it or ever seeing it, that error is gonna get worse and worse and worse. I'm sure we've all seen it, you know, playing with some of the different AI models online, right? You you feed it a picture of a duck and say, redraw this, and then you keep feeding it the redraw, redraw, and before long you don't have a duck at all. Um that's because that little error that was in its first one gets bigger and bigger and bigger. The second part of a self-evolving model is one of the key jobs of a modeler is understanding what the model does and how it works, where it messes up and where it works well. If the model's changing itself, you're gonna lose that knowledge on how this model functions, and you may not be able to deploy it properly to do what you want in your inspection, right? We have our model that finds the objects, but then we have the inspection that takes the findings, the objects that you found and makes real life determinations, right? Sometimes it's shut the line down, and um the quickest way to get your inspection turned off is to shut a line down when it didn't need to get shut down. So uh we advise, at least in our area of expertise in manufacturing, um don't usually let your models change themselves. It's it's much better to have oversight from somebody who can see what's happening. Um, and then you asked about deploying a model. I'm gonna keep it fairly simple on that. It's gonna depend on your use case. Sometimes you have use cases that need 99.99% accuracy, right? You're spitting out a thousand vehicles a day. You are 99.9% accurate. Well, that means you're losing one vehicle a day, and that vehicle might be worth$70,000. Um, so 99.9 is not good enough. You have other use cases where the the client has really no way to find these defects. They might be dents, scratches, corrosion, it could be partial seeding of connectors. And if you can find 80% of the most common ones, you're helping them out a ton because those are 80, those 80% of errors wouldn't have been found any other way. Um, so for that, you're a little loosier and a little goosier on what you allow. Um, you're gonna focus on your your middle 80%, 80% of the most common defects. For that 99.99%, you're really stress testing it with those edge cases because those are what's gonna trip up your model. Um, once you feel like you tested enough, you you let it run and you know you monitor it, as I said, you got to watch it, see where it's performing well, see where it's struggling.

SPEAKER_01

I'm gonna add to that. So there's a ton of iteration in this, and and Cal is kind of explaining how we do that. Um there's also diminishing return. And so as we're doing these iterations, um we want to try to do them quickly and and you know, take big chunks off the table. But it once we get that in a range where the demand the return is diminishing, um we we push that more to maintenance type of effort. Um, but but versioning is a key thing too. Um so as we're doing these iterations or even changing models over time, uh, if I need to get back to where I was six months ago, that version needs to be there. That that ability to go back to that version. Um people who jump into this and just keep you know iterating and changing and and and don't do the versioning work, um, eventually you'll regret it. Eventually you need to cling to that and go back to that to reset.

SPEAKER_04

Yeah, it even in like code generation, this is one thing that people often have the problem with. Eventually the the good code gets rolled off the back, and you you no longer have your sort of safe reference. And with vision, it's so different, especially because the real-time aspect of what most vision stuff is doing, it's a very like gener generative AI in the tech sense. Yeah, we want it to be fast, but it's crapping out words at a reasonable rate to keep up with human consumption is not hard. But being able to consume, process, and then inform somebody based on vision that's happening in real time, especially in a manufacturing sense, like the chance to suddenly like pull the line, you know, without having a human have to see the thing, that has to be something that happens in real time. And there's so much difference in the efforts to train and the challenges. So when you have clients and folks that are like saying, Hey, don't I just take a foundation model and just like just keep going? Like, what is the surprise that you hear when you kind of tell people this is what training will look like as we begin to implement?

SPEAKER_01

Um so uh the the biggest thing, especially when we're training clients for the first time, that whole bridge of thinking and patterns, we we go back to that because uh again, they make a lot of assumptions in their head. Um we do, you know, so it's interesting. When we're training, I mean we make a lot of mistakes. We make a ton of mistakes creating models. Um the the key is is not to cling to bad assumptions. Um, don't if you're wrong, you're wrong. Pivot. Pivot fast. Um and and and don't it's not a fight to be right, it's a fight to get it to work and have good results.

SPEAKER_04

It is important that we keep that objective, and then now connecting the technology and the methods just to go, Eric Smith. I'd you've probably seen a lot of people even taking this content in and learning from Eric and Cal and folks in the community. You probably are seeing people that are like, oh, like they may be hopefully surprised and educated and easing their journey, you know, who are maybe not working directly on foundation or they just want to consume the stuff. But as other technologists in the community, what are some of the fun things you're hearing back when people are hearing, you know, these type of tips and and tricks?

Real‑Time Constraints On The Line

SPEAKER_03

I think that's a great question. Um, let me start with sort of what I what I was surprised with, uh, what I've been surprised with with the you know, the feedback that we've gotten from the sessions in general, and then I'll go to this one specifically. Um, I think people really appreciate understanding how the technology kind of works first, uh, getting it getting a really good hands-on sense for what they can do with it. And this dump this session was perfect for that. You we we spent a lot of time looking at you know rectangles and and talking about uh you know what some of the challenges with working with each with each sort of uh picture uh granularity, you know, scale up, scale down of images. I mean, it was really interesting, but uh the big one of the biggest surprises for me came when we were talking about planning the session, and we had started off with this idea that we would have a training webinar. Uh and you know, let's train a model. And so I trained a simple uh binary digit classification thing in our very first uh session, and I said, well, let's just, you know, we'll have on an expert that's from from uh an expert with LLM and then Eric and Cal experts in vision, and we'll just do one session on training. And and it was clear, excuse me, it was clear that that was not going to work almost immediately because of how wide the topic was. Um and so we focused on vision. And what the other thing that I took away from this session and and in general these sessions are that a lot of the concepts are transferable, not exactly, but a lot of the principles are are similar. Um I like the idea of failing fast. Uh, I like the idea of not getting hung up on a specific assumption, uh, being very flexible and willing to pivot, all those things are are very important, I think, and they're common. Um uh uh yeah, so that I mean I think that's really, really just sort of giving people a basis to sort of build upon and understand the differences between between training a large language model and and and a vision model has been uh sort of was interesting just from that perspective.

SPEAKER_04

And and maybe Cal, I'm gonna call on you, I'm gonna take you into a deep dark topic. Because one thing we hate is when text shows up in images, holy heck, it's the most brutal thing to consume because there's a lot of risk in how it's text is treated not as text. We see it like I, as a human, I see that's a word and I treat it like a word, but we then have to train vision models to be able to do that amongst graphical things. And this kind of touches on OCR. I'll say, as an example, like even if you do OCR like this or like this, it will fundamentally change the accuracy. And so I've even had folks that they say putting in documents and making sure that they are in the directional format will speed and also increase the accuracy when consuming and and rendering text out of out of data with OCR. So I love to how does the text in images and OCR kind of come into play when we're looking at at the vision challenge?

OCR, Rotation, And Regex Tricks

SPEAKER_00

Yeah, it it is a deep can be a deep dark topic. Um, Eric and I have uh banged our heads against many of walls over the years uh when it's come to OCR and text-based use cases. Um one thing I'll throw out there to begin with is don't be hesitant to use existing OCR technology. Um it's it's a form of vision, but it's it's a honed-in, focused form of vision, right? It it can recognize digits and numbers, letters versus maybe a more generic uh vision algorithm can be trained to really see anything. Now, that's not to say you can train almost any vision model to learn what letters and numbers and characters are. Um but is it gonna be better than an existing OCR that you can that's open source that you can use? Probably not. It really depends on the characters, I suppose. Um you talked about you know the rotation. That's that's a big one. So we've actually um had models specifically to to see is this rotated the right way, and if it's not, you have code to rotate it before it even gets fed to an OCR. It goes back to what Eric talked about, like a baseball player, right? If I could stand up with my bat and say I want a 70 mile an hour ball right down the middle, I'm probably gonna hit it almost every time because I know exactly how fast it's coming in, I know where it's gonna come in. And that's what our goal as modelers always is, no matter what the use case is, is to get that consistent pitch, same speed, same place, same time every time. Um, then the model doesn't have to work very hard. Um when it comes to uh accuracy of OCR, um, there's a lot of tricks in there. Eric and I had to become experts in regular expressions when dealing with OCR because again, we wanted to tee it up for the model. We knew we were looking at a serial number and we knew it followed a pretty specific alphanumeric pattern. Um, it could be any amount of letter or numbers, or any any letters or numbers, but it went a specific pattern, you know, number, number, letter, letter, whatever it is. So we we use regular expressions or reg X is what you might see in in the industry to tee it up for the model. Hey, you're gonna see three letters, and then you're gonna see four numbers. And guess what? The three letters aren't gonna none of the letter will never be O or A or whatever the the rules were there. Um, and those are all just little things. You have to become experts in a lot of random areas to really get these effective and functional models because it's all about reducing the noise, increasing the signal.

SPEAKER_01

Yeah, just to add to that, when when Cal and I get called in to do modeling work and help teach people, they are focused on making these work right, not just good. They want them working really, really well. So some of the techniques he's talking about are what we do.

SPEAKER_04

Yeah, that and I think this is the there's so many people who Their exposure to this stuff is basically one-shotting off of a foundation model that's in some service that they don't understand what the training is, you know, and even like one thing that became very good early on, of course, was being able to generate human figures uh in certain ways, but there were certain things that were challenged. Uh, you know, I I uh literally could have hold up my hands. So uh I I've I've I wanted to get there's like a somebody sold sells this thing, it's like a little fake finger you put underneath. And so you just wear this in any place you are, and if someone says, Hey, I don't like what you said, they're like, Oh, that was an AI generated video. Look at the extra fingers. That wasn't me. Uh so I gotta there's a great way to get out of uh out of crime cameras, I suppose. But the the other challenge was the the reason why those things are very good at generating human figures, not necessarily fingers and toes, is because they were generated off of a lot of training data that had a lot of human figures in. I won't go deep into what that what that was, but there's a lot of lot of training data that you didn't have control over that led to that first foundation model. And then now what do you do with that? And this is why anybody that's thinking of getting started, it's not just grab the some latest and greatest model. You have to understand what what led to the current state. How do I make this you know achieve my first goal? And then how do I continue to evolve myself or this model or my process, which is there's a lot of stuff that we don't necessarily know about what got us to to step one.

Know Your Tools’ Limits

SPEAKER_00

Yeah. So um, if you're in this line of work for long enough, one of the key skills that doesn't often get talked about is your ability to break things. Um I don't necessarily mean, you know, grabbing a hammer and smashing something, but uh when you have a new technology, one of the first things we do is we ask, how how do we think we can make this not work properly? Where do we think it's weak? Where do we think it'll get confused? Um because you can't properly use a tool until you understand its full limitations, whatever that tool is. Um if you try to hammer in a screw, that's not gonna work very well. Uh it's the same thing with these tools. If you try to some of them are amazing, a lot of them are amazing, but they might not be good at some specific thing. Right. So you talked about the the generative AI and it couldn't, it was having trouble making a a five-fingered hand consistently, right? But it did a lot of other things really amazing. So if you're using if we're using that tool and we understand that limitation, we might be asking to generate images that don't have people's hands in them. Um it's the same for any other of these AI tools, these vision tools that we use. Um you don't, you don't, you know, it goes back to Einstein, you right? If you judged a fish by its ability to climb a tree, it wouldn't seem very smart. Where the the key is to to break the tools and to understand their limitations. Because if you've if you've broken it, if you know how it won't work and you know what it's not good at, then you can really deploy it in a way that it's going to be effective and that you're gonna get the full full value out of it, whatever its ceiling is you want to get there.

SPEAKER_01

I I laugh at what Cal said about the breaking because I I I coach high school track and field, and I'll have kids ask me, so what do you do for your job? And so so I tell him, I said, Well, well, I break AI so that we can make it work.

SPEAKER_04

That's awesome. Well, I'm glad that I got fine folks like you both uh breaking AI so that we can make it work better. And I I could I only wish that we could we could go I could go eight hours and keep on going with with all y'all. So thank you very much for sharing what you did. I don't want to take away because people you talked about a great great stuff in the webinar, so we'll have links. So people go check out the webinar, stay in touch with the series. We'll also make sure that we have links out to folks that are here on on the podcast because this is stuff that this is a really exciting topic, and I love that in the SNEA community, there's opportunities to keep participating and having these conversations. So I look forward to that. I would tell people the classic thing if you don't think that we do certain assumptive things with vision, it's a very simple test. Everybody knows what an elevator is, we know how it works, and we trust that it generally works as expected. But if you think that your trust in that output is so good, then what I want you to do is I want you to go up to an elevator, I want you to turn around backwards, I want you to press the button, and I want you to close your eyes. When you hear the ding and hear the door open, I want you to jump in backwards. You are never gonna feel a feeling like that moment in your life. Because what's the first thing you do when you get an elevator? You check to see the elevator's there. You don't know you do it, but you do it. And if you don't do it, you know why you do it because your heart will make motions that you've never experienced on any roller coaster as you jump backwards into what you know is an elevator. I I don't suggest people do that just in case there's no elevator there. Uh uh, you know, so we're on that. Uh I I recommend a friend is there just in case. Uh, but uh but that tells you this is a very challenging technology problem that has a very human element to it. And we tech will get there with human help, and it will continue to bounce off back and forth between human and tech. And this is the value of of kind of what what these folks are doing. So, with that, I'll I'll ask you both or all to kind of sort of do a quick wrap and remind folks where we can find you uh and any quick uh exiting words as well. Eric, uh I'll say uh sorry, Eric Gamble. I'll I'll go around Robin here.

SPEAKER_01

Um with any analytics, any any any of the modeling, whether it's AI, I mean whether it's vision or just basic AI, um comes down to patterns and changes and patterns and and and go back to that core concept every time.

Takeaways And How To Connect

SPEAKER_04

Absolutely, absolutely awesome. And Eric, if when folks do want to get connected with you, what's the best way to do so? Beyond, of course, we'll have some some uh some links to your SNEA persona, of course, and and people can get to the SNEA events and and we'll we'll be able to meet you IRL, hopefully. But what's the best way to catch you online? Uh you can get catch me on LinkedIn for sure. Um you'll find me. I love how LinkedIn became like unironically my favorite social media network. Because all every we all just like they all disbanded, everybody disappeared off the other one. So LinkedIn is the last uh the last bastion.

SPEAKER_01

Yeah, yeah. There's multiple of me out there. I I think you'll see my background right now has a track on it.

SPEAKER_04

So nice. All right. And Cal, where uh what what what what would you leave with the folks uh before we go? And and where do we find you to keep this conversation going?

SPEAKER_00

So I'm gonna steal a quote that Eric used uh in in our presentation, right? All models are wrong, but some are useful. And that's if you can remember that going into uh any any models you use, but in my case, vision models, you'll you'll be successful. The models are never gonna be perfect, but if you understand how they can be used, you can do a lot of cool things with them. And uh, if you need to or have any interest in getting in contact with me, LinkedIn is also the way to do it. Uh, there are not too many cow foshies on there, so um it shouldn't be too hard to find me.

SPEAKER_04

Nice, nice. Well, it's like they like the old saying, uh, you know, democracy is the worst kind of government except for all the others. Precisely. And it that's we have to really, it's it's fun to contextualize this because that is exactly it. Like we just make this like everything just works. Like, well, assume it doesn't, and then work upwards from that assumption. So uh and Eric Smith, of course, uh last and certainly not least, uh, where do we find you and what do you what do you want to share with the folks about you know this session and and what we've got ahead in 2026 for the SNEA community?

SPEAKER_02

Sure. Um I'm not sure if you can hear me uh the heads such as that thing. Um I would say uh you know go out and break things.

SPEAKER_03

You know, seriously, go out and break things, uh, try try doing things, um, you know, fail fast and see what works and what doesn't work, and uh and um you're gonna learn a lot. And that's that's I think the best way to get introduced to AI and the topics in general. Uh, we have a great set of what you know webinars coming up. You know, consider going to them, looking at the demos, and uh and and the best way to get a hold of me is also on LinkedIn. So thanks everyone.

SPEAKER_04

LinkedIn's where the party's at. And if you want to connect with me, there's way too many air crates in the world. Uh, but there's only one disco posse, that's the easiest way to find me. Uh so if you search for uh disco posse, there's definitely only one of those on LinkedIn. Uh, and please, I recommend folks connect, watch other great podcasts. So thank you all for joining and sharing your knowledge and and stuff today, and check out the webinar. It's fantastic. These are practical things. So this isn't just like, let's talk for 45 minutes on the ethics of AI and have no outcomes at the end of it. This is like you walk in, we walk through hard problems, you show us how to solve those hard problems, and you give us ways to experiment. So this is practical for the best of us and for the rest of us. And uh uh thank you all for sharing this stuff. So, with that, make sure you, of course, go get to your latest SNEA online and in-person events. So make sure you go to SNEA.org, check out all of the other links to upcoming in person events. 2026 is going to be a great year in technology in the world, and I look forward to seeing all of you on many, many more of these kind of podcasts and IRL somewhere at a SNEA event.