Digital Pathology Podcast

147: Non-Generative AI – Predictive Analytics & ML – 7-Part Livestream 3/7

Aleksandra Zuraw, DVM, PhD Episode 147

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 44:20

Send a text

What if I told you the biggest AI breakthroughs in pathology aren’t coming from ChatGPT or generative tools—but from the quiet power of predictive analytics and machine learning?

In this episode, I explore the non-generative side of artificial intelligence in pathology. These are the tools that detect tumors, segment tissue, classify images, and make predictions—without generating a single word.

It’s the third chapter in our guided AI series, and this time we focus on the models you’re more likely to use in real-world diagnostics. You’ll hear about object detection, segmentation, anomaly detection, and how these models are built using supervised and unsupervised learning—plus the pros and cons of different annotation strategies.

We’ll also cover why no one model fits all, and how combining simple tools like decision trees with more complex neural networks is often the key to building reliable, usable AI in pathology.

Whether you’re training your first model, selecting an algorithm for rare disease detection, or just want to understand what “unsupervised clustering” means—you’ll find something useful here.

🎯 HIGHLIGHTS WITH TIMESTAMPS

  • [00:00] Welcome and global audience shout-outs
  • [02:00] News: The authors of the AI paper series are coming on the show!
  • [04:00] Booth 528 @ USCAP—join me live
  • [06:00] Live annotation workshop announcement
  • [08:00] AI Hierarchy: ML → Deep Learning → Foundation Models
  • [10:00] Use Cases: Object detection, segmentation, anomaly detection
  • [14:00] Supervised vs. Unsupervised Learning explained
  • [18:00] Common algorithms: Regression, Trees, SVMs, KNN, Neural Networks
  • [26:00] Feature learning and CNNs in pathology
  • [33:00] Pattern detection with unsupervised learning
  • [37:00] Annotation strategies: Fully, weakly, self-supervised
  • [45:00] Multi-modal AI: Text + image + omics data
  • [50:00] No single tool solves everything—toolkit mindset matters

📚 Resource from this episode:

📖 Main Article: "Non-generative artificial intelligence and predictive analytics in medicine"

🔧 Tools & Mentions

  • Digital Pathology Trailblazers Book – Free visual eBook
  • QPath & V7 Labs – Annotation tools mentioned
  • University of Pittsburgh Medical School – Authors’ institution
  • Muse Microscopy – Booth 528 at USCAP 2025

This episode is all about real-world AI—how it's already helping us in digital pathology, where it struggles, and how we can use it more responsibly and effectively.

🎧 Listen in to learn why annotation isn’t just a pain—it’s a power move.
 🎤 Stay tuned for part 4, where we talk AI + statistics in pathology.

Support the show

Get the "Digital Pathology 101" FREE E-book and join us!

Introduction and Welcome

Aleks: [00:00:00] Welcome. My digital pathology trailblazers. I'm gonna switch off the music 'cause last time I forgot I had it on.

Starting Off the Live Stream

Aleks: When you join, let me know if you are. Let me know if you're here. Let me know where you're tuning in from. I actually don't need the headphones if I don't have music on.

I see you joining. Let me send you a hi in the chat and just let me know where you are tuning in from what time it is and.

Technical Issues and Audience Interaction

Aleks: I have this problem always that I want this live stream to be high resolution and it is not. Now it's too late to fix this problem.

Current Session Overview

Aleks: So couple of updates as you are joining and as you are saying hi and letting me know where you're tuning in from.

So we are today gonna be doing this third part. We're on the third article. Let me see if we have yeah. My pen. [00:01:00] Pen is working. I think last time it was not working. And let me know if you're hearing me well and if you are seeing me out. If I drop off this live stream, which happened in the past, just wait for me to get back.

Should be fine today. But we are today in. This place non generative ai ml in medicine. ML is machine learning and we are not yet halfway through, but getting there, the next one is gonna be tough. It's gonna be statistics. I need to really prepare for that one. This like the first three, one. I have enough knowledge to basically teach it without too much preparation.

The statistics. This is gonna take some preparation. Oh, welcome. Trying. This is so cool to have you. Friday 5:00 AM club. Yeah, there was a book, the 5:00 AM Club. I never read it, but I think I belong to the club.

Guest Announcements and Podcast Updates

Aleks: So the updates for today, other [00:02:00] than we're doing the third paper, are are that the authors of the papers?

Not all of them, but the main initiators per or of this whole session? Session. No. Serious. Serious. Are these guys, let me show you. It's gonna be Liron. It's gonna be Mann, and it's gonna be Matthew Hana here. They work together at here, this university of Pittsburgh Medical School and they have they are establishing AI Center of Excellence for Medicine.

They were the main driving forces behind this series, and I managed to schedule a date to have them on the podcast. So be excited with me. I'm super excited to have them all on the podcast and talk about what the, like about the series, what was the goal of this? How did it happen that they did it, how long did it take?

And [00:03:00] all this stuff behind the scenes of writing a paper, which, if you have ever, not even if you haven't written a paper, if you have read one, like it's a lot of work. It's tough. Anyway, so did, they did seven of them. They are working on the center of excellence. They're being the center of excellence and they're gonna join me on the podcast.

Conference Details and Invitations

Aleks: So I am super, super excited and we have Atlanta on the line. Welcome. So cool to have you here, Scott. And if you're just joining and I see you, I see that you guys are joining, let me know in the chat where you're tuning in from and just say hi. So that was the first update. They are gonna be on the podcast and you will be seeing them on video then I.

I'm gonna be at US Cap and I'm gonna be working with, sorry, wrong window. Of course. I'm gonna be supporting a sponsor, muse Microscopy, and then I'm gonna be creating content with some other sponsors. So most [00:04:00] of the time I'm gonna be at this Muse microscopy booth, which I'm gonna tell you, which is the number later.

While once I find it then you can meet me there and probably we will have also a setup to join me on my podcast and basically talk about your digital pathology journey. Doesn't matter if you're a beginner or if you're advanced, if you've already implemented something published, not published, just exploring.

I wanna hear you there. And I want you to be a guest on my podcast. We're gonna be doing mini episodes with you with the digital pathology trailblazers. So let me just check this the booth. Because then if you're going there, and you probably already know that you're going there because the registration is probably closed five to eight, the booth five to 8, 528.

Let me just write it

here.

[00:05:00] Boo 5 2 8. It's gonna be Alex and Muse. So join me. Just come and say hi. Whatever you're doing there. If you're even just for one day, come to booth five to eight, and we have international crowd joining, which is fantastic. Hi to Cambodia. Wow. I don't think we have had anybody from Cambodia. Say hi, yet, that's exotic to me.

Okay.

Annotation Workshop

Aleks: So that's the second second update and third update is we're gonna I think one time I asked if you wanted an annotation workshop, and the answer was yes, we want an annotation workshop. And I had a shout out in this livestream. Oh, if there is a company who would like to facilitate that and sponsor this livestream we can do that.

So there was a company that reached to me and we are gonna have a call next week to figure out how to do this livestream. They have [00:06:00] images they have the software to annotate so you guys can just join me and annotate live. And speaking of annotations, that is gonna be part of the topic of what we're gonna be talking about today, because today we have non generative AI in medicine and non generative, and I'm gonna make it big.

Let's do this here

and there are guests who are gonna be present at US Cup. So join me booth five to eight. Let's see each other. I'm gonna be running around crazy person, but I will be at that booth for a lot of time. So you just say hi. You say, I'm a digital pathology, trailblazers. And the conversation starts even if we have not met in person ever, because it doesn't matter.

I met a lot of my digital pathology trailblazers at conferences, and it feels like we knew each other for a long time. So yeah, the live stream with annotations is gonna happen. I will let you know when.

Deep Dive into Non-Generative AI

Aleks: [00:07:00] And now let's talk about non generative ai. Let me not destroy my setup. And also let me like make better lines.

What is this line? My 6-year-old does better lines than me. Let me remove it. Okay. So if you're just joining, let me know. Oh my goodness. We have many people today say hi in the chat and let's do it. And maybe we're not gonna sit here for an hour today because they have cool images. Okay. So non generative ai.

Medicine advancements in applications in supervised and unsupervised machine learning. So last time we talked about generative and now we're talking about non generative. Let me take you to the first image. And we are here. This is the third, our third session. I skipped a session last week. I'm sorry.

There were, I don't know what, I had to do something crazy for other [00:08:00] stuff and I had to skip. Apologies. So when we look at this image, let me just make it even bigger. Yeah,

one more time. We can do it guys. We can do it. I wanna make it bigger and. Move it around. I think, yes, I'm tech savvy enough to do it. Okay, so starting with this higher girl focus, higher

hierarchical structure of AI technologies, and today we're talking about non generative ai. AI is the whole thing, like everything that computers do can be described as ai. And so at the very beginning we have machine learning, and then part of machine learning is deep learning. Then within deep learning, we have the foundation models, [00:09:00] large language models and vision models are within foundation models and.

It can be both generative and non generative. Last time we talked about generative, now it's non generative, meaning it doesn't make anything new. It predicts from data. And what can we use this non generative application for? And I have the tendency that I think I have overcome, but maybe I have not overcome it yet to be super excited about every new technology.

And it's I don't know, let's call it new tech toy syndrome where I see a new method and I'm like, oh, now I can forget about all this other stuff and I can just use this new method for everything. Never happens. And I keep being so excited. Doesn't matter. What do I wanna say with that?

When the generative AI came on the scene, I was like, oh, we can like, throw everything away from the non generative. And of course, as I learned about the technology and about the [00:10:00] limitations of the technology, I'm like, ah, we would pull this one approach and use it in combination or this other approach.

Or maybe you don't really need to deploy generative if your task at hand is so simple. So every time a new thing shows up this happens to me. I already control myself, but not always. I always get this excitement. So what can we use these non generative AI foundation foundation model tasks in pathology and medicine?

We can use them for quality control, ensuring data quality and detecting artifacts in host site images. We can use it for classification. So here we have a couple of words that have double context. A classification, for example, it's a normal world where you classify something, but it's also a concept in computer vision where you classify things, but by a computer, right?

So a very common classification task, and these are called tasks. The, like the things that the computer do [00:11:00] classification task is identifying disease and detecting abnormalities based on the whole pathology or radiology image, cancer versus normal. So these are the algorithms that will help pathologists do something on the images.

Then we have object detection. Like the name says, it detects objects, so it sp spots specific cells or features in pathology or medical images and in computer vision. Tools, this object detection there is something like a bounding box. So let's do, if I wanna do object detection of these, what?

They're pan pentagons. Pentagons, no tangles. I don't know. Send me in the chat. Like I know a hexagon and I know a rectangle. What's a five? Let me know in the chat Anyway. If I wanted to do object detection of this if, and if I was a computer, I would place a bounding box [00:12:00] around it that would have, and each of them would have a bounding box, right?

So this is a real thing, a bounding box. You basically place a box around the stuff that you're detecting. And also this can be used for a different cell detection. So CHI 67 quantification the blood smear. Quantification cell quantification. Whenever you need an object just to identify an object, you would use this.

If you need to outline a region, you would use a segmentation. And this is another computer vision word that means that you're gonna be outlining a region. Maybe I do this one. Outlining a region of interest in medical images. For example, tumor in pathology images where you have in the image analysis com you would do tumor trauma separation if you only wanna, for example, identify immune cells within the tumor or within the Pentagon.

Yeah, Pentagon was correct. Okay. Thank you so much [00:13:00] Pedro, for Pentagon. I was like, Pentagon, was that not the military place? Anyway? But yeah, so segmentation for regions for example, segmenting, cancer, segmenting, I dunno, specific regions of an organ segmenting vessels, something that you wanna extract out of an image, but it's not an object, it's a region. So objects are the cells and regions are regions, right? And then we have something called anomaly detection, which when I heard about it, I was also super enthusiastic.

Because I'm a toxicologic pathologist and the the job of a toxicologic pathologist is to, most of the stuff is normal, but not only in toxicologic pathology. Also in diagnostic, most of the stuff is normal. And then you wanna train a model on normal. So that it can tell you when it's abnormal. And I was like, this is a fantastic concept.

Let's just do this and I will keep just [00:14:00] looking at the abnormal as long as I don't have to like visually screen. It didn't perform that well because the very subtle changes that Toxicologic pathologists evaluate are very subtle. So they don't look such like as such an anomaly. You like hunt for the change anyway.

So these are the things you can do with the classical non generative ai and they are still like legitimate things to do where you can use these tools.

Unsupervised Learning Techniques

Aleks: And we can divide the the ais in supervised and unsupervised learning. And we're gonna start with supervised learning. So supervised learning. Lemme just check if I have some text that they wanna mention as well. Yeah. But we focus on the images, 'cause image is worth a thousand words. Supervised learning.

And if you have any questions let me know just what questions you have about what I'm speaking right now so that I can [00:15:00] address it immediately. Or anything else related to digital pathology. Just let advise. Learning is with labels, whether labels are this is this, these are like. Names of things.

This is a phone. So if I had an image and the label, it would be an image of this phone and the label would be phone and in medicine label is gonna be the diagnosis. For example, for an image, for a pathology image. It can be the diagnosis, it can be an annotation, it can be at the whole report now with generative ai.

So now the generative and non generative is gonna be intertwined, but the whole report can be a label for this image where you have descriptions what's on this image. So basically pairs of something and then something that's telling you what that is. And.

Supervised Learning Techniques

Aleks: In supervised learning and the types that we have, there is classification and there is regression.

And now you can see on these images are [00:16:00] brilliant, I love them. So classification, you'll have a bunch of dots and then you wanna divide them in some group, classify them into blues and oranges. Yeah. So that is gonna be classification. You classify them into one of the classes. Classes are the categories, and regression is, oh, they're all like together.

So you do align around which these things are. So this is linear regression and there are also different algorithms. So these are the types of supervised alert, sorry.

Types of supervised learning. And they're also different algorithms, which every time, so I've been doing this or I've been in that space since 2016. Oh my goodness, that's nine years already. Crazy. Next year I can say I have a decade of experience in blah, blah, blah in, in this space. Anyway, but what I'm getting at is, I remember when the [00:17:00] random forest decision tree came out and we started using it in the tissue the image analysis company.

I was working for tumor, more str separation. And I saw a few images and they were so good. And I'm like, let's scrap again, right? Let's scrap all the other stuff that we did because this is so much better. And then on other images, it didn't perform. So then again, when deep learning came on the scene, I was like, oh no, this random forest is like so random.

Let's not use it. Let's do deep learning. And then I learned about the limitations of deep learning. With, before I tell you what it is all about, we say hi to Argentina. Hello. But I'm just, making fun of myself to emphasize that there is a right tool for everything. There is no one tool for everything, right?

And I'm just super excited about stuff. And I think one thing is gonna solve the problems of the world. So decision tree, where you basically have make little decisions, support vector machine, and we have a little text about it. I'm gonna go into that text with these because there is a little [00:18:00] bit more explanation needed.

We have linear and logistic regression and linear, you can see that there's a line and logistics. We have two groups of points and there is this s-curve. I'm gonna remove them, making my images anywhere. Then we have K nearest neighbor and here is our neighbor in question. And we decided that it's actually closer to blue group, so we're gonna head off yellow and the neural networks.

They are better than the other approaches for image analysis. They're not perfect, right? But they're definitely better. And here we have a real life example of patient I think and segmentation detection and se segmentation and classification as well of digitized PAP test slide, right?

Here we have what cells. Are here, normal cells, abnormal metaplastic. And then we have also [00:19:00] squamous cells and glandular cells. So squamous are from the surface or from the surface of the cervix in this case. But also we have glandular cells end the cervical and endometrial. And this is, these are our classes, the categories, the classes.

And there are different features of these things. They can be geometric features nuclear to cytoplasmic ratio. We can have HPV status. And this deep neural network does feature detection and deep neural network has this input layer. Then stuff is happening in between. And this is what is offered.

Referred to as the black box ability of ai, what's in the black box. You have input and then you have output. But what happened here and things happen like convolutions and basically different computational calculations happen [00:20:00] in there. And it's called neural network because these these circles are called so-called neurons and it resembles the connections in the brain.

And then what we get from this output layer we're gonna have some quantification. We're gonna have classification, we're gonna detect the cells, and we're gonna segment them.

Application of AI in Digital Pathology

Aleks: So that's what can happen with deep learning and our deep learning neural networks. Where here, one of the common supervised algorithms and here.

You give a lot of these examples. So that's gonna be often a human labeling it. And in case of pathology, it's gonna be a pathologist or a histo technologist or whoever has the domain knowledge to consistently recognize and classify these cells. So let's talk a little bit about the theory decision trees.

I'm.

Audience Questions and Interaction

Aleks: And if you [00:21:00] have any questions, let me know. Decision trees this was a popular non-parametric choice for supervised machine learning in medicine and pathology. And the building. Such tree based machine learning models can predict patient outcomes, diagnosed disease, and sometimes help identify new biomarkers.

Do you notice there's no mention of can analyze images? We used it for image analysis, you take the newest thing and you check if it works. But yeah, they are, these are still popular and useful. Then we have the support vector machines, and I'm gonna be going back to the image so that you have, so decision tree was here, then we have the support vector, right?

And you see you have two groups of dots and some line in between, right? So let's go to theory of support Vector machines. They excel at handling high dimension [00:22:00] data and can be used for classification tasks. So the main goal is to find optimal boundaries, hyperplane. So that was the line in the middle hyperplane that separates labeled data points off different classes.

And in two dimensional space, the hyperplane is a line, right? It doesn't have to like it. It doesn't have to be flat, it can be in the space, but then you have hyperplane that becomes a line. And data points closest to this line are referred to as support vectors. And then if a trained support vector machine model is presented with a completely new data point, it can be used to predict on which side of the hyperplane it'll end up.

It can classify disease and for, from continuous data, a regression approach can be used with support vector machine. So it's a combination. It's called support vector regression. Let me show you. Here was regression. This was regression, like [00:23:00] checking like retrospectively saying on which line these points are around which line they are clustering.

So you can combine this support vector and you can probably combine everything. But that's the example we have here.

Future Trends in AI

Aleks: And so let's talk a little bit about regression. Linear regression is denoted by a straight line as we have seen where logistic regression reveals. Logistic is the other one. I'm gonna be scrolling a little bit.

So this one is the logistic. And you can see barely, because it's small, maybe I can make it bigger. We will lose resolution, but it doesn't matter. This one is an S shaped curve, right?

Okay. And this linear logistic so linear is straight logistic is S-shaped and in contrast to linear regression task, logistic regression is often employed for binary or al classification task, such as diagnosing the presence or [00:24:00] absence of disease. And you had the points clustered above and points clustered below, whereas the linear is gonna be for continuous outcomes such as blood pressure or glucose levels.

Then we have K nearest neighbor where the yellow dot became, whatever it became blue, I think. Yeah. It became, we decided it's blue because it's closer to the blues and, this can be employed for classification and regression tasks. And K nearest neighbor model does not require a training phase.

And they are also sometimes called lazy algorithms because computation is delayed until the prediction phase. That means that the computation is expensive because it's so delayed. And what can we use it for diagnose, classify? So here, see they often say this diagnose, classify together.

The classify is in because diagnose is medical [00:25:00] term, classify is gonna be computer vision term. And let me show you something.

I have it in my book. A little diagram. I call it computer vision to pathology, vision translation. With giraffe images. If you have the book, it's on page 77. Maybe I can just

look, let's see if you guys can see it. I should have a PDF ready, but basically you will not see it. The drafts right, and this draft has just a box around all the drafts. And this is a classification. It's classifying, these are gaffes here. This one has a bounding box. Around each of the drafts there are three.

So this is object detection. Then you have all the drafts together. We actually, deli, do you hear me? Let me know if you don't. We delineated the drafts. So that's gonna be the [00:26:00] segmentation. We segmented them. And here a certain type of segmentation, semantic segmentation. And here we actually segmented out every single giraffe, which is called instance segmentation.

So these are the computer terms that. You used to solve pathology problems and pathology problems will be structures of interest, which are the dots and cells and tasks of interest and structures. We will have regions and objects. We will have cells and next time if I wanna show it, I'm gonna prepare it in the digital version.

But if you have the digital, you can get the digital version on digital pathology place.com. There's like a big button and you can just download this book for free right now if you want to. I should have a QR code for that next time. Next time. Okay. Going back to our task at [00:27:00] hand.

If you have the book, let know in the chat. Just write book in the chat. Or, and, or if you want it, write, want the book. And I'm gonna send you the link because I didn't prepare the QR code. Apologies. And then we have the neural networks, the almost magic and neural networks. That's okay. It was the, we had the good image for this one.

So we're gonna skip the theoretical explanation. And then application and performance of supervised learning models.

We're gonna talk about performance next week, but let's talk about the application. What can we use these models for? And. Like, how will you use it before we, we dive into it? Like, how will you use it as a pathologist?

Closing Remarks and Store Announcement

Aleks: Like now you're gonna be going to your IT people or to your computer computational pathology team and saying, oh, let's use regression for this [00:28:00] and let's use this one for that.

I wouldn't do it, but I be, because, I'm not a computer scientist, but I can now constructively discuss different approaches because I have the knowledge of the pathology tasks and what I want, like what do I wanna get, what information do I wanna get? And I know the names of the tools for and what those tools are good for.

And I can now explain to my computational pathology team, Hey, this is a task at hand.

Introduction and Welcome

Aleks: These are the computational computer vision terms of the things that I wanna achieve, which. Approach do you think is best? Maybe that one, maybe some other one Maybe don't use random forest for tumor trauma separation.

At least when I was doing this, it didn't work that well. But anyway, the example is like you need to have enough background knowledge to work in a multidisciplinary team. And digital pathology team is a [00:29:00] multidisciplinary team, and we are part of this team, so we need to know how to talk to others and what can we use the stuff for?

So for example, regression models generating continuous numerical values. Output can be trained on a data set of patient demographics, medical history, laboratory test results along with for example, cost of each, sorry, cost of each test. And then labs can predict, cost savings right from this kind of stuff.

Supervised Learning Techniques in Pathology

Aleks: In pathology the supervised machine learning is used for classification models discrete class outputs that diagnose disease from labeled data. And the convolutional neural networks have been a cornerstone of modern AI with an image classification studies, right? Both radiology pathology.

You wanna screen, you wanna say benign malignant cancer, no [00:30:00] cancer. This is gonna be classification. The method is that even leverage is gonna be convolutional neural networks. And because they can convert pixel data from images into progressively higher order visual features over the course of sequential convolution and pulling steps.

And these steps are what is happening in the black box. And if you are just joining, let me know in the chat.

Unsupervised Learning: Clustering and Anomaly Detection

Aleks: And then we have unsupervised learning. The supervised was with labels. Unsupervised is without labels. And what do we have here? We have clustering, we have dimensional reduction and anomaly detection, clustering that we cluster some data points into groups, right?

And un super. In an unsupervised fashion, these groups are being selected and I would do this group as well here. Yeah, probably I would do the same. I would do the same as the algorithm does. And now [00:31:00] we have three groups and we can check, okay, what do they have in common? Do they have some gene expression in common?

Do they have some test results based on the data that you have? The algorithm is gonna cluster them. Based on this information.

Dimensionality Reduction in AI

Aleks: Sometimes we have a lot of information and none of this not all of this information is relevant. So then dimensionality reduction would be something that could be used and anomaly detection that I still believe it's gonna get bare.

And then we can detect all these red dots, even if they're very subtly abnormal. But that's an option as well. So you, the model only knows normals normal and flags everything that is abnormal. And the algorithms used here is K means clustering. So making these clusters principle component analysis.

So here you figure out which groups these belong to. Principle component analyst analysis sna. One time I [00:32:00] remembered the if, let me know in the chat or you can Google it and put it in the chat. But basically this is actually a way of visualizing stuff like a 3D 3D relationships between data into a 2D plane, ti and TI plot.

This is a TISA plot, so Clustering is gonna group similar data points into clusters. Easy name, right? And based on their feature similarities. And this is fun. This is well fun. It's interesting randomly, initializing, k OIDs or cluster centers. And the keyword here is randomly, so it just like randomly initiates a center of a cluster and makes a cluster, which brings me to something about the unsupervised methods is that it's not always easy to interpret why this was clustered together.

You have to investigate does it make sense or is it being clustered based [00:33:00] on irrelevant information or relevant? And which kind of information that like takes takes us again to the explainability aspect of ai. In the supervised learning there is a label, right? So you can compare it to the label, whereas in unsupervised it's.

Just figuring out which data belongs together. And you can then, as a researcher, you need to check. Does it make sense? In path fostering has been applied to identify by subtypes of disease such as different cancer subtypes patients subgroups with distinct clinical features of gene expression profiles.

And the cool thing is that it can enable researchers and clinicians to uncover hidden patterns in different data sets. So if the data is, if there is a lot of data, if this is a multidimensional uncovering these patterns with your eyes or with your own mind is difficult. So you as leverage. [00:34:00] And then the dimensionality reduction also name says it all, reduce the number of dimensions features in.

Relation to original numbers. So basically eliminate everything that does not give you additional information to classify this data into categories. While preserving the most important information. And for example, this is used to simplify a models, compress data, reduce noise, improve performance, prevent overfit, and reduce computational costs.

And there is a. Trend or tendency that I'm seeing is and this is often mentioned as a drawback. Oh, and we have more international guests now from Cameroon. Fantastic. Cameroon, welcome. So the trend that I'm seeing, I think it's just a general tech trend that's happening, and then it's entering pathology AI [00:35:00] world as well that you have this.

It's actually everywhere. Like life is a funnel, right? You have a highly computationally, how do you say like that? Consume a lot of computational resources. And then an algorithm that consumes a lot of resources. And then you figure out how to streamline it, how to get the, like with this dimensionality reduction, you have a bunch of data that classifies it into something.

Oh, can it use less data to get the same result? Can it use less computational to get the same result? Can it not use the whole image at one time, but stream the image and different ways to make these new computational discoveries lean which is super important in medicine, not only, but in general to put it in devices that have less computational powers than computers and servers and things like that.

Very important for medical devices to lean and trim off the unnecessary computational [00:36:00] resources that are being used. Then we had our anomaly detection outlier detection, which is unsupervised algorithms used to identify outliers. And this can be applied to pinpoint unusual patient records with a particular health issue.

Error in data entry detect rare gene genetic mutations identify errors or flag. Abnormal lab results. So you immediately excuse me, I dropped my pen immediately have something flagged right to, for farther workup. Super useful.

Annotation and Supervised Learning

Aleks: And let's look at here we have the fully supervised learning, weekly supervised learning and self supervised learning.

So let's start with fully supervised. Who has done annotations in this group? Let me know in the chat. Have you annotated [00:37:00] pathology images or any kind of like images and not just hours at the time? Then you know why I'm making faces. And this is also the bottleneck of this approach. It's, blessing and a curse at the same time because blessing, you have an expert or maybe you have some other orthogonal label, which is fantastic because you then don't have to have a human label stuff.

But then you have different costs associated, pros and cons of everything. But basically this is gonna be a slide plus an annotation or region of interest labels. And then you sub regions of interest using annotation, and then you have a predictor. And then you update this predictor using more regional, more annotations.

So you give the model some annotations, then it trains a little bit, you see where it is making mistakes. You [00:38:00] give more annotations and then. You can basically train a very good image analysis model, but if you wanna use less human resources and not annotate so much, you can just use slide label. Sorry, slide label slide.

And this slide label. So what's gonna be the slide label? The slide label, it's gonna be the diagnosis of the cancer that's on the slide. Or like a case label where we have a bunch of slides together. Not all of them have cancer, but in general the case has cancer. And then here we're gonna be use that, we're gonna use attention mechanism.

And then the model is gonna predict the slide and it's gonna go back to these predictions and, update them as it trains.

Self-Supervised Learning and Future Trends

Aleks: So this is the weekly supervised and then we have self supervising. Let me remove my, too [00:39:00] many too colorful on this image. A little bit too colorful. And if you have any questions you let me know in the chat.

Just wanna remove this.

I don't know if it helped, but what I wanna say is that self supervised learning basically eliminates the person labeling. And Thomas, I know you feel my pain about annotations. You're smiling. Yeah. It's. A blessing and a curse. It's a tedious task. And you do need high quality data.

So you do need to like, put some effort into it if you wanna have a good model at the end of the process, right? And the more data you feed the model, the better. So the more you annotate, the better. The more you annotate, the more tired you are. And that's, I guess that's life. That's okay. Anyway, so self supervised learning.

We, we have the slide tiles. We have perturbation, perturbed, slide, and tile. So what [00:40:00] did we have? Okay, here it had yellow and here it has green. Then we have an encoder model. Training loss, no label and downstream, downstream use may require further training. And here it's basically matching stuff and also reconstructing stuff so it didn't match circle and this other thing, but it matched the two stars, right?

And then it's reconstructing and the update encoder model using loss function that do not need class labels. And this borders on magic for me, but it's not magic, it's just the new computational computational ways of doing things, which I love because in the last, I think two years I stopped thinking about this one second.

Sorry I didn't take water and I have to [00:41:00] mute myself to cough. So in the last two years, I would say maybe even more, but in the time when, okay, so there was deep learning first that I learned about. It was before, but I learned about it like 2017, 18. And that bordered on magic for image analysis. But then the large language models came out in 20 21, 20 22.

And I'm like, okay, if there is something that's not working right now in terms of ai, I don't have to get frustrated. I just need to wait because those people are like constantly iterating and figuring out how to make these things better. And, sure enough, there, there was something that I think it was the molecular prediction from images from, because you can have molecular information as a label and then have a predictor of molecular stuff in an image.

And I started seeing these publications. I'm like, okay, you just wait. [00:42:00] Keep following this field. And they will come up with something cool, a cool method, and then us as the scientists pathologists healthcare professionals, and whatever expertise we have, being aware of those methods, we know how to leverage it for our fields.

We're not done yet. Let's finish up. And then I'm gonna go to the comments I have with you from Thomas.

So digital pathology and non generative.

Practical Applications and Challenges in AI

Aleks: AI machine learning leverages whole slide images, right? We know it leverages whole slide images, as we said, can be labeled in linking the slide with diagnosis or metadata. And metadata is basically data about data. So it can be patient demographics, clinical findings, tumor grade stage immuno has histochemical results, response to therapy, and you name it.

You can figure out what o other data can be [00:43:00] there. And the term of annotation refers to process of making sorry, marking or labeling specific structures, cells, mitotic figures. Oh my goodness. Can you imagine sitting for five hours and marking mitotic figures? People did that and they did that well, and then came with a good algorithm for mitotic figures.

And these annotations can be performed using digital tools. And when I, when we have the annotation workshop and if you have some data that is that can be shown and you would like me to do an annotation session on your data, and we can do it, in qpa or if you're a vendor, we can do it in your software as well.

And then you can reach out to me and. We can organize that and do something that's gonna be a fun annotation session. I know I'm like complaining about those annotation all the time, but on the other hand, you can put music on and you can hang out while annotating and that's what we can do. So [00:44:00] yeah, digital tools, overlaying bounding boxes, depending what the task is, right?

Different colors and h and e images can be done in open source. Which Q path is the main open source or commercial tools? V seven is one. And then you can export those annotations in text format, for example, Jason and XML. So here I need to highlight that guys because, and Thomas, you tell me if you guys are doing this because what I'm seeing what I have seen so far, those annotation efforts, like basically get.

Embedded into the model, but you can't, annotation is data. And I don't think at the beginning of this process we were treating is a as data that can be reused and they would just go into the proprietary software, the annotations were done, train the model. The model was nice, good, perfect. You changed the software, your annotations are gone.

You need to start the [00:45:00] effort from scratch. My big ask to everybody who's annotating, if you can keep it as data that's reusable for different platforms, do it because it's thousands of hours of specialist time and it's I don't know, energy sunk into it and then it's used once for one model and then it's gone and you have to redo it.

And I've been doing that several times like redoing. We should store them. I don't know, like what every, so now the standardization for slide formats is better than for annotations. But if there's an option to store them and reuse them, let's keep let's start doing this 'cause it's very time consuming and subject to human interpretation, right?

So let's treat our annotations as the precious data that it is.

And let's talk a little bit [00:46:00] about the future, future emerging paradigms in ai. Obviously foundation models, and there's gonna be I can see my heart, right? You can see we're gonna be talking about it. We're just session three. There is more. But foundation models are emerging, techno technology, multi-model tasks where that can handle can involve variety types variety of data types in medicine.

And we, we said that the main categories would be image, text, and then we have genomic data image, text, tabular data. I think genomic would be in tabular data, right? And combining these all from different disciplines would be. Oh, and we have a lot of questions. We, I'm gonna be addressing them 'cause we're almost done.

But what I wanted to emphasize here and then right multimodality, I wanna tell you some names. Sometimes people ask me, oh, what do you think about his stomachs? And I'm like, what is his stomachs? So here it defines that algorithms that merely extract only [00:47:00] select features from histopathology slide were called his stomachs to those able to analyze hierarchy of novel features in entire whole slide images called pathos or tissue phenomics.

So I think tissue phenomics was invented by the company I was working for by the definitions tissue phenomics. I think we had the symposium tissue phenomics. But yeah, basically like hierarchy of different features, next generation histo reformatory. Like a lot of terms, but basically what they wanna say.

Different data types, including the image data and combinations, right? Another piece of magic vision, transformers. Don't take the piece of magic literally. But these models these ways of analyzing images, self attention mechanism and they can perform both patch level and slide level image classification, segmentation and object detection.

So they still need to divide images into patches, but what they can do [00:48:00] after each patch is linearly embedded into a vector, positional information gets added to retain the spatial relationships between patches. I think this is novel. It wasn't the case before. Definitely wasn't the case for multiple instance learning where you would tie an image into patches and then let the model predict is it cancer image or not?

And it would have to sort and they can, there can be just one patch that has cancer. So anyway, the transformers do get the. Patch location information. And the next thing, revolution that I'm seeing happen in consumer AI will be AI orchestration tasks also in medicine. And oh, you guys are having a cold discussion in the chat.

Super cool. So I'm gonna, I'm gonna show it as well. Yeah, AI or orchestration combination. Sorry. Combination of text, image and multi data. And then in addition as noted [00:49:00] earlier, the classic non generative foundation models are of grade value and they are of grade value in combinations.

And what combinations, like we will figure out what combinations, what makes sense, where is it where does it make sense to combine classical with generative, non generative with generative classical, with deep learning and all these things. And the heart wants to show you that ultimately the most likely best solution for a responsible AI approach is one that simplifies the approaches as much as possible, but also may employ a combination of traditional and advanced approaches with or without generative ai.

In addressing the task at hand. And so that's something to pay attention to, especially in marketing messages where they say drop everything else. This is the new thing on the block. And just use this. You know what there is a reason why [00:50:00] handyman have a lot of tools in their toolboxes, right?

And that counts for AI and for science as well. I recently talked to a group of molecular of toxicologic pathologists dealing with molecular pathology. And. And that was one I was a podcast host for another, for TXP podcast from the Toxicologic Pathology Journal. And one of my question was, oh, is there the main method?

Like it was I knew there is no main method no solution that solves everything. But I asked them, is there like the one thing that people now should be using? And they say no. They say basically the same. You have your problem and you need to pick the right tool to solve your problem.

And let me check the chat. Here is what about their represe when one patient has multiple slides and tumor type and the consequences of the predictions the. It should detect multiple tumor types. Depends [00:51:00] like how we use let me rephrase the question and tell me in the chat if that is what you mean. But basically when we have the the non generative non foundation models okay.

And she's talking about the case with heterogeneous tumors, which have multiple divergent differentiation and you have experience on that. I don't have experience and you guys keep talking in the in the chat, which is a fantastic discussion. Thank you for this. But I'm gonna present you a concept that could be applied or like the the pattern that I'm seeing.

So I. When those algorithms are being developed those non generative AI non foundation models or like non generative AI foundation models, they have very narrow applications. So it needs to be fit for purpose. And I think we had one publication where there [00:52:00] was they developed an algorithm that will check the data for you and say, oh, is this data good for whatever your algorithm is gonna detect?

So if we have something that just does, I don't know, I don't wanna give a back bad example, but does let's do it very high level. If there is an algorithm that does breast cancer and there is a certain types of breast cancer that it was trained on, it's not gonna perform on a heterogeneous tumor that has multiple components, right?

The options, there are two options. Assuming that option that I just mentioned from the literature is available, you screen the data with whatever method of data screaming, data, screening you have to check if the tool that you have is gonna perform on this data. If it will not perform, don't use the tool.

Meaning it either goes back to the pathologist and come let me know if this is helpful. Or it goes to another algorithm if we have one. But that would require upfront knowledge or [00:53:00] screening. Okay. Do we have the tools that can be used for those images? And if we don't, then those images or those cases are set aside as like specialty cases or something that must be reviewed by a pathologist.

And that's how I would approach it. Because there is no point in using a tool that's not gonna be good for the job. For the job. Let me say, let me check if you have any other questions. I, and Thomas is saying that he's afraid of pharma r and d I'm afraid in pharma r and d is quite difficult.

Weekly or self supervised learning methods. Yes, I think so too. So I was of course, and I was in 2019 where I saw the first anomaly detection and I was like, oh, now that's gonna be the next thing. And then it didn't perform because pharma r and d, because tox bath is a [00:54:00] different like different visual.

Presentation of changes. So if it's very obvious the images see you at the uscap ga. That's so cool. So if you guys are going to us, how do you pronounce it by the way? Is it uscap, ascap or uscap? I think I'm mispronouncing or mixing the pronunciation constantly. If you can send me in the chat the correct pronunciation of U-S-C-A-P, then I'll use that pronunciation from now on.

And I'm gonna be at the booth 5 28.

Closing Remarks and Store Announcement

Aleks: Thank you so much for joining me today. Digital Pathology Trailblazer. And I wanna show you something before we go in case you wanna check out and do some shopping because our store digital pathology store is live. Ah, I don't wanna do. I wanted to show [00:55:00] myself here this QR code DPP store, you can check it out.

So I'm just telling you it's beta version it's work in progress, but I thought if you are interested in checking out the courses, checking out little things like these earrings you can check out the store. You can also give me feedback on what you would like to see there. What is totally off, because as I'm saying this is work in progress and I would love to I would love for you to go there and check it out and let me know what you think and I talk to you in the next episode.