207: Deep Learning for Histopathological Classification of Salivary Gland Tumors Artwork

Digital Pathology Podcast

Aleksandra Zuraw from Digital Pathology Place discusses digital pathology from the basic concepts to the newest developments, including image analysis and artificial intelligence. She reviews scientific literature and together with her guests discusses the current industry and research digital pathology trends.

All Episodes

Digital Pathology Podcast

207: Deep Learning for Histopathological Classification of Salivary Gland Tumors

Subscriber Episode • March 21, 2026 • Aleksandra Zuraw, DVM, PhD • Episode 207

This episode is only available to subscribers.

Digital Pathology Podcast +

AI-powered summaries of the newest digital pathology and AI in healthcare papers

Send us Fan Mail

Paper Discussed in this Episode:

The Performance of Artificial Intelligence in Classifying Molecular Markers in Adult-Type Gliomas Using Histopathological Images: Systematic Review. Almaabreh O, Al-Dafi R, Tabassum A, Othman A, Abd-alrazaq A. J Med Internet Res 2026; 28: e78377.

Episode Summary: In this deep dive of the Digital Pathology Podcast, we explore the intersection of human limitations and computational power. Following the 2021 World Health Organization mandate requiring molecular data to diagnose adult-type gliomas, pathology has faced a massive bottleneck. Can artificial intelligence look at a standard pink-and-purple tissue slide and accurately predict hidden genetic mutations to serve as a diagnostic shortcut? We unpack a massive 2026 systematic review that evaluates the architectures, the "data diets," and the structural hurdles of using AI to "see the invisible".

In This Episode, We Cover:

• The 2021 WHO Diagnostic Shakeup: How the World Health Organization shifted glioma diagnosis from pure visual morphology (judging a book by its cover) to requiring precise genetic spelling (finding a typo on page 42), making the diagnostic process incredibly slow and expensive.

• The Targets - IDH vs. 1p/19q: Why AI models are highly proficient at spotting the metaphorical "canyon" carved by early metabolic IDH mutations, but struggle to find the subtle visual clues of 1p/19q chromosomal codeletions.

• The AI Toolkit - CNNs, MIL, and Transformers: ◦ CNNs (like DenseNet121): The heavy lifters of medical imaging, analyzing local cell structures and edges by constantly reusing foundational visual features. ◦ Multiple Instance Learning (MIL): The brilliant algorithmic solution to the excruciating human labor of pixel-by-pixel tumor annotation, allowing the AI to mathematically figure out what cancer looks like using only slide-level labels. ◦ Hybrid Models: By combining the microscopic focus of CNNs with the zoomed-out, global contextual awareness of Transformers, these models achieved the highest average accuracy at 92.80%.

• The "Data Diet" and Domain Shift: The critical danger of training AI exclusively on single, homogeneous databases like the TCGA. We discuss why an algorithm that performs perfectly in a pristine "test kitchen" completely panics and drops in performance when faced with the varied stains, slice thicknesses, and scans of real-world community hospitals.

• Multimodal Medicine: The revelation that AI models perform vastly better when fed diverse data streams, such as combining slide images with MRI scans and clinical notes. Implementing this necessitates a monumental structural integration between historically siloed hospital departments like radiology and pathology.

Key Takeaway: AI is not replacing pathologists tomorrow; it is stepping into the co-pilot seat. While hybrid models show immense promise, their true standalone clinical adoption depends on breaking free from narrow training data, overcoming domain shift, and fundamentally restructuring our hospitals to feed these algorithms the multimodal context they need to thrive

Get the "Digital Pathology 101" FREE E-book and join us!

You know, um, usually when we talk about a medical diagnosis, there's this underlying expectation of just like pure mechanical precision,

right? Yeah. Like engineering.

Exactly. Like if you fall and break your arm, the X-ray shows that jagged white line on the screen and the doctor just points at it and says, you know, there it is.

Yeah. It's a totally binary state. It's either broken or it's not

right. It's clean. And frankly, it's comforting. We like things to be visible. We want them neatly categorized so we can just, you know, fix them and move on.

But, uh, Biology rarely cooperates with our need for neat categories.

No, it really doesn't. And when you step into the world of neurooncology, suddenly that clean binary X-ray machine is just useless. We're looking at a diagnostic landscape that is incredibly murky, highly subjective, and intensely high stakes.

Oh, absolutely.

So, welcome back, trailblazers. This is the digital pathology podcast. We were here to take a massive stack of cutting edge research and distill it down to the essential insights that you actually need to know.

And today's topic is a big one,

huge. If you are sitting in a pathology lab right now or even if you're just, you know, fascinated by where human limits meet computational power, you know exactly how murky this gets. We are looking at a space where human eyes are literally being pushed past their biological limits.

Yeah, we are specifically unpacking the diagnosis of adult type glomomas. And just for context, these are incredibly complex and tragically very devastating primary brain tumors,

right?

They account for about uh 75% of all malignant central nervous system tumors.

Wow. And the survival statistics there are just grim.

They are. We're talking about a 5-year overall survival rate of less than 35%. Time is literally everything here.

So, to guide us through how medicine is trying to buy these patients more time, we're diving into a brand new really comprehensive systematic review. It was published in the Journal of Medical Internet Research just this year, 2026.

Right. By a Incredible team.

Yeah. Obada alma Abra, Rokaya Alafi, Aliyah Tbasa, Mammad Oman, and Allah Abdal Razak.

And the core mission of their paper is to answer one massive fieldaltering question

which is

can artificial intelligence look at a traditional hisystopathological slide, I mean just the pink and purple stained tissue under a regular microscope and accurately predict the underlying molecular and genetic markers of these glamas.

Okay, let's unpack this.

Yeah. says they didn't just, you know, write an opinion piece here. They started with a mountain of data.

A literal mountain.

Yeah. Like 24,453 initial reports and they filtered, they scrutinized, they assessed for bias until they distilled it down to just 22 key studies,

which is an insane amount of work,

right? But before we get into the AI models themselves, I think we have to talk about why this distillation is such a big deal. Like why is the neurooncology world so desperate for an AI assist right now?

Well, it's because of the clinical bottle. neck. To understand why those 22 studies even matter, you have to understand what happened to pathology back in 2021.

Oh, the WHO update.

Exactly.

Historically, the way we diagnose these tumors was purely morphological. So, a highly trained pathologist would look at a tissue section under a microscope and classify the glyoma based on what the cells literally look like,

like their shape, their density, stuff like that,

right? Their visible behavior. And that was the gold standard, just pure hisystologology.

But human eyes have limitations obviously.

Exactly. you inevitably get interobserver variability. You can have two brilliant pathologists looking at the exact same slide, especially if it's a borderline case with mixed cellular features,

right? And they come to completely different conclusions

or at least slightly different diagnostic conclusions. Yeah.

But then the World Health Organization stepped in and fundamentally changed the rules of the game.

They totally flipped the script. The 2021 WHO classification shift integrated molecular features directly into the diagnostic criteria.

So you can't just look at it anymore.

No, it's no longer enough for a pathologist to just say what the tumor looks like. You have to know its exact genetic makeup to officially categorize it.

Okay.

Specifically, a diagnosis now absolutely mandates knowing the status of the IDH1 and IDH2 mutations as well as something called the 1 P19Q chromosomeal code.

Wait. Okay. So, it's like we used to judge a book by its cover, right? We'd look at the binding, we'd look at the font on the dust jacket, and a pathologist would say, "Okay, based on the cover, this is a tragedy.

That's a great way to put it. Yeah.

But now the WHO is telling us, no, the cover isn't enough anymore. You actually have to read the exact genetic spelling of the text inside. You have to know if there is a specific typo on like page 42.

What's fascinating here is that that is a perfect analogy to conceptualize it.

But the problem is getting that genetic spelling takes so much time.

So much time.

I mean, human eyes get tired reading the font is subjective and running the actual molecular sequencing in a lab to find that typo. That requires specialized, really expensive equipment

and it takes days, sometimes even weeks.

Which brings us to the promise of this systematic review. Like, what if you didn't have to wait weeks for a sequencer?

Exactly. What if an AI could just process the digital image of that tissue cover and notice such incredibly subtle microscopic patterns in the binding and the font

that it could accurately predict the typo on page 42

just by looking at the picture.

Wow.

You are essentially creating a diagnostic shortcut.

Rapid, highly accurate molecular classification without waiting for the lab to do the sequencing.

Okay, so the AI is essentially attempting to see the invisible. Let's look at the actual tools these 22 studies built to pull this off.

Yeah, let's get into the tech

because we aren't just talking about a single algorithm here. We're looking at an entire toolkit of neural networks. And the poolled average performance across these studies is pretty striking.

It really is.

The AI achieved an average accuracy of uh 85.46% a sensitivity of 84.55% and a specificity of over 86%.

Which is huge. Those numbers confirm that the AI is absolutely seeing a real biological signal in the morphology. But um the AI landscape here is highly varied.

How so?

Well, when we look at the architectures like the brains of these models over half the studies about 54.5% relied on convolutional neural networks or CNN's

right CNN's there. essentially the heavy lifters of medical image analysis. Right.

Absolutely. They act like a scanning magnifying glass moving across the image looking for specific local spatial features

like the edges of a cell wall, the texture of the nucleus,

the density of the tissue. Exactly. And the review noted that among the pure CNN models, one called Denset 121 was the absolute standout.

Oh yeah. Hitting nearly 91% accuracy.

Yes. And the reason Densenet12n performs so well is because of its underlying mechanism which is dense connectivity.

Okay, break that down for us.

So in a traditional neural network, information passes sequentially from layer 1 to layer two to layer three,

right?

But by the time the image gets to say the 50th layer, the network might actually start to forget the high resolution details of the cell edges it saw back in layer one.

Oh, I see.

It's called the vanishing gradient problem.

So it essentially loses the plot the deeper it thinks about the image.

Exactly. But in a densenet architecture, every single layer is directly connected to every subsequent layer.

Wait, every single one?

Yes. Layer 1 passes its feature maps not just to layer 2, but directly to layer 3, layer 4 all the way to the end.

Oh, wow.

So, the network is constantly reusing the foundational visual features, which makes it highly efficient at extracting those local cellular details without requiring, you know, massive computational power.

Okay, that makes a lot of sense. But CNN's weren't the only tool evaluated here.

No, they weren't.

Because half of the studies utilized something called multiple instance learning or MIL. And honestly, when I was reading through this, I found the MIL approach just brilliant.

Oh, it really is.

Yeah.

Because it solves a massive human bottleneck.

The annotation problem.

Exactly. The annotation problem.

Right. Because in traditional AI training, if you want the computer to recognize a tumor, a human pathologist has to sit at a monitor and painstakingly draw circles around every single pixel of cancer on a slide.

Yes. It takes hours per slide. It is an excruciatingly laborintensive process

and MIL completely circumvents that need for pixel level handholding.

It does. It handles what we call weekly labeled data. So instead of giving the AI a perfectly coloredin map, the model just gets a single slide level label.

So the human just says somewhere on this whole slide there's an IDH mutation.

Right. Just go find it.

That's wild.

Precisely. The MIL algorithm then breaks that massive gigapixel slide down into thousands of tiny patches.

Okay,

it treats the whole slide as a bag and the tiny patches as instances inside the bag. And through repeated exposure, the model mathematically figures out on its own which specific patches correlate with the overall molecular diagnosis.

Wow. So, it essentially teaches itself what the cancer looks like without a human ever pointing to it.

Yeah, it's incredible.

But even with CNN's and ML, the undisputed champions of the systematic review were the hybrid models.

Oh yeah, the hybrids.

These are architectures that combine the local microscopic feature extraction of a CNN with the global contextual awareness of transformers. And these hybrids hit the highest average accuracy at 92.80% with the sensitivity approaching 90%.

Yeah. And this is really where we see the cutting edge of computer vision. Transformers rely on something called self attention mechanisms,

right?

So a CNN is great at looking at a single cell and understanding its immediate neighbors. But a transformer can look at a cell in the top left corner of a tissue slide and mathematically calculate its relationship to a completely different cluster of cells in the bottom right corner.

So, it's basically zooming out to see the forest while the CNN is busy analyzing the veins on the individual leaves.

Exactly. It understands long range dependencies across the entire tissue architecture.

That's so cool.

When you fuse those two capabilities together, you just get a much more comprehensive holistic understanding of the complex tumor structures.

Okay. Wait, hold on though.

Yeah. I read these numbers almost 93% accuracy and it sounds amazing, but let's be real for a second.

Okay,

these hybrid models are massively computationally heavy. I mean, they require serious GPU processing power to run those self attention calculations.

They absolutely do.

So, my question is, does the average pathology lab, say, at a smaller regional hospital, actually have the server racks to run these models, or is this just a fancy academic exercise that only lives in a university supercomputer somewhere.

That is the exact grounded reality check we need to have here

and honestly it is something the authors of the review are very mindful of

because it's a real barrier.

You're absolutely right that these hybrid models demand significant expensive computational resources.

And while cloud computing is expanding and you know offloading some of that on-site burden,

these models still have to mathematically justify their operational costs with unassalable clinical accuracy.

And are they doing that?

Not entirely. And this is crucial for the trailblazers listening. The review clearly notes that even these highly advanced hybrid models are generally hovering below the 90 plus% accuracy threshold.

And 90% is the magic number, right?

Yeah. In clinical practice, 90 to 95% is widely considered the strict minimum threshold for standalone diagnostic adoption in high stakes environments.

Makes sense.

So, while these algorithms are incredibly capable, they are not ready to fly the plane solo. They are still very much in the co-pilot seat. which is fair, but if we want them to fly solo eventually, we have to look at how we're actually training them.

Yes.

Because a brilliant hybrid architecture is completely useless if you feed it garbage information.

Let's look at what these models were actually studying. The data diet.

Oh, the data diet is everything. The quality and diversity of the training data fundamentally dictate the success or failure of the model.

Right. The architecture is just the engine. The data is the fuel.

Exactly. And this review highlighted a Massive difference between unimodal and multimodal data inputs.

Okay. So unimodal meaning we just feed the AI the pink and purple hystopathology image.

Correct.

Multimodal meaning we give it the slide image but we also hand it the patients MRI scans, their age, their clinical demographics,

all of it.

And the results weren't even close. The models that used multimodal data significantly outperformed the purely image-based models. I mean sensitivity jumped from 84.31% all the way up to 90.15%.

Which if you think about it, perfectly reflects how medicine actually works. Correct. Context matters.

Oh, totally.

A human pathologist does not look at a slide in the sensory deprivation tank, right?

They look at the clinical notes. They consider the tumor's location from the radiology report. They factor in the patient's age. When you allow the AI to integrate those diverse data streams, it just builds a much richer, more robust internal representation of the disease.

Okay. But speaking of the data diet, the authors of this review raised a massive flashing red flag regarding where all this data is actually coming from.

Yes, they did.

When they dug into the methodology of these 22 studies, they found a startling lack of diversity. 15 out of the 22 studies used the exact same open-source data set to train their models.

Yeah. The cancer genome atlas or TCGA,

right? And this is a huge problem.

If we connect this to the bigger picture, this is perhaps the most significant vulnerability identified in the entire review

because everyone is drinking from the exact same well. Exactly. The TCGA is a phenomenal foundational resource for the scientific community, don't get me wrong,

but relying on it so heavily across the board introduces a critical flaw regarding generalizability.

It's like um it's like training a master chef exclusively on one specific brand of premium ingredients inside one highly controlled climate regulated test kitchen.

I love that analogy,

right? Like they might cook a perfect Michelin star meal in that specific kitchen. But what happens when you drop that same chef into a totally different chaotic restaurant in a different country with local ingredients they've literally never seen before? The meal is going to suffer.

Yes, exactly. And in the world of machine learning and digital pathology, that different ingredient scenario is known as domain shift.

Domain shift. Break that down for us. What does domain shift actually look like under a microscope?

Well, think about the chemical reality of preparing a slide. A tissue sample is stained with hematoxin and eosin,

right? Eosin's pink hematoxin is a bluish purple, but um a lab in Boston might leave the tissue in the stain for 30 seconds longer than a lab in Tokyo.

Okay.

Or they might slice the tissue a fraction of a micron thicker.

Yeah.

Or they might digitize the slide using a totally different brand of scanner.

So to a human pathologist, it just looks like a slightly darker purple cell. No big deal.

Exactly. A human brain easily adjusts to that,

right?

But to a computer vision algorithm, the exact RGB pixel values are now completely different. The underlying math has changed.

Oh wow.

So when you take an AI model trained heavily on the specific quirks and color profiles of the TCGA data set and you apply it to a sample from a community hospital with different prep protocols, the model panics.

It just doesn't know what it's looking at,

right? Its performance often drops significantly

because it didn't actually learn the universal truth of what cancer looks like. It just memorized the specific interior design of the TCGA test kitchen.

Precisely. And the authors of the view are adamant about this. If we want these AI models to work in the messy, varied reality of global clinics, future research desperately needs to prioritize large, highly diverse, multi-institutional data sets.

We have to train the AI on the variation.

We have to,

which brings us to the final and frankly most fascinating clinical insight from Alma Aubra and the team.

Yeah, let's get into the targets.

We've talked about the algorithms. We've talked about the data they eat. Now, how do they actually perform on the specific molec targets the WHO demands. The Idh mutations versus the 1P19Q codellations.

Well, they do not perform equally. Not at all. The AI models demonstrated a very clear proficiency for identifying Idh mutations, achieving an accuracy of 86.13%.

Pretty good.

But they struggled significantly more with the 1p19 code deletions, dropping down to 81.63% accuracy.

And before we get into the biology of why that is, there was a really counterintuitive quirk in the data regarding the Idh mutation. Oh yeah, this was so interesting.

When the AI was trying to detect IDH, the models designed for multiclass classification actually outperformed the models doing simple binary classification.

Yes, that was a fascinating finding.

Just to clarify for everyone listening, binary classification is asking the AI a yes or no question, right? Is the mutation present or absent?

Correct?

Whereas multiclass classification is asking the AI to sort the tumors into three or four distinct subcategories. All once,

right?

You would think the simple yes no question would be easier for the computer.

You'd think so. Yeah.

But the multiclass models hit almost 92% accuracy compared to just 84% for the binary models. Why does a harder task make the AI smarter?

It comes down to how neural networks build their internal logic. If you just ask a model, is this a dog or not? It might just learn to look for fur,

right?

It finds a lazy shortcut. But if you force the model to answer, is this a husky, a golden retriever, or a cat? looking for fur isn't enough anymore.

Uh,

it is forced to look at snout shape, ear geometry, tail length. By forcing a neural network to distinguish between more complex, nuanced categories, you force it to learn much more robust, deeply discriminative features of the tissue.

It prevents the AI from taking the lazy way out. I love that.

Exactly.

But I want to go back to the performance gap itself. Why does the AI play favorites? Why is an IDH mutation so much easier for a computer vision algorithm to C than a one P19Q code. What is actually happening to the cells?

It's all about the underlying biology and the timeline of how the tumor develops.

Okay.

An Idh mutation is generally considered a very early event in glyogenesis. It fundamentally alters the cellular metabolism from the very beginning

day one

pretty much. Specifically, it causes the overprouction of an oncom metabolite called D2 hydroxyglutarate.

Okay. So, it changes the internal chemistry of the cell

drastically. And because this metabolic shift happens so early and is so profound, it influences the way the cells grow, divide, and organize over a very long period of time. It literally leaves a distinct morphological footprint on the tissue architecture.

It's like a river carving a canyon over time. Like the earlier the river starts flowing and the stronger the current is, the more obvious that canyon is going to be to anyone flying over it in an airplane.

That is exactly the dynamic. The Idh mutation carves a deep visual canyon into the tissue. The one P19 Q code on the other hand is a chromosomeal alteration.

So it's different

very pieces of chromosomes 1 and 19 are missing and while it is critically important for determining how the patient will respond to chemotherapy it simply does not produce as pronounced a visual change in the physical shape of the cells.

So the AI is scanning the tissue looking for clues but it's like playing where's Waldo except someone forgot to actually print Waldo on the page.

Exactly.

The visual evidence just isn't there in the standard pink and purple stain. It's a much harder puzzle because the morphological clues are either incredibly subtle or completely non-existent in a standard H&

So what's the solution?

Well, the authors caution that while Idh detection is a fantastic candidate for immediate clinical tool development, the 1P19Q models are going to need serious refinement,

right?

They will likely need to lean heavily on those multimodal data streams we talked about bringing in the MRI scans to compensate for the lack of visual evidence on the slide itself.

Got it. So, we've covered a lot of ground today.

We really have.

So, what does this all mean? What are the core takeaways for you listening right now? Let's synthesize this. First, AI is not replacing pathologists tomorrow.

No, definitely not.

That 90% clinical threshold is a strict bouncer at the door, and most of these models are still waiting in line outside. Right now, AI is shaping up to be a highly capable co-pilot.

Co-pilot. Exactly.

It is particularly adept at detecting Idh mutations where it can serve as a rapid triage tool. tool or a tireless second set of eyes.

But its potential is entirely capped by the data sets we use to train it. If the field doesn't break its reliance on single homogeneous data sets like the TCGA, we will continue to build brittle tools that completely fall apart when they encounter the domain shifts of the real world.

Right? This systematic review by Almarabra and colleagues is essentially a treasure map for the next decade of digital pathology.

It really is.

It shows us the immense computational promise of hybrid transformer models, but it also clearly marks the massive potholes, the data set overlaps, the domain shifts, and the strict biological realities of what an algorithm can and cannot see.

Which, you know, this raises an important question, a critical structural challenge for the future of medicine.

Oh,

we discussed how these AI models perform best when they are multimodal. When they integrate the hisytologology slide with the MRI scan and the clinical demographics,

they're combining the data.

But historically, rad ology, pathology, and oncology have operated in distinct silos within a hospital.

That's true.

They have their own databases, their own proprietary software, their own distinct workflows.

Yeah. The radiologist looks at the screen in a dark room, the pathologist looks at the slide in the lab, and the oncologist talks to the patient in the clinic.

Exactly. But if the most powerful life-saving diagnostic tool of the next decade explicitly requires all three of those data streams to be synthesized simultaneously,

that's not going to work.

How will the This force our hospital departments to fundamentally restructure. How do you seamlessly and securely merge a radiology pass system with a digital pyology server to feed this multimodal AI?

Wow.

It is no longer just a software engineering problem. It is a massive structural medicine problem.

It really is. It forces the whole system to integrate. But you know, thinking about all this, especially the part about how the IDH mutation alters the cells early on, leaving that distinct footprint like a river carving a canyon, it makes you wonder about where this is all heading.

What do you mean?

If an AI eventually gets so sophisticated that it can reverse engineer a tumor's entire metabolic history just from the subtle shapes of its cells, will we one day use these computer vision models not just to diagnose the cancer, but to predict exactly how many months or years ago it started growing?

Oh wow.

Like a digital time machine for oncology. Think about that for a second.

That is an incredible thought.

We highly encourage you to read the full paper in the Journal of Medical Internet Research. It is an absolute masterclass in evaluating the realworld readiness of diagnostic AI. Keep questioning the algorithms, keep pushing for diverse data, and keep learning. We'll see you next time on the Digital Pathology Podcast.