Digital Pathology Podcast
Digital Pathology Podcast
196: DigiPath Digest #39 - If AI Sees More Than We Do. What Makes It Clinically Trustworthy?
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
If AI can detect patterns we cannot see, how do we know when its answers are clinically trustworthy?
In this episode of DigiPath Digest #39, I explore a big-picture question in digital pathology and medical AI. Many models now match or even exceed human performance in specific diagnostic tasks. But most of that evidence comes from controlled or retrospective datasets. So what happens when we try to bring these tools into real clinical workflows?
I review four recent papers that help frame this challenge and point toward the next steps for trustworthy AI in healthcare.
You will hear about the role of prospective validation, real-world effectiveness, transparent reporting standards, and multimodal data integration as recurring themes across these studies.
Key Highlights
00:00 – Introduction
What do we do when AI detects signals that humans cannot see? The core challenge is verifying those outputs before trusting them in clinical decision making.
03:32 – AI Across the Healthcare Continuum
A narrative review shows AI achieving clinician-level performance in well-defined imaging tasks, including digital pathology. But most evidence comes from retrospective or controlled environments, and prospective validation remains limited.
08:34 – Multi-Omics and AI in Gastric Biopsy Diagnostics
Morphology alone cannot fully capture molecular heterogeneity or predict disease progression. Integrating genomics, proteomics, metabolomics, and other omics with AI is shifting gastric pathology toward data-driven precision gastroenterology.
13:38 – Hyperspectral Imaging for Real-Time Surgical Guidance
Spectral imaging can analyze tissue composition during surgery without staining, freezing, or contact with the tissue. Studies show promising sensitivity for detecting malignancy and supporting intraoperative decision making.
17:20 – REFINE Reporting Guideline for Foundation Models and LLMs
An international consensus guideline introduces a 44-item reporting checklist to standardize how AI studies are described. The goal is transparent, reproducible, and comparable research in medical AI.
22:35 – Big Takeaway
AI should be viewed as clinical decision support, not a replacement for clinicians. Real-world validation, ethical governance, and reproducible research standards will determine how these tools enter pathology workflows.
References (Articles Discussed)
Artificial Intelligence in Healthcare: From Diagnosis to Rehabilitation
https://pubmed.ncbi.nlm.nih.gov/41755929/
Transforming Gastric Biopsy Diagnostics: Integrating Omics Technologies and Artificial Intelligence
https://pubmed.ncbi.nlm.nih.gov/41751306/
From Image-Guided Surgery to Computer-Assisted Real-Time Diagnosis with Hyperspectral and Multispectral Imaging
https://pubmed.ncbi.nlm.nih.gov/41750768/
REFINE Reporting Guideline for Foundation and Large Language Models in Medical Research
https://pubmed.ncbi.nlm.nih.gov/41762555/
If you enjoy staying current with digital pathology and AI research, this episode will help you connect the dots between promising algorithms and practical clinical adoption.
00:00:00
Aleks: Welcome my trail blazers to the digipath digest number 39. Today we're going to be talking about kind of highle picture. What do we do when AI sees stuff and that we don't see and how can we verify it and high level maybe is not the right description here because we are uh we have a lot of reviews today and we have a lot of reviews and the reviews are many of them are literature reviews so these uh meta meta analysis or they call it something else and so yeah there's going to be a theme I'm going to talk
00:00:40
about this theme after we go through the papers through the abstract then you're going to know like what the overarching thing is here when AI is better than people. Let's do a few updates. So before this I was listening to this new subscriptionbased podcast option that I have for you and it is so good. It's AI assisted audio summaries of papers and they are not in the main feed. their subscription base cuz the amount of them is going to grow exponentially because every week we're doing the digipad
00:01:16
digest and in every digipath digest we review like three to five abstracts. So what I'm aiming at is to get these abstracts then take the papers and make an audio summary. It doesn't replace the reading but it's so good. So I listen to every single one of them before they go out and why I think this is like the next best thing after sliced bread they say. A friend of mine was like analyzing I don't know a couple of people who advanced very much professionally and she was asking them like what their
00:01:51
secret was and what they said was that it's reading papers every day. When I heard that I'm like there is no way I have time to read papers every day. But then we started doing the digipath digest. So that was already a step closer. And then I discovered these audio summaries. So the audio summaries are the closest that I have found so far to reading papers every day. And now out of the eight that we have covered in the last two weeks. Six already out there. Two I'm still listening to and there is
00:02:21
going to be four more after today. That's update number one. And if you're interested in that, let me give you the and then a big update. USCAP, US Cup in San Antonio, we are going to be working together. Digital pathology place is going to be working together with Hamamatsu. So the booth number to remember is 312. 312 is the booth where you can find me. And we're gonna do a special thing for you. I'm going to bring books and there's going to be a book giveaway. Digital Pathology 101 is the book that
00:02:59
we're going to be giving away at the booth 312 in San Antonio in like 3 weeks, something like that. And if you already have the book, you can bring it and I'm going to be honored to sign it for you. So, if you don't have it yet, get the PDF. There is an audio version. And these like physical books are going to be at the booth 312 at useup. We're going to be together with Hamamatsu trying to promote digital pathology. Now, let's focus on the science and on the reviews. And the first one is
00:03:32
artificial intelligence in healthcare from diagnosis to rehabilitation. I love this series today because I'm planning on another book that's going to be basically informing patients how AI is used in healthcare and this paper is definitely going to be part of my references and also I love it because this publication is from Poland. I've seen so many AI digital pathology and digital health publications from Poland recently. I'm super proud of my people from my country. So here it's
00:04:09
all places in Poland. Not so far for not so not so close to where I live but you know what it's Poland. So AI is increasingly integrated in modern healthcare and the applications that we have is going to be medical diagnostics, laboratory medicine, rehabilitation and patient center digital health solutions. So yeah, so so not a meta analysis but they call it narrative review to provide critically curated overview of current clinical applications of AI across the healthcare continuum. I like that very
00:04:44
much because and digital pathology is obviously a part of that from diagnosis to rehabilitation. So they did the targeted narrative literature search of all these different databases PubMet Medline Scope was web of science in base and they found influential studies published in the last decade. So in the last 10 years so they synthesized the evidence across clinical domains, diagnostic imaging, laboratory diagnostics, rehabilitation technologies and conversational agents. So all the previous ones are okay the normal care
00:05:24
continuum healthcare continuum but conversational agents hey this is new right this is chat GPT this is claude and all these chatbots chatbot chat chat agents and their reviewed literature indicated that AI system can achieve diagnostic performance comparable to healthcare professionals in selected welldefined tasks particularly with imaging based specialties. So we're talking radiology, mimography, of opthalmology, dermatology and our beloved digital pathology as well. And here's another important thing
00:06:04
retrospective or controlled study conditions. So this has been mentioned as a challenge several times that we are operating on retrospective data in controlled study conditions. So in laboratory medicine they focused on workflow optimization, result interpretation and clinical decision support. While in the rehab, this is also like next level, a little bit futuristic. We have AI enabled systems including robotics, motion analysis platforms, and large language models that facilitate personalized
00:06:41
therapy, functional recovery, but the there is heterogeneous evidence and limited perspective validation. So, so again here limited perspective validation whereas chat bots they were demonstrated potential to support patient education mental health interventions. This one is also like a new thing where people get access to mental health guidance through chatbots and they can talk to it like to a therapist but also communication workflows and specifically when it was adjacent to clinicianled care. How cool
00:07:18
is that if we have this as our support and despise despite these advanced challenges they that were related to generalizability algorithmic bias ethical implementation and reg regulatory oversight persists. So sorry despite these advances. So we have these like little injections of AI in the continuum of care, but we still have the challenges. And the conclusion or like the last sentence here that I have in orange is AI should be regarded as supportive clinical decision support technology rather than a replacement for
00:07:52
healthcare professionals and we need to p uh prioritize prospective validation, real world effectiveness and responsible integration. So here we can start already like seeing this pattern that answers the live stream question which is okay AI can can see more than we see maybe multimodel data sets it's like a lot more accurate a lot more specific but like how do we decide that it's valid clinically well we need prospective validation and we need real world effectiveness So you know the tool being super cool
00:08:34
and very with high performance metrics if it's not going to have real world effectiveness including incorporation in workflows including being easy to use including explanability transparency then they're they're not going to be useful right let us look at the next one the next one is transforming gastric biopsy diagnostics integrating omix technology and artificial intelligence. And this is also a review and the background here is that gastric biopsy remains central to diagnosing helicoactropy infection, autoimmune
00:09:15
gastritis, intestinal metablasia, dysplasia and gastric cancer and often other different gastric conditions and morphology based assessment is limited to inter through limited by interobservability. sampling constraints and incomplete ability to capture molecular heterogenity and predict progression. So we cannot do it just from an H& image and H& diagnosis. So the question or the objective of the this mini review is to summarize how multiomics technologies and AI are modernizing gastric biopsy
00:09:54
diagnostics. And um here in the methods they also used a narrative synthesis across the literature on gastric pathology multiomics genomics transcripttoics epigenomics proteomics lipidomics metabolomics microbiomics and spatial approaches. I don't know there are so many omics I guess you can make an omic out of every bio life science domain I guess but also they included AI in endoscopy and computational pathology. So computational pathology is our domain and the results were that multiomics
00:10:31
profiling enhances mechanistic understanding and refineses disease classification and it can capture clonal evolution pathway disregulation, immune micro environment interactions and metabolic remodeling with potential for biomarker discovery and therapy prediction. So the AI applications demonstrate strong performance across the gastric diagnostic pathway. And here we have this this word again diagnostic pathway. So we're not doing like the whole care continuum but we are not just doing one application for example
00:11:06
helicoacttor pylori. We are looking at the diagnostic pathway. So we're looking at these things more holistically. And so it improved lesion detection during endoscopy, reduced mist rates, segmentation of lesions, classification of precancerous conditions, H pylori recognition. We recently had a podcast with Dr. Andrew Janowik who wrote a publication on validating and deploying his H pylori detection tool in a large hospital in Geneva and there's a full podcast episode on that but it was so
00:11:47
interesting that the development of this tool the to detect H pylori it's pretty straightforward image analysisbased tool the development took like several weeks maybe six weeks the deployment of this whole thing and validation took them three years. So here we have this like okay is AI even fantastic AI for a task that could make our life easier. Is it going to be deployable and how much effort it costs? It costs a lot a lot more effort to deploy these things than actually develop but when you develop
00:12:21
them they have to work and good enough to even be a candidate for deployment. Other things that they mentioned here is that systematic evidence from these systematic reviews supports robust diagnostic accuracy while prospective studies highlight real-time feasibility. So they had some perspective studies here which is in contrast to the first one that we just reviewed and the integration of multiomics is shifting gastric biopsy from descriptive hisystologology toward datadriven precision gastroenterology.
00:12:58
But we have barriers as always data set quality standardization interpretability cost and regulatory and ethical governance. And when you look at these papers, these are going to be the keywords that are showing up over and over again. Okay, these are our challenges that we have to overcome. Now the next one is pretty interesting as well and it's a review as well. It's from imageguided surgery to computer a computer assisted real-time diagnostics with hyperspectral and multisspectral imaging. So we are looking at different
00:13:38
ways of imaging hyperspectral and multisspectral that are in life or intraoperative models of imaging. So let's have a look at what this type of imaging in contrast to light microscopy or in addition to light microscopy can help us with. For this one, there is a need for intraoperative image guidance and they are focusing on gynecologic oncology oncologic surgery to provide accurate identification of malignant tissue ensure negative resection margins and emerging imaging technologies can complement standard hisystopathology. So
00:14:15
if we had something that um you will do the hystopathology for final diagnostics but if there was something intraoperative other than frozen sections that could help you with margins with anything that's happening during the surgery and that would be fantastic and if it was real time that would be even more fantastic and this spectral imaging can extract information of tissue composition and physiological status in real time. So this is important and there's no need for tissue contact, contrast agents, staining or
00:14:49
freezing, which is huge in an intraoperative setting when you don't need to do all that. These imaging methods they analyze the wavelengths of light coming from the tissue and depending on the type the state of tissue and all different factors the wave composition if is different and you can extrapolate this information from just intraoperative imaging. And this review synthesizes current critical applications in gynecological oncology decision support utility and diagnostic performance with
00:15:22
data processing frameworks for tissue classification. So they did the systematic review and they had 29 studies and two clinical trials that met inclusion criteria. So look they even had two clinical trials. This is amazing. And uh most of them focused on cervical neoplasia, ovarian cancer detection and then there was also assessment of the fallopian tubes, assessment of endometrium valver skin and the pathology was used as the gold standard. So they had the information from the pathology report. They had the
00:15:59
imaging from intraoperative from uh during the surgery and they it's interesting because the overall specificity range from 30 to 99 that's a big range from 30 to 99% sensitivity from 75 to 100% and there is particularly high sensitivity for cervical lesions 79 to 100% and ovarian cancer 81 to 100 and among the included studies 13 used data interpretation algorithms, 11 applied machine learning, one deep learning and one combined both. Interesting. So not like high cutting edge vision transformers but just
00:16:43
classical machine learning and then one with deep learning and one as a combination. And what they conclude here is that spectral imaging supported by computational methods has shown promising results in the diagnostic evaluation of gynecologic disease. So that is amazing because getting as much information during the surgery without being invasive or without taking too much time um is improving care. Like that's very objective metric of improving care. We have one more. going to be talking about reporting checklist
00:17:20
for foundation and large language models in medical research. It's has the acronym refine R E F I N E international consensus guideline and it's very international. Let me show you the list of authors here. Very international. Turkey, Switzerland, US, Germany, Italy and these are not all the countries because after number 20 I stopped highlighting but and they will they they tell you it was 57 contributors from 17 countries. Can you imagine like how much back and forth and coordination this
00:17:58
thing required and then to publish it. This is like I'm always like I have so much respect for these projects that require such high level of coordination but then like if you want to have something that's going to be applied globally that's what you need to do. So kudos to all the people who organized that and who who all the people who made this guideline possible. So let's look what this guideline actually like let's look how they did it right because they're the guideline I'm going to show
00:18:28
you the guideline in a second but also here the purpose of this was to develop reporting checklist for foundation and large language models I love like reporting re foundation f large language they took like a letter from in the middle and the the last one so they like basically it's this acronym creation it used to be the first letter of the words or the first two letters And now you can do freestyle I guess. Anyway, I'm just laughing. It doesn't really matter. It's easier to remember the refine. So the
00:19:02
international reporting guideline for transparent and reproducible reporting of foundation model large language model studies in medical research imaging artificial intellig and imaging uh artificial intelligence applications. So that's so crucial to comparing the results of the studies because it's important. I wanted to say like okay if you want to take this tool and see which one is better but often the research tools they never are being deployed but in general to advance the research and
00:19:36
see okay is the thing that we just developed better than something else you need to have a framework to compare. So they do have one and they this uh protocol was prespecified and publicly archived. A modified deli process was conducted to establish reporting standards. And this dely process is a process to reach consensus when you have multiple contributors, when you have multiple experts discussing and trying to agree on something like guidelines, like a checklist and there is a specific process for that. And they wanted to
00:20:09
establish the reporting standards for unimodel and multimodel foundation models and large language model application involving text imaging structured data. And there was a steering committee that coordinated protocol development, expert recruitment and all deli rounds and the harmonization phase. So look how complex this is. I'm I'm baffled. I'm like really respectful of these type of efforts. any type of like international consortia. So if you go to the podcast digital pathology podcast, you will find
00:20:43
an episode on big picture. Big Picture is a private public consortium that tries to create the biggest European image repository and they also have like over 20 I don't want to miss the numbers but if you do if you Google digital pathology podcast and big picture you will know like it's a multi-year process with multiple stakeholders like respect. But going back to our checklist, our checklist, this had 57 contributors from 17 countries and then 54 panelists from 16 countries completed round one and
00:21:24
two. 54 people did and voted and came up with it. Anyway, I know I know I'm repeating myself, but it's uh impressive. Then the harmonization phase was completed by three expert panelists and the steering comedy. So then like they decided and everything they needed to heart them up to harmonize and it this process produced a 44 item six section framework with standardized terminology and detailed reporting instructions. So when you get this abstract and look at this link, don't use this HTML. It didn't work for me.
00:22:01
I'm going to show it to you in a second. But the conclusion was that Refine provides a comprehensive consensus based reporting standard for medical foundation model and large language model research including imaging AI studies. And it enables transparent, comparable, exclamation mark here, comparable and reproducible reporting of foundation model and large language model studies. And let me show you how it looks and what's actually in this check mark. If you're just joining, you can still say hi in the chat and let me
00:22:35
know where you're tuning in from, what time it is for you today. uh to for me it's now 6:33 in Fairfield, Pennsylvania. And look at this beautiful mug that I have. It's our digital pathology trailblazer mug. Let me show you the the list. So we have the refined checklist and what do we have there? You guys see my mouse or not? Yes. Okay. So we have model specification and it is like a really a check list and model name vendor developer uh all the information and is it reported yes partial no not
00:23:15
applicable and where is it reported model architecture key characteristics and everything on model specification then we have prompt design and and they explain everything okay prompt engineering protocol with versioning what is that report the protocol used for prompt engineering and the development process. For example, iterative testing of prompt variance with predefined success metric, human in the loop review with interrator checks. So like there are methods to do it in a standardized way, right? So then you're
00:23:50
going to report did you do it, did you partially do it, you didn't do it or it was not applicable, prompting strategy, format and length and so on. Then stochasticity control generation parameters prompt operator characteristics a number of prompt attempts output selection. So yeah stoastic stoasticity controls data set integrity output de who do they have in data set integrity name versions access type source license data set origin ethics and consent statement. Wow. Missing data extent, mechanism and
00:24:29
handling. Amazing. Oh, wow. Reference standard and annotator qualifications. How cool is that? Output evaluation implementation. Okay. Did they integrate in clinical workflow? Did they measure clinical utility or added value? Wow. So cool. So you can basically like immediately when this checklist is applied see like which which stage of maturity these things are. Then there is a summary of checklist completion by section you can download and um there's a which article to site when you use this checklist. So
00:25:06
it's pretty impressive. I'm really impressed with this work. So if you are interested in getting the audio summaries of the papers that we just touched on that we just discussed the abract abstracts from uh I'm putting a QR code on the screen uh that's going to take you to the subscriptionbased page of the digital bathology podcast. Um, if you're interested in diving a little bit deeper into an audio summary that at US Cup at ASCAP in so this is the United States and Canadian Academy
00:25:44
of Pathology annual meeting in San Antonio, Texas. There's going to be a book giveaway at the Hamamatsu booth, booth number 312. So visit us and if you see me just wave let me know that you're there and I would love to interact in person in addition to our weekly digipath digest interaction. Thank you so much for joining me today and I talk to you in the next