New Matter: Inside the Minds of SLAS Scientists

The Thinking Microscope By Steven Finkbeiner | SLAS 2026 Innovation Award Winner

SLAS Episode 199

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 42:37

Our guest is the 2026 recipient of the SLAS Innovation Award, Steven Finkbeiner, MD, PhD, of Gladstone Institutes and the University of California, San Francisco. Steven won the prestigious award for his podium presentation, “Development and Application of AI-powered Label-free Imaging for Assays and Screening."

Our discussion takes us through his lab's development of an AI-powered, label-free imaging and a closed-loop "thinking microscope" that uses optogenetics and reinforcement learning to perform thousands of single-cell experiments in a single well, dramatically accelerating research into neurodegenerative diseases and beyond.

Key Learning Points:

  • AI-powered label-free imaging for assays
  • Deep learning models in biomedical research
  • Prognostic markers and disease diagnosis using AI
  • Closed-loop automated microscopy platforms
  • Overcoming challenges and limitations of AI in research

Stay connected with SLAS:

www.slas.org | Facebook | X | LinkedIn | Instagram | YouTube

About SLAS
SLAS (Society for Laboratory Automation and Screening) is an international professional society of academic, industry and government life sciences researchers and the developers and providers of laboratory automation technology. The SLAS mission is to bring together researchers in academia, industry and government to advance life sciences discovery and technology via education, knowledge exchange and global community building.

Upcoming Events:

SLAS Europe 2026 Conference and Exhibition (19-21 May 2026 | Vienna, Austria)

SLAS Meet-Ups

SLAS 2026 Sample Management Symposium (October 21-22, 2026 | South San Francisco, California)

SLAS2027 International Conference & Exhibition (January 30 - February 3, 2027 | San Diego, California)

View the full events calendar

Emily Yamasaki (00:03)

Hello and welcome to new matter the SLAS podcast. I'm your host Emily Yamasaki. I am delighted to be joined by the SLAS 2026 Innovation Award winner Steven Finkbeiner of Gladstone and the University of California San Francisco. The Innovation Award recognizes one exceptional podium presentation honoring research that is exceedingly innovative and contributes to the exploration of technologies in the laboratory.

 

Steven Finkbeiner (00:09)

Thank you.

 

Thank you.

 

Emily Yamasaki (00:26)

exceeds a benchmark or milestone in screening or the lead discovery process, or demonstrates advanced and integrated use of mature technologies. Steven, huge congratulations on receiving this year's award and welcome to the podcast.

 

Steven Finkbeiner (00:29)

Thank you very much, Emily. Pleased to be here.

 

Emily Yamasaki (00:40)

Fantastic. So your winning podium presentation, development and application of AI powered label free imaging for assays and screening brought together AI imaging and autonomous research to address some of the complexities of neurodegenerative disease biology. So just to start us off here, could you tell us a little bit about your research background and what led to your focus on the application of AI in unraveling some of this complex biology and the diseases you study?

 

Steven Finkbeiner (01:06)

Yeah, did an MD and PhD at Yale and my PhD was in neuroscience. then went on to do clinical training and neurology at the University of California, San Francisco. So I've had an abiding interest in neurodegenerative diseases in particular, but neurological diseases in general. And unfortunately, the clinical trial success rate for those disorders is really low. It's like 99, 90 % failure rate or higher. And so

 

So for me, I think it was pretty clear that we needed to come up with new approaches that might be better at trying to create a preclinical pipeline that was more predictive of what's going to happen in the clinic. And so that was a major motivation. And then more specifically in this case, one of the issues that became apparent to me early in my sort of research career was that it's very difficult to infer dynamic biology by just looking at snapshot pictures.

 

what the field had conventionally done. And there was a particular question in one of the fields I worked on that people argued about for 10 years with no resolution. And so as I thought about it, it seemed to me the only way we could answer that question was if we could create a system that would allow us to observe a disease model as it unfolds in time so that we could both understand the cascade of events, but importantly, be able to have

 

a data set that would let us unravel cause and effect relationships. So in order to make that technology possible, we built robots that could do imaging and reproducibly get back to the same cell as often and as long as you wanted. We called those robotic microscopes and it was successful in creating the sort of data sets and bringing resolution to the question that motivated the invention. But one of the things that also did and I wasn't smart enough to see this in the future was that it created

 

really large, very high quality data sets that were very suited to artificial intelligence ⁓ analysis. So oftentimes in academia, the data sets are too small to really use those tools effectively, but this particular technology did it really well. And I got a call out of the blue from Google who told me that I generated enough data to be interesting.

 

And so I became their first academic collaborator. And that was sort of how we got into the applications of AI. It took us six months. The computer scientists had never seen a neuron before, and my biologists had never written code before. And so there was some handholding that was necessary. But once we got to the end of that first six months, it was jaw-dropping. We started to see ⁓ results using the AI analysis of our images that humans couldn't see, but that we knew were there. So we knew it had a lot of promise.

 

Emily Yamasaki (03:47)

That's amazing. And I'm curious at a high level, and we'll get into a little bit more into some of the research that you presented on a little later, but what is your perspective on how these AI tools and approaches can accelerate research and drug discovery?

 

Steven Finkbeiner (04:02)

Well, I mean, it's a huge topic. think I would say the most important point for me is that I think a major reason why the clinical trial failure rates are so high is because human biology is so complex. And historically, in science, we've used reductionism to make basically the science small enough so that we can actually do it in the lab. And I worry that in the

 

in an effort to make those problems feasible, we've come up with systems that aren't as relevant. And so we can get results, but it turns out that they're not very translatable. And so for me, the one very important take home point is that for the first time in my career, I really feel like in AI, I found a tool that is capable of handling complexity on its own terms.

 

that the chief problem is just generating enough high quality data to really harness that power. But more broadly, like to address your question, I think the thing that's impressed me is that it's an extremely versatile tool. And so it really touches on almost every stage of drug discovery. We're talking today because we've used it on images and for screening and things like that, but it can...

 

go be very useful for chemistry, for toxicology, for finding the targets of your drugs, ⁓ integrating clinical and really just every single stage. can use it in one form or another.

 

Emily Yamasaki (05:29)

The first half of your talk you really focus on a deep learning model for label free imaging. Could you tell us a little bit about the significance of this model, how it works and how it can be used in screening?

 

Steven Finkbeiner (05:40)

Yeah, so actually was the product of the first project we did with Google. then, we've been so impressed with the technology that we've developed our own internal computer science group. And we're doing a lot of the things now completely independently. in that first project, Google was basically just interested in, would AI be useful for biomedical research? It sounds like a silly question now. But back then, in 2015, it was unclear.

 

And so one of the things they were interested in was ⁓ could you get AI to do something literally a human can't do? we thought, well, interesting potential application would be could you train an AI model to be able to predict from unlabeled images labels that humans normally need to use to be able to see things like the nucleus or different subsonic organelles.

 

We took basically bright field images and then used at that time a form of deep learning architecture called convolutional neural networks and pairs of basically unlabeled images and then an image of the same cell after labeling was done. And then we fed those image pairs into the deep learning network to see if it could discover a correspondence between the unlabeled image and the labeled image that was sufficiently strong so that

 

in the future, could just show it the unlabeled image and it could actually predict the labels. And the results were stunning. With some of the labels like nuclear dye, was essentially 97 % accurate. And it not only could predict whether a pixel in the image was positive, but how positive it was. And we did that for a variety of labels of cell structure, cell state, cell type. It seemed pretty versatile. It depends a little bit on

 

the quality of whatever the label is that you're using for the paired analysis. But we were impressed. It worked almost every time and for a variety of labels. So we were able to answer the first question, was that, yes, can definitely, can reveal things in images that humans can't see. But I think to your question, think there's a lot of

 

immediately useful applications for this. One of the things in screens is you ideally would like to get as much information as you can about the cells you're studying. That brings up the issue of multiplexing, bringing multiple labels together to see things. In this case, if you've already pre-trained the system so that it can predict labels, it's possible to do

 

you know, many, many more labels or be able to get that information out of your system just by looking at Brightfield ⁓ images without having to do the labeling. So it saves you time, it saves you money. And as everyone knows, every time you have to do a manipulation to a biological sample, there's the opportunity for batch variation and all sorts of technical issues in addition. So you really, simplifies the whole assay workflow and you're able to get more robust and reproducible results.

 

And I think a lot of people are familiar with being in a situation where you have an antibody that's like the best antibody for a particular thing you've ever found, but maybe there's a limited supply and once it's done, you either have to change your assay or something. In this case, you can actually train your models based on your very best antibody and they live for perpetuity. So you no longer have, you're no longer so dependent on particular reagents.

 

And the other thing that we've been able to show with this approach is you can use it not only for fixed cells, but for live cells too. And so that's exciting because I think a lot of people know intuitively that there's a lot of information in cell dynamics. And so now you have a way to just look at a bright field, even a live cell and follow, you know, look at it longitudinally or dynamically.

 

and be able to make these predictions. So it's a non-destructive assay, so it's possible to really see things, dynamics as well.

 

Emily Yamasaki (09:34)

This might be a little bit of a silly question, but I'm curious what it is that the models are able to recognize that they're able to find those patterns in.

 

Steven Finkbeiner (09:41)

Yeah, it's funny you asked that because when we got our first results, the biologists were so blown away, they literally tried to stare at the images and see if they could figure that question out. I think, you know, I think the power of AI, there's several things it can do. One is it can look at the image at different scales. So it can look all the way down to individual pixels.

 

But importantly, especially deep learning for images allows you to look at a context. So not just a single pixel, but the neighborhood of the pixels, how that's related. And that scale can extend from, like I said, a single pixel to a group of pixels to the whole image. So there may be information at different levels of sort of the image that could help answer your question, basically, that ⁓ would help it

 

predict what that particular pixel is. And so I think that's a power basically the humans will never be able to have. We can't really look at multiple scales or we can't even resolve with our eyes, you know, individual pixels, let alone put them in some sort of context and understand, be able to compute a kind of interaction that would allow you to use that information to make accurate predictions. But AI can and it's tireless and it's very

 

good at doing things reproducibly. Like it will literally apply the same algorithm to whatever the image looks like in a way that just humans can't do.

 

Emily Yamasaki (11:12)

Really fascinating to think about that model is able to pick up on all of those different things. It's really exciting. Another aspect of this that I think is really interesting from kind of that more clinical translational perspective, on being able to identify prognostic indicators of disease using these models. How successful are they in identifying prognostic

 

markers and how is that done?

 

Steven Finkbeiner (11:36)

it's possible with AI. So one of the things that we did, maybe the second project we started to work on with AI was, could you diagnose disease in a dish? So could you take cells? We do a lot of work using patient-derived cells to create models of disease. We often turn them into induced pluripotent stem cells and then into specific cell types that we can use that are relevant to disease.

 

have just used fibroblasts or blood cells and things like that. But you can set up a problem for the AI to see if it can find anything in images of cells from patient with a particular disease and one without. And there are ways to do it in a highly controlled fashion, particularly for familial diseases where you can use CRISPR engineering to create highly controlled lines. And we've been really successful at being able to

 

train AI models to be able to tell cells from a patient with a particular disease from healthy controls. And I think we've done it in a way either at scale or with CRISPR engineering where we feel reasonably confident that it's truly discovered some underlying biological phenotype that represents the disease or that connects these cells. so that's one example. on the one hand, you know, that's

 

It's at a remarkable feat. On the other hand, I think one criticism of AI that comes up is that ⁓ it's a bit of a black box. It does amazing things, but once it does those things, sometimes it's a little difficult to know how it did it. And so it can in some ways be a little bit of a scientific dead end. But that has changed, I think, in the last especially five years. There are new approaches that generally are called explanatory AI.

 

where you have new tools that you can apply once you've developed a model or a performative network that will help reveal to the human eye what the network found in the data that's driving the accuracy of the model. And that's been really exciting for us. both, I'm talking today and I talked solely at ⁓ SLAS about cell-based ⁓ imaging and AI, but we also do digital pathology and AI.

 

In both those cases, once we develop performative models, we've been able to apply explanatory AI and get it to show us the features that drove the distinctions. And in some cases, we've seen things no one's seen before. In one case, we trained a successful model using E-staining e-staining of Alzheimer's disease brain tissue, and it was able to diagnose Alzheimer's disease with the same accuracy that pathologists

 

can do with vorous specific labels. But importantly, with this sort of disease agnostic labeling approach, with XAI tools, we were able to have it discover that in a subset of cells, there was a signal around the nucleus that was really driving the classification. And that hasn't been described before. I think exciting because I think not only can it.

 

do potentially diagnosis and prognosis, but importantly, we now have a path that lets us go from making those exciting performative models to something more specific that could allow us to generate hypotheses about underlying mechanisms and potentially get to new mechanisms of disease, especially early mechanisms of disease, think, early markers of pathology. One of the things too,

 

I mentioned in the talk was with AI, you can of course use it to analyze the images that are in front of you and train models based on specific images. But if you have longitudinal imaging data like we generate, you can also ask the AI to kind of work across time. So you can ask it, is there some correspondence between an image of that cell early in the process and late?

 

And you can go both ways. So you can predict the future and you can actually look back in time. And I think both of those directions are very powerful and something of course humans can't do without the help of AI, but could lead you to understand what exactly your question, what is the prognostic value of this particular change? Like what will the future be for this cell or this tissue? And likewise, if I know something, an endpoint that I really care about,

 

What's the earliest that I could see a change in that cell that would have predicted that outcome?

 

Emily Yamasaki (15:54)

Yeah, that was actually one of my questions is how early have you seen this approach been able to identify progdostic markers? Is it significantly earlier than, you know, the approaches that are currently in use today?

 

Steven Finkbeiner (16:06)

Yeah, ⁓ I would say that most of the work that we've done has been ⁓ focused on our cell-based models. And we can identify things that appear very early in the course and that predict something that will happen at the sort final fate of the cell, which in our case, a lot of the experiments we do are just over a period of a couple of weeks. But nevertheless, I think in principle, if there is a correspondence, it can detect it.

 

Emily Yamasaki (16:31)

It's such a powerful potential to be able to make those calls earlier and ultimately, I guess, down the road, look at getting people treated and into therapies earlier for anything that is identified.

 

I know your focus is primarily on neurodegenerative disease models and as you said on the cell-based approaches. are some of the applications outside of neurodegenerative diseases? Could this be applied across other conditions like, for example, cancer or looking at other diseases where you could get this longitudinal view of progression?

 

Steven Finkbeiner (17:04)

as you point out, we are focused on neurodegenerative diseases because of my interest in training. we often we partner with biotech and pharma companies a lot. And ⁓ we've been asked by some companies to work, help them with problems that are outside neuro. And in some cases, I felt like the question was really interesting and ⁓ worth doing with them. And it

 

reflected sort of, I thought, a very thoughtful application of kind of some of this technology. So this is really a cell biology platform. It's not limited to neuro in any sense. I will say that to be able to longitudinally follow cells, you need to come up with a way that allows you to track them. And so we're fortunate in most cases with neurons because they hold still for the most part, some cells.

 

professionally run around like microglia or other immune cells and things like that. And so you do have to either figure out either image more frequently or figure out other solutions. For example, one approach that we've kind of developed is sort of using face recognition approaches to be able to recognize a cell independent of its location. So a unique cell and be able to track it as an example. But yeah, so we've done projects, for example,

 

on cancer. So one company in particular had a phase two clinical asset and it killed some cancer cells, but it didn't kill other cancer cells. And it wasn't clear to them why was a difference. And so one of the things that we were able to do was to be able to train an AI that could predict which cells would die and which cells wouldn't.

 

but do it in time to be able to collect them and then analyze them with other approaches like transcriptomics to see if we could help them answer that question. We've done stuff with GI and ⁓ cardiomyocytes and things like that. So it really is a pretty flexible approach and AI doesn't really care what cell type you're working on. It's an image at the end of the day. And so you can apply these tools pretty broadly.

 

Emily Yamasaki (19:06)

Thinking a little down the road, do you think it's feasible for these kinds of tools and approaches to become more commonplace in things like diagnostic screens or more patient specific applications?

 

Steven Finkbeiner (19:18)

Yeah, I definitely do. think still a situation sort of skill sets or the expertise they need to be able to do this work effectively has been mostly in the computer science arena and biologists don't necessarily have that background or training to be able

 

understand and grasp kind of what the possibilities are and to be able to integrate them. But I think that's changing rapidly. mean, in particular, I think that actually the AI companies are moving toward the biologists faster than the biologists are moving toward the AI companies. companies like Google have created something called a co-scientist. You know, Anthropic is trying to really adapt Claude to be able to help scientists. And so my prediction is that

 

Emily Yamasaki (19:52)

you

 

Steven Finkbeiner (20:07)

as biologists become more comfortable with the use, for example, of large language models and using some of those tools to be able to the kind of strengthen some of the scientific approaches that they're doing. I think it's just going to naturally infuse and disseminate into some of these other applications and people will learn kind of how to do this effectively. ⁓ So I think it's an exciting time.

 

Emily Yamasaki (20:30)

very exciting to see some of those applications come to fruition. So switching gears a little bit, in the second part of your talk, you also describe an really innovative closed loop automated microscopy platform that uses AI to kind of design future experiments based on the data that's collected. Could you tell us just a little bit about how that system works and what the significance of that technology is?

 

Steven Finkbeiner (20:53)

Yeah, sure. think, the, idea for it sort of came around the time Google, came up with a chess Google chess player. And if people in the audience aren't familiar with this, it's, was pretty amazing. So, you know, so the thing about machine learning that's important to know is that, instead of the old ways to write computer programs where you basically put

 

in the code, you know, fairly well-defined instructions to the computer about do this, not that sort of thing. With machine learning, computers learn by example. So you basically just show them things you want it to do and things you don't want it to do, and it will figure out what, you know, the best approach is to be able to achieve those goals. And so with Google Chess, it was the same thing, you know, there were, you know, the old

 

IBM approach of kind of just brute force, sort of computer programming to get it to play, you know, calculating kind of the value of each move and things like that. But all Google did was just show, the AI people playing chess and, and over time it figured out how to play and they could even set up two AIs that would play against each other and just keep getting better.

 

And it was amazing. ⁓ There was a game I'll never forget against a computer called Stockfish, which is basically based on the old approach where they gave Stockfish all this extra time and everything and Google, the Google chess player still beat it handily. Yeah, it was, and so it's become the best player on earth. And what was remarkable to me was that there was one game.

 

one match in one of the games where it did a move people hadn't seen before. So it was as if it had learned some like even underlying principles of playing chess that humans hadn't even grasped. I, you know, with all of what I've told you today about how we've been able to use AI to see things and images and data, you know, we've applied it to genetics to that humans can't see, we started to wonder what if we could put it in charge of

 

doing the science a little bit like Google had figured out a way to get it to do chess. Would it be able to solve problems more quickly, maybe better, maybe come up with solutions or do experiments that humans wouldn't even think to do? And so the idea was kind of spurred by a lunch I had with one of the original Googlers who was just wondering out loud if you do closed loop machine learning to do science.

 

And closed loop machine learning is just this idea that basically you have the AI make observations, then you have the AI do an experiment, and then you have the AI analyze the results of the experiment and come up with the next round of sort of experiments that it does. And so that's what we endeavored to do. And the thinking microscope is just a funny name for the platform that does that.

 

In order to enable the microscope to do experiments, we equipped it with hardware called a digital micromirror device. And this is a device that allows you to basically reflect excitation light from a light source onto the sample plane. But it has a thousand mirrors, each smaller than a human hair, and we can put them under the control of a computer. So we can direct the light to very specific locations.

 

on the sample plane. So the way the system works is you put your microtiter plate on with cells. The microscope will scan the well, identify every cell that's in that well, give it essentially a little social security number, each one, so that it can tailor the experiments to each cell. And then it controls those mirrors to be able to deliver light to each cell, independent of another. can even do subcellular regions.

 

And there turns out to be a whole toolbox of molecules that you can put into cells that will make them convert light into some sort of biological pathway. ⁓ So it's a way basically to use light or optogenetics to be able to trigger biology in cells. And so it can do an experiment. So it can control the frequency and the dose and the intensity of the light that it delivers to each cell.

 

⁓ And with this system, you can ask it basically to now explore what the relationship is between the stimulus you're delivering and the response that it's producing. I think in the example I showed at the SLAS meeting, we were looking at the role of oxidative stress, which is thought to be a mechanism both for aging and for neurodegenerative disease. And we had a little molecule called Minisog. We could put in cells and with light that generates reactive oxygen species.

 

And the thinking microscope was in a single well able to do dose response curves, because it could basically deliver different doses to each cell and really understand, literally do thousands of experiments in a single well. So something that would take much, much longer to do using conventional approaches. So dramatically accelerates science and gives us a really high resolution answer to the original question. And then the

 

The way we're closing the loop is to use reinforcement learning, which is a way that the system can use the observations it's making and be tasked to solve a problem. And then so it will use the data that it's collected to look for perturbations that seem to be going in the direction of a solution and then use that as the basis for designing the next round of experiments.

 

and keep doing that iteratively until basically it's figured out a solution. so, yeah, so that's the big idea. And I'm hopeful that, as I said, that it will allow us to both miniaturize science to the level of single cells, so getting much more data for a much less cost and being able to get high resolution answers. We've already done small molecule screens where we can actually do

 

dose response curves in a single well. And the results are incredibly robust because you have in-well controls. And so the results you get ⁓ in our hands so far have been extremely reproducible, much more so than a conventional high throughput screen.

 

Emily Yamasaki (27:05)

So with that kind of iterative experimental design and execution and just the amount of data that you're generating through that and through the miniaturization and an increase of throughput that way, how do you go about identifying actionable targets from the data that's generated in these kinds of processes?

 

Steven Finkbeiner (27:21)

Yeah, we have a couple different approaches in general to that problem. So I think like a lot of people, we have both unbiased sort of screens that lead to targets as well as more candid approaches. And we also have something we call kind of a top down or bottom up approach. We are strongly interested in

 

that have some sort of genetic connection to the disease. And part of that is there's evidence that things that have a genetic connection may have maybe up to a two-fold increased chance of ultimately being FDA approved. And then there's, I think, already a strong bias by some of our industry partners about like really looking at targets that have some sort of genetic connection. And so,

 

In terms of ⁓ identifying actionable targets, we use machine learning to look at for genetics, for example, with ALS. We can use it to do multimodal data integration. That can be a way to look for targets from our screens that have some connection to human disease. For example, we do a lot of family-based whole genome sequencing to look for genes that might modify that.

 

severity or age of onset of certain diseases, the penetrance. And then we can take those in sort of a top-down approach and put them into iPS cells to see if they actually mitigate or modulate the phenotypes that we think are disease associated. And then we developed Bayesian tools to help us basically integrate data from these different sources to be able to raise or lower the confidence we have for a particular target.

 

And so that's one approach to being able to sort of prioritize targets anyway. Do a lot of genetic screens too. We really think that those are very useful for in an unbiased way beginning to understand the genetic architecture of the biology or biological modifiers of disease phenotypes. And when we have the hits from those screens.

 

⁓ There's some simple things we can do just with bioinformatics to see if chemical matter already exists to some of those targets. Often we do find, especially when we do genome-wide screens, small molecules that already can target things so that we can have a sort pharmacological validation of some of the hits that we have. And then small molecule screens, we can do chemigenomic or unannotated screens. Oftentimes, it's challenging to get

 

especially for unantated screens to mechanism of action or the target. But we've also developed AI, which I think I mentioned in that talk, to be able to leverage data that other consortia have developed so that you could use imaging to actually predict the target and mechanism of action of the drug. so those are sort of some of the examples of ways that we get to targets in terms of whether they're actionable or not, because we're in academia.

 

you know, we have certain limitations. So we don't have the same, you know, depth of expertise and skills in terms of going from hit to lead in terms of medicinal chemistry and things like that. And so one of the things we've done is form pretty close relationships with a lot of drug companies to get their opinions, because in the end, somebody needs to pay for a clinical trial if we get that far. And so we, it's important to us to find a good partner who

 

you know, is enthusiastic about the target. Oftentimes they'll base those decisions based on drug ability. What are the strengths of their, you know, company? Some prefer, you know, well-defined drugable targets. Others actually, you know, like things that, you know, that might be targeted with antisense and oligonucleotides or other sort of technologies that don't depend on small molecules. So that's important.

 

the novelty of the target for some is important. And a lot of times these companies also have a lot of information that's not public around toxicology. So they are aware of baggage around certain targets that ⁓ sometimes isn't well known. And so again, getting kind of their feedback, especially early on is really helpful. Cause if they're aware of some side effect profile that might really prevent.

 

that target from ever becoming a drug that would be a clinical candidate. It's great to know that early so we don't waste ⁓ time and resources.

 

Emily Yamasaki (31:24)

Absolutely. So a lot of the tools and approaches you've described and that your group is working on, really exciting in that they have the potential to accelerate our understanding of disease pathology, of biomarker identification. I'm curious to hear, because this area also has a lot of hype around it right now, what is your perspective on some of the limitations of these approaches?

 

Steven Finkbeiner (31:45)

Yeah, that's a great question. I think...

 

One of the things, one of the main things I think that we've encountered time and again is a problem called overfitting. And the issue here is that AI is so powerful and so sensitive that it can oftentimes detect features in your data sets that really aren't pertinent for the biology or the question that you're trying to answer, but by

 

dumb luck end up being features that AI can use to actually make some sort of accurate classification or whatever, do whatever the task is that it's asked to do. so I think for people who haven't done this before, they get really excited because they see, your model is 95 % accurate. But when you start to dig deeper, it turns out that

 

It's 95 % accurate for that data set. But if you now apply that same model to a data set that it hasn't seen before, the performance can drop down to 60%, know, barely above average. And so that would be an example of overfitting. And that's a case where you've developed a model that's found something, but it's probably not the important biology. And so it doesn't generalize. And it's still, I must say, challenging.

 

to defeat this problem. For cell-based screening, I think one advantage we have is that it's possible to develop really large data sets and also data sets that you can, large enough that you can train on part of it and then reserve unseen data that can be used to really test the model to make sure that that's not happening. And so that's the best way.

 

I think, to really get at this issue. There are some other strategies. So sometimes people will use something called k-means cross-validation, where you have a large data set. You kind of divide it up into segments, and then you keep shuffling them and see how the performance changes over the sort of data, the sort of portion that you use to train versus those that you use to validate.

 

And if the performance varies a lot, and especially if it drops every time you go from your training set to your validation set, that suggests that you have an overfitting problem. But I think I'm hopeful that there are, one of the things, for example, that we've used explanatory AI to do, although it's primarily meant to help you interpret your AI, for us, it also serves as a little bit of a sanity check because

 

If you apply that XAI and it's showing you features in the image that clearly aren't in the cells that you're trying to use, you become suspicious that the model's found an artifact. Another thing we can do is shuffle labels. essentially, the AI may work really well, diagnosing disease versus control. But we've been asked by folks, well,

 

Maybe the AI is just so sensitive that it found something in there, but it's not really meaningful. Maybe, fact, it doesn't even really matter. You could basically arbitrarily design the data set into any segments, and it would find a difference because it's so sensitive. So we've done that too, where you basically take a subset of the control data and call it disease and vice versa and see if the AI can still tell them apart and

 

And in good models, basically, we see the performance basically go to chance when you do that. That's how it should work if it's really found something real. So I would say that's one of the big issues. And unfortunately, for people who just, it's a very easy problem to fall into. And sometimes you can certainly demonstrate that overfitting is an issue, but sometimes it's hard to completely eliminate it as a. ⁓

 

you know, as a concern. And so I think that's something people have to keep in mind. Probably the more famous concern that people have is something called hallucination. And this is more relevant to generative AI models. So especially large language models. And they're the idea and so-called foundation models. They're the ideas that you've trained an AI on a really large data set and you've trained it so that it learns the features of those data sets.

 

and learns the relationships within the data well enough so that you can query the sort of AI to get answers to questions that you may not even have had in mind when you train the data set. So, you you think the answer is in the data, but you didn't really train the AI to answer that specific question. It's more of a general one. And the problem that can come up is that AI will always give you an answer pretty much. It will never say, don't know. And so sometimes, unfortunately, it effectively makes up.

 

the answer. So that's the hallucination problem. And I think some of the same solutions to overfitting in terms of using large data sets and high quality data sets can be helpful, but it is a big problem. And it's something that's hard to avoid entirely. I think for a lot of the things we do, we have independent ways to validate whatever answer it is that the AI gives us. And we can do that even experimentally. I think that's one of the key things with the thinking microscope is that ⁓

 

Basically, the AI may give us an answer in terms of kind of what it thinks is going on and design an experiment based on that, but then we do the experiment and see if it's right or not. And so I think that can be really helpful. don't, know, and so those, think are the two big kind of concerns in terms of reliability. I think another limitation that's just worth mentioning is that the size of the data set and the shape actually really matter in terms of what the AI can do.

 

⁓ It's often the case, especially when it comes to clinical data, that people really put a lot of emphasis on cases versus controls. So oftentimes the data sets are very imbalanced. You may have 10 times more cases than controls. And that's not a good situation for AI because what you want to do is you want to basically force AI to be honest. ⁓ And the best way to do that is to create balanced data sets where

 

you know, it's 50-50. so you, you know, so if it's able to, you know, make predictions that are much better than that, you can have reasonable confidence that it's found something in the data set. Whereas, you know, AI can be quite lazy. So if, you know, if you, for example, had, you know, 95 cases and five controls, well, it's already gonna be 95 % accurate if it just says, you know, they're all cases kind of thing. So I think that's important. The other is that, you know, people,

 

who haven't used these tools sometimes ⁓ think of them primarily as tools to handle large amounts of data. And it is true that these tools are remarkable in terms of being able to process data, complex data quickly. But the shape of the data matter in terms of, it matters whether you have lots of different individuals with some data versus a few individuals with lots of data.

 

the data set or size may be the same, but AI is going to do a lot better on the first situation than the second, because the more independent examples you have, the better job it will do at finding sort of the common features that cut across those things and basically avoiding that overfitting problem that I mentioned earlier.

 

Emily Yamasaki (39:06)

Well, your talk was certainly a hit on site. I know there was a lot of buzz around people who were able to attend and listen to it. And obviously it really impressed our judges as well. So I want to just ask you, what does winning the the SLAS 2026 Innovation Award mean for you and your work?

 

Steven Finkbeiner (39:23)

Yeah, well, the first thing to say is it was a wonderful surprise. I wasn't even, to be honest, I wasn't even aware of the award. I was asked to speak at this conference by ⁓ someone who, at Amgen, who knew me from work I'd done at Genentech with a team there, and he just wanted me to talk about label-free imaging, and so I happily agreed. But then they had a little box when you submit the abstract about whether you would like to be considered for an innovation award, and I checked the box, and that...

 

that began the long journey to ultimately ⁓ the presentation and getting the award. So very unexpected, but very appreciative. And so, yeah, I mean, I think as hopefully you've gathered from some of my comments, we think innovation is absolutely critical to make progress with neurodegenerative disease, diseases I'm sure in general, neurodegenerative diseases are the ones on my mind.

 

given especially the complexity of human disease, the slow progress of conventional approaches and the high failure rate. But there, and I don't think I know anyone who doesn't like innovation, but there are lots of challenges to actually pursuing it. So it usually involves pursuing high risk endeavors that often fail. In my experience, sometimes if the idea is really novel, it can be hard for others to actually grasp what you're doing. In fact, I almost barely got tenure.

 

with when I first invented the robotic microscope because no one really understood what I was doing until it was done. And so there is that huge challenge and it can be a little terrifying at times to kind of take those risks. you know, getting to proof of concept is really, it can be challenging. It can be a kind of scary time. And ⁓ the other thing too is, you know, just in terms of the support that's available, at least for academic.

 

researchers, the NIH still remains the largest federal source of support, but it's pretty conservative by nature. A lot of times the review panels will just focus on ways that it could fail. And so sometimes it's difficult to do high risk work with support of NIH. The other thing too is that NIH in general has been much more

 

I think inclined to fund applications of new technology rather than funding the development of new technology. So I think that also makes it challenging if you're trying to really build something from scratch and really get an idea that you have in mind materialized. So I think the Innovation Award at SLAS 2026 was really meaningful to me because I thought that even the whole meeting, I think they really emphasized the important.

 

some innovations, so I really appreciated the kind of value they assigned to it and the importance that it plays in terms of really doing breakthrough research and really finding solutions to these really challenging problems. And so, and you know, it's very gratifying to receive some recognition for the risk we've taken over the years. And I guess I was hopeful also that...

 

Awards like this can really give people the courage to take those risks. hopefully it helps other innovators too.

 

Emily Yamasaki (42:26)

Thank you so much for joining me today and congratulations again on winning the award. We can't wait to see what comes out of your research next.

 

Steven Finkbeiner (42:33)

Thank you so much Emily, it was a pleasure.

 

Emily Yamasaki (42:35)

Thank you.

 

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.