Trends from the Trenches

Episode: 43 - The Hard Truth about Digital R&D: Patterns, Pitfalls, and the Next Wave of Innovation

Bio-IT World Season 1 Episode 43

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 45:11

At last month’s Bio-IT World Conference & Expo, Eleanor Howe, founder and CEO of Diamond Age Data Science, presented the Trends with the Trenches session based on gathered input from colleagues, clients, and advisors across the life sciences field. Coding agents were the central topic—how they’re speeding up software development, open source maintenance, and analysis workflows while also shifting new types of responsibility to the human reviewer. Now, “just coding” is no longer enough, and Howe provides advice for early-career scientists on how to get hired in a different environment. Plus, she covers why simulating data can backfire, why smaller purpose-built models often win, and why better data curation, active learning, ELN automation, and federation may matter more than another flashy model demo. 

Links from this episode:  
Bio-IT World Conference & Expo
Bio-IT World
BioTeam
Diamond Age Data Science 

 

Bio-IT World’s Trends from the Trenches podcast delivers your insider’s look at the science, technology, and executive trends driving the life sciences through conversations with industry leaders. 

Conference Session Setup And Context

Announcement

Welcome to BioIT World's Trends from the Trenches podcast, your insider's look at the science, technology, and executive trends driving the life sciences. In this special episode, we bring you the Trends from the Trenches session, hosted by Eleanor Howe, held last month at the BioIT World Conference in Expo. Let's listen in.

Eleanor Howe

Hi, everybody. My name's Eleanor Howe. I am the founder of Diamond Age Data Science. I'm a computational biologist myself. And I'm going to tell you about what I'm seeing in the field these days. In the last, I tried to focus on the last 12 months or so. One of the differences in how I'm going to give my talk is I don't talk as fast as Chris does. No one has seen my slides until now, actually. All right. And I did a bunch of interviews with folks. I relied on information from my team and my clients, but then I also reached out to my friends and colleagues and asked them, like, what are you seeing as far as trends in the industry? So this is informed by trusted advisors of mine as well. I am a consultant. I want you guys to think I'm smart. Do you guys read Dungeon Crawler Carl? It's an amazing, yes. Okay, all right. So it's awesome. It's a science fiction book, and one of the characters is very straightforward. Don't trust people unless you know where they're coming from. So I'm coming from a scientific background. I'm coming from a space of providing services into our industry. I don't have a product to sell. And I'm going to tell you what people are seeing and what we are seeing. I hate this word. I hate it so much. AI is an overused marketing term that is no longer informative. And so what I'm going to try to do today is to not use it very much. And instead, I'm going to talk about the specific sub-technologies that are actually useful and talk about what they can do and what they can't. And I recommend that anytime somebody tries to sell you on something in AI, you ask them, well, what kind of AI are you talking about? Tell me more than that, because you could be talking about a linear model. This is the big news.

Coding Agents Change Daily Work

Eleanor Howe

Coding agents, everyone I talk to, everyone on my team, everyone of my clients, this is the thing that's changing the way that we work. Bioinformaticians are able to write more code in less time, and they're able to build bigger things in the same amount of time. Biologists are able to write their own plots. They're able to do some analysis for better or for worse. And it's changing everything about our day-to-day work. We spend our time interacting with the coding agent rather than directly manipulating code, which is a pretty intense activity, actually. Coding can be relaxing, but working with an agent is not. There used to be a pipeline for talented, smart, interested people who could write code but were not biologists to come into our field because they could code and we needed their help. And they were welcome. That path is closed. Being somebody who just codes now is actually not helpful in our field. You need to know some science. There's a huge benefit to open source software development now. This is just one example of somebody who has ported an entire gigantic, well-loved, detail-oriented, incredibly essential library into another programming language in a week for $500. It would have taken him, according to him, months to do this without a coding agent to help him, but he could do it now. So he could carve out a week to do this work. He could not carve out three months to do it.

Eleanor Howe

This is going to be a huge benefit to open source software package maintenance and upgrades and everything. And that's really good because there's not a lot of money going around supporting open source software. There never was. And now maybe it's going to be worse. Chatbots, LLMs, also good for things that are not coding agents. Literature summarization is way, way more fun to do. Now it takes less time. You can ask Claude to poke holes in your hypotheses. That's also kind of fun. I'm hearing that from people. They're like setting up specific rags with Claude or other LLMs to help them understand their biology and then to act as a, I don't know, an interested opponent in discussing what they do and what they're trying to do and what their hypothesis is. Results summarization, probably everybody is doing this. You pile a whole crap ton of data into the prompt and you just say, What is the theme here? You know, of course, the problem with chatbots is is well known. It they will be confidently wrong and tell you something. And you have to be on guard for that at all times. That's another one of the reasons that working with these systems is quite tiring.

Jobs Anxiety And Junior Advice

Eleanor Howe

Some folks are pretty worried about their jobs. That's what that's what I hear. I get it. Our jobs are changing. People are worried that there won't be as many jobs for bioinformatics folks. It's true it's changing. I'm not convinced that we're going to disappear because there's still a lot of work to do, and these agents generate more data than they used to. There's more results to look at, there's more stuff to do. And I'm getting mixed, mixed input from the people that I'm talking to about how this change in our work is affecting hiring or not. So one of the things, I don't think bioinformaticians are gonna go away, right? My team uses these agents every day, and they spend a lot of time correcting what's wrong and guiding the agent. They're essential. I I pay them. I know that I need them. I would not pay them otherwise. I'm a business person, right? Like really, it's it's they're essential. I really feel bad for the junior folks who are trying to get jobs in our field because it is a tough, tough, tough job market out there for them. For those folks, I'm sorry, it's rough out there. Your resume is not the issue. Don't waste all of your time tuning it. Try to get some world world real world experience because that's the thing that really matters. And boy, look twice at any paid degree program because the degree programs that I'm most familiar with don't provide the kind of training that is actually really useful now. They teach you how to write code, how to string pipelines together, and that kind of thing is not as differentiating as it used to be. Go for an education that teaches you how to manage yourself, manage complex projects, persevere in the face of things getting really messed up and not working for long periods of time, handling complex interacting priorities. So I'm thinking PhD programs actually, not bad at this. The right thesis-based master's program potentially could do this as well.

Eleanor Howe

Oh, there's something wrong with my slides. Yeah, okay, there's supposed to be a quote on one of these slides that says, I don't think I'll ever hire another junior bioinformatician. Sorry, it didn't come through. And that's from a senior bioinformatics leader at a major pharma company. And when he was explaining to me how his perspective on jobs in our space has changed. And then on the next slide, this one that is completely empty, it all it is supposed to say, I don't remember the quote exactly, but something to the effect of some companies are only hiring junior people because they are more AI forward. And that came from a person who works with venture startup companies that are generating brand new small biotechs. So opinions vary. I don't think we really know what's going to shake out from this. But you know, there's hope. There is, there, there are jobs out there for junior people. Okay,

Enable Tools Or Lose Your IP

Eleanor Howe

so the upshot of coding agents from from my perspective is you must enable them in your organization. If if you're in an organization that's saying we don't do chatbots, do not use ChatGPT, don't use anything, your people are copying and pasting into the open source, into the free version of these tools, and you're losing stuff. You're losing your IP. It's really important to go and license something that's going to protect your company and insist that people use that. I also recommend not marrying one of the products right now. They they're changing very rapidly. Each of them is good at something different. And next year it'll be totally different which one is the best for your current use case. If you are building platforms, I would say try to make them LLM agnostic, build a flexible interface on that could be switched between many of them or use all of them at the same time, possibly. Pricing is in the Uber subsidization mode. Remember when Uber was cheap? The investors were subsidizing it because it was in a fight with Lyft and they were trying to get market share. And then once Uber stopped being able to get investors and to pour money and they needed to make a profit, then the prices rose and rose. And that's where we're going here. Every week I feel like I get another notice from Google saying your pricing on your Gemini account is going up. I don't know if you guys have all seen 2001, but spoiler alert, HAL 9000 is the spaceship that or is the AI that runs the spaceship and goes crazy at the end of the movie and basically kills everybody. And it is very confident about its decision to do so because that's they he's it's preserving the mission. This is the problem that we all know about LLMs. They are confidently wrong and they need a lot of guidance in order to do the right thing. Also, I really wanted to put this picture in my slide deck. All right. So

AlphaFold And Structural Biology Wins

Eleanor Howe

non-LLM-based systems. So I'm moving more into foundation models here. Alpha Fold is one of the technologies that has properly changed one of the fields in drug discovery, and that is the structural biology field. People who are working with antibody-based drugs, they all use Alpha Fold. Everybody is using it, it's all over the place, and it causes, it's causing real reductions in the amount of time it takes to develop a new drug. Like I know of a company that designed a new antibody, and it wasn't a standard antibody, it was one of these fancy ones with modifications. Designed it in silico and then grew it up in cells and sent it up to be crystallized, got the crystallized structure back and compared it to the designed structure, and it matched, it matched within an angstrom, and they were able to go right into trials with it. And that is game-changing for an early stage biologics company. So that cuts, I don't know, a year, maybe two years off of development time. It's huge. It is huge. And Alpha Fold and Friends, there are others, are all over the place. Any company that's doing any kind of structural biology is using these in great depth. Boltz 2 is a tool that is used for chemical affinity binding. It is also heavily used and it's very good at its job. So there's a lot of good structural biology models out there that are really helping move drug discovery forward.

Eleanor Howe

On the other hand, Alpha Genome is a model that predicts transcriptional profiling or transcriptional outputs after mutations in DNA, right? So this is another model that doesn't do as well as Alpha Fold. It cannot predict outside of, not meaningfully outside of its own training data, and it doesn't it doesn't predict accurately the outcome of mutating housekeeping genes. Housekeeping genes are the ones that definitionally are required to keep the lights on in the cell, and knocking out one of these housekeeping genes kills the cell. And so you can see why this system wouldn't be able to predict the outcome of that because you can't train on dead cells. So there's a hole in the training data, and so you can't predict anything there. Why the difference? So Alpha Fold succeeded because they had a really tractable biological problem. It's a clear outcome that they're looking for, they want structure, and then they had a beautiful data set to use, this protein databank that was 30, 40 years, I forget how long, of thousands of scientists building structures, measuring really very precisely these protein structures, and then depositing them into the database in a structure that was enforced. You weren't allowed to put the data into this database in any other way except in the appropriate format. And so when Google came along and decided to build this, the AlphaFold system, they had a pretty darn good data resource available. They still had to do a lot of work to clean it up. There was a lot there, but they had something really good.

Eleanor Howe

And then they also built a biologically aware algorithm. It includes a bunch of multiple alignment data, so phylogenetic data to supplement its findings, and it does a great job. Whereas to me, transcriptional profiling is a much bigger problem because in what there are 20 amino acids. If you want to predict a protein structure, you have 20 types of things to work with. And yeah, they can move together in a lot of different ways, but fundamentally that problem is much closer to physics than transcriptional profiling is. Transcriptional profiling, you have 25,000 genes-ish, and how many transcripts per gene, you've got a lot of degrees of freedom. And each of those transcripts is affected by, you know, the sex of the organism it came from, the tissue type, the age, whether or not the air conditioning was on in the lab that day, everything. There's just tons and tons of inputs that don't really apply to that structural data. So those things don't really help very much yet. So

Why Big Biology Models Struggle

Eleanor Howe

we could solve that by simulating data, right? Why don't we simulate the data and then train on that? This is a terrible idea. This is the worst idea. If you simulate data, okay, let's go generate some data. Is it perfect? No, it's not gonna be perfect. There's gonna be problems with it. And then if you train on that imperfect data, you learn all the imperfections. And then you generate more data with more of those imperfections and then train on that. Like you're gonna just keep building and building and creating bigger and bigger problems, like this guy who's chewing on his tail here.

Eleanor Howe

There's a metaphor for this that I love in breeding. The Habsburg family. Are you guys familiar with these folks? They ruled Europe for like three, four hundred years, and they kept power by marrying each other, right? So there's lots of cousin marriages and uncles marrying nieces and all kinds of stuff. And they became known for epilepsy and insanity and early death and the Habsburg jaw, right? Where their jaw, they they would have this look at this picture here. This guy has got some problems. There's a zoom in here. So this is not a problem with perspective. This painter knew what he was doing. These are royal court painters that made these paintings, and they were incentivized to make their subjects look good, right? And this is the best they could do. So don't simulate and try to train on that. I mean, in small use cases that can work, but they're very limited. Okay, so the things that are working are these smaller purpose-built models, right? That's the stuff that can work. People have a scientific goal, a specific goal, and build a small disease-specific or tissue-specific or whatever model, and then train and then generate data specifically to build that model. And then that becomes a useful predictive tool. The kitchen sink method of collecting all of the data and putting it into a model and trying to build a foundation model out of that, it's not going to work in biology the way it has worked for the human language. And there's a couple of reasons for that. And one of the reasons is I think biology is more complicated than English.

Eleanor Howe

And also, humans talk all the time and everything they say goes on the internet, right? And so the training data is like everything any person has ever written on the internet ever, plus all the books ever published, right? So anything humans care to talk about has been put into those big foundation models. We have not put all of biology onto the internet. There is way less of it. There's a very limited set of data. People talk about, oh, but geo, the gene expression omnibus, it's so big, there's so many things in there, but it's nothing compared to the complexity of an actual cell, and then let alone like a body full of cells or a population of people full of bodies. So there are efforts to remedy some of these situations to try to figure out what is needed in order to build some meaningful foundation models, at what scale, where where how can we generate the right data to do some drug discovery? So, this project from Illumina's got a consortium with AstraZeneca and Lily, and at least one more pharma company, I can't remember, and they're collecting single-cell RNA-seq data to try to build some good models that will accelerate drug discovery. I think this is the right approach. They're doing something, they are planning ahead of time to build these models, and so they're generating the data in a consistent way, in a consistent format, with a specific set of lines and conditions that they want to go over, and they're planning to try building more models to predict, in this case, response to perturbogens. So perturbogens being drugs, something that would perturb the cell, it's basically poking the cell in some way. So a gene knockout, a gene overexpression, or a drug. This, I think, is one good way to go, but a billion cells is not enough in part because single cell tends to, I don't know how many cells they're going to use per sample here, but a typical experiment would have 5,000 cells per sample, which would mean that there's only actually 200,000 samples in this data set, and that's not very much. I don't know what they're actually doing as far as replicates go. Okay.

ELNs With Chatbots Plus Active Learning

Eleanor Howe

Another trend that I'm seeing is, and this is a trend that goes way back. Scientists hate ELNs. They have always hated ELNs because curating data is a pain in the neck, right? And many companies they can't pay their scientists enough to actually put the data in the ELN. But but now we have these chatbots, these LLMs who can assist with that, and that is really effective, right? If if you've got a problem with getting data into the ELN, you've got high-throughput screening data, you've got anything else large. One of the things we've had success with is building a chatbot and then some back-end code that talks to the scientists and sort of says, hey, we need to know X, we need to know Y, and then can take kind of an arbitrarily structured Excel spreadsheet and turn it into the right format that it can actually be uploaded into the right database. And then that data becomes usable for training foundational models. Imagine that. So LLMs are the solution to their own problem.

Eleanor Howe

Active learning is one of the things that's actually functional. This is, and this is simply the this is a whole area of expertise here of assessing a data set and figuring out what data would most meaningfully improve the predictive power of this data set. And then servicing those experiments and directing this, and this is how you direct your research program to do fewer measurements. It reduces the number of experiments that you have to do in order to get the right data in-house to build a useful predictive model. This is something that I first heard about in the small molecules space. I was actually here at BioIT World on the show floor, one of the , forget the name of the company, built something like this for small molecules. It was very impressive. I think they won best of show, actually.

Announcement

Are you enjoying the conversation? We'd love to hear from you. Please subscribe to the podcast and give us a rating. It helps other people find and join the conversation. If you've got speaker or topic ideas, we'd love to hear those too. You can send them in a podcast review.

Eleanor Howe

Okay. Federation is starting to work.

Federated Weights Negative Results Public Data

Eleanor Howe

One of the problems with federation is people don't want to share their data. But if instead you build a model and the model is a bunch of weights, and you cannot you can't get the data back. Once you make the weights, if you can hand those weights around, and the people who have it cannot extrapolate back to the original data. And so folks are realizing that that is a safe way to federate. So people can combine data sets from different companies together and have a more powerful system this way. Lily's Toon Lab is one example of one of these places where, and this is for admi compound assessment. That's what this Toon Lab is for, but there are other examples. LifeBit is doing something like this too. There's a huge bias in publication. I think everybody has heard about this. This missing data that is it it it's people don't publish negative results, right? If the results from your lab experiment don't work, that that just gets shoved aside and never published. And then the next scientist goes and does that experiment and it doesn't work again and they don't publish it. And so nobody knows. And then the third scientist goes and runs that experiment because it's a good idea and it doesn't work and still nobody knows about it. This negative results problem persists, it has persisted for a very long time.

Eleanor Howe

And this is something that I have no idea how to solve this problem. But if we published all of our negative results, it would help save all of us a bunch of time in repeating experiments that don't work, and then would also be quite informative for all of these foundation models and other kinds of models that people are trying to build. That negative data is actually quite informative. The data sets that are funded by NIH funding are really helpful for all of this AI stuff we all want to do. If the United States stops generating data, we're one of the biggest generators of useful biological data out there. Small companies utterly rely on the public data sets, like, for example, TCGA and GTEx. Like we use those every day. Every single cancer company, every single cancer biotech uses the TCGA data set. And if NIH doesn't get a budget, if NIH can't be funded, there will be no future TCGAs.

Eleanor Howe

And that's going to be a huge loss because that means that biotechnology companies won't be able to develop drugs in any kind of cost-efficient way because they're not able to do the big licensing deals with the large players that have data sets, right? The public open source data sets are real enablers in this space. And this is something I are every one of our customers uses these things. So that's a problem. I was on a panel yesterday. There's long panel talk in the conference app that you can watch if you want more details. And then the Global Alliance for Open Science ran a survey and is doing some work in this space. So I'll put the link in here if you guys are interested. I was not smart enough to do a QR code. Sorry about that. The old

Old School Models Still Deliver

Eleanor Howe

models still work. Okay, it's not everything has to be a transformer model or a deep learning or deep neural network or whatever. The the the the random forest was a great is a great method. It actually works really well. And you know, we've we've done some testing. We found that like a lot of the transformer models out there aren't any better than the cheap, old, well-established machine learning things that we've had for a long time. So yeah, we try the older stuff first. That's that's my takeaway from that. We have saved our clients a lot of money by saying, like, why don't we try a linear model for this and see if that does what you need it to. And then you're done in a well, it would be a day now. It was a week at the time. All

Drug Discovery Reality And Bad Actors

Eleanor Howe

right, drug discovery. There's a lot of hype about how AI is impacting the drug discovery business. I think that it's a misconception that there is an all-AI drug out there. Like there is a Hell 9000 that's discovering drugs for us and just going right through the clinic with them. That's not happening. What's actually happening is every company out there is integrating all of these different technologies that get lumped together under the label AI. They're integrating them all in various different places throughout the drug discovery funnel. Every step of it is having and is being impacted by these technologies. I talked already about the structural biology impact, like that is a real thing that is happening all the time. People are building foundational models to do things like predict transcription to limited success. People are using coding agents all the time, everybody's using that all the time, and then all the scientists are also using LLMs to help them. And they're writing code too, and a lot of the time that works well. But the the clinic has not yet proven out whether or not AI as a unit can actually do drug discovery. This is another one where there should be a quote.

Eleanor Howe

So I'm gonna have I apologize, folks. I should have checked this. There is a quote that says, Some of these companies give me Theranos vibes. You see why I wanted it on there. I have a list of kind of anecdotal stories about companies behaving badly in this space. And I want to say first that not all of them are doing this, obviously. But I am absolutely gonna bag on the bad players, and they're out there. You know, I've heard and and seen like companies scrub the attribution section of a paper and just like take out the other people's names and present it as if it was their own work. Companies, and and and it's something that I think is really worth watching out for and being careful about. Some companies they want to collaborate with you and they will ask for all of your data. And you and they assure you that they're not going to do anything untoward with it, but I'm not really convinced that everybody is entirely trustworthy. Certainly some people are. So I would recommend being super careful who you share your data with. Maybe consider building one of these like models and sharing the weights with the other organization instead. Okay, so a couple of things that I think are coming up, but are not yet actually having huge impact on day-to-day work. So

Autonomous Agents And Virtual Cells

Eleanor Howe

I'm seeing a lot of stuff about autonomous scientific agents. I saw a talk about it today, yesterday, I guess. And certainly there's a lot of talk about the AI scientist. I have not yet seen any AI scientists in action. And in fact, all of my pictures are gone. I'm really sad about that. Hmm, that's disappointing. I have not yet seen any AI scientists in action. I've seen the insides of a lot of companies. So maybe they're coming. They are not here yet. Protein affinity modeling is, you know, that's the, you know, does this antibody bind to this protein or not? That is something that's not really done by not like new AI tools, right? Alpha Fold doesn't really do that yet. But the old school stuff like Schrdinger is still what people are using for that. Virtual cells are fascinating. There is an entire talk in the virtual cell topic that I'm not going to give right now. But this link here, this Substack link, is by a guy named Gerald Chendesh, who is at a company called turbine.ai. This blog, this blog post is well worth checking out. It's got an incredible summary of the current state of virtual cells as a concept, and then a roadmap for what would have to happen for us to actually have a meaningfully useful virtual cell in the future. Spoiler alert, we don't have it right now. You start with that that model that predicts transcriptional profiling changes from perturbations, right? And so we have some limited functionality there now, but we're, you know, companies and and other and organizations are working on building out from that, but haven't moved much beyond it. I interviewed a bunch of people. I've got a list at the end of the talk. I interviewed a bunch of people for this, and and I would always end every interview by saying, like, what are we missing? Because we talk about AI all the time. And and the answer I get is, well, oh yeah, that's that's a really good question. So I thought about it myself, and I decided that this is one of the things that people are not paying enough attention to. So I did my PhD in bulk RNA-seq data or other transcriptional profiling anyway, microarrays. I've been doing, I've been working intimately with bulk RNA-seq for a long time.

Eleanor Howe

And so this is my favorite, right? And now it is really cheap. It's it used to be this stuff used to be so expensive. And now we've got some super ultra cheap versions of low pass, three prime only bulk RNA-seq readouts. And that is one of those, this is one of those things that you know, quantity has a quality all its own. These less expensive experiments enable a whole bunch more of like diagnostic testing of ongoing experiments and building out the data sets that you would need to build a proper transformation transformer model for transcriptional profiling data, right? If you like 80 bucks a sample, is I remember when these were $1,000 a sample. And a three-day turnaround time is pretty nice. The Lynx project from NIH is, you know, this project is, I don't know, 10 years old. No, way more than that, actually. It's quite old. This was a project to measure the transcriptome in a systematic way across many cell lines and under the effect of many perturbogens and to build a model of the sort of the state of the cell based on a profile of a thousand transcripts, right? And you know, the folks tried to predict the thousand transcripts that were most representative of the cell's state and all that. And they built a pretty useful tool that's actually pretty good for saying, like, hey, here's my new compound. I'm gonna drop it on these cells, I'm gonna measure those thousand genes, I'm gonna compare it to what's in the rest of the Lynx database. And that's going to show me what gene knockouts are most similar to my compounds impact.

Eleanor Howe

And that's actually a really great way to do some mechanism of action studies much better this project would be if we had an actual full transcriptome, right? And that's approachable now. So that is something that I think we're going to see. I hope. I hope somebody, maybe Lynx, will pick this up and actually do the bulk RNA-seq experiment. Even if they don't, pharma companies could afford this, right? And I think that for this would be a really great way to push forward basic drug discovery work like mechanism of actions and also building those transformer models.

Omics Costs Drop And New Assays

Eleanor Howe

In other omics space, I mean, you know how this works. There's there's omics, , you know, they have the the beginning of the of the lifespan of typical omics technology. It's fantastically expensive. People only use it if they absolutely have to, nobody knows how to analyze the data. And then over here on the other end is the situation where it's cheap, it's commoditized, everybody does it, every CRO has it, everybody knows exactly how to analyze it. So the NGS stuff, I'm gonna, I these are not in any useful order at all. I apologize. Next gen sequencing is the one that's furthest on the commoditization scale here. I think you all know this. Single cell transcriptomics, still dominated by 10x. It is fairly commoditized now, right? You can get that done at any CRO you want. The first, say, half of the analysis pathway is pretty standardized. You don't have to intervene in it, you can automate it completely. And then the later stages of analysis are the ones where you have to have somebody sitting there and saying, okay, what do I actually want to do with this data? Because the analysis that you do from then on changes depending on what kind of question you're trying to ask.

Eleanor Howe

That is not the case with bulk RNA-seq as much because single cell just has more degrees of freedom. There's more you can do with it, it's more complicated. Spatial sequencing technology, mostly spatial RNA-seq, in when I think about it, is about where single cell, generic single cell was in 2020. So it is more expensive, it is harder to work with. The tools that you might use to analyze the data are less well-defined, less complete, and choosing what to do with it is totally Wild West. It's all over the place, things are not very standardized. It is fascinating data, and also nobody uses it unless they have to, because it costs a lot. But it's great stuff. And that is something that when the cost comes down, it is going to also be transformative for all of us. Proteomics is one where O-Link is still the dominant player. O-Link is a hybridization-based protein detection tool. Direct mass spec is still too expensive to be really done en masse, although people are doing it. But I don't see any kind of inflection point in like changing how these technologies are applied or deployed. They're pretty steady right now, as far as I can tell. Some of the things that are interesting, though, is that people are just being real creative about combining various kinds of omics together and building specific assays that are for whatever particular problem they have at the time. I just have a couple examples.

Eleanor Howe

Immunoptidomics. I love when people like stick together like six different words to make a new omics, right? This is great. This is one where you want to know what's being presented by a white cell. Like what is the white cell cutting up inside itself and holding out for the rest of the immune system to look at? So you kind of salt wash the peptides off of the MHC complexes and then run them through a mass spec, and then you can see what's in there. And some companies need this, like absolutely need this, because they're trying to target one of these proteins. They want to know if it's being presented. So this is one of the things that I think is extremely cool about where the omics world is going, is that there's a billion mini omixes that are being created, and they're all different and they are all analyzed completely differently. It's fascinating. There's this nutty combination omics where you do long read sequencing. There's a particular library prep you mark the methylation sites, and then you do long read, and at the same read, on the same reads, you can detect both the sequence and the methylation state. And then in post-processing, you can assess the phasing of the of the variants and detect, you know, did which parent did each of these variants come from? That it like it's a lot of data for one library prep. I think I think that that's kind of awesome. And then people also keep inventing new stuff. One customer came to us and said, Can you detect copy number from single-cell RNA-seq data? And we said, maybe. And yeah, it turns out you can. There's a couple of R packages. They're in my speaker notes, which is gonna be available on the site after.

Eleanor Howe

Obviously, you can't see them. Numbat is one of them, copycat is the other. They're pretty, they're R packages, both of them. So you can do that. Anyway, I think that this is where this is where our field is going, is is it's not just it's not just NGS DNA, it's not just RNA-seq, it's all of these ultra-specific technologies that you know the work is in figuring out, well, what what question am I what am I asking of this biological system? What do I want from these cells? And then going and looking at the huge list of possible assays that you can run and picking the right one, and then figuring out how to manage that data when it comes out. All right, what's next? Ha. All

Leadership Whiplash And Human Accountability

Eleanor Howe

right, I'm back to the AI thing. Yeah, you can't escape it. Again, in my interviews with people, I get different stories about how people are approaching the AI transformation. And, you know, middle of the road is usually the best way, it turns out. I've gotten stories about people basically feeling like they're on a ping pong table because their leadership keeps changing their mind about how they're approaching AI. At first, it's no, you're not allowed to use any chat bot, anything, we're not touching it, nobody's allowed to do anything. And then a month later, oh, we're AI all the time, we're gonna onboard this vendor company that does AI, we're you know, we're AI forward, and people are getting whiplash from these changes. And then in other places, folks who are at all critical, like me, of what AI can do are being sidelined for being insufficiently enthusiastic. And that's a mistake as well, because the critics, you know, they sometimes have a point, right? And so I think what I'm saying is like, listen to the critics too. There's some real criticisms there. Obviously, you can tell where I stand on that. This is a bacterial culture, by the way.

Eleanor Howe

All right. What I tell my team is I told them I want you to use these LLM tools. I've licensed, we're licensing two of them so that we're protected. Or you know, the lawyer reviewed all the documents and we decided we're gonna pick these two. And, you know, by the way, Gemini, in my not lawyer opinion, has the better indemnification terms and is a better legal stance as a licensee. Again, I'm not a lawyer, you shouldn't listen to me, but I prefer Gemini for that reason. But my team wants Claude because according to them, Claude is the best for coding by far. That will probably be different next year. And I told them, like, here are these tools, I want you to use them, learn how. It's really important that we stay ahead of this stuff and we know how to use these tools effectively. And everything that you generate is your responsibility. Even if the AI wrote the code, it's your work. And so if you don't know, if you can't assess whether this is right, you shouldn't be doing it. I love that archive.org has put in place this code of conduct change where if the submission that you give them has evidence of having been insufficiently reviewed after LLM generation, they will just ban you for a year. All the authors, the whole author list, they'll ban all of them. And the way that they check is they look for hallucinated references, they look for notes like, I wrote this paragraph and I wanted you to review, you know, like the kind of you know, the meta talk that the LLMs do for you. They're like, oh, I did this, any sign of that stuff, and you permaband, not permaband, it's banned for a year, and then there's like a probationary period of some kind after that. I think that's really smart. You have to be responsible for what the LLM made on your behalf. And that is another reason that I say that working with these tools is exhausting because reviewing all of that work takes a lot of thought and energy.

Eleanor Howe

Okay, I wanted to talk about a few other talks that I saw at BioIT World. I'll just click through these because the slides are on the website. You can find them later. Oh, yeah. So, Doa Mugaheed, you are in the audience. I know you are, I saw you. There you are. Hello. She ran a panel yesterday about this Global Alliance for Open Sciences work and about the concerns about scientific funding and what do we do about it. That's all on the website. This talk is an excellent overview of how to actually use a coding agent to help you build something. I loved this talk. This was super practical and really great. And then John Q, full disclosure, was one of my co-PIs for my PhD. He gave a great talk that about network biology. Again, transcriptomics, you can sense a theme here. And I think he did a great job of showing the complexity of the transcriptome a bit. And I think it it shows you a little bit about why the data sets that we have are not sufficient for training a foundational model, and maybe gives you a hint about what we would actually have to do to generate that data. All right. Oh wow, 30 seconds. Okay. Yeah. Okay. All right, fine. I got the go-ahead. So this is just a summary though.

Rapid Fire Summary And Closing

Eleanor Howe

We're basically at the end. Coding agents are changing things. Kitchen sink model training, not so much. Small models, focus models, yes, working much better. But try the simple, stupid model, like the data, those small, dumb models, like you know, linear models, random forests is not as dumb, but you know, simpler than you know a neural network. Try those out. The data that gets curated so carefully in order to feed these AIs, that benefits the cheaper and easier models too. You may find that you get a lot more bang for your buck out of random forests or lasso or something. I do think oops, sorry. I do think that these technology advances will continue to enable us to actually build models that are useful for things. And I am so I'm watching closely how that market is changing. And then don't simulate your data. It's a terrible idea. Make more or share with others. All right, here are some of the folks I talked to. And yeah, thanks to everybody for listening and for everyone who talked to me about this. You guys were super informative. And to my team who does the work with our clients, and thanks to them as well, everybody here. And like there's a lot more names that I'm not putting up. I can't name everyone, but they all contributed to this talk. So thanks everyone.

Announcement

If you enjoyed this conversation, we'd love for you to leave us a review and subscribe wherever you get your podcasts, or visit us on the web at www.biopython itworld.com.