Infinite Machine Learning: Artificial Intelligence | Startups | Technology

Biochemistry AI

November 16, 2023 Prateek Joshi
Infinite Machine Learning: Artificial Intelligence | Startups | Technology
Biochemistry AI
Show Notes Transcript

Nicolas Tilmans is the cofounder and CEO of Anagenex, an AI-powered drug discovery platform. They have raised $37M in funding so far from investors such as Lux Capital, Khosla Ventures, Air Street, Menlo, and Catalio. He was previously the VP of Engineering at Lumiata. He has a PhD in Biochemistry from Stanford.

In this episode, we cover a range of topics including:
- What are small molecule drugs and why are they challenging to develop
- The founding of Anagenex
- Data generation engine
- DNA Encoded Libraries (DELs)
- Affinity Selected Mass Spectrometry
- Identifying the right targets
- What's next for AI-infused drug discovery

Nicolas's favorite books:
- Tomorrow, and Tomorrow, and Tomorrow (Author: Gabrielle Zevin)
- Letters by Abraham Lincoln (Author: Abraham Lincoln)

--------
Where to find Prateek Joshi:

Newsletter: https://prateekjoshi.substack.com 
Website: https://prateekj.com 
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 
Twitter: https://twitter.com/prateekvjoshi 

Prateek Joshi (00:01.257)
Nicholas, thank you so much for joining me today.

Nicolas (00:01.929)
All right. You're welcome. Thank you so much for the opportunity.

Prateek Joshi (00:07.781)
Let's start by defining small molecule drugs. What are they?

Nicolas (00:13.549)
Yeah.

Small molecule drugs are essentially what most people will have taken if they've ever taken a medicine. For unless you have an oncology condition or you're diabetic or you have an autoimmune condition, unless you have those things, the probability that you've taken anything but a small molecule and a vaccine as the two things are very, very low.

Prateek Joshi (00:44.802)
Amazing.

Nicolas (00:46.614)
So what you're talking about just for definitional purposes, we talk about large molecules as things that are heavy and large. So thousands of kilodaltons, which is a weight of measurement, or hundreds of kilodaltons anyway. And then the weight of a small molecule, you try to keep it between under 500 is the rule of thumb. There are some that are a little heavier that'll approach, but those are the exceptions. So you...

you typically have something like a 500 Dalton. And very small, can draw it on a screen, looks like a chemical.

Prateek Joshi (01:23.197)
Right. Amazing. That's actually, you take these things every, almost like very regularly, and you almost never think about how it's bifurcated. So just for listeners, what are the big molecule drugs used? What's the common occurring use case for those?

Nicolas (01:43.402)
Yeah, I mean, so the other thing that you can do is you can do a rule of thumb of what's happened. Did you take a pill? Small molecule. If it's not a pill, maybe a macromolecule. Not always true, but... And that's why we like small molecules. They're the easiest to deliver. They're the easiest to get to people. They require less storage. They're the best way to help the most people, just full stop, I think, most of the time. The

Prateek Joshi (01:51.967)
Alright.

Nicolas (02:11.462)
Insulin is it would be considered a macromolecular drug. It's a It's not quite I guess you could call it a biologic but when people think about macromolecular drugs probably the biggest thing they think about is antibodies and a body therapeutics and Humira I think is probably the biggest one that people think about because it's the biggest selling drug in the world until I mean It might be losing that now, but for a long time. It was the biggest selling drug in the world

Prateek Joshi (02:40.569)
amazing. And you mentioned that even the small molecule drugs, they are easy to deliver, have like a huge impact like a lot of people. They are challenging to develop. So what are some of the challenges of developing small molecule drugs?

Nicolas (02:59.891)
Oh boy, um...

So let's talk about the difficulties of developing a drug in the first place. Like any drug, small molecule or mac or molecule, it doesn't matter. You have to interact with a biological system. And that's step one. Are you potent? And for our purposes, we work in an area called target-based drug discovery. A target is a protein in the body that people think has an effect for disease. That covers some exceptions to that. But let's just...

Keep it at that simple version a protein in the body that has some effect on disease And we want to interfere with it with some molecule or in some way Number one. Do you interact with it number two when you interact with it? Do you do the thing that you want to do for the disease? So do you shut it down if you want to shut it down? Do you activate it if you want to activate it whatever the case may be? Do you do that? So that's potency but that's

not even half the battle. You have to do a lot more. If you think about it, I'm taking a pill, let's assume I have a molecule that does the thing I want it to do, it does it well. I take a molecule, it's by mouth, it's going to go into my stomach, it's going to have to survive the stomach, it has to get across either the intestinal lining, the stomach lining, something, it has to get into your bloodstream. It then has to stick around in your bloodstream long enough to get to the place where it needs to have an effect.

it needs to not kill you in the meantime, which is just a real pro thing. That's a basic requirement of your medicine. So it needs to not kill you in the meantime. And it needs to stick around long enough to do all of that. So all of that is just a lot harder to, and it can't get trashed by the liver. It can't get trashed by, it can't just go through your kidneys and into the bladder immediately.

Prateek Joshi (04:34.993)
Kind of like a basic requirement. Yeah. Yeah

Nicolas (04:58.182)
If you want to get to the brain, that's a whole other question. Getting across the blood brain barrier is very hard. So you're talking about a lot of things here that you've got to get right. And yeah.

Prateek Joshi (05:13.569)
That is a funny thing. We live with our bodies, obviously, all our lives. And we rarely, and most people don't really, when they know it's complicated, but this is very complicated. It's like even for a basic, like a tablet, like a simple tablet that we take, there's so many things that need to align and go right for it to function. And it's almost, it's miraculous that we have these amazing drugs. Okay, I think it's a good point to stop. And for listeners who don't know, can you explain what antigenics

Nicolas (05:45.006)
At Antigenics, we are a machine learning and AI enabled drug discovery company. Our mission is to find a molecule for every malady, is to get molecules for pretty much anything you would want in disease. We believe that the best way to do that is to radically accelerate our ability to generate data in the real world. This is where we start to differ a lot.

from many machine learning companies. We care deeply about being able to create real measurements for ourselves and that those real measurements can be translated into insights through AI and machine learning. We are focusing on small molecule drugs because we think it's the best way to help most people. We can generate billions of data points. Our current collection of compounds is around two billion data points. We can test all of those in a single tube in a matter of days.

Prateek Joshi (06:24.47)
Amazing.

Nicolas (06:40.906)
And that allows us then to generate just an enormous amount of real data from the real world to inform the machine learning. But then the next thing that we do and what's really the special sauce of Anagenis is we can take those insights and then generate an entirely new data set. We're capable of building a hundred million compounds in two weeks. And that allows us then to say, Hey, all of that stuff, the machine learning works. What's the next set of compounds that are going to be hopefully better than the first set.

Can we project that out? Can we learn that? Can we then go out and test it again, reinforcing the model with yet more data that is now really testing its own predictions against itself, which is the best way to really reinforce a model to work with.

Prateek Joshi (07:24.549)
You have built an iterative system that alternates between lab experiments and AI power predictions, and you just explained how that kind of informs what you build, how you build, and obviously make sure it has an impact. Now, I want to talk about the data generation piece here. It's very interesting. You just mentioned you assess billions of compounds in parallel, generate high quality data. Can you just talk about

how much data is available, so you're just collecting it, versus how are you enriching this data? Do you need to generate new data? So can you just talk about the balance and also the nature of the data that gets generated?

Nicolas (08:09.55)
Sure, I don't know how to give a simple, straightforward answer, so this could meander a little bit. The data that is available, one of the reasons I think machine learning has failed for drug discovery, not failed, but has had a limited impact, I think, in drug discovery, is because if you look at the data sets that...

Prateek Joshi (08:17.952)
Let's do it.

Nicolas (08:34.446)
companies like Pfizer have at their disposal. So if you are the biggest players in the world, you have maybe a couple million compounds that you can test. And you basically never do. You usually test a smaller subset of it and then see what happens, and then you can maybe test the bigger parts of the bigger subset. You don't really generate data generally beyond a few hundred thousand compounds, maybe a million compounds.

So that's a key bottleneck. The next thing though, is that once you've got that, that's it, you are following up the information that you just generated at a rate of about 10 compounds per chemist per month. So that's just not very fast. You talk about things like LLMs, when we're doing things like reinforcement learning with human feedback. Those are millions of examples that you start to generate after the fact, right? Not to mention whatever the users are inputting.

So like you can see how these larger scale models really benefit not only from an enormous data set, but also from this ability to iteratively update themselves. The way that we think about, so that's what does the data look like? You generate these data sets and they're usually relatively small on the scale of machine learning at best. And they'll be pretty binary. So the largest version of these data sets

are does it do it, does it not? So it's a fairly binary classifier task. There's some regression information, but mostly it's a binary classifier. As that's your first pass, and that's totally fine, that works pretty okay, but as long as you have a path to getting it better. In our case, the specific shape of our data is we will have about two billion different compounds that each have a number associated with them.

the vast majority of those two billion, nearly two billion are zeros. And you get the vast majority are zeros. And then you have some that have some number associated with them that is loosely correlated with how well it's interacting with the target. We do some pre-processing for that. We do some post-processing for after certain parts of the engineering and data prep pipeline. And then you have your data set that says compound score. That's the way that the data looks like.

Nicolas (11:01.318)
You feed that to the model, and then hopefully when you present it with a new compound, it comes up with a score, whether that's a binary score or a regressor score, and then you can say, hey, this is good, this is bad.

Prateek Joshi (11:13.561)
amazing. And I want to take a quick stop here just to get your view on biochemistry. And it's a quick sidebar, but for listeners who know it's not something you come across on a daily basis and it's fairly complicated. So when it comes to building a machine learning platform that specifically, at least in general talks or targets the biochemistry, those processes and something that happens within those organisms.

What are the things that you need to keep in mind as a developer of this product? And again, this is deliberately kind of broad. And it's mostly like if there's a machine learning developer, what should they know about biochemistry so that this becomes successful?

Nicolas (11:59.496)
Oh man.

Prateek Joshi (12:00.914)
Hahaha

Nicolas (12:04.24)
Ugh.

Nicolas (12:08.738)
I... where to begin? I'll start with this. Most people who come from a machine learning background come from a world... this is well known, I think, for any machine learning practitioner, but when you do all of these things when you're in school and you take classes around machine learning, all of the data sets... pretty simple. They don't work. There is each way to train a model that will work.

and the data is relatively clean. And there's a good balance usually of positives and negatives or whatever. Like it's simple. And the reality of real world machine learning is that is almost never true. So that's the case for just anybody. I think that what starts to get into more real world and starts to approach the truth of what machine learning is in industry is there's not a lot of open AIs.

Prateek Joshi (12:43.995)
Right.

Nicolas (13:06.082)
There's not a lot of mosaics. We believe that we could be something like that for chemistry, but it's gonna take a long time for us to get to that level of data to even make a shot at it. So I think that when you think about what does biochemistry mean in a machine learning context, everything is a lot noisier than you think. That's number one. Everything is way noisier than you've ever seen before. Two.

Biochemistry is incredibly idiosyncratic and incredibly prone to batch effects, hidden effects, and you really have to understand, I think, if you're going to do machine learning well in a biochemistry context, you have to really understand the nature of your data. And to do it in such a way that...

the data you, that to do, to understand that nature so that you can stratify your data properly, that you can understand batch effects, that all of that, like it's, all of those are super complicated. So I guess practical advice for anybody who's trying to get into machine learning for biochemistry, talk to the biochemist, validate your ideas with the biochemist.

Ask, try and really sniff out what are the ways that this data could be contaminating. I'll give you a bunch of examples. Here's a simple one. I'm going to be testing everything in a 96-well plate. Frequently in biology we have a plate of compounds that are right on a grid. Now that grid sits on your bench, flat.

the things on the outer edge are very slightly more concentrated than the things in the middle because solvent evaporates more from the edges than it does from the center of the plate. Even if it's not volatile solvent, water is volatile, right? So what you end up getting is you say, okay, I'm going to predict these things. The compounds on the edge of the plate will appear to be slightly more potent because you thought they were more dilute.

Nicolas (15:27.018)
but they were actually slightly more concentrated, and therefore they seemed to be more powerful than they actually are. So that's a thing you could not really know unless you really think about how the experiment was generated. And there's no going back, by the way. You can't go back and say, hey, I wanna collect with randomization across the plate. I wanna randomize the positions for a few things. You can't do that. It was already, the experiment was run.

So you want to design your experiment upfront with as many of the controls as possible to make sure you're not getting the wrong thing inadvertently.

Prateek Joshi (16:06.893)
That is phenomenal advice. And this is so much, like you need to know the domain. As some of these things, it's hard to like learn in a structured data set on a class. So you gotta do these things to figure out, oh, that you'll encounter those when you actually try to deploy a system in the real world. It's fascinating. Okay, so we talked about biochemistry, the processes. What should people know about biochemists?

the people you deal with when you commercialize something like this.

Nicolas (16:42.21)
They are guaranteed to not believe you.

Prateek Joshi (16:46.414)
Hehehe

Nicolas (16:47.054)
biochemists, scientists, med chemists, anybody in that world. Like, it's these people.

Nicolas (17:05.502)
You work, if you are a scientist in a lab, the thing that you are most likely to get, the data, the steps at which, the way data works is you're going to generate one experimental result a day if you're like, I'm gonna work, I'm gonna come in the morning, I'm gonna do some stuff. By the end of the day, I might have some measurements. And then,

Maybe something went wrong. The moon was in declension with Venus rising and it didn't work. And you don't really know why. So that is the life that they live in. They live in a world where stuff is sort of random and it takes an enormous amount of effort to just get through all of that randomness and really know something for sure. So when a machine learning person comes along and says, I've got this map, it's gonna be great. Look at this.

It just works. They don't believe you, they can't. And they're right to not believe you, because you're probably wrong. So like, how do you deal with chemists and biochemists and the like? You have to meet them where they are. You have to go to them and you have to say, hey, you...

Prateek Joshi (18:13.279)
Right.

Nicolas (18:28.851)
You have to meet them where they are and say, look.

here's some information and what do you think are the possible flaws and what can I show you to try and convince you? What are the things that, and they won't know, it's kind of a, it's a human thing. They won't, they don't understand machine learning. I just, the call I was on just before now was with one of our medicinal chemists who is really smart, really talented and she's actually pretty good at understanding machine learning. She's getting there. But she has questions that...

are questions that are completely logical, that make a lot of sense in a world when you're making experimental decisions, you say, okay, everything that you do in the lab is essentially a decision tree. Like, this is going to happen, then I will do this, then I will do that, then I will do this. But machine learning is not a decision tree. It's like I'm trying to figure out some sort of very, very sophisticated averaging across latent space, and that will work in ways that are slightly different. So trying to bridge that gap, trying to get her to understand, okay, maybe this could work. And the thing is, the thing is...

It might not. And it could be that latent space is total horse shit and we learned the wrong thing. And so like, I'm rambling a little bit here, but you have to meet them where they are. You cannot over promise and you have to take their criticism seriously and try and really focus on the data generating process. And then when you turn around and you say, hey, here's some ideas, you spent the day outputting a list of compounds. If they wanna test that list of compounds,

It might be a week of work for them, so it's gotta be worth it.

Prateek Joshi (20:06.333)
Right. That actually provides a great insight into, and again, biochemists, they're smart people. I mean, they're doing, it's not, it's complicated work. So I think it's important to, as you said, important to understand where they are and how can you meet them where they are. And also, I want to get your thoughts on this. So you, biochemists, they need tools to do the work and then the actual work, that's the output. Now,

As a service provider, do you provide the tools to do the work, or do you actually do the work? Because in that case, you don't have to convince them that the tools work, you just deliver the work. And again, there are pros and cons to both. But what's your thought on this dynamic of like, hey, if they are not convinced that the tool works, I'll just do the work, and then they can just look at the output. Obviously, other challenges, but what are your thoughts?

Nicolas (21:04.485)
Yeah, I think.

Nicolas (21:08.354)
So first of all, we're not a service provider. We are a drug discovery company, and the product that we generate will be molecules that we will eventually take forward into the clinic. Whether we commercialize them ourselves or partner at the commercialization stage is an open conversation. But we intend, we are not a company that's selling to 100 different people to use our platform. We're gonna be selling.

to our clients, which ultimately are patients that we're hopefully gonna make better, and then to a limited extent, potential biotechnology partners who will be looking for, have us help solve one of their chemistry problems and one of their drug discovery problems. So just level setting that up front. In terms of, but the question you asked around how bad is it to have a black box is effectively the summary.

A black box can be great if it's super reliable. Almost nothing is that reliable. I think that if you're going to have something...

I think there's two dimensions to it, and I actually think there's only one that matters. I think the only thing that really matters in machine learning is how expensive is it to check your prediction. If you think backwards from that, if it's very cheap to test a prediction, you can give a thousand predictions that are bad and one prediction that's right, and that'll be totally fine. If it's very expensive, it costs a lot of money to test the prediction, you better be giving...

one every two or one every hundred is wrong. But like you've got to be like 99% accurate if it's going to cost you a million dollars to check. It's got to be expensive. It's got to be right. So I think that that, that will play a little bit to the black box question, which is, Hey, how do I know when I'm just starting to build what is effectively a relationship? You're building a relationship with the machine learning.

Nicolas (23:12.898)
How do I build a relationship? We'll have to build some trust. We have to communicate. And the building trust and communicating part is, can I get an explanation from you? Why did you think this was true? And if I don't have that, it's just super hard to build trust. Like if you were to try and, if you were to interact with me right now and you're saying, hey, what are small molecule drugs and why is it hard? I'd say, oh, it's just hard. Like that's not, it's not helping anybody. You'll believe me maybe, but like it's not going anywhere, right? And,

Same thing with the model. So I think it's actually a real problem that we can't Explain our modeling. So how do you get around that with models that are not possible to explain? Neural nets notoriously are very difficult to explain of all sorts so there's ways to Call it retroactive continuity from conquest like retcons and some Explainability by using things like Shapley explainability and things like that which I think is the most popular framework

You could do that, that I think has some help. I do think that the way you can build it, particularly with scientists, is to show them where it's working and where it's not. To say, Hey, I tested it on the things that are at the very top of my list, and it does very well with the things that are at the very top of my list. And when I look at the bottom of the list, it doesn't do very well.

I took the model and I blinded it to this part of the data because I talked to my biochemist as we were talking earlier and the biochemist told me and said, hey, you know, we're really worried about this particular thing. In the data, there's this control. So think about that. It's like, okay, well, I blinded myself to some aspect of that or I took the control into it, made it challenging. I kind of steel manned myself with the model. And when I did that, I still ended up doing okay. Not as well, but I did okay. Right.

If you can start to show examples of that and say, hey, here's a bunch of controls I ran. Again, coming to where they sit, they sit in a world where they're constantly interrogating the world. And a world that is, it's a little too far to say that's a black box because the whole point is that we believe as scientists that the world is explainable and we just don't know the explanation yet. But like a chemist and biochemist is often interrogating a black box.

Nicolas (25:33.79)
and they do it by having a bunch of different control experiments, like, okay, this should work this way, doesn't work this way, this should work this way, doesn't work this way, okay, here's the unknown, now let's see how that unknown works. So if you can go to where they live with that language, you're gonna do a lot better at convincing them, even if it's a black box. And then you can all jump together, you can say, hey, it's fine that the model sucked, it predicted only 10% of the things that were accurate. In some domains, that's fucking fantastic, it's amazing, you want that, right? But in other domains, it's maybe too little, but say, okay, we came back,

and we scored 10% or we scored even zero. But at least together you said, OK, well, we all had conviction that this could work. This had to change. It didn't. But at least we did the right thing to try.

Prateek Joshi (26:16.957)
Right. Thank you for the explanation here. It's fascinating to see how the process behind how it works. Okay. So, moving, shifting gears a little bit, you, in your website, you mentioned you'd test these compounds using DNA encoded libraries and affinity selected mass spectrometry. So, can you quickly explain these two terms?

Nicolas (26:46.002)
Yeah, so DNA encoded library is a way to test a huge number of compounds in one place. When you have a molecule in a tube, you don't actually know what you have in your tube. The only way you know what you have in your tube is because you wrote on the front of it, you said, this is in this tube. If I were to say, hey, here's a tube, and this is actually a common...

classwork and stuff. It's like, I have a two. It's got some stuff in it. Tell me what it is. Oh, that is a lot of work. You got to go do all sorts of analytics, all sorts of instrument measurements and say, okay, well, it can only be this. It becomes a detective story. So you need to have a barcode that is associated with your compound at all times when you're doing an experiment. In practice, that means a positional barcode. It is in this plate, this position in my plate. And that is how I know what the compound is.

But that means that you can only test one compound at a time per tube or plate or whatever. Like you can only test one compound at a time. And that means everything is going to be inherently slower. That's why you cap out at the million compound scale. It's just physically hard to assemble that many compounds and test them. What we've done at AnagenX, and this is, we're not the only ones who do this, but what we've done, we're I think the best at it, is we barcode the molecules at the molecular level.

So every molecule has a barcode that is in the form of a piece of DNA that is attached to it. And that DNA sequence is unique to the compound. If you see this DNA sequence, you will always see this compound. I'm oversimplifying a little bit, but let's just go with me on that. Every time I see this barcode, I will see this particular compound.

And that means I can take all the compounds, I can put them in one single tube, and I can always sort out what's in there because I can just run it on a sequencer and see what barcodes happen to be there. And so how does this help me test a billion compounds? I can take a protein, remember we talked about a target as a protein that may have some relevance to disease, I can put that protein in a position that I can pull it back out, and I can pour my library over it.

Nicolas (29:02.57)
and see what sticks to the protein. Like I'll pour the collection of compounds over the protein. I wash the stuff off, then I pull the protein away, and then I try and use a sequencer to say, what are the DNA sequences that are stuck to my protein? And then I know those are the compounds that were stuck to my protein. So that's how a DNA-encoded library works. Affinity-selected mass spectrometry is a little bit similar. In this case,

The barcode is missing and the barcode ends up being a mass. So something that you could just type with a mass spec. Necessarily, that means that you can do way fewer things in a single tube because there's so many things that have the same mass. You have way fewer things that you can mix in a tube, but you can still mix a few dozen compounds up to a few dozen, like in a single tube. And that allows us to very quickly follow up our big experiment and say, hey, which one's actually worked? And then push that.

Prateek Joshi (30:03.289)
amazing. This, I love your explanations. It's, I know it's, it's not easy to take like, fairly reasonably complicated concepts and I think kind of help people kind of visualize what it, I mean, I know you had to kind of simplify it like way down, but it works and yeah, it's really, really fun. So when it comes to drug discovery, how do people decide what targets to go after? And also,

Nicolas (30:18.53)
So.

Prateek Joshi (30:31.009)
Part B, how do you identify biochemically active compounds?

Prateek Joshi (30:39.881)
I don't know.

Nicolas (30:42.676)
Okay, alright. Identifying and picking the targets is one of the hardest things to do in drug discovery. It's probably the hardest thing to do in early stage drug discovery, picking the right target. Because...

Nicolas (31:05.435)
So how do you... because if you don't pick the right target, that's going to govern whether or not you succeed in patience in the end. So, no pressure. The way... Let's talk about how...

Nicolas (31:26.158)
So how do you pick a target? Anything you do in early stage drug discovery is insanely distant from a human. So you wanna figure out the best way to increase confidence that the thing you're choosing is going to have an effect on a person. So how do we do that? Is there a human genetic component? So I have this weirdo family in the wilds of Peru that has this mutation.

and they never gain weight. That's the kind of thing you'd really like. Okay, so that means they're missing this target gene. They're all super healthy and live to 150 years old. Definitely wanna go after that and say, screw the target, right? For a variety of reasons, that is extremely rare. Evolution really doesn't like to have situations like that. But that's one thing you wanna see. Things that are a step below that is, hey, I have a bunch of oncology patients and I've looked at their tumors.

and all of them has this mutation. And then when I go back and I try and grow the tumors in the lab, if I try to mess around with that mutation, the cancer cells die. And that gives you a good confidence, hey, that's a pretty strong indication that it's gonna work in a human. So you try to look for things like that. It has a secondary benefit, which is if I can say, oh, there's a gene associated with this, that means I can pick my patients in the clinical trial and say, you don't have that gene? Sorry, you're not in. And that gives me a whole lot more of a chance to be successful at a clinical trial level.

The next thing you want to do is, hey, can I get the protein? Is it well-behaved? And that gets to your second question of, how do I know biochemically does it work? I'm a biochemist. I love purified proteins. I always want to have my protein work in a tube and does the thing it's supposed to do. And then I can see in the tube, purified, does it work or not? That's my happy place. That's not always possible.

There's a lot of other companies have different philosophies about this. I would say recursion is the most famous exponent of phenotypic drug discovery, which takes a very, very different approach. I would argue that all phenotypic drug discovery eventually becomes a target based drug discovery because it's just so much easier to once you have a target in hand to turn over a bunch of experiments and say, Hey, does it work? Does it not? Those are so much quicker and easier than doing cell based experiments as amazing as the recursion platform is. So like doing that sort of optimization works better, but you try and

Nicolas (33:53.234)
So that's the assay. So that comes down to everything comes down to the assay. And it's okay, I have even the simplest assays or I have a protein in the tube. I need to have some sort of readout that is telling me is the protein doing the thing it's supposed to do. Usually that's some sort of fluorescence. Maybe it's some light emission, some color change. Like you don't, so can I reduce my assay into something like that?

And that's a challenge. There's people who spend their entire careers just developing assays to test compounds. We are fortunate to have some amazing people doing that here at Angi-I.

Prateek Joshi (34:32.309)
Amazing. I have one final question and it's about an interesting nugget I found. Your platform has identified small molecule compounds for a validated undruggable target. Now what makes a target undruggable in general?

Nicolas (34:51.742)
Yeah, I mean, and I would say, look, when you pick your targets, there's a lot of things out there that you're going to be careful about, right? Is the undruggable target the thing that you're going to go forward with into the clinic? Maybe not, right? It might be the better idea that you should go after something that's a little bit better understood because undruggable necessarily means it's not well understood. What is undruggable? I don't think there's any target out there that is actually undruggable. There are a lot of targets that are not yet drugged and that are super hard to drug.

What makes it hard, there's simple versions of it. You can think a little bit of a target as a lock and key. And if you have a traditional lock that's sitting on your door, right, that's a deep, there's a deep hole inside the lock that something fits into. And that's the ideal world. For small molecules, if I have a deep hole, that means that the molecule can go in there. It's gonna have a lot of stuff that's around it. And the environment.

that is around it is very, very different than the environment that is not around it. So you can exploit that chemically. That generally contributes to more druggable targets. The thing that contributes to not being druggable is you don't have that sort of really deep hole. In the parlance of the industry, there isn't a good pocket. You have some sort of surface that has a little bit of three-dimensional structure, but it's very, very hard to actually get something to interact.

interacting with the surface of the protein at that level is not that different than the surface interact, than the protein interacting with nothing but water. So it's, it's not a, so that makes it much harder to bind. I think that covers most of what you would call undruggable from a small molecule perspective. Some of this stuff is druggable with things like antibodies, which actually can bind a fairly flat surface very effectively. So

That's but antibodies and macromolecular drugs by and large do not make it inside the cell where about two-thirds of targets are two-thirds of targets are intracellular. So like do you can't use an antibody for that. So what are you going to do? Right. And that's why it's on drug.

Prateek Joshi (37:02.977)
Right, amazing. Again, I think another interesting thing that I found was just interesting how, or what makes something, again, as you said, it's not quote unquote, unruggable. It's just very, very difficult because of the structure that you mentioned. All right, with that, we're at the rapid fire round. I'll ask a series of questions and would love to hear your answers in 15 seconds or less. You ready? Let's do it. Question number one, what's your favorite book?

Nicolas (37:31.947)
Oh my, um...

I don't know, my favorite book that I read recently is a book called Tomorrow and Tomorrow and Tomorrow. I really love this beautiful story that was very affecting. I think that one of my favorite books that I always loved is a set of letters of Abraham Lincoln because it really humanizes the guy. He's like writing a love letter, he's writing a letter to his buddy, he's like, the girl didn't like me. And it's like, Lincoln is a guy in marble, and so I love that book to make him a human.

Prateek Joshi (38:01.405)
Yeah, amazing. I think that description is nice. I'll look for it. Right, next question. What has been an important but overlooked AI trend in the last 12 months?

Nicolas (38:16.271)
Jesus.

Nicolas (38:21.71)
I don't know if it's super overlooked, but I think that the trend of, I would use Replet as the biggest example of creating a code-based model that is very focused. And so I think smaller, more focused industry-specific models that are much more economical to run is the trend. I think that will be the future in so many ways, at least for the next five, 10 years. But that's, you know.

I'm not on the... I was so wrong about Dali and the potential of everything coming after Dali. So I could be entirely wrong about that.

Prateek Joshi (39:01.217)
What's the one thing about biochemistry that most people don't get?

Nicolas (39:08.878)
how hard it is to get data and when you get that data, how difficult it is to trust it. The other thing is that everything is in equilibrium. Nothing is 100%. Things are interacting with each other all the time. We think of things as on-off switches and that's biology generally is not a lot of on-off switches. It's a lot of dimmers.

Prateek Joshi (39:32.813)
Right. Amazing. What separates good AI products from the great ones?

Nicolas (39:41.554)
You mean the great ones from the good ones? I think the great ones are able to deliver value immediately. I think that the great ones really communicate in an easily understandable way, hey, this totally is helpful. And they're anchored in real problems that real people have. I don't think that, for example, a chat bot...

and chat GPT being an amazing piece of technology. Like it's an awesome thing. It's an interesting product. It's an awesome product. I think the great products are yet to be built because I don't know what to, after I've talked to it and had it make a Shakespearean version of my press release, like what next, right? So there's, I think the great products have to be a little bit more tailored and designed into what is gonna be the problem they answer.

Prateek Joshi (40:39.553)
Right. I think that that's actually a really good point. Right, next question. As a founder, what have you changed your mind on recently?

Nicolas (40:50.742)
Where's my mind at?

Nicolas (40:54.906)
Um, I think that I have had a huge amount of nuance injected into a very particular question around biotech platforms, biotechnology companies that are small fall usually on the what's called a single asset company. There's a molecule or a biological hypothesis that you form around and you are going to hammer that and you will rise or fall on that one thing. We are a platform company that intends to have multiple different things. And that.

is a different approach that can be very successful. Understanding how you do the trade-off though of proving the platform, which in and of itself is actually not very valuable, the asset, the compound you put into the clinic, that is saving lives, that is super helpful. The platform is a way to get there, but the thing that is very valuable, the product itself is the asset. How you do the trade-off of building those two up simultaneously, I've gone back and forth on so many different things. I actually think that you have to focus.

a little bit more on platform than people conventionally say. People say focus on the asset, not the platform. I think you actually have to do a little bit more platform work in a particular way than that statement implies. I think it's correct, but there's a, there's a lot of tweaks to it. That I've changed my mind.

Prateek Joshi (42:12.089)
Amazing. What's your biggest AI prediction for the next 12 months?

Nicolas (42:18.518)
Oof. Um...

Nicolas (42:23.114)
Biggest AI prediction.

Nicolas (42:29.624)
I have no idea what's gonna happen over the next... I've made a lot of predictions and been wrong a lot. I think that...

This is a, I don't like making negative predictions. I think that the wave of pure chat bots is going to fade. I think that the biggest prediction I would make is something's going to come up. People are going to come up with a whole slew of apps that just blow open niche applications and they will crush it. Um, I think that's going to be the next thing for this happens in the next 12 months.

Prateek Joshi (43:05.921)
Final question, what's your number one advice to founders starting out today?

Nicolas (43:14.211)
Hehehe

Right.

Nicolas (43:21.846)
Don't get high on your own supply.

It, uh, there is a lot of pain that is going on in the industry right now, both especially in biotech, but definitely in tech as well around valuations because everybody was super happy and I think it's actually kind of hard to do in the moment to take a smaller valuation.

And I don't know that I would have. Let's be very clear. If I was offered an enormous valuation, would I have said no? I'm not sure I would have. But I would say really be careful. I think that the advice I gave is, what is the real TAM? You say this is a trillion dollar TAM. What's the real total addressable market you have? And work backwards from there and choose a fairly conservative version of that for yourself and for how you're going to raise.

and set your valuations and set your KPIs and all of that. I think it's actually not my suggestion. I think it's maybe Peter Thiel who's said things like, oh, I actually don't like it when people have an enormous TAM at the front because it's really hard to compete that way. What's the best way to do it is you find something that's under addressed that can potentially grow into something much bigger, but really focusing on crushing a small problem is the thing that I would...

It's always true, but extra true today and really think about how that TAM works.

Prateek Joshi (44:50.109)
Nicholas, this has been such a wonderfully rich discussion on biochemistry and basically drug discovery and how platforms should work in this sector. So thank you so much for coming onto the show and sharing your insights.

Nicolas (45:05.622)
Sure, thank you so much, have a good rest of your day.

Prateek Joshi (45:08.822)
uh...