Ardavan Tells Us About Machine Learning For Drug Discovery And Combating Climate Change Artwork

Age of Information

We share information.

Age of Information

Ardavan Tells Us About Machine Learning For Drug Discovery And Combating Climate Change

March 05, 2021 • Vasanth Thiruvadi & Faraz Abidi • Season 1 • Episode 6

Ardavan Farahvash is a PhD candidate at the Massachusetts Institute of Technology studying theoretical and computational chemistry. His research involves the fields of statistical mechanics and machine learning applied to chemistry.

Linkedin: https://www.linkedin.com/in/ardavan-farahvash-512a85bb/

0:00

So the songs you might've noticed that I'm talking kind of weird. They right now, I actually have a couple of pieces of plastic that====have been folded onto my teeth. I stopped doing that when I was six, to be honest for us, I'm be well, it's actually prescribed by an orthodontist. So yeah. So so you're like, you've probably heard of Invisalign. So the generic form of Invisalign is clear. Aligner therapy designed is just a brand. So basically putting these plastic trays in your, in your mouth to straighten your teeth. And I mean, I think I've heard of it. My mom uses Invisalign or she did use Invisalign a lot to say that, but what I know that Invisalign, it makes us treat Keith straighter. And what else, what else does it do? Well, I'm not going to comment specifically on Invisalign, but clear aligner therapy, generally speaking, it's destroying your teeth and they're also like wears down grinding. Anyway, let's, I'm not, I'm not here to tell people to get on therapy. The reason I'm bringing this up is I I'm actually a super hyped to get this because that's what my company did. I worked for a startup for six years. I was I was the first employee and we built 3d printers for the dental industry. So finally, the chance to like eat my own burger, so to speak is really exciting. I mean, part of it, I did to straighten my teeth, but a big part of it was just to use my startup in my own mouth, which I bet Mark Zuckerberg, can't say. Yeah. You know? Well, yeah, you're sort of like the doctors who, who developed their own medicines and then inject themselves. Yeah, exactly. Exactly. Well, this is very safe and know it used by like a million people. But yeah, it's exactly like the measles vaccine for sure. Yeah. And you waited seven years before you tried them out. So that tells us everything we need. Yeah. This is a great segue to our guest today who also uses software to do really, really cool things. Art, Yvonne is a PhD candidate at MIT. His research involves the fields of statistical mechanics and machine learning applied to chemistry. I know he's going to be talking to us about drug discovery, climate change practically. Everything I don't know much about. So I think, I think it's gonna be super interesting. I'm really looking forward to it. Let's get started. Welcome to the podcast. Thank you for joining us. Glad to be here. I think let's just jump right into it. What are you working on right now? What are you researching? Right now. So my research generally is in the, like the interplay between like statistical physics and machine learning. And right now I'm researching ways to essentially solve equations of motion in statistical physics using machine learning. So it's very abstract. That's kind of, what's gotten my interest right now. But we can talk about other projects that maybe are more concrete, too. So what department are you in exactly? Are you in the physics department or are you in well in the chemistry department, I'm in chemistry department. I'm in the chemistry department, but sort of I've discovered over the course of my research that I enjoy practical problems. So I like to have at least one project that's practical, but I also enjoy abstract problems a lot too. So as I've gone through my PhD, I've kind of. Put my hands in a little bit of both, a little bit of practical chemistry. Like how do we make better solar cells and better batteries, but also a little bit of abstract stuff. Like how do we solve really hard equations that will eventually help us build like better materials and better drugs. So right now, the kind of primary project that I'm working on is more on the abstract side. I'm working on. Class of equations that are called stochastic differential equations. So these show up a lot in finance, because, you know, if you think about stock prices, what is that? That's a bunch of jacket lines, you know, and you want to predict the future. And the future is to cast Stickley some probabilistic function of the past, right? So the same kind of model is actually ended up being very useful and chemistry. When you do want to describe something, that's in a very noisy environment. So there's a lot of aspects. There's a lot of problems like that in chemistry there's problems where like you have protein in water. So the water is a very noisy environment. Another example would be something like ions in an electrolyte, in a battery. So an electrolyte and a battery is just some kind of big, continuous, medium, and that medium kind of. Interacts with the ion and it kicks it around and it binds it, but it's very noisy. So understanding how that noise affects, you know, the performance of the battery would help let's build better batteries. You know, this is very interesting. Because I heard about this fund called the medallion fund. Do you know what these guys are? The song. I do. I do know about these guys. These guys are basically physicists, chemists other hard science PhDs that built a hedge fund. And they've successfully had the greatest returns of all time. They like 70% a year. Right. It's ridiculous for like the last 25, 30 years. If you research those, you'd be odd when I'm sure. I'm sure this will fit right into what you was talking about right now with the sarcastic statistics and whatnot. So is this what you're researching? Oh, it's exactly the same kind of thing. Jim Simons, who was the founder of that? I forget the name of the hedge fund I think it's Renaissance. Yeah. Models where exactly these kinds of models you wrote, like these stochastic mean reverting models and the, I think this was the 1980s. And at that time, those simple models were. Fantastic. Like you could write down a simple stochastic differential equation, black Shoals or whatever, and predict the stock market like to wonderful accuracy. But these days, like they're doing really, really advanced stuff. They've, you know, they're combining like state of the art, math with state-of-the-art machine learning to stay ahead of the ball game. I'd be really interesting to know what they're doing. It's a complete black box over there, and they've got a bunch of like math professors basically, right. Working on like secret software, none of us have ever even imagined. Right. And so maybe to ground this into like a real world application, the research that it is you're doing when you graduate and you're looking for a job, what kind of job would you be doing? I think definitely I'd be coding a lot. I think like if I was to go into industry, what I'd probably be most interested in is something like computational materials, design, or computational drug discovery. And yeah, that involves a lot of coding to, you know, build, you know, Programs that can predict the properties of materials before you ever produced them in a lab. Sure. So your, so your background before getting into your PhD program was not coding, is that correct? It was a little bit of coding. So my background was, I started out as a biochemistry major kind of convinced I wanted to become a doctor. And there were sort of a gradient of coding along my undergraduate years where like I started out first year, undergraduate like, hell no, I want to do medicine. And I want to heal people and I don't want to touch a computer. And I'm just going to take chem classes in like human health and like molecular biology. And then kind of after I took my first CS course, it just kind of sucked me in more and more. And then I eventually just became someone who codes all day. Wow. It sounds like, you know, there are all of these fields. In the hard sciences where you really can't get that much done without knowing how to at least process data through languages like R but I mean, with what you're doing, that's way more. That's, that's basically, I mean, that's straight up programming. I mean, you're a straight up PODER. Would you say that's common in your fields that, I mean, everyone needs to code, we will make an impact in the hard sciences. Yeah. So what I'll say is that in like physics and engineering, they've been friends of coding for a long time, since the beginning of basically programming languages, but chemistry and biology, this sort of revolution in the like applicability of computational sciences is relatively recent. And in chemistry in particular, I'd say even, maybe just like 10 years ago is where you start to see the real. Applications come through. And part of the problem is that. A lot of chemistry is based on a lot of qualitative insight. It's based on a lot of intuition when you go into like a chemistry lab and you say, okay, how do we build, how do we build a synthetic pathways for a new molecule? You know, you don't solve a bunch of equations to figure out how to do that. You have a lot of chemical intuition for how, you know, a certain reaction should go a certain way and based on, you know, having. However many years of experience, like decades of experience, you can do a pretty good job of guessing what to do. And when, in order to get you to the final product you want. And so that was not like a process like that was really not amenable to, you know, software and coding until, you know, the recent revolution in machine learning and deep learning where we can really attack problems that have very abstract spaces. Like this space of all molecules with computer programs. Can you give me some examples of the problems that you had much more difficulty with before ML DL? So let's say you want to build, for example a new battery, a new electric for a battery. How do you do that? Well, when you want to explore chemical space, what do you want to figure out is what combination of atoms connected to what other atoms gets me the maximum for X property. So for like a battery that would be the maximum conductance for the electrolyte in the battery. But then how do you do that? How do you, how do you put that in terms of an algorithm? You have to realize that chemical space is enormous. It's much larger than even the space of English words in like the dictionary is larger than the space of like images. It's this massive unreal, the space. That's not only do you have to think about how these atoms are connected to each other. You have to realize that there are different types of connections. There's double bonds and triple bonds. You in principle have access to the entire periodic table. The 3d geometry of the molecule matters a lot for its properties. If a molecule is oriented, sort of with like left-hand symmetry, it's very different than a molecule that's oriented with right-hand symmetry. So all of these come to play and you have to figure out a way you can include sort of a molecule into a computer as some kind of feature vector, and then somehow build some kind of. You know, statistical algorithm for predicting the properties of that molecule. And before this sort of recent revolution in deep learning, where you can actually start to do that, you can build massive representations and have massive data sets and And do inference on such large amounts of information. That was basically impossible, purely in the regime of human learning. You know, a human had to do that after decades of experience. This is just in the past for you just becoming something that you can do with computer learning. So it sounds like to some extent, this is an optimization problem. So why are you using an LDL techniques rather than something like a convex optimization solver? When I look at it and LDL, I see a combination of three things. The first thing is a way to encode information into, you know featurization a way to include information about whatever. You know, quantity or like, whatever thing of interest you want. I like the orientation of the molecules. For example, for example, the second is an incredibly like powerful. Algorithm or an incredibly powerful mathematical structure for representing arbitrary functions. That's really what a neural network is. A neural network is essentially a universal map, which takes one input set and turns it into another output set. And you can make that map as sort of large, as you want as heterogeneous, as you want, as long as you have enough computational power. So those are the first two aspects. And then the third key aspect of deep learning is optimization. It's the Catholic, the gradient descent. It's Adam, it's whatever you want to call it. It's how do we fit that incredibly powerful mathematical structure to some data. So you're precisely right. It is an optimization problem, but it's exactly the kind of optimization problem that you can only solve with like the techniques of machine learning and deep learning where you have this. Mathematical structure that allows you to generate arbitrary maps between, you know, very complicated input and output sets. And then you have a way of optimizing the parameters of that map, that function. Already. That sounds super interesting and a little over my head, if I had to be honest. I've been just trying to, trying to try to keep up. No, no, no, no, no, this is, this is great. This is great information. I think one thing you mentioned earlier is one of the sort of key ingredients for all this to exist. And for you guys to really operate in this space is, is the data. And and I guess I'm curious, where are you guys getting this data? In, in this advent of the space where you have techniques like ML and DL who's providing you this data and really how big is this data? How far back does it go? You know, this is actually a fantastic question I mean, I mean, you can like, what's the main problem with any application of DLS? How do you get the data and what do you get the data and do you have enough data? So that is the problem you're having. Right now, too. So data can come from two sources. Data can either be generated experimentally. So the experimentalists can try a bunch of different molecules and lab and measure their properties, you know, measure the conductance of the electrolyte. And then we can go and we can, you know, type that data in or record that data or pull it from the existing literature and then use that to build a model. That's one type of that's one approach. The other approach would be to generate the data yourself. So there are like algorithms that use quantum mechanics and statistical mechanics, and first principles physics to predict the property of sand electrolyte before you ever do anything in the laboratory. So you can go onto a computer yourself and design and electrolyte. On the computer and then run a simulation and get a decent prediction for the properties. Wow. Both of these have the same problem though, which is that, Oh my God, is it expensive? So experimentally, obviously experiments are expensive. You know, nobody wants to, you know, just, you know, test. A hundred thousand different, you know permutations of a molecule during their PhD. You know, that's, first of all, that's extremely boring for people. So scientists just don't want to do that. And then also, I mean, it's expensive. The materials costs the human labor costs. They're huge. You need specialists to do these kinds of experiments and on a computational end is much cheaper. But you still run into this problem where potentially you're burning like hundreds of thousands or millions of CPU hours to just produce the data set. You need to do the ML. And sometimes you get into the problem. You've done so much computational work to produce the data set. What's the point anymore. You've kind of already explored this space, doing all the work you need to produce the data set. So that's kind of the problems. We're the problems that we're trying to tackle right now. Yes, the solution. So the solutions are, first of all, you have to realize that it's you can't use ML to attack all of chemistry. You need to have some sort of well-defined problem that doesn't explore all chemical space, but like a subsection. So you don't need to use, you know, billions of data points. Maybe you can just get away with hundreds of thousands. And then the other solution is you need to adapt like ML and deal algorithms specifically for like computational chemistry. So when these things first started out, people would just use the methods that existed that were like, kind of like convolutional neural networks and just like slap a chemistry problem on it. In the past couple of years, we've learned that we can do a lot better than that. If we use some of the framework of DL, well, we sort of build aspects of those algorithms that are specialized for purposes in chemistry. So instead of just using a convolutional neural network, for example, you can use a graph neural network where you can actually represent a molecule as a graph, and there are specialized convolutional layers you use on that graph network neural network that have been shown to work really well with chemistry. Essentially, what you want is you want some kind of prior and you want some kind of prior that reflects the, the, the intrinsic, you know, symmetries of chemistry. And that way you don't need as much data, you know, the better your algorithm reflects the symmetries of your data, the less data you need. In the case of drug discovery, do you think that also applies there? Oh, yeah, this applies to drug discovery. Just as much. I'm not as much of an expert on drug discovery, but the basic principles are. Are almost exactly the same. You have some kind of protein and you have some kind of that's you discovered this is like a particularly good target for, you know inhibiting, you know, say a viral, like a virus or something. And you want some kind of small molecule to bind in a pocket of this protein to disrupt this function. So in that case, you could run up against the same problem you want to optimize the small molecule. You have to sort of. Chemical, you know, space you're searching over in order to find something that optimally fits in this pocket. And, you know, the strips, the protein, for example, the problem you run into with drug discovery is only that biology is so multifaceted. So not only do you have to worry about stuff like, you know, how well does this drug bind? You have to worry about like the Brazilian other proteins, this drug could possibly. Buying two that might cause side effects with the worry about whether this drug can ignore plastic, pass the blood brain barrier, if it has to work in the brain, for example. That's so ultimately the problem with drug discovery is that you always need to do clinical trials, computational computation can. It can work. It can reduce your costs a lot is played an extremely important role in some really important drugs. Like anti HIV drugs were first discovered on a computer before they were made in lab. So yeah, there's a long story of Merck and how they, they built these first kind of revolution in HIV drugs, on a computer. Can you expand on that? Yeah. So. comp chem prior to let's say 2005 was mostly an academic thing. Like the industry didn't really take it seriously and for good reason, because it was so expensive and so kind of abstract that nobody thought you could ever use this stuff to make something that you can make money with. And then along comes Merck and sort of the late 1990s and early two thousands. And they're sort of comp chem team plays a really pivotal role. They're the ones who first make this molecule and propose it as a anti HIV drug. And that ends up, you know, revolutionizing anti HIV, like a medication. I mean, you've heard about people like magic Johnson now that essentially have like no HIV in their bloodstream. And then once this, like, once everybody just realized what had happened, like by, I forget when the clinical trials ended, I think it was the mid two thousands. There was like a article written I think in, in like scientific American or forums, which was like, Hey everybody did, you know, you could do, like, these were drugs were discovered on a computer and then it just exploded. Then like all the drug companies wanting to get in on this. So I think at every drug company now, like every major pharmaceutical, there is a. Small to medium-sized team of computational chemists. You were saying that, you know, the industry came in, they built the space out, they made it even more famous. They made it more interesting for people to study. Can you name other. Fields or careers or areas that have expanded because of industry jumping into the hard sciences. One example I'd say in biology might be something like sequencing where a lot of like human genome project started off mostly as an academic thing. Like, like let's sequence the human genome to you know figure out more about, you know, human biology. But then I think a company was Illumina came in and they figured out a way you could sequence the genome super fast and unbelievably cheap. And so they really democratized sequencing. And now you have companies like 23 and me, which we'll just do it for anybody. It used to be something like, like the original human genome project costs. Like I dunno, however many millions of dollars. I think now they're saying that the human genome costs. I forget if it was it's a few thousand to sequence or maybe even less now, maybe it's become a few hundred. And is that because of machine learning and deep learning? Partially, I think machine learning and deep learning have made it a lot more reliable to do the sequencing. Like the data you can de-noise the data a lot better, but it was a lot of just fundamental. Advancement in, like, I think it was fluid dynamics, honestly, and just like methods and molecular bio that helped like Illumina build these next gen Sikh devices. So, you know, back to what you were saying about Merck I mean the revolutionary thing that happened to our economy in the last 20 years is that. Startups became incredibly cheap to found because they just required you to basically just required a computer and someone who knew how to use it. You know, with Facebook, Instagram, all of these types of companies. So when you talk about computational drug discovery, computational biochemistry, are there sort of couple of kids in their garage, but almost no capital founding companies that could really blow up. There are there's quite a few. I know. So in Boston right now there's just one. I was looking at the other day at river re labs, which is founded by just a couple of Harvard undergrads that just graduated a couple of years ago. There's another one I know over in Soquel, anton. That's the same kind of idea. These are a bunch of just small startups. So reverie labs is computational drug discovery. So I forget exactly what, but they have a class of diseases. Yeah. There, I think betting on the fact that they can discover a potential drug targets much faster than the pharma companies can. Get some patents out packing Todd's early. And then I don't know, I'm not part of their team. I don't know exactly what their business plan is. And I think Anton is more on the materials side of that. And but to your question of whether they can, you know, make it big and you'll go from their garage to making billions of dollars. I mean, we don't know yet. I mean, it's so new. It's past five years, past three years, past two years. It hasn't been done yet, so maybe it will be in the future. It's something I struggle actually, when I sort of think about what I want to do after my PhD. I think a lot about where there are not, I think the technology has gotten to the point where it's really capable for you to go make a startup. And just by the fact that your sort of machine learning framework is the best in the business, out-compete the big. The big, you know, companies are compete the big pharmaceuticals. I will say. I think that it's probably easier on the material side that is on the drug discovery side because the big pharma companies are just huge and massive and they have so many resources and they can try so many different potential drugs on the materials side. It's a little bit easier because I don't think the big chemical companies are as big and it's a lot. You don't have to do any clinical trials to get a new material. You just have to make an, a lab. Prove that it works. And boom, you have your patent. What's an example of a material that's used in, in the real world that was a result of machine learning and deep learning. What seems to work the best is catalysts. So there's like a famous MIT lab in chemical engineering called the Kula cloud. That has, I think already, I think, submitted patents for the catalysts they've made just purely on the computer. Catalysts are very nice because like I said before, their, their chemical space isn't as large. So you typically have just a few different atoms. You're trying to arrange to make the catalyst. And it's not that hard to generate tons of data. So you can do sort of very fast calculations on a computer to produce an absolutely massive data set. I think there are any roadblocks or artificial sort of gatekeepers or challenges that have been put in place to stop, you know, teams like the one you described in Boston from really succeeding. Is there, is there something that can be removed, that that's just forcibly been created in the first place that stopping these people be better? I think there are roadblocks, but I don't think most of them have to do with stuff like policy, for example, or culture. I think most of them are just really hard problems that you have to try and solve. I think if anything, the roadblock is just, we have to get this stuff to the point where sort of the broad public understands it's, you know, It's potential, you know, use and power. I mean, you're talking about problems that have the potential to actually change the face of like human life on this planet. Like drugs, like better solar cells, better energy storage. I mean, these problems, if you can solve or even just, you know, you don't have to solve them completely, like even a 10%, 5% improvement, revolutionizes, everything changes the face of society. So I think that's what gets me interested and the. More, we can convince other people that this is something we can do by virtue of, you know, having success. I think the better it will be. And I think, I think people are doing that. I think if you look at stuff like these venture funds, Andrew seen whore Horwitz, for example, just hired a guy VJ Pandey, who was like the most famous computational chemist. On at Stanford to be one of their general partners to basically go for them and look for companies in this space, startups in this space and give them money. So I think it's slowly starting to seep into the mind of, you know, the big software companies and the big software giants and the Silicon Valley tech world that this is something cool and we should kind of pursue this further. Along those lines. Do you think that the only people who can found these types of startups must have like a hard biochem background or can someone like me? Who's like a software guy be able to find something I think you need teamwork. I think what's unique about these problems is they're just the hardest of hard problems. You have problems that essentially are at the interface of three to four scientific fields at the same time, like physics, chemistry, biology, and computer science. So, if you want you, first of all, you need really smart people at these companies. It is basically a race to like, you know, build the best possible models possible. But you know, people generally cannot do all of these things. Once you're going to have somebody who is, you know, a really a great expert at, you know, The chemistry and the physics and the building, the models, you know, like writing out, you know, the form of the models and the math, but who is going to be not that great of a program. No. Who may have written some research co you know, a couple of Python scripts here and there a couple art scripts, but doesn't understand how to make, you know, industry scale software. So I think, I think more than the. Expertise like what, you know, what it matters is curiosity and your ability to work in a team of people who have different skills. And it sounds like a, you know, part of this is getting these fields that didn't speak to each other as much to link up. So getting people in the computer science department. Get lunch with people in the chemistry department. So maybe, you know, they start talking and a startup comes out of that. Yeah, exactly. That's exactly. I'd say that's one of the main issues is how do we get people who have traditionally for the past a hundred years? Not like to talk to each other, but just had no reason to talk to each other. And how do we get them to kind of sit across the table and solve problems with each other? And I think there's tremendous you know opportunities if they do that. And, you know, I places like MIT, I'm starting to see it happen. Well, I, maybe we can say this. If any of our listeners are you know, strong software people who are thinking of founding a new social media company, maybe consider going into computational biology and chemistry, instead, A great place for us here is to dive into bioinformatics. It is a subject. I think that is the definition of what you just described What is bioinformatics? so not a bioinformatics PhD, but I can give you the sort of basic lay down bioinformatics is how we represent biology in terms of data. Like, you know, stuff like, I mean, easy stuff, easy examples would be stuff like DNA. I mean, you were presented in terms of, you know for like a, for integer based code, like act and G. So how do we represent biology as data? And then how do we. Use that data to make inferences about the molecule. So or sorry, inferences about an organism. So like for example, how do we take the set of all MRA, M RNA molecules you have, and look at them and see that there's an abnormality in them. And that tells us that you have a specific type of disease. So this, this really sounds like. Straight up coding bio, like, did this even exist before people were writing code in this field by it's a little bit older than chem informatics because bioinformatics, I mean, as early as like 1980s, I mean, people knew, I mean, people like ever since the discovery of DNA people know, Oh, there's like a code for life. It's act and G and that includes all the information and all of yourselves. So bioinformatics has been around for a while. And they, I mean, the, they didn't always use deep learning and stuff. They use sort of more traditional statistics. Bioinformatics has gone through a long evolution to get to where it is now. The new thing in bioinformatics is just that. So nowadays we kind of have a more nuanced vision of data and biology where we understand that actually just looking at stuff like the genetic code, your genome, and your, maybe your transcriptome actually doesn't give us all the information about. What makes you, you, there's a bunch of other really interesting stuff. Like the space of all metabolites in your body, the metabolome or how your DNA is packed. Like how, so this is really interesting. So it's not just your sequence of DNA that matters. It's how your DNA is folded together and packed within this and cell of the nucleus and which areas of dead DNA sort of have enough open space that something can come in and read them. So these days, what we've discovered is that the 3d structure of your DNA is one of the most important things to understanding your gene expression. So I think there's a new type of sequencing called Hi-C and that allows you to not only sequence the genome, it allows you to look at how the genome is structured. So in 3d space, what parts of the genome are next to what other parts of the genome and what, where do they contact and stuff? And I think it's been in the past few years, people have been looking at well, okay. How do we use that to sort of control which genes can we, can we turn on and off, or even what is the science of, you know, gene X? Why do certain genes turn into skin cells or brain cells or whatever. So very practically speaking, just trying to sort of understand that, how do you get information like the genome into code, and then how do you turn that code into actionable intelligence that you can use? Okay. So simple example might be something like there's a cancer cell, and you want to understand what has, what mutation has happened that made this thing cancers. So what you do is you sequence a bunch of normal cells, wild type cells, and then you go and you sequence a bunch of cancer cells. And you have this huge amount of data. That's just a bunch of and you look for abnormalities. So you look for biomarkers of of the cancer. So oftentimes these, sometimes you need deep learning. Sometimes you don't, sometimes it's very simple. Sometimes you say, Oh, there's a, there's a single mutation. There's a snip right here. And that's the that's. This mutation right here is what's causing the cancer, but sometimes it's a lot more complicated and you need a lot more data to, you know, figure out what's going on, depending on the disease and stuff. When I was in my second year, between my second and third years of undergrad, I did an internship at Harvard medical school where I did something like this. So what we were looking at that time was not cancer. I think it was Alzheimer's and other neurodegenerative diseases. And we had, it was kind of known that the genome itself, they didn't tell us any information. So we were looking at the transcriptome and the proteome that's the space of all possible RNA molecules and proteome is the space of all possible protein sequences. In your body. And we were looking at, is there any sort of abnormalities in that data, which might, you know, give a definitive marker that someone has like Alzheimer's for example, and once you figured out what someone else, why someone has something like, for example, like, okay, so there's this abnormality in this protein that's causing this disease, then you can start to look at stuff like drugs, so you can say, okay, I understand what's causing it. I can build a drug that hopefully, you know, goes chemically, binds to somewhere it fixes the problem. Do you think this has actually, before we even get into that, can you, can you tell the people what CRISPR is? I'm not, I'm not a CRISPR person. I don't know any, I don't know much about CRISPR. What I understand is that it's a, it's a much better way of editing genes. So we've been able to edit genes for awhile. Now, CRISPR just allows us to do that with much higher accuracy. And much like much faster than we could do before. Do you think that this falls into the field of computational chemistry? Is that really being benefited by bio Canon? I think it falls a little bit into bioinformatics, but really we're talking about experimental stuff here. I think where it falls into is trying to think of stuff like gene therapies. Like, Oh, if we edit this gene here, we can get rid of whatever genetic disease somebody has. But I mean, I'm not an expert, but as I understand that the, the applicability of CRISPR to human health is still in the sort of, it's still in the early days. CRISPR is like wonderfully potential, but the hard problem with any new biology, biological technology is how do you turn potential into, you know something that's like widespread and usable. Right. And it sounds cool. Right. Can you expand on that? Well, I mean, it's just in biology. I mean, like there was this Chinese scientist who was performing, you know, I heard that he was performing CRISPR tests on human infants, human embryos. And I mean, the, and then I would say probably the risks, because low, like probably the risk that you produce, some kind of horrible genetic abnormality is low, but I mean, there is still some risks that you, for example, really mess up some. Poor child's life because you were testing like testing with them in, in the laboratory. And these were human embryos that they, you know, they implanted inside actual people and then let them grow into full-sized human babies. I think I'm not sure if that's the story, but I think that's what I heard. So anyway, anytime you do, anytime you have some new biology to make sure you're not crossing any horrible ethical lines. Into like like some kind of like a crazy scientist, otherwise, I don't know. It's just bad. You, you become like an evil, super villain and like Batman needs to come stop. Yeah. I mean, when are computational chemists going to get any playtime in these Marvel movies I'm waiting for that day? I think I already have, you know, like like you could say like no, like vision and stuff, like building like a. I don't know. No, I think I think, I think for us as a group already Audrey took, we take, we claim vision for ourselves claim, vision, and, and Ironman for herself. You can have, we can have captain America. I don't know. But Ironman, Ironman is a little bit more mechanical. I feel like unless actually he, he created vision. Didn't use it. Optimization problems for flight path. Anyway. So you, I think you've done a really good job of explaining these very difficult challenges that you guys are facing. But let's say at its highest aspiration, let's say you guys were able to overcome all these challenges. How do you think the world is going to change in 10 years because of these new advents in these field? Well, I think that. If you look at the sort of existential problems facing humanity, they are exactly these problems. They are. How do we store energy better? How do we extract energy better from the environment, how we do, how do we do all of that cheaply? How do we sort of, you know, how do we treat that? How do we, when we have a new virus or a new pandemic, how do we go from that to having a vaccine in like under a year? Each one of these problems, I think you don't need to make, you know, revolutionary progress, even five or 10% increases can, you know, change like the quality of life for everybody enormously. So I think right now, sort of like. If I think, I think the biggest change that will happen for example is that there are one or two or a handful of kind of real winners from this, like a race to make new, you know, materials and drugs computationally, and those kind of handful of applications that really work are what really produced like a couple of crazy new businesses. Where, for example, you know, for the new catalyst thing, if we can figure out a way to, for example, catalyze CO2 a lot faster, that helps us, you know that really sort of alleviates the problem of global warming and the potential economics risks a lot, you know it makes it like green technology, a lot easier to adapt. It gives, it buys us a few years before we really run into the problem of like having these feedback loops of global warming costs, some kind of catastrophe. So I think it's very hard right now to predict exactly what areas of computation are going to be the winners of this game. I don't think it's going to be all of them. I think a lot of people will try to attack certain problems in computational chemistry and computational drug design then will just be too hard. And it will, until we spend like 10 years, we won't know that it's too hard. But whatever are the couple ones that really ended up working, whether it's like catalysis or, you know, specifically. You know, drug discovery program. I think they will be like, I think they will produce huge industries that I think will actually have, you know, a very, they, they will like make people's lives. Better. And probably what will actually happen is that no one will actually really understand exactly what happened. Like why everybody's like, like the great, the amazing thing about technology is that if it's really revolutionary within like 20 years, you it's become so normal at Google. Like you don't think about how crazy it is, what Google search engine is doing. You're just kind of assumed it as part of your life. It's so revolutionary that you don't even think about how revolutionary it is. That's true for almost any technology like you don't look at your cell phone and think about how amazing it is that radio waves in space are being read by a micro meter, long antenna that somehow can transform oscillating electromagnetic waves in space into text information in a liquid display. You don't think about that, but that's insane. There is absolutely nothing that if you just looked at the natural world will tell you that's possible. So I think that's probably what will happen with computational chemistry. You'll have a couple of crazy advances that will really improve people's lives. And within 10 to 20 years, everyone will just take them for granted. And it will just be like, Oh yeah, that's the thing we can do now. And they'll, they'll complain about it. Like they complain about not getting bad cell phone service, even though you have beams coming from space to let you talk to people. Yeah. When do we understand what goes into innovation? It kind of blows your mind. Like wow. Humans were able to do crazy stuff. Right. Vaughn, you've mentioned to me before that maybe there's ways we're hurting innovation and th there are ways we can make innovation faster. Can you speak on that? Yeah. Going back to Ross's point that, you know, compensate the, these problems we're solving are so multifaceted and stuff, and they involve a lot of hard science and physics. Part of what can make a lot of this stuff easier is just if we had the ability to educate people a little bit in a more modern sense. So. We currently have this education system that I think is tailored. I think a little bit too, you know, science of 10 to 20 to 30 years in the past where sort of, you know, you're learning calculus in the final year of high school, but in order to tackle the problems that we need to tackle, I mean, there are people who need to like be coming in to college already knowing like computer science and like, I would say calculus and linear algebra, just ready to go. And then from that fundamental groundwork build like building up whatever, you know, whatever your particular interest is, whatever you you want to go from there. I think we struggle a lot in this field because, you know, In college. And the first two years were just trying to play, catch up with the new students, kind of teach them. Okay, here's the basic stuff you need to be able to do in order to innovate. And then by the time you've graduated college, you know, you want to go to make things now, like you don't want to learn forever. But that's really hard to in four years take like. Spent two years just teaching you the basics of the basics. And then in the other two years gets you caught up to make stuff now on this level, you know, so I think one of the things that holds us back is just the fact that we need to educate people in a way that allows them to tackle these problems sooner. So.'cause not, not a lot of people want to do a PhD for five years to get to the point where they really understand these problems like deeply. And then that's, that's fine. Like to me, that makes perfect sense. Like, you know, by the time you're 22, you really want to go out and start to, you know, make your own life and innovate and build things. And not everybody wants to spend, you know, the majority of their twenties in school, but if we're going to be doing that, you know, and we're. Like, and you're not some kind of superstar undergraduate. Well, yeah, you're going to like Harvard and, you know, you're just came into college already knowing these things. We need to do a better job of getting like the general populous up to speed on fundamental concepts and math and science. Isn't that the role of advanced placement classes like AP classes. Isn't, isn't that what they're supposed to do? Like recreate college classes for high school kids. I think it is partly, and I think they're doing a good job. But I mean, how many people take AP classes? First of all, I think it's not that many people. I don't think it needs to be something that's very stressful where we're kind of pushing kids to learn more and more hard, you know, math. I think we can. You know, re orients the curriculum a little bit could probably do a better job. I mean, I'm not an expert on education, but I, I sometimes wonder, like, you know, we spend a lot of time drilling the students on how to do basic computations in math classes, how to multiply matrices and a little bit of that is important, but nowadays you do all of that on the computer. Anyway. So like, for example, I think it would be really good if you could just have math and CS almost be the same class, do some of the math, you do some of that. And then you go and you build out a computer, you see the, you see how that math works right there on the computer and you can see the applications of it. And then you go back and you learn to map. And I think there's. I mean, there's so many ways you could hybridize those two. You can hybridize those two in algebra and calculus and statistics. It's, it's, there's a natural sort of interlocking there that I think could be really good for young students really protect, prepare them so that when they go to college and they see stuff like machine learning for the first time, they're like, Oh, well, I mean, this is easy. I already had all this computers and math in high school. I know this is, this is just like the next application. You're just doing a bunch of linear algebra right here. I know it's breakfast is breakfast for me. Yeah. I don't think it has to be something where like we're pushing students and be like, Oh, like, do all these homework problems, like spend three hours a day doing the F I hope it's not like that. Cause I know that like, Having been a 14 to 18 year old person, myself, not too long ago. I know that's not what teenagers want to do. Right. Our Vaughn it's been amazing. I've learned a lot at, I say that every episode, but I I'm, I'm biased. I have to, I don't say that every episode, but I've definitely learned a lot. Oh, okay. Okay. We're going to talk about that. last, last question for you. What is the best piece of software that's ever been built either in recent history or full-time. This is a question I struggle with because it's like, what is the best? What is the best or what is it is different than what is the most useful for me? Yeah, probably the most useful for me would probably be something like Python because Python is kind of really a wonderful revolution in democratizing. A lot of CS, like before, if you wanted to do like some quick prototyping, you know, some people use mat lab, some people, depending on what you want. I mean, curl, if you were interested in like, you know, texts stuff, but Python kind of puts that all under one umbrella, you know, and then it's, it's so easy to just, you know Interface with like really common libraries. So you don't need to reinvent the wheel. You don't need to write, you know, the newest version of the fat, like the 80th version of the fast for your transform. There's a super fast version of four year transform. You just get it from scifi and boom. You're done. You're off to the races. So for me, it's probably Python is wonderful. Really my opinion, really wonderful programming language. It's a little slow, but that's fine. Given the, given the, the, the computational inefficiency is offset well by the improved, like coding and prototyping efficiency. You know, I suspect that if people in your field, how do you see a plus plus innovation would take a lot longer? Oh, for sure. For sure. I used to code in C and C plus plus a little bit in undergrad and a little bit at my first year of chemistry grad school, but then I'm past like year, year and a half. I've pretty much only use Python. And the thing is I always kind of run into this question. When I start a new project, like, is it worth the amount of time? It will take for me to build this in C plus plus versus the efficiency bonus I get. And for like 99% of the question, like the questions I answered usually is not worth the amount of time, extra time. Thank you so much. We really appreciate it. Thank you guys. This was fun. Thanks for listening guys. That's our episode for this week. Make sure you leave a comment. Tell your friends about us. Review us on Apple podcast, subscribe on Spotify or wherever else you get your podcasts and we'll see you for another episode.