Phase Space Invaders (ψ)

Episode 4 - Modesto Orozco: Computations driving experiments, opening simulation data, and integrating knowledge across sources and scales

March 12, 2024 Miłosz Wieczór Season 1 Episode 4
Episode 4 - Modesto Orozco: Computations driving experiments, opening simulation data, and integrating knowledge across sources and scales
Phase Space Invaders (ψ)
More Info
Phase Space Invaders (ψ)
Episode 4 - Modesto Orozco: Computations driving experiments, opening simulation data, and integrating knowledge across sources and scales
Mar 12, 2024 Season 1 Episode 4
Miłosz Wieczór

In the fourth episode, Modesto Orozco and I talk about the rough path to the predictive power modern computational science has achieved in biology, and the immense possibilities it opens to today's computational biologists working across the scales of space and time. Modesto also reflects on why sharing simulation data is crucial to make sure that our results are trustworthy, and how access to other people's simulations can become a gold mine in the data-driven era of computer modeling.

Show Notes Transcript

In the fourth episode, Modesto Orozco and I talk about the rough path to the predictive power modern computational science has achieved in biology, and the immense possibilities it opens to today's computational biologists working across the scales of space and time. Modesto also reflects on why sharing simulation data is crucial to make sure that our results are trustworthy, and how access to other people's simulations can become a gold mine in the data-driven era of computer modeling.

Milosz:

Welcome to the Phase space invaders podcast, where we explore the future of computational biology and biophysics by interviewing researchers working on exciting transformative ideas. Today, I'm talking to Modesto Orozco, who, aside from being my boss here at IRB Barcelona, is mainly known for his contributions to the multiscale study of nucleic acids. From his early quantum chemical research on the energetics of individual nucleosides, through breakthroughs in all atom and coarse grain modeling of DNA, all the way to describing chromatin organization in the nucleus. His career is a great example of integrating multiple levels of physics modeling with cutting edge experimental findings. So in our conversation, Modesto points out that we are now at a moment in history where insights from computational research can finally drive experiments and provide testable And that is important for us as a field to take initiative and not limit ourselves to merely reproducing what is otherwise known. We discussed the importance of open data for trust in science, even if that means that mistakes will be made public. And how the upcoming revolution in biology really relies on bringing different types of and massive amounts of experimental and computational data together. Hope by now you are curious enough. So let's get started. Modesto Orozco, welcome to the podcast

Modesto:

Thank you.

Milosz:

so before I joined the lab, uh, you know, we had met at a few conferences and I have to admit, I couldn't help, but notice how often people looked up to you for all sorts of advice or opinion on the latest developments and how your opinions on both theory and experimental techniques tend to be really well informed. But then I learned that you really started your, your scientific career in quantum chemistry, which is quite a tiny slice of what you're doing today. So, I was thinking if you can share your reflections on how one goes from a bunch of atoms, you know, to a chromosome. That's a big change, during a career. Not to mention understanding what happens in the wet lab.

Modesto:

You're right. I mean, I start as a quantum chemist. I mean, actually my my first top paper was an electron correlation of mono electron operators or something that I think nobody's really interested right now. And then I I make, I would say quite a lot of advances in the self consistent reaction field theory. So in what are called now continuum models of solvent. And on the calculation of electronic properties, yes. And I think it was a kind of continuum in the sense, I mean, what, at the time it was a long time ago, but it was interesting to, to knew about biological systems. And then you realize it was very obvious there, and it's even obvious now is that if you wish to tackle biological systems, you need to go to things that are faster, more efficient from a computational point of view, otherwise you just study a very tiny detail with a very high level of resolution uh, but you never get, uh, a complete picture and this is what has been my entire career is trying to get, um, a wider as possible, uh, view of biological systems and then they have been navigating from different, uh, methodologies from. Quantum to QMMM, to classical, even to, uh, bioinformatics. So that there is a, is a kind of continuum. Is that, is the curiosity, the this idea to look, uh, behind the courtain. To see what is, uh, like if you were a kid that you have, uh, you have a toy, you have, a car and you break into pieces, you know, to see what is inside. This has been my, my main motivation. This is still my, motivation. Of course, on this journey, there is a point where you, you realize, and for me, it was a couple decades ago that there is a point where you should not look anymore. to the experiments as a kind of enemy or as a kind of a thing that someone else did. There is a point where you realize that actual experiments can be actually fantastic to complement all these things that you don't know, all these boundary conditions of your simulation. And this is why we start, doing the simulations coupled with experiments.

Milosz:

very nicely fits into the pattern of people from physics. Moving deeper into biology and applying, you know, always the same methods of physics to more and more complex problems in, in biology. And so you've been on top of this revolution for the last, was that three decades now?

Modesto:

Well, I think my first paper, I think is from, uh, eight. 1886

Milosz:

to four,

Modesto:

1887. So yes, sometime. Yeah.

Milosz:

And then you still managed to, to be on top of the things, right? So let's talk about. What are the upcoming things that we have to grapple with? what Do you think will drive our specific or your specific research in the next five years?

Modesto:

think that this is a very exciting period. So I'm, I'm really I'm ashamed that I'm too old. So I would like to be 20 years younger just to catch this revolution because for many years, I mean, we were like, uh, Proving that things could be possible. So I remember my former supervisor, Bill Jorgensen, said that it was first step in all the theoretical development was to publish completely relevant research, but that proved that something could be done. It is like it called phase one. So is that is that you people I'm talking now about, you know, 90s so that two people was the pioneers in the field were able to the most that they were able to do was to run something that was already known for decades and prove that this could be reproduced by by theory. It was extremely exciting and it was a Nobel Prize winner idea, right then we pass another another wave. And this is why when I start working, really the main objective was To give a rationale to things that you already know. So is the, uh, the idea is that you have a system that is well characterized and then you run simulations to really explain what was already known. And this was actually a quite cool type of, uh, simulations, but you didn't really learn. Uh, new things you just explain. And I will say that in the beginning of the 2000, we start having a predictive power, but predictive power running at a very detailed level. So if you look in my field, nucleic acids for until the 2000, you just ruin the structure. It was. Peter Kollman said, I mean, you take a beautiful x ray structure, you run a simulation and you ruin the structure. So, in the 2000s, we start, uh, with, uh, simulations being able to provide something that had some predictive power, but that was, it was predictive power in the shortest scale. So the idea was, yeah, I mean, this hydrogen bond is important for something, or this stacking interaction is important for anything, or this loop movement may capture some important biology, right? But I think in the last, uh, years, uh, we are actually in the middle of a new wave of a true revolution where, uh, and this is obvious, I mean, I'm not going to talk about alpha fold because everybody knows alpha fold, but yes. Take a look at what has been done on, on RNA and prediction of folded structures on, uh, we are in a revolution where actually simulation, where theory can be in the center. I mean, it's not going to provide any more details about what, uh, how biological system, but can be actually the core of the discovery. So it's, it's now, um, you see the big journals, you see every. Issue. There are several papers of people doing landmark calculations. So we are in a really exciting period where we can be not only the partners of experimentalists doing big research, big discovery, advancing science, but we can be in the front line of this discovery. So it's a really exciting period. Of course, we are still far and there are many things that need to be improved, but it's people like you, you are, you are, you're lucky guys, because, uh, it has been a long period of very difficult work so you, you tell at the beginning and you were right that somehow now people, uh, value my opinions, right? In different fields, you know, but believe me in the nineties, it was absolutely irrelevant what I believe, even if I was completely right, nobody paid attention. And not for me, it's just because nobody really pay attention to a simulation. It was, yeah, it's beautiful for, for making a video. But. Not really providing, real information and things have changed dramatically. So it's a

Milosz:

lot of convincing to be done with experimentalists, right? That slowly, slowly What we do, will become relevant and they will have to, the same way that we always had to look up to experiments for some sort of ground truth, they might one day need to learn how to interpret our results to, to stay on top of their field. That's true.

Modesto:

I mean, it has been a big change. I mean, you have seen how often people came into my office, explain me problems and you have tackled some of the problems, right? And they came when something is very difficult, very complicated. It's a point where they actually, we are in a situation where we can drive a full experimental lab to follow our ideas and to actually validate our ideas. And yeah, so it's a very exciting period. Of course, still mean, there are some areas where science is advancing very quickly and we still don't have. The power, I don't know, single cell biology, right, is advancing very quickly. We still don't have this ability to really impact in these fields. In other words, yes, we are actually impacting, uh, very, very seriously.

Milosz:

right. I remember the first time it really impressed me how far we can depart from, let's say, standard simulations and be relevant for the experimentators was when I saw the Hi-C results that you already had been modeling in what was a 2016, 17, right? trying to tackle the question of chromatin organization in the, in the nucleus.

Modesto:

Yeah, yeah, this is a, I think this is a wonderful example. I mean, this is the point that the experimentalists used to apply this, uh, to the interaction maps really. They didn't get a clue that it was actually what you have behind was a very complex molecule, but I mean, and I think it would be now impossible to publish in a top journal if you don't put a structure behind this, but we can go actually at this point, I think we can go. farther than experimental. It's because we can go to single cell where it's very difficult to tackle experiments. And we can, in the case of chromatin, for example, we can tackle things like lesions in the DNA, non equilibrium processes. So it's the The chromatin is activated based in any in, not in the equilibrium things. It's based on how fast a signal arrives to a a certain place. And this triggers something, you know, and this can actually divide the cells in two different clones in terms of, uh, and this is a type of thing we should be able to simulate. I, I know you are aware of this, uh people doing, uh. entire cell simulations. Of course, this is still uh, proof of principle type of things, but different groups are, Siewert Marrink probably the most relevant one that they are dealing with systems close to 1 billion and the expectation is to simulate a subminimal cell with everything there. Of course, the information they are going to have, they are going to get will be not ultra accurate, ultra exciting, but this is a proof This will be done sooner or later, you know, if we have enough computer power and the functions are good enough, it will be possible to actually get, you know, uh, well, almost sort of principle simulation physics of the cell, something that was just impossible to dream uh, five years ago. So we are going in this direction to, to actually make the system more complicated, closer to the reality

Milosz:

mm

Modesto:

You know, a few years ago, people simulate a piece of, uh, 50 pairs of DNA, and that was fantastic, no? And now, I mean, uh, we are simulating thousands and thousands of these pairs. I'm sure, we'll be able to simulate the entire chromatin, uh, at a, uh, Base per resolution levels. And so this is a very exciting period. Yeah.

Milosz:

Yeah, there's a very clear kind of structure to the problem that we face, right? From simulating things in vacuum to things in solution in vitro, then maybe in cells, we are probably at the level of trying to approach the cellular level where experiments can distinguish between single cell and ensemble data. And then eventually we'll probably try to approach something that, uh, resembles. You know, the entire organism in the context, but this will require a lot of integration of data, right? So, right.

Modesto:

you know, it's complicated, but think that, um, I used to give my, students, I used to give the example of, physics when physics was classical physics. I mean, when classical physics was mature and now it becomes engineering. So, uh, when people make a bridge, it doesn't make a bridge and see if it doesn't fall down. Because before making the bridge, actually, there are all the equations that show that this bridge will work, biology, as biology is increasing, The back of knowledge that is behind, it will be sooner or later that will became engineering. So it will became dictated by the basic rules of physics. Something that, uh, you know, uh, Dirac, uh, say more than a century ago, it's just a question of time. And as we increase the amount of data, the amount of information that we have, we are closer and closer to be able to explain The biological system and the biological reality. I mean, for me, it's just amazing. Something so, so stupid as, uh, you know, uh, being able to predict what is the structure, secondary structure of RNA and what is the stability of RNA, uh, can fuel the design of vaccines. So to make vaccines hundred times more powerful. And this is something to, from a theoretical point of view, is something that is so evident. then it has such a dramatic systemic impact. So we are going, as you say, to the systemic level to the organism level from the very small. So this is, you know, it's going to be a generation or two generation more, but it's just a question of time, I think. There is nothing magic in biology

Milosz:

Yeah That's, a very good phrase to keep in mind. Yeah. I really feel impressed always by on the one hand, you know, how much cross talking there is in a cell. So how everything depends on everything else. And how, at the same time, as you mentioned in the vaccine case, we can still make a single molecule that goes somewhere and does something, right? So I think the integration of data is really the frontier that you're trying to point at. As you say, for example, we can look at the physical location of the chromatin, then we can look at the modifications, then we can look at dynamic processes and bring it all together. Uh, which is something we so far had always looked in isolation, so to say.

Modesto:

it's that science always follow this reductionist approach. It's like has been driven by structural biology really for, for many years. You start from the very small and then you have all the pieces and it's now a thing, it's time to combine all these pieces and really make the entire, puzzle, you know, the entire uh, make the entire picture, I would say that we are in approaching to the situation. Of course, we still need to be guided by experiments. I mean, we are not self, Sustainable. I mean, we need to use all the techniques to, to limit the complexity of the, of our ensemble space, but take a look back. I mean, it's a good exercise. I always told my, my students is take a look to papers that were published 10 years ago. And you will see that how primitive, you know, how., The science may be good, but how simple everything was, and then take a look to one recent paper and see, you know, wow, the complex picture, how everything is integrated, how experiments are mixed, you know, it's a very clear change.

Milosz:

Yeah, and that's a great segway to the second part of the conversation, which is, uh, you suggested that we should be thinking about, how to integrate all those things, all those pieces of information, all all these data. And you're also. Deeply engaged in some of those efforts, right? So How can you imagine we will move on as a community integrating data and making better use of what is already known?

Modesto:

Yeah, this is, this is a very, very important point. So the point is that our, somehow our field is stuck in the 70s, you know, and it was the time when I was not in the 70s, it was a little bit later, but it's a time when I started this, you ran a simulation or you run a calculation, then you write a paper. And, uh, that's it. And you publish the paper eventually. And the point is that it is a lot of, a lot of, a lot of information that is lost. And this was because somehow we were not confident at all on the type of things that we're doing, I'm thinking now in some of the Work publisher during the pandemics. And so the, the magnitude of the effort that was made and the type of data that could be derived and how this data could guide new experimental efforts All this cannot be waste. I mean, and this is something that even the protein databank recognizes. I mean, I'm sure that in four or five years will be like, not only, you know, structural genomics database, it will be also simulation database, things coordinating everything with everything. I mean, uh, It is important. So if our simulations are useful, our simulation should stay there. It's the same that sequencing a guy, if it's useful, then it is because not only, you know, solve a problem on pathology, but it's because all this information may be useful for something else in, uh few years, and all this is actually making the simulation and the results of the simulation a central resource for, further developments, you know, and this is a different way of working, a different way of thinking that, okay, the simulation is something that, or in general, the theoretical calculation is something that you do, then you process, you try to get the most of the results that are there, but should be keep because it's, Then another theoretician, another experimental can look at that, at the data and actually get information that uh, you didn't get it. You cannot be so arrogant to believe that everything that is in the simulation, you, you analyze it. This is, absolutely incorrect.

Milosz:

Yeah. So even the recent thing that, well, you published, our lab published the database of simulations. This has a potential to be quite revolutionary. In the sense of giving people the access to all the simulations that have been made to maybe learn something out of a collective body of knowledge like this, right? We, I think we

Modesto:

Yeah. Yeah. And it's

Milosz:

we haven't really explored the possible consequences of having all this data at our fingertips.

Modesto:

I think we didn't trust that our simulations were veritable. So I was doing something, kind of back of the envelope calculations, right you know, that we are working with the Ascona BDNA consortium, the ABC consortium, and then, and the last one we have been involved in for many years. And the last one, I mean, it was an effort of 10 groups. Now I think they are close to 30 and the final would publish a paper and the paper was some, something on bimodality or whatever, you know, what's. It was okay. But from the data, in this case, we keep all the data in a small database. I mean, from the data that we obtain has been probably around 100 papers that were using this data, that we're using for developing, in our case, for developing coarse grained models for DNA, but other people use for, for many other things. And um, You know, from interpreting, um force microscopy data to interpreting circularization experiments to interpret chromatin, uh, nucleoids, uh, everything. So you, you cannot predict why and how the data will be used. So when a crystallographer or an EM person or an NMR person put a structure in PDB, he has no idea. He's just science and it's data, it's information, and he doesn't have an idea what people will use it. But he trusts that the results are right, are correct, are useful, and this is something that we should start doing. Okay, trust our results. If they are wrong, no problem. Someone will tell you that they're wrong, but it can really help in many, many different things, um, from drug design to very biophysics, uh, study, even to cell biology, very basic mechanism, microbiology, many things can be, can be derived from these data. If it's data share, if it is a good format, yeah. Yeah. It is a kind of maturity in the field.

Milosz:

You bring up an interesting question of, I always thought that there's enough incentive to make those things public because if other people refer to your work, you are being cited. And that's a kind of the incentive structure of academia, right? But people can also fear, being revealed as someone who made a mistake or,, misconstructed the system.

Modesto:

But this is the same that happens in the beginning with a protein databank. So there are many structures in protein databank that are wrong, and in the beginning there were many that were wrong, and then these were detected. And they were removed and they were corrected. So in our case, yes you can actually more or less reckon what simulation is clearly wrong, you know, by looking just to the profile, but sure, it would be, would be simulations that look like right, but get to the wrong solution, the wrong conclusion for different reasons. But this is science

Milosz:

I remember always being quite paranoid as a young person, like younger than I am now about possibly building a system in the wrong way. And, you know, coming up with a conclusion that's completely based on the false assumption. But then I as you say, it's part of the process. We should try the best to avoid mistakes. But when the mistakes are there, it's not the end of the world. Like, as long as you can, as long as you can identify them and call them out and maybe refine things

Modesto:

it may happen. You know, I mean, I have published 500 papers. I don't have any single retraction, but It may happen in five years that someone finds that they did something wrong and they have to retract the paper because they make a tremendous error. I don't know. Um, it may happens, but it happens continuously in experiments and nobody really, um. You know,

Milosz:

simulations are most of the systems are anyway approximations, usually quite coarse approximations with protonation sites with, I don't know, some cleaved fragments or, glycosylation or all those things we don't have most of the time. So, yeah, there's always an aspect of something being slightly wrong. The question is how wrong, right? Okay, so great Modesto, it was a pleasure to have you on the show thank you for sharing your insights. and for taking the time to be here.

Modesto:

Okay, thanks a lot, Milosz.

Milosz:

See you soon in the lab.

Thank you for listening. See you in the next episode of Face Space Invaders.