Phase Space Invaders (ψ)

With the convergence of data, computing power, and new methods, computational biology is at its most exciting moment. At PSI, we're asking the leading researchers in the field to discover where we're headed for, and which exciting pathways will take us there. Whether you're just thinking of starting your research career or have been computing stuff for decades, come and join the conversation!

All Episodes

Phase Space Invaders (ψ)

Episode 1 - Pilar Cossio: Modeling experimental setups, overpublishing, and maintaining code

February 20, 2024 • Miłosz Wieczór • Season 1 • Episode 1

0:00 | 27:00

Send a text

In the first episode, Pilar Cossio and I discuss the radical progress in integrating simulations with experiments, and the excitement about recent progress in modeling cryo-EM tomography data. We share thoughts on the sustainability of our publishing practices, and comment on the challenges of funding the maintenance of scientific code libraries.

Milosz: 0:00

Welcome to the first episode of the Phase Space Invaders podcast, where we explore the future of computational biology and biophysics by interviewing researchers working on exciting transformative ideas. Joining me today is Pilar Cosio, a project leader in the Center for Computational Mathematics at the Flatiron Institute in New York City. Pilar kindly agreed to be the first guest of the podcast. As about two years ago, we started working on some common projects. We met at the first in person post pandemic biophysical society meeting and ended up applying a method for calculating kinetics, co developed by her, to simulations of the spike protein I was running at that time, pretty much like everybody else. In fact, Pilar has an incredible history of working on computational models that keep bringing us closer to the world of experiments, and she's been collaborating with researchers across the board and around the world to bring the best of the two domains, getting our simulations ever closer to the reality out there. And so Pilar is giving us a peek into the future of in situ cryo EM tomography and how computational methods can give us insights into the molecular organization of living cells. On top of that, we get to discuss the current state of scientific coding as the writing, testing and maintenance of software is becoming an increasingly tedious and chaotic component of our everyday scientific work. So if you're anxious to hear more about it from Pilar Cossio herself, let's go. Pilar yeah, I remember a moment, when quite a few people who have been working on integrative biophysics, so reconciling data from different experimental and computational sources, and free energy methods, suddenly started saying, Oh, have you seen You know, that paper from Pilar, and that was the idea of Bayesian reweighting of cryo EM data. But so you have worked along these lines for quite a while and together with some of my personal idols. So can you first briefly summarize your path to to becoming a group leader at Flatiron? Yes,

Pilar: 2:21

Sure. So I am a physicist by training. I was born in Medellín, Colombia, and that's basically where I grew up and where I did my undergrad studies. I really liked almost all the areas of physics, and so I was doubtful what to do. And so I had the luck of going to Erasmus Mundus in Paris and I took a lot of courses like astrophysics, particle physics, condensed matters, and among them, I took introduction to biology for physicists. And I was really, really like amazed with the world of biology and these things that the biologists typically, they tell stories of here comes the ribosome and here comes the tRNA and then things move around. And I'm like, how does this work? How can biology work from this, like a physical perspective? And so it really opened a, like the curiosity of trying to understand biology with a physics perspective. And so I got into a PhD program at SISSA in Italy. And I worked in computational biophysics with Alessandro Laio, working on protein folding on enhanced sampling on basically using atomistic simulations to understand protein folding. And then I, started my postdoc at NIH with Gerhard Hummer. And there we started thinking about experiments, analyzing experimental data, and trying to basically develop tools to extract information from these types of experimental techniques. And in particular, I worked on cryo electron microscopy, I still work a lot on cryo electron microscopy, and also on force spectroscopy, on pulling. So basically very different perspectives of how we look at biomolecules from simulations, from like single molecule dynamic information, but very low resolution and from the high resolution perspective of snapshots that are very noisy, but you don't have time information. And so basically after that I also went to Germany, did a postdoc, followed Gerhard when he became a Max Planck director there, and then had an independent group leader position, Max Planck, and back home in Colombia. And after a lot of diffusion around the Atlantic, I I'm here, two years ago at the Flatiron Institute, and basically trying to join forces between, simulation techniques and experimental techniques. And in particular cryo EM.

Milosz: 4:52

So as you say, it's a very complicated journey. It might look like a, like a free diffusion from, from a top. But it's really amazing how physicists in general go into any field and make huge contributions by using the tools of physics and always getting inspired by the mysteries of other fields like biology, as you say, right? Yes. I see this in anything from sociology to, to chemistry to any, any field you can imagine. so Yeah, what is the area that you see? You already hinted at this with your scientific interest, but what are the areas that you see as the most promising for the next several years?

Pilar: 5:35

So I am very excited about having finally simulations and theory in hand with experiments. So before, I think a couple, like 10 years ago or something, my feeling was that it was like the experimentalists that were leading. And so they would do this type, some type of experiment and then didn't understand something or wanted a slightly different complementary view. And then they would go and ask the theoretician or the computational person to, oh can you please help me I'm going to explain this, or can you please do this simulation? So it was always like we were, the theory was behind. And I think now people are starting, and theoreticians are starting to guide more the experiments, and the experiments and the computational people are working hand in hand, and I think it's going to be a path that has to be taken together. And so I'm, I'm very excited about that. Yes, I think that both like computational power, the computational resources, and all the new advances in the, machine learning type of algorithms where you can now handle tons of data, where you have functions that are very complex and you can fit very complex phenomena, and where you also have like some generative models or forward models from the physics perspective that give you a An idea of really the experiment, how the experiment is taken, all of these things will help us a lot.

Milosz: 7:01

As I say, I share the frustration sometimes when people come from the experimental side with everything already done, and they just expect a sort of explanatory story. It even has a name of harking, if I remember correctly, so hypothesizing after results are known, yes, which we often were left in this kind of here is what the result has to be find a way to make it this way. But I agree also that these days we have much better tools to actually, I mean, you have been involved in actually modeling the experimental setups in a way. So if we're talking about single molecule spectroscopy, for example, you explicitly try to model Oh, I don't know, the physics of the linker, or the kinetics of the experimental additions, not just the molecule, right?

Pilar: 7:51

Yes. Yes so if we really want to extract good, accurate information of these tiny biomolecules that move around that are like frozen in ice, I think we need good forward models, good biophysical models of this is my protein is linked here. Then there's a DNA tether. And then there's this big gigantic apparatus that exerts force on it. And obviously all of these processes are very complex. But I think one of the arts is having the minimal. representation of factors that best describe the experiment and that's not trivial at all. Yes. But very important.

Milosz: 8:36

Yes. And it really exposes our sometimes lack of understanding of even just fundamental physics of the experimental setup, right? I remember these discussions around, for example, what is the actual cooling rate? of the matrix in cryo EM matrices. So this is something that nobody knew and the estimate could vary by orders of magnitude, right?

Pilar: 8:59

Many orders of magnitude. Yes.

Milosz: 9:02

And this is not something you can easily measure. So people had to rely on kind of common knowledge approaches to, is it evaporating? Well, what are the gradients given all the physicochemical situations that are happening?

Pilar: 9:19

Yes, and it's essential to know this because if we don't know the cooling rate, we don't know how much our confirmations are affected by, Where they are in the free energy landscape. And the funny thing is that when you start like getting more into these types of experiments you realize how Much it's all like empirical. It's like I throw this it works but I have no idea why and I throw this and it doesn't work in and then I change it modify the setting and then something works, but understanding why it works is still

Milosz: 9:53

non trivial. Yes, I agree. It, it can sometimes look like it's something trivial, like it's a third order correction to the main issue. But then when you try to model those things, as you say In modeling, you often have a free parameter that depends on some experimental condition, and then there are limits to what you can model if you don't know those very fine details. Yes. Right, so what was the other thing that you wanted to to share

Pilar: 10:22

So I wanted to tell you, so another thing that I think is going to be very exciting in my field, is In situ cryo EM, which means basically, so typically what say single particle cryo EM does that has been around for many years and lately like there's been a huge boom. In structures that have been resolved with cryo EM is that you have copies of your biomolecule in a purified sample in solution. So you know that everything you see is basically your molecule of interest. And then what you do is you flash freeze the sample and then take images of this. And there have been, like, many well established techniques to study cryo electromicroscopy single particle. Now, what is going to be very exciting now, and like the field is moving toward, is that you freeze, instead of a sample of purified protein, you freeze a cell. And so you freeze that cell. Sometimes you have to mill it. You have to make a very, very thin slice of this frozen cell. And then you take an image, or many images, of this, of the cell. And so there you have like everything that happens in the pure native environment, and it's very challenging because basically you have a lot of overlap of proteins and molecules that you have no idea what they are. The conformations are different, the cellular environments are different, so it's a field that is very nascent, and it's the methods they are, are only being developed and the techniques of what is the best way of, for example, should I take one single image or should I take many a tilt series of images? All of these things are still being explored. And, but I think looking at the biomolecules now in their cellular context, that's going to be amazing. That's going to be really, really cool.

Milosz: 12:19

Right. So I remember having those research questions or problems where people would not know, for example, the oligomeric state of something. And very often there is no easy way of knowing. So if you, I can see that if you can just take a snapshot and just do a statistics. But then as you, as you alluded to, there's, there are so many molecules in a cell that it really becomes combinatorial. probability based problem, right? So you would have to have some sort of database of possible molecules and try to do a maximum likelihood assignment? Yes.

Pilar: 12:55

But even, so we're not even there yet. So just identifying so you have a very crowded environment you have no idea what's there and there's tons of noise. Because it's a thick sample, and then, and you can't shoot your electrons for a long time. You can irradiate your sample for a long time, so basically you have a very noisy image. And now the only, like the main problem for the moment is, can I find the molecule of interest? Not even doing statistics, but can I see it? Can I find it in this very So one molecule at a time, right? Yes, one molecule at a time, can I find it? And for the moment, it's still very, very challenging for things that are smaller Eh, say than a ribosome. So we, we can find big stuff, but we can't find yet small stuff with high precision. So this is the first problem. And then imagine that you have many, and then you can find them, and then you can do statistics and like see where they are located in space and what compartments of the cell and things like that and what confirmations they're in. So there's like. A million questions. But for the moment, the simple question is, can we find the protein that we think is there?

Milosz: 14:07

I can imagine. Have you resorted, have you tried resorting to labeling? Or is the whole idea of this to be label

Pilar: 14:13

free? Yes. The problem, no, so you can do labeling. The problem is that if you have something that's optical, like fluorescence or something, the resolution is way larger than basically the image that you take. So, so we're beginning with electrons and so we can see high resolute, like higher resolution things. But then if you have like a fluorescence probe, it tells you, ah, you're, it could be within like 100 nanometers. And that's basically the size of where you're searching. Right. Optics doesn't work that much. So, basically and then you have this thing that's frozen, and so, people are trying to do some type of correlative microscopy and stuff, but it's not trivial, and I haven't seen a technique that has, like, now, say, everybody uses, no. You could do like DNA labeling, like you can build like DNA, like origami and try to put them there. But these guys are also very big in comparison to what you're looking for. So it's not trivial.

Milosz: 15:17

And I imagine it's hard to label with nanoparticles or something that would bind nanoparticles, right?

Pilar: 15:22

Especially in a cell. Yes, but people are trying. Yes, people

Milosz: 15:26

are trying. Yes, yes. And by the way, is the, you're saying it's a thick layer of ice. Is the thickness of the layer sort of technical? Limitation?

Pilar: 15:37

Yes. So, so this would be a thick layer of basically frozen cell. And thick is like, so actually there's a lot of technical developments on trying to make this as thin as possible without breaking it or damaging it. And now the normal, like, thickness of the, these slices of cells is around 200 to 100 nanometers. So it's not gigantic, but still, because it's a crowded environment, it makes it very difficult. Yeah, it

Milosz: 16:08

can be many, many molecules stacked upon each other. So I can see how this is a problem. But at the same time, I think seeing how much cryo EM itself progressed in the last, say, 20 years, I think maybe in 20 years we'll be really counting molecules in cross sections of cells. That would be exciting. I think that's something definitely to be excited about in the near future. So yeah, changing the topics a bit. How do you think the way we're doing science is evolving or how it should be evolving? What, what do you think we could improve in that regard?

Pilar: 16:46

So my worry is that, so there's two main worries. So one, the first one is that things get more attention in flash social media ads, than really like going deep and digging deep into a paper. And I think that worries me because I also, So, I don't know if this is true, but my worry is that this will be cached by the journals. So the journals will, like, say, ah, this person has a lot of followers, gets a lot of attention in social media. And so probably it's good to have him or her in their papers in, in this journal. So I'm worried that everything is being much more like superficial with all the, like, social media stuff. And that science It's not really dug into and really deep. So I

Milosz: 17:40

worry about that. Funding policies also go in that direction, right? Yes. Funding the things that are flashy and that get a lot of media attention. Yes. So yeah, I've heard concerns about that, that people from fields that are getting a lot of this media attention are actually worried that this disrupts the faith in the field because There's a lot of attention and then there are retractions or things like that. Yes. That eventually undermine the public trust in science. So something was an, another miraculous drug or technology breakthrough. And then a couple of months later, it turns out to be a hoax or to be a fabrication just to attract more, more attention. Yes. I can see that we should be. Definitely

Pilar: 18:27

wary of that. I'm, I'm also, another thing that worries me is the amount of information that's out there. So it's like so many publications that say like, normal people get, get out, it's, so like, one of the questions that I have is, should we just publish less? Like should we just publish, say every group only can only publish one paper a year. And that's it. And then, and then you have to put all of your year's advances in that paper. I think that might be better. I

Milosz: 18:56

don't know. That's a radical proposal, because on the other extreme, I hear a lot from people who say, Oh, you should publish negative results and all those things. And it kind of goes against this. I mean, I have no idea which strategy would be. Best. I think we should have like, perhaps we should have tiers of publications that there should be, I don't know, technical reports or preprints or comments. Yes.

Pilar: 19:23

That would, for example, report those negative things. Yes, yes,

Milosz: 19:27

yes. And maybe there should be actual articles that, as you say capture the most important takeaways and things that are validated and pretty clear yeah, I wonder if we have the infrastructure to kind of foster this sort of publishing. Well, at least with the advent of preprints, we have much more freedom to kind of self sort into actual papers and maybe things that are more work in progress or more technical

Pilar: 19:55

papers. Yes, yes. I think, I think the archive is great, but, but my, but I have to confess, I don't have time to read, read all the papers that I would like. my feeling is that sometimes it's like too much.

Milosz: 20:08

Yes, I agree with that. I definitely don't have a feeling I'm catching up with the field. But I also wonder if this can be somehow remediated by the AI models that will, aggregate or maybe point you to important articles. That's if you ask a question in regular human language. There are already tools out there that can take your question and to point you to, to interesting papers or even comments or solutions of the question or I'm thinking if we can get to a point where there will be some sort of collective knowledge encoded in AI models. What do you

Pilar: 20:45

think about it? Ah, I don't know. So I'm a little bit worried because of the So it's you get filtered your information, and then sometimes you lose stuff. So that's my worry. So I, so it's like the balance of like how much can I have that is good and reasonable and the best, best information, access to the information that I can have. But yeah, the worry is that you might lose things or filtered out or

Milosz: 21:16

yeah. So maybe it should be kind of competitive or definitely not a monopolistic model. We should not rely on, I don't know, Google to do all the work. I think it also comes from the fact that, you know, back in the day, very few people were actually contributing to science in terms of the global population of the countries that had easy access to institutes, universities, research agendas. So I think as the world becomes more equal, there's a good component to that because actually more people are participating in science.

Pilar: 21:48

Yes, yes, yes. And then we

21:50

have

Milosz: 21:50

the absolutely technical problem of how to deal with that.

Pilar: 21:53

Yes, I agree. I agree.

Milosz: 21:56

So, yeah, I think we have to think hard. I know that people tend to think that publishing models and communicating are not real scientific issues, but I think they are very much at the heart of our practices and how science will be not only perceived but also effective, right?

Pilar: 22:16

Yes. Yes.

Milosz: 22:18

And then you also mentioned something along similar lines in terms of coding.

Pilar: 22:25

Yes. So, so, so one of the questions that you asked me was what do I think is like undervalued in science? And so I was thinking that having a very good functional state of the art to code, say, for example, doing molecular dynamic simulations or doing cryo EM. So is, and I think that's undervalued. I think it's undervalued in terms of like publications because basically if you publish, so, okay, the first version of the code you can publish in a very, like, good journal. But then say version 5, Not really that much. And keep maintaining it. It requires tons of human power updating it. Like all of the, you need a really a team of people that are software engineers that can take care of your code. And I think that that in science is not so much appreciated. Like you can't, you can't go to a university and say, okay, I'm going to apply for tenure track. And my objective is to maintain this code. I think that's rare. But we need good codes. We need good codes to do our simulations, to analyze our data. Everything is codes nowadays. And so I think it should be more appreciated.

Milosz: 23:46

Yeah, that's very true. I'm thinking of I often talk to people from GROMACS and I'm really impressed by how they managed to secure the funding to have the right mix of people who are developers, GPU experts. actual biophysicists who know what methods have to be implemented. And that's, that's a huge, huge collaborative project that has been going on for many years and it's not very frequent. Although I'm really, I really think that projects like this should be just like they do it. So long standing, longterm projects that create a sort of standard in the community, right? Because there's also the danger of. Having a lot of small codes that will have each one their own bugs. Exactly. Each

Pilar: 24:29

one their maintenance issues. The code that only this, you and three others understand. And the other one just, then there's another code that does very similar stuff. Yes, I

Milosz: 24:39

agree. And you have to actually write to the developer to understand the reason why this error is coming up. Yes, we've all been there. But I think, well, I think at least the popularization of GitHub, GitLab, has made it so much easier to maintain code, because I don't know about your experience, but I'm often so frustrated by finding out that there is a web server that was released five years ago, and you try to go to the website and it's not there anymore, and the code is nowhere to be found. Yes so at least, I think these days it's better in the sense that at least you can download the codes. Maybe you can even download, I don't know, Python version 2. 3 on a 32 bit machine and still make it work somehow.

Pilar: 25:25

Yes. No, I agree. I think GitHub, GitHub is great. Yes.

Milosz: 25:29

Yeah. But that's also the human side, as you say. So people who have to keep things working and. I know that for many labs it has been a funding issue, as you say. So, you get funding for two years to develop the next version, but then the funding goes away and nobody is there to take care of it. So, I don't know, I think I've been following the European legislation at least, and they try to identify those long standing collaborations that have to be maintained. I don't know if the US also has some explicit plans to, do that.

Pilar: 26:02

Yes, I think there are some grants that are specialized for code development and code maintenance, but it's not trivial to get these, yeah.

Milosz: 26:13

Okay, yeah, so again, mostly affecting the biggest labs. Yes with the biggest impact, which is, well, still good, but not perfect. No, I agree, I do my own code development in my free time. And I always think my code is under tested under documented So, thank you so much for your time, for your insights and comments. It

Pilar: 26:37

was great speaking to you. Thanks for agreeing

Milosz: 26:40

to be one of the earliest guests on the Phase Space Invaders podcast.

Pilar: 26:45

Okay. Thank you.

Milosz: 26:47

Thanks a lot, Pilar. Pilar Cossio. Have a great day and thank you for, joining us.

26:53

Thank you for listening. See you in the next episode of Face Space Invaders.