Phase Space Invaders (ψ)

With the convergence of data, computing power, and new methods, computational biology is at its most exciting moment. At PSI, we're asking the leading researchers in the field to discover where we're headed for, and which exciting pathways will take us there. Whether you're just thinking of starting your research career or have been computing stuff for decades, come and join the conversation!

All Episodes

Phase Space Invaders (ψ)

Episode 29 - Jérôme Hénin: Free energy methods, building useful software, and human learning from biomolecular systems

April 22, 2025 • Miłosz Wieczór • Season 4 • Episode 29

Send us a text

Jerome starts our conversation by reviewing the history of the ABF method and its advantages compared to the main competitors, and connects it to the development of COLVARS, historically very parallel to how the development of the Plumed tool stemmed from the needs of the metadynamics community. We discuss the benefits of graphical interfaces in biomolecular workflows, and touch upon the question of connecting multiple software environments and communities. We then move on to discuss membrane systems and the challenges they pose, both historically and today, and end up on the alchemical side, talking about the latest approaches to alchemical free energy calculations from several exciting angles. Eventually, we agree that regardless of software developments, it's learning and helping others learn to understand molecular systems that's the most rewarding part of the job of a biophysicist.

Milosz: 0:00

Welcome to another episode of Phase Space Invaders. It's the 29th conversation on this podcast, and my guest today is Jerome Henin, research director at CNRS in Paris at the Laboratory of theoretical biochemistry. Early in his career, Jerome worked under Chris Chipot to develop the adaptive biasing force method, one of the main approaches in free energy calculations for biological systems and has spent considerable time working on flavors, extensions, and implementations of a BF. He worked on the COLVARS library that initially provided collective variable definitions and free energy methods for the NAMD ecosystem, now also including a VMD plugin. But the interface has now been expanded to other simulation programs such as Gromacs or LAMPPS. In parallel, Jerome did a lot of impactful applied work On both lipids themselves and transmembrane proteins. So Jerron starts with the history of the ABF methods and its advantages compared to the main competitors and connects it to the development of COLVARS, historically, very parallel to how the development of the Plumed tool stemmed from the needs of the meta dynamics community. We discussed the benefits of graphical interfaces in biomolecular workflows, and touch upon the question of connecting multiple software environments and communities. We then move on to discuss membrane systems and the challenges they pose both historically and today, and end up on the alchemical side, talking about the latest approaches to alchemical free energy calculations from several exciting angles. Eventually we agree that regardless of software developments, it's learning and helping others learn to understand molecule systems. That's the most rewarding part of the job of a biophysicist. So I think we're both equally happy to share this conversation with you, hoping you learn something new. Jérôme Hénin, welcome to the podcast.

Jerome Henin: 2:16

Thank you.

Milosz: 2:17

So Jerome, your name already came up in our conversation with Lucie Delemotte a while back, where she cited you as one of the top experts on free energy methods and enhanced sampling. And myself, I grew up in a world that was dominated by two main approaches at the time. There's umbrella sampling where we keep our system, you know, restrained at various points and kind of extract, and integrate local tendencies and Metadynamics where we systematically kick the system out of its comfort zone to make it explore more diverse configurations. But your main contribution to the field revolves around the now multiple flavors of the adaptive biasing force algorithm or ABF. And so what was the history behind it? What was the initial motivation, uh, to develop it? How does it fit in in this grand landscape as of now?

Jerome Henin: 3:06

So I think that's a very good way to look at the landscape. So, first, ABF and Metadynamics appeared roughly at the same time in 2001. but if you look at, if you say that umbrella sampling localizes the collective variables somewhere and Metadynamics just, you know, pushes them around, uh, a BF takes a third way, which is let the variables evolve. Spontaneously, maybe quote unquote or diffuse. So preserve the diffusive behavior of these variables, but, uh, flat on the barriers along the way. So it's trying to, uh, so this was designed by Andrew Pohorille, who was really a, a physicist and a, had a physicist approach. And so he was, concerned with keeping like equations of motion that behaved as much as possible. As they would in the unbiased system. So it, there was kind of, you know, minimization, minimizing the perturbation, um, that, that was the idea. And so in ABF, you just let you know, let the system diffuse and have the same kind of stochastic dynamics and very smoothly erase the, free energy gradient along the way. So that's, that's the basic idea. In terms of like my own motivation was not very precise because I was just a young grad student, and so I had no background in all that. But I was working with Chris Chipot at the time. He was the mentor. He did a p, he did a postdoc with, Andrew Pohorille. So he got exposed to the very early idea of ABF and offered me to, to work on this new exciting method at the time, and implemented into NAMD, which sounded like fun. So that's how I got started. And then I never, you know, I. I became a believer in the, the strength of this approach and mostly the approach of, working at the level of the free energy gradient. I think that's very powerful because if you look at the free energy gradient, you can have a purely local estimate that's very significant. But if you try to estimate free energies, you need a non-local estimate to get something significant.'cause otherwise it's just, you know, up to a constant additive factor. So any information you get on the local free energy gradient is something that's gained. For good. I guess, something you've learned about the system.

Milosz: 5:21

All right. That's a fair point and a good, distinction between the methods. And then over the years you built multiple flavors of ABF. Right? So just as Metadynamics say started evolving by incorporating kind of well tempered multiple walkers, so did. ABF, can you, can you summarize what are the most successful approaches that you would recommend, for example, or you would use today?

Jerome Henin: 5:45

Yeah, I mean, so I think it went into several directions. one direction was to make it, applicable to more, to more systems and more types of coordinates because, uh, the initial formulation was a bit, uh, demanding in terms of mathematics. So you need, you needed to have, essentially. Have to computer jacobian derivative. So it was kind of based on second derivative calculations, and that was a bit heavy. And there was, I mean, there were other formulations, but, so that was one, one part of the, of the work that led to working in visa's, um, extended system variant. and then another part of the work, uh, really was to get better exploration behavior. And that that was largely, that has occupied, uh, Chris Chipot and his, uh, coworkers with all these combinations of ABF and, different flavors of Metadynamics because Metadynamics has this property of never getting stuck or being very good at getting out of, pitfalls, I guess. Whereas ABF dynamics, if you have a slow relaxing orthogonal coordinate then ABF is trying to estimate a local free energy gradient, but that local estimate is drifting slowly over time as things relax. And because the local, like this estimate is drifting slowly, it's never really up to speed with what the coordinator is currently feeling. And so you get trapped. So that's a property of a PF that I think is kind of a, you know, a mixed, a double-edged sword or maybe a blessing in disguise because. Sure your dynamics is getting trapped, but it's a very useful diagnosis. It tells you that your fragile estimate is somehow incorrect or slowly converging. So what's what I really, what I've always appreciated about ABF is that ABF will not give you con confidently a bogus answer. Whereas, you know, if you, you could say if you run umbrella sampling, umbrella sampling always gives you an answer and then you need to, you need to dig. Uh, if you want to make sure that it's significant, so that's something that ABF, the, the quality of the answer is directly visible in the ABF dynamics. You know, if the ABF dynamics is ly diffusive and you get something, uh, more or less flat, then you know that your answer is correct. I

Milosz: 8:02

Oh yeah. If you spend enough time on any simulations forum, you'll see quite a lot of un plausible

Jerome Henin: 8:09

mm-hmm.

Milosz: 8:10

that people get from umbrella sampling.

Jerome Henin: 8:11

Absolutely.

Milosz: 8:12

That's definitely true. There's always something that's coming out.

Jerome Henin: 8:16

Milosz: 8:16

No, this kind of self consistency is, a nice property. I can see how that helps that to, to like know that you're not getting what you

Jerome Henin: 8:25

mm-hmm.

Milosz: 8:25

getting. And so, What are, the common pitfalls that you, kind of, you already mentioned that, of course getting stuck is one common problem. Right. But especially when you go to the multidimensional, aspect of it. Right. Um,

Jerome Henin: 8:39

That's right.

Milosz: 8:40

We kind of, we have a very simplified understanding of multidimensional free energies, I believe,

Jerome Henin: 8:47

Mm-hmm.

Milosz: 8:47

and probably you can have a much more refined narrative about it. You can share.

Jerome Henin: 8:53

Yeah. So the assumption when you do these, when you do one of these methods that enhance the dynamics in a low dimension space, is that, so I guess you just start from the perspective of configuration sampling. You know, the idea is that you have a very high dimension space, but the areas, the relevant areas, the regions of high probability are located in a low dimension sub manifold. And so you said you're saying, okay, we can localize our simulations, our simulation is going to evolve in this low dimension manifold. Um, but then if you want your enhanced sampling dynamics. To work well, that means that the dynamics has to be, that this, the, you have to parameterize this though dimension manifold with coordinates that will also reproduce the dynamics well. So in, in other words, you need a good description, a good projected dynamics. So that's something that I got interested in recently. Mostly, uh, uh, talking to Fabio Pietrucci whose work is, I mean, part of his work is. Fitting low dimension dynamical models like, low dimension Langevin equation to the projected dy. So like collective variable dynamics from a high dimension simulation. And that's a tricky thing to do, but when it works, it's very rewarding because then not only do you have a convincing, you know, statistical model, you have a convincing low dimension dynamical model. I think that's the ideal setting. So of course, that requires two things that there is an effective, you know, that the, actual dynamics is. Actually well described in a low dimension space and that you can find that space and parameterize it properly. So that opens the whole Pandora's box of collective variable discovery, which is a field in itself. And it's funny'cause I, you know, many people work in that field and there is, you know, I don't know, it's a huge, literature and I haven't done a lot of work myself. And I, I tend to work like downstream from that. Like, once you have collective variables, then I provide tools. To work with them. Just because there is so much work. I'm not sure what, in what way I could contribute at this point, you know, and everybody has to find their own little niche. Um

Milosz: 11:03

because we often just look at the system and we think we know what the collective variable should be, right? Okay. We take this distance that's changing. but then, yeah, I, I, I've been in this situation so many times where you, you think you're simulating something and then you look at a simulation, you realize, okay, the simulation is clearly doing something that I didn't expect, right? Because.

Jerome Henin: 11:20

Absolutely.

Milosz: 11:21

You find yourself in a very different region of the phase space or the configuration space that you were imagining in your head. So how close are we to like automating our, maybe improving these things in a robust way?

Jerome Henin: 11:36

Yeah. Right. My first question would be like, how much do we want to automate them? Uh, I think this first step where you say like, oh, I think this distance must be changing. It's an important step. And I think it it means that you, that there is a scientist somewhere looking at the system and having, developing ideas and, you know, mental models of what's happening. So, uh, I like that. And that's, what I talked about in this paper from a couple years ago. The, uh. Who style is like human learning of collective variables or something like that, you know, that's as opposed to machine learning. So I think we need humans to learn. Otherwise, I'm not sure what we're useful for. So in this case, okay, you have an idea about some collective variables. You run simulation and then something happens that you didn't expect, to meet. That's good news, right? When something happens, you didn't expect, it means you're learning something. That's, that's the kind of the point. But of course there's a tedious process then of adjusting. Your computational model, uh, with what you've learned, and then running again, maybe learning something new and everything. And so the whole iterative loop of refining your collective variable choice and running more simulations is exactly what I wanted to streamline when I started working on this VMD plugin, this COLVARS dashboard, which is all about working with the data set that you have and exploring freely, but really, I would say freely, but manually, the space of collective variables, to extract maximum insight from it. And then refine your, your own, uh, low dimension model of what's happening.

Milosz: 13:10

Right. We can, we can talk about COLVARS a little bit'cause this is something, as you mentioned before, that kind of parallel to Plumed. Serves a very similar, uh, purpose. Right. And you say it was again, developed roughly at the same time and it's now the default free energy plugin for NAMD. Right. But also has been recently implemented in Gromacs. And what else? Uh,

Jerome Henin: 13:33

Uh, also lamps. So lamps is very good because it, it's really used in different communities of materials, you know, like soft matter. So that really, gives people from. Different communities access to those algorithms. and then recently tinker hp, which implements in particular, um, some polarizable force fields really well. So that's, again, that kind of extends the type of applications you can do with these methods. so, so the spirit of COLVARS and as, as plume really was to, to remedy a situation that was before, which was different codes, implemented different methods, and for each method. Like collective variable based methods. They implemented some collective variables. And so you had this very, very fragmented ecosystem where given the molecular dynamics engine you wanted to use, you had some methods available, and for each method you had some choice of collective variables. But all of that was extremely limited. And so the idea was to make that extremely modular, just defining variables on one side and different biasing algorithms from another, and possibly having a flexible interface, so these codes could be interfaced with different MD engines, and that's, yeah, that's what's achieved by these, these codes.

Milosz: 14:46

Now this had definitely great developments. I had Max Bonomi before, and we

Jerome Henin: 14:49

Mm-hmm.

Milosz: 14:49

about, how Plumed contributes to that goal. And it's also great that you, you can move

Jerome Henin: 14:55

Mm-hmm.

Milosz: 14:56

part of the space of molecular engines, right, like NAMD and uh.

Jerome Henin: 14:59

Yeah. Absolutely.

Milosz: 15:00

and everything can be just plug and play

Jerome Henin: 15:03

Yeah.

Milosz: 15:04

I also told you before, but I want to credit Culvers dashboard with inspiring me to you know, believe that you can actually write a plugin into VMD, uh, that could be useful. so yeah, I'm, I'm really, really happy that it's out. And yeah, actually I wanted to mention this, this kind of utility of graphical interfaces in. What we do, right? Because many of us are people who work with the command line. We are so used to, doing things the hard way and, that's probably a very convenient tool for people who are really, really, well or have a lot of experience with, with command line tools, right? How do we make things easier though, for us by integrating visualization? And by, by using those, uh, tools like VMD to like exactly making sure that things are visual and, accessible.

Jerome Henin: 15:56

Yeah, I think that goes beyond, it's true that it's having a visual interface or a graphical interface makes it more accessible to people with, essentially it gives her a smoother learning curve and makes it more accessible to more people. But sometimes even with like, no matter how experienced you are. Some tasks are just easier with, in an interactive way, in a graphical way. So the reason I started working on this COLVARS dashboard is that I had been dreaming of a tool that would, you know, I was getting tired of doing these things manually. And, uh, so I wanted, I wanted shortcuts. And those shortcuts took the form of a, graphical interface I imagined. And at some point I got tired of waiting for someone to write it, and I sat down and, and wrote it. And I was real surprised because I wrote a first prototype that was maybe like. I don't know, 200 lines of code. I still have it lying around somewhere. And these 200 lines of code were like just one little window with a few buttons. And it immediately became useful to me on a day-to-day basis. And so I, I, I loved it and I just kept, improving it and extending it until, I wanted to, to, uh, unleash it upon the world. And that's where it becomes really interesting.'cause once you distribute it. So that's another thing, in my work is that I like to write code that is as widely distributed as possible and to get as much feedback from users as possible because, I notice that I, I do much better work when I'm connected with people who use it for real stuff. And sometimes I write code and end up like many years later realizing that. The interface I designed, you know, sounded good in my head, but really it's not helping people or it's not all the best solution, but because I've, you know, I've done something in a disconnected way from users, then it's very suboptimal. So I try to be as connected as possible. That's one thing I learned that to avoid doing too much work on my own, you know, without talking to users.

Milosz: 17:50

Oh yeah. That's great advice. I have been working with, Barry and Diego from the v and d

Jerome Henin: 17:56

Mm-hmm.

Milosz: 17:57

And having their feedback on like what they consider good practices and what users want and how streamlined things should be, was a huge change from

Jerome Henin: 18:07

Mm-hmm.

Milosz: 18:07

exactly. My own mentality there. So like I can definitely this contribution of people who are routinely working with those tools and are connecting with users and, you know, know the user experience side. there's this common question I think, and I've been talking about it recently about, you know, when do you want to start writing your own software versus contributing to someone else's software? And yeah, I think it is the kind of timely question.'cause there are so many packages now that you can contribute to. but then sometimes you want more control over something. I think you struck a right balance where you have a module that is a part of a bigger package. Right.

Jerome Henin: 18:47

Mm-hmm.

Milosz: 18:48

In this way, you have of control over the module that is still useful to people who make it popular and distribute it. And in this way it really has the best of the two worlds.

Jerome Henin: 19:00

Yeah, that's right. And there was, really getting our library module integrated into Gromacs was a very interesting experience and honestly, a kind of difficult one because Gromacs is this very. I mean, recently they've raised their software engineering standards quite a lot, and so they have, a lot of demands, uh, you know, like they have a, uh, well thought out reviewing process and testing process and everything. And so getting coded to Gromacs, like, you know, one does not simply send code into Gromacs. And when you had, you know, we already had this well developed library pretty large and dealing with a lot of constraints from other packages we interact with and having in like interacting with VMD, you know, VMD was at the time limited, you know, we had to use pretty old compilers and old language standards so we could not use the more modern stuff, but Gromacs pretty much demanded the more modern stuff. And so we had to strike, compromise and we had to negotiate a lot of stuff with Gromacs developers. Uh, mostly, fortunately, you know, we had to do a lot of work on the interface, but the library was well separated enough that it was our own turf. And so what happens in the library is basically of no concern because it doesn't, contaminate Gromacs code base. So yeah, that was, it was convenient to have that even, that, even the interface gave us a lot of work. It took several years to get it smooth.

Milosz: 20:25

I see. I still haven't tested it in Gromacs, but, um, yeah, a big shoutout to those guys.'cause I've been to one of those writing code in grom max tutorials, and they really put a lot of effort into. Streamlining the standards

Jerome Henin: 20:38

Mm-hmm.

Milosz: 20:39

them modern and, and kind of explicit.

Jerome Henin: 20:43

Mm-hmm.

Milosz: 20:44

as you say, now we can just take all the standards and start writing code for them, but it's still a lot of review, a lot of

Jerome Henin: 20:50

Yeah. And even though, even though they've, even though they've modernized things, they still have, you know, the cos seal has a long history and so there is a lot of legacy codes and design in there. And so, you know, it is very complex. It's, I mean, all of these major packages are very complex, so it takes, it takes a lot of time to. Let's get used to them.

Milosz: 21:11

Oh yeah. Very often people ask me like, oh, do you write your code for simulations? I'm like, you don't know people you know who are outside of science and just hear that, oh, I

Jerome Henin: 21:18

Yeah,

Milosz: 21:20

Do you write that in Python? And I'm like, you know, that would be, you can write something that runs in Python, but uh, yeah, that's not going to be competitive with

Jerome Henin: 21:30

uh, for sure.

Milosz: 21:32

yeah. So a lot of the work you've done also was, driven by the requirements of the biological systems, right? And for you, these were mostly membrane systems where you have this really slow lateral diffusion and, um, of course you want to enhance the sampling of certain things, binding, unbinding. Um, what were the best or most interesting takeaways that explore there?

Jerome Henin: 21:55

Yeah, so that landscape has changed a lot. When I started as a grad student, the challenges were immense. So as you said, the systems were slow. So the sampling challenges were just, you know, we had big membrane protein and lipids and stuff, and we ran a handful of nanoseconds and hoped that something would happen, right? And it did happen and we did get, you know, some insight on some local processes. But, then the modeling challenges were huge. Uh, the force fields were not that great, especially lipid force fields. And the thing with lipid bilayers is that they're these exquisitely sensitive, like mechanically sensitive because a bilayer exists as a balance of, of lateral forces, and you have very strong repulsion forces and. Attraction forces. So if you look at the pressure profile, you have like positive pressure and negative pressure, and the balance between these, uh, positive and negative pressures gives you a certain area per lipid. And so, and certain bio structure, and that is not just like a, a random, you know, unimportant parameter. Because if you've got, if you change this balance, you might go into a gel phase. So you might get completely different overall behavior. So to get that balance right, you need to get this very large positive number and very large negative number. Exactly right. Because a very small relative error will give you completely broken bilayers. So when I was a grad student, lipid models were a big deal and I started pretty early on working with the CHARMM, family of, you know, lipid force fields, which were and are still like spearheaded by Rich Pastor at NIH and Rich pretty much spent his life making lipid models and that's a level of dedication that's amazing. And I think that has faith because I think the charm, lipid models are they're, they're the best out there in my view. And, and back then, you know, some lipid models were quite bad. Uh, and even the CHARMM force fields didn't have exactly the right area per lipid, so you had to tweak them. But, you know, they took the very difficult, decision of making like ground up models, like using this strong, fragment based approach. So basically the models were not explicitly tuned to get the right area up a lipid. They were tuned to get proper molecular properties and then they were hoping that, they would turn out to have the right area of lipid because the physics would be right. And that was extremely difficult. And, most other force fields just tweak the parameters until they got the era appeared right. But of course the physics was otherwise quite wrong. So anyway, so nowadays we have incredible, lipid models and we can simulate them for a long time. So that's cool, and then the last part of the puzzle was, the membrane proteins, because of course to do membrane protein simulations, you need membrane protein structures. And back then they were few and far between. So a very common sport in membrane protein simulation was doing homology models of everything, right? Everything had to be homology model based on something more or less related. You know, sometimes it was very, very, uh, hazardous. And so

Milosz: 25:00

would fall apart beautifully.

Jerome Henin: 25:01

yeah, they may or, or sometimes they didn't. And I was incredible and I had, honestly, I. I'm quite amazed at how well homology models have held up. You know, in that area. We've done some very sensible things based on homology models, even though some people in the field just did not believe homology models and, you know, never had, especially lab structural biologists. Uh, so, so that was a time when there were two kinds of experimental biologists. Some of them would not believe anything that came from modeling of simulation and others. Who were like, were believers and believed everything that came up of simulations, which was way too much trust to put on us. so I think by now, you know, the field has matured a whole lot and people who do, like anyone who works with molecules has a much better grasp of what simulations can, can do. So,

Milosz: 25:54

Yeah, we have to say that very often those transmembrane regions were actually the most, conserved in the terms of evolution, right? So maybe it wasn't ridiculous or not, even conserved, but conserved in mechanical terms or.

Jerome Henin: 26:07

Yeah,

Milosz: 26:08

Jerome Henin: 26:08

exactly.

Milosz: 26:09

structure, sir?

Jerome Henin: 26:09

That's wrap because the sequences were, so, it's the sequences were a bit more degenerate because you know, you have a smaller Vocabulary of, of amino acids or the, you know, more hydrophobics and stuff.

Milosz: 26:19

Mm-hmm.

Jerome Henin: 26:19

So the se on the sequence level, yeah, you would have not much sequence conservation, but you're right that the folds were pretty well conserved. Um, so that's probably, that was probably what saved us back then that's true. Uh, and I've seen, but in some occasions, like there were some really unfair processes that worked. So I've seen people. Painstakingly come up with very, I mean, with homology models that looked pretty reasonable and tried to publish them and have them, you know, rejected by fancy journals. But then just after that, X-Ray Crystal Structures came out, and as long as you had just modeling predictions, people would ignore them because they were just from modeling. And as soon as they were confirmed by experiments, people would ignore the predictions because we had the experimental results. And so it was like a lose lose situation, that could be unpleasant at times. Yeah.

Milosz: 27:15

Yeah. I see. I think well, the CASP was already running at a time, I guess. Right. But unless you make it somehow formalized that you make a prediction and you make a bet,

Jerome Henin: 27:25

Yeah. Exactly. And

Milosz: 27:25

uh, it's, it's really hard to formalize in this, uh, this

Jerome Henin: 27:29

yeah.

Milosz: 27:30

way.

Jerome Henin: 27:30

And even if you, even if you validate formally, like biologists are not just not going, not going to be interested, you know, they're not, they were not interested in the prediction and before it was validated and they were not interested in it after it was validated either because they had their experimental result. So, and,

Milosz: 27:47

right.

Jerome Henin: 27:48

and you cannot, like, you might be able to argue to command people's agreement based on very like, well. Uh, formulated like scientific reasoning, but you cannot command their interest. You know,

Milosz: 28:00

Yes,

Jerome Henin: 28:00

and at some point it becomes just an emotional issue or something, or not rational.

Milosz: 28:05

fair to say that there was a time where we were not really in the predictive regime, right? Like where we probably had a coin flip chance

Jerome Henin: 28:12

Yeah. We were teetering on the edge of prediction. Yes.

Milosz: 28:18

so this is an attitude that people have to change now because now we're doing, I guess, much better as a community some people are still like catching up to that, uh, as I see,

Jerome Henin: 28:30

Yeah.

Milosz: 28:31

uh, talking to experimentalists. But, uh, yeah, they have the reasons and they might not be the worst reasons.

Jerome Henin: 28:37

But I remember a, a talk I gave about, I don't know, six or seven years ago maybe, about, uh, ligated iron channels, which is a, a, you know, family of. Of proteins I, I did a lot of work on, and basically that talk, it was for a mixed audience of experimental and, uh, computational researchers. And most of the talk was explaining what we predicted and how it turned out to be confirmed by experiments and then how it predicted something else, which again, turned out to be confirmed. There was like a, just a, a series of those things trying to hammer home the message that yes, we could predict stuff. And also that maybe, you know, I'm trying to claim a bit more credit than, than we were given at the time, but,

Milosz: 29:19

Yeah, I mean, I think the best research from our community. Was always somehow predictive, but there is

Jerome Henin: 29:28

Hmm.

Milosz: 29:28

of, quality and people collaborating, uh, at different levels that I guess message might have been different.

Jerome Henin: 29:36

Yeah, that's right.

Milosz: 29:37

then, uh, going back a bit to, the ABF, but also the,

Jerome Henin: 29:40

Mm-hmm. Mm-hmm.

Milosz: 29:42

mentioned. how about the alchemical side? Right? Because now you mentioned that you're kind of converging on. sampling and also alchemical simulations, for those of the listeners who don't know, are simulations where you essentially morph one molecule into another, by creating or destroying dummy atoms and, making this unphysical transformation, right? So now this, so-called lambda coordinate for you is just another coordinate that you can sample dynamically in a simulation.

Jerome Henin: 30:13

That's right. Uh, and in, in our case, in most applications, we're not even morphing a molecule into another, but we're morphing it into, into nothing because we, we use it mostly for absolute binding free energy calculations. So really computing the free energy difference between a molecule interacting with an environment and the same molecule not interacting with it. so to get a binding free energy of a ligand to a macromolecule like a protein, you want to decouple it from the binding site and then decouple it separately from like box solvent. And that comparing these two, free energies. Gives you the, essentially the binding free energy. So yeah, that's a very old method. It's interesting. The first, the first paper that really does this is from 1986 on the, and it's really literally affinity of a small molecule. Really? That was a an inert gas, I think it was Xenon. Yeah. Xenon atom to a protein site. But that was, so, it's almost as old as protein simulations. Not quite, but nearly as old. And I, read that paper again just a couple years ago and I, um, I'm trying to remember the name.

Milosz: 31:28

It wasn't Zwanzig.

Jerome Henin: 31:29

it wasn't Zwanzig. It was, uh it is very confusing because it's by this Indian scientist who published some papers with, at some point there was a mix up between the first and last name. And so some papers have his first name as author name and the others have it. So basically he has, he has papers with under two different names. so Shankar is one of his names. Whatever it is. Um,

Milosz: 31:56

it.

Jerome Henin: 31:56

yeah, I. Yeah. Anyway, that's, and that's pretty amazing because in that paper there is a discussion of many issues that have, that reappeared like a decade later or even two decades later, and that people kept arguing. But really that guy had a pretty solid idea of how the whole thing worked even back then at the very beginning. So, yeah. So sometimes, uh, we think we're doing something new and then we're just not aware of all the old literature that's out there.

Milosz: 32:27

Oh yes. And uh, how predictive are we there now? Because I remember we always struggled to break this say, one kcal/mol barrier, right? Which is considered, say, chemical accuracy or,

Jerome Henin: 32:39

And that's interesting'cause so there's two really, there's two branches to that question. one of them is, how well do these free energy methods work and how, how well can we sample? And so how how low can we get the statistical error bars or the variance of our estimation? And then the second branch is what is the bias of our estimation, which is the bias of our model. So the force field mostly, but not just the force field, maybe the um, other things that are like, hmm,

Milosz: 33:09

the binding,

Jerome Henin: 33:10

yeah, I mean the binding pose and.

Milosz: 33:12

molecules. Yeah,

Jerome Henin: 33:13

In principle, all of that falls under the sampling part. If, if we had, you know, proper infinite sampling, the

Milosz: 33:19

Okay.

Jerome Henin: 33:19

if the binding pose, so that depends if you call the binding pose just a starting point or something you use to define the binding side that you are characterizing. So, or so recently, we've moved on. So that's something the whole binding affinity, prediction. Work is something I've mostly collaborated on with, uh, Grace Brannigan at Rutgers, Camden. and with Grace we have this framework that, that we call SAFEP, uh, streamlined Alchemical free energy perturbation. And in that, really it starts with characterizing a bound state ensemble. So the idea is you need a statistical definition of the bound state. So first you need definition of the bound state to begin with because you can't. It makes no sense to try and quantify something that you haven't defined first in a quantitative manner. So, so really it all begins with running a simulation of the complex you want to correct. Right. Just a plain md, you know, unbiased simulation to see how the complex behaves. Sometimes, you know, depending on your choice of modeling, parameters, force fields and stuff. The ligand might just run away immediately, in which case, maybe there's no point in trying to estimate, binding free energy. So, and that really goes back to a very general principle that I really like. I forgot who told me that first. There was many years ago, and basically it is like the less you know about a given system. If the less complex, the method you should be using on it. So you know, if you don't know anything about a system, start with just plain md. Don't try to do fancy stuff, you know, because you don't know. You have no idea what's going to happen and you will be unable to interpret what's happening. If you do fancy stuff, it can go wrong in so many ways. So do the simple stuff, get a rough idea, like qualitative idea, and then you can move on to more fancy things, right? Fancy methods and, and enhanced sampling and whatnot. So I really like that idea. And I think that makes sense to not try and quantify anything for a system that you don't understand qualitatively. And so again, we go back to the human learning part and, uh, that's why I'm a bit, it's interesting. I wouldn't say I'm skeptical, but you know, you have papers in the ligand binding for energy, um, prediction literature that are, there's a lot of, um. High throughput papers or, you know, papers that, characterize large data sets. And so you have, I dunno, many binding sites or many ligands, and you get big tables with a lot of, uh, figures and they're, I have nothing to say, I have nothing against those papers on a technical level. I think some, you know, they can be really well done and have some pretty solid statistics. But, On the purely, again, like you can't, what I said before, you can't command interest. And so the problem I have with these papers is that I'm not interested, you know, I don't find them fun to read because they're, you know, maybe they're great tables of great data, but I don't find looking at tables of data fun. So, because I don't feel like I'm learning anything from them. So that's, that's the problem I had. So I'd rather, I'd rather do, but again, like, I think they're necessary though. I mean, for many applications and people who do drug design need to be able to do high throughput prediction of affinities and something. So I'm not, claiming they're not

Milosz: 36:35

Jerome Henin: 36:36

important with

Milosz: 36:36

something like. Ablation in the sense of removing of the pipeline, right? And showing that, okay, this part of the pipeline really, really changes the result. That's

Jerome Henin: 36:46

hello.

Milosz: 36:47

But just seeing, I, I agree that just seeing a bunch of numbers

Jerome Henin: 36:51

But yeah, I think,

Milosz: 36:52

error is, is like, yeah, nice.

Jerome Henin: 36:55

I mean, I think it's, it's also a different field. I think I'm a, you know, I'm a physical chemist by training. So when I do these things, I like to see some physical chemistry happening, and to me that really looking at molecular interactions, in detail and with my own eyes. So that's why I always go back to VMD, you know, whenever I have students working with me and learning how to do these quantitative predictions, my first, instruction to them and my last one, I mean, like something I keep telling them sometimes throughout their PhD because some people just won't learn, is that don't like run a simulation. And look at a free energy value or free energy surface or quantity prediction that came out of a simulation without first loading up that simulation into VMD and checking with your own eyes what's happening in there. Because sometimes you can get very interesting quantities and then you're not realizing that something's happening in simulation that has, it's completely different from what you expected. And so you're not looking at what you think you're looking at. So it, so, yeah. So I really try to get, get my students to learn that too. First again, like get a sense of what's happening qualitatively and make sure that it's really what they expect before looking at the numbers.

Milosz: 38:08

Yeah, we always want to know why, and some people always ask, why do we want to know why? But

Jerome Henin: 38:14

Yeah. That's the thing. Yeah.

Milosz: 38:15

of character.

Jerome Henin: 38:16

Yeah. But I, I want, I want to feed my intuition with something. Right. And numbers only go so far. Uh, but for example, I could have. You could be computing these, uh, ligand binding for energies. And in the meantime, you know, your ligand could be exploring a different binding post that you didn't expect. And so you are quantifying something, but it's, it's not what you thought you were. Or maybe, I don't know, or maybe something is happening with water molecules in the side, as you said. Maybe there's a counterion dropping by, you know, who knows? So many things had happened so that, that brings us back to the, the question of sampling in these alchemical simulations, and so for like the, the most, by far the most common way of sampling these, so you have this al parameter usually between zero and one that connects two different states with different potential energy functions. And typically we discretize this and we fix this parameter at intermediate values and we run simulation to intermediate values. And that's what, I'm trying to move away from now this discretization and to keep this continuous and dynamical. And one benefit of this, there's several benefits. One of them is you don't have to pick a particular discretization, which has been a headache in the past because it's really hard to find a good rationale on, uh, you know, what is the best dissertation of this lambda space. And the second part is. the dynamics in Lambda is coupled very tightly coupled to the dynamics of other degrees of freedom. So if you let that freely fluctuating and diffusing, then you are less likely to have meta stability in other degrees of freedom. Or maybe you have these orthogonal barriers, but if you have a dynamical lambda, you can just go around the barrier instead of having to cross it, um, because the bear is. Thethe virus are then cells dependent on the value of Lambda. And that's something that was my main issue with umbrella sampling from the very beginning. In umbrella sampling, because you're localizing coordinate, you're interested in, you're also making many other tightly coupled coordinates, more meta stable. And so we really preventing them from sampling anything, which is why I've always objected to, this word of umbrella sampling.'cause I think umbrella sampling is really like not sampling very much, at least the way it's typically done,

Milosz: 40:32

yes.

Jerome Henin: 40:32

know? and so at some point a student asked me not long ago why it was that, umbrella sampling was so popular and to that I said, well, Maybe it's, it's so popular. in the same sense that a, like a TikTok influencer might be popular. It doesn't mean that they're particularly good at anything in particular.

Milosz: 40:55

Right. It's, uh,

Jerome Henin: 40:56

Just

Milosz: 40:57

as, as we said before, it'll always give you an answer, right?

Jerome Henin: 41:00

Exactly,

Milosz: 41:00

for

Jerome Henin: 41:01

exactly. And it was,

Milosz: 41:02

answer.

Jerome Henin: 41:03

and there, you know, it's been around forever. There's a lot of exposure. And it's also, there is nothing really complicated conceptually about it. So there was also one, one of the benefits of Metadynamics early on is Metadynamics is very easy to understand as principle. So it's then it's easier to, I don't know, to trust. I think if a method that, that,

Milosz: 41:26

And

Jerome Henin: 41:26

that's simple.

Milosz: 41:27

making the connection because we haven't mentioned that. well, the standard methods are probably the most explicit method of calculating free energies alchemical free energies. Right? Is pretty similar to umbrella sampling because it's just sampling the derivative

Jerome Henin: 41:40

Yes, exactly. Yes. Yeah,

Milosz: 41:43

discrete points.

Jerome Henin: 41:44

exactly. So you're fixing Lambda and computing derivative. That's right or maybe either derivative or computing. The final differences between adjacent points. Uh, that's if you're doing something like Mbar. But anyway, it's uh, it's related.

Milosz: 41:58

then, and then there's also this fast switching method, which is based on Crooks' theorem,

Jerome Henin: 42:05

Yeah.

Milosz: 42:06

is another alternative. Again, we have like three strong methods that can compete for.

Jerome Henin: 42:11

Yeah. And that's interesting. I'm

Milosz: 42:13

their own strengths and uh, weaknesses

Jerome Henin: 42:15

absolutely, uh, it's funny'cause I'm getting. Recently I've, I've gotten interested again in these non equilibrium switching methods. I think they have a lot going for them, which is interesting'cause all the, the work I did on ABF was all about being as close to equilibrium as possible and, you know, not inputting arbitrary, work into the system, to minimize bias. But, uh, yeah, I think some, if you do non equilibrium switching, well you can sample a lot of diversity in the trajectories you see. so that's something I've seen with this, a method called, ABMD, adiabatic Bias md, which is also known as Ratchet md, or Ratchet and Pole md. So basically it's these, this method where you have a single progress coordinate and you let it diffuse forward, but you have this harmonic potential that follows it and prevents it from going backward, ever. So what's clever about it is that. It's, you do get pretty rapid reactive trajectories, but you are exploiting the system's own fluctuations, thermal fluctuations, and so you're not directly

Milosz: 43:23

little work

Jerome Henin: 43:24

right and you're not directly imposing the timescale on which transitions happen. So if you do conventional, like Steered md, then you pick a timescale and all your transitions will happen more or less on the 10 timescale, and you may get a bit more lag, in some trajectories than others. But when you do this, this ABMD, dynamics and you have a barrier, you will see crossing times that are very, very different and these different crossing times which correspond to different pathways. So you get a nice exploration of many pathways, in a short time. So I think that that's promising.

Milosz: 43:59

Oh yeah. I can see, I mean, my, from my experience, the more work you put, the harder it is to get overlap between distributions of work. That's,

Jerome Henin: 44:07

Exactly.

Milosz: 44:07

it was the

Jerome Henin: 44:08

Yeah.

Milosz: 44:08

concern that I have seen in those fast switching simulations. So

Jerome Henin: 44:12

Yeah.

Milosz: 44:12

something to explore. You're right.

Jerome Henin: 44:14

Yeah.

Milosz: 44:15

But again, going back to this question of, human learning, I like the idea that we want to. Learn from every experience.'cause yeah, if you think about the way I know Feinman, uh, was approaching science, right? He was reading the title a paper and trying to guess the conclusion.

Jerome Henin: 44:33

Hmm.

Milosz: 44:33

And I think we all have this, we all try to get trained in this approach where like we look at the project, we try to guess what's happening, and then if we don't see what we guess should be happening, we're really, really interested in in that. Right.

Jerome Henin: 44:45

Mm-hmm.

Milosz: 44:46

As you say, this visual inspection and, all these visualization tools are helping with that.'cause, uh, it's a long, long journey from, an undergrad who's doing their first steps to someone who can really, really look at a c and predict what's going to

Jerome Henin: 45:01

Mm-hmm.

Milosz: 45:02

particular scenario.

Jerome Henin: 45:03

Absolutely.

Milosz: 45:05

so yeah, we should invest in human learning in parallel to, to machine learning.

Jerome Henin: 45:09

And it really as.

Milosz: 45:10

I agree there.

Jerome Henin: 45:11

you know, as I, as I grow older, I find more and more sometime, you know, the, I find that the most rewarding part of the work is mentoring young scientists. point, you know, it's really part of, uh, I don't know, getting, helping people grow. yeah, I find that very rewarding. It's, and, and, you know, helping computers get better numbers is fun, but it. You get tired of it eventually, so

Milosz: 45:40

Absolutely. Okay. Thank you so much for the conversation. Jerome Henin

Jerome Henin: 45:45

Great pleasure. Thank you.

Milosz: 45:46

your time and your expertise and the insights. Thanks so much.

Jerome Henin: 45:50

Thank you.

Milosz: 45:51

And have a great day.

Jerome Henin: 45:52

You too.

45:54

Thank you for listening. See you in the next episode of Phace Space Invaders.