
The Translational Mixer
The Translational Mixer
Episode 6: Veronique Kiermer on open science and a White Negroni
Veronique Kiermer, Chief Scientific Officer and Executive Editor at the Public Library of Science, talks about the myriad ways in which open science is changing the face of research and some of the challenges it poses for AI and the translational arena.
01:55 What is open science?
03:55 What are barriers to openness?
07:28 Early adopters
10:30 Open challenges for AI
11:35 Registered reports and publication bias
14:20 PLOS’ priorities for open science
18:40 The Open Science Village beyond data access and sharing
24:25 Reproducibility and reuse in drug research
27:30 Can biotech companies be as open as pharma?
29:44 Pre-competitive consortia for rare disease
32:14 Moving the needle
38:00 Professional data curators?
39:53 Opening science around the world
41:05 COVID-19, infectious disease and open science
45:34 Veronique’s favorite tipple
The White Negroni
1 Oz gin
1 Oz Lillet Blanc
1 Oz Suze
DIRECTIONS:
Add ingredients to a mixing glass and stir over ice for 45 seconds. Strain over fresh ice in a rocks glass and garnish with a lemon peel.
Sources mentioned in the podcast
Mehra, MR et al. RETRACTED: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet (May 22, 2020) https://doi.org/10.1016/S0140-6736(20)31180-6).
AlphaFold3—why did Nature publish it without its code? Nature 629, 728 (2024). Good question!
Abramson J et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature (8 May 2024).
Promoting reproducibility with registered reports. Nat Hum Behav 1, 0034 (2017). https://doi.org/10.1038/s41562-016-0034
The Yale University Open Data Access (YODA) Project at the Center for Outcomes Research and Evaluation advocates for the responsible sharing of clinical research data
All Trials (https://www.alltrials.net/news/)
Gordon, D.E., Jang, G.M., Bouhaddou, M. et al. A SARS-CoV-2 protein interaction map reveals targets for drug repurposing. Nature 583, 459–468 (2020).
Nature’s podcast on Registered Reports: Nature's Take: Can Registered Reports help tackle publication bias?
The Mixer music “Pour Me Another” courtesy of Smooth Moves!
01:55 What is open science?
03:55 What are barriers to openness?
07:28 Early adopters
10:30 Open challenges for AI
11:35 Registered reports and publication bias
14:20 PLOS’ priorities for open science
18:40 The Open Science Village beyond data access and sharing
24:25 Reproducibility and reuse in drug research
27:30 Can biotech companies be as open as pharma?
29:44 Pre-competitive consortia
32:14 Moving the needle
38:00 Professional data curators?
39:53 Opening science around the world
41:05 COVID-19, infectious disease and open science
45:34 Veronique’s favorite tipple
Andy Marshall: Hello everyone, my name is Andy Marshall. Welcome to The Mixer. I’m here with my good friend Mr Juan Carlos Lopez. How are you today JC?
Juan Carlos Lopez: Very good, Andy. Very happy to be here.
Andy: Great. So what's on the menu today?
JC: Today, Andy, we have a special treat. We have our friend, Veronique Kiermer, who is Executive Editor at the Public Library of Science. And you and I have both cortical and limbic reasons to be happy about having Veronique on the show. Cortical reasons because she has been instrumental in advancing open science from her position at the Public Library of Science. As you know, that organization for a long time has been advocating first for open access to the scientific literature and more recently for open science and all the things that we're going to be hearing about today. And we also have limbic reasons because she is a very good friend from our time at Nature. She was the launch editor of Nature Methods. We've known her since then and she's been a great friend of both of us during all these years. So I'm looking forward to having her on the show and learn about what's going on in the world of open science.
Andy: Yeah, I think the drive for openness in science has really been gaining prominence in recent years. And certainly, I think, a younger generation of scientists are really embracing some of these principles. So, I'm really looking forward to delving a little deeper into some of the different elements of research and development that this is having an impact on. So let's get started.
JC: Let's go.
01:55 What is open science?
Andy: Veronique, thanks for coming on to the podcast today. We're delighted you agreed to come on and talk to us about open science. So can we start off by you telling us a little bit about what open science is?
Veronique Kiermer: Yes, well happy to be here. Thanks for having me. There isn't a single definition of open science. I like to think about open science as everything that can be done to make the research process more open and transparent. And that includes making what we tend to think of as intermediary outputs of research available — data, code, protocols, as well as a lot of details about experimental methodologies, analytical methodologies and things like that. And that's important to make the research process actually more collaborative so that others can actually build upon the research that is being communicated more inclusive if you think about people who might have great ideas but maybe less means to participate or generate large data sets and things like that. And also it's a process that allows to accelerate discovery because it prevents duplication, it can prevent going down alleys that are non-productive and it can really ultimately lead to more reliable results because results are more open, more open to scrutiny and there can be more eyes on the results and how they are achieved.
Andy: It's kind of accessibility. It's transparency. Equity? Does that come into it?
Veronique: Absolutely. It's about equity in access to knowledge that is generated, but also equity in participating in knowledge creation, if you want. So, yes, having a much more diverse set of participants in research and having more access to the results of research and ultimately it's to benefit society.
03:55 What are barriers to openness?
JC: Veronique, there has been a lot of progress in open science, but I'm sure you'd agree that we're not there yet. So I'm wondering, what are the barriers that you think prevent the community from embracing open science? You know, I can think of a few; issues related to intellectual property, for example; technological, it may not be technically easy to share all the stuff that we would like to share; curation, so that the data meet certain standards to be useful. Would you agree with these concerns or you think there are other barriers? And if you think there are other barriers, which are the main ones?
Veronique: All of these factors are potential obstacles. Actually, to practice open science takes time and effort. Sharing your data, your code, your protocols so that they're useful to others actually takes an extra effort. These are efforts that are not necessarily valued in the current system of academic incentives and so on. There is also the question of access to infrastructure. Guidance about how to share data, where to share it to make it most useful. There is the need to add value to data. And then there are very legitimate concerns about certain types of data, human data in particular, about privacy issues and things like that. And then there are cultural issues as well. We see that open science is practiced very differently in different disciplines, for example. And I think that this really comes from the norms that are in play in the field. If you look at data sharing in particular, you see more data sharing in biology than in medicine and pharma, for example; more in political science than in sociology, right? And so you have these differences that are not necessarily explainable by a very concrete thing about the nature of the data, but that really comes from the norms that are in play in the field…cultural factors. And there is also environments that are more conducive to that, right? Institutions that have more resources, more librarians that really dedicate themselves to support the researchers in sharing practices and things like that. And that is very inconsistent in different places. So there's a whole host of potential obstacles from the practicalities of sharing data to the norms and the environment and the culture and to the fact that ultimately it's not necessarily something that is rewarded in the current system and therefore it's time and effort that you spend on something that you wouldn't spend on something else and so you have that competition for researchers’ time and efforts. So I think all of these things remain obstacles to having more open science.
On the other hand, if you look at motivations, you know, what motivates people to share their outputs, for example, their data? And there really are three categories. One is policies. If I'm required by my funder or my institution or the journal in which I publish to share my data, that's one of the main drivers. The other one is much more an alignment with personal values and personal commitments, right? And you see, actually much more openness towards sharing in the younger generation. There is quite a bit of divide there. And then the third one is, I'm motivated to share because there is a benefit for me. And that can be a citation advantage, for example, which has been shown for data sharing. Or it's to give my research more impact and visibility. And so these are the big motivations that are there and the obstacles are still very much in play.
07:28 Early adopters
Andy: I'm interested that some fields adopt open science, open data sharing, open publication much more than other fields. So can we talk a bit about some of the kind of early adopter fields? Because I mean one thing that seems to really strike me is how I think probably back from the 90s with the Human Genome Project and genetics you had, you know, the Bermuda Principles in the 1990s. And since then, there's been many, many kind of different databases built by both by kind of, you know, the NIH, EBI. And also, you know, you've had things like the Global Alliance for Genomics and Health (sic), I think it's called. So that's genetics. And then I remember with the kind of pre-registration trials, and we should talk a little bit about them and explain them for our listener. That really got kicked off by the field of psychology and psychiatry. That whole field was really into that. So could you talk a little bit about which fields have been the early adopters?
Veronique: Yeah, and you’re right that sort of different practices find early adopters in different fields, right? So, sharing data, I mean, genetics and genomics has really been the leaders in that, in the biomedical field, of course, right? You have physics and astronomy and all that is yet a different way of working. But actually, there are some parallels there because I think that with genomics, what started all that was the technology advantage, right? It was the ability to sequence and the ability to... get these data on a much higher throughput than before. And what you saw in that field is a field coming together and establishing new norms of working because they had a new technology. And these norms were about sharing, but they were also about the standards of how you share, where you share, what information you report with your data. They were about benchmarks and protocols for data quality, and things like that. So there was this whole infrastructure that was developed around that, as well as norms that the fields start sort of self-policing around that, right?
If you weren't sharing your data as a genomicist in the 90s, you were not in a good place, right? And similarly, if you took advantage of data that was shared by someone else without giving credit, that wasn't seen proper. And so all this etiquette of the field and the norms that were established really helped push that forward. And obviously that's a field where the technology has kept a pace and has really accelerated and then it became impossible to not rely on these very large data sources. And so you have really this high quality data that is the main resource of that field and everybody sort of contributes to that and uses it. And so I think that's a really nice example of a field coming together around norms to make that feasible.
10:30 Open challenges for AI
Veronique: Another example, before we go into the (trial) registration, what we're seeing now, the fields of AI machine learning very much committed to open source code, for example. And the commitment is about sharing, but it's about reproducing/reporting normative protocols, why you share in certain repositories and everybody is there, you credit. You also have, you know, the field comes together to have benchmarks for assessing the performance of tools and things like that. So the community comes around these resources and competes to beat the benchmark. You have a different way of working because you have these standards that are established.
And when you talk to AI and machine learning experts, they're actually very frustrated that they're not seeing the experimentalists adopt the same thing. So they're lacking high-quality data sets to train their tools on. They're very much open from the software and code perspective, but not from the, they're lacking this, these high-quality data sets, which goes to show that it's not all open science practice in the field. It's a very specific thing.
11:35 Registered reports and publication bias
The pre-registration movement that you've seen in psychology in particular, that really came from a reckoning in the field about the lack of reliability and reproducibility of a lot of the results that were published. And this unreliability and these problems were traced down to poor practices of analysis and things like that, that were related to not controlling experimental biases well enough. And so the answer to that was something that has been used in clinical research for a long time, which is basically registering the protocol of your study before the data collection starts, right? And having that protocol registered, including how you're going to decide what data is in the node, your exclusion criteria, how you're going to perform your analysis and things like that. And having described that upfront, you’re then, in a better place to mitigate against biases because you're not tempted to do things like p -hacking or hypothesizing after the results are unknown, which is another big problem that was happening in that field. And so that's really sort of a response to a field to an issue that the field was having with really losing trust in a lot of the published results.
And that is taking place in psychology a lot. Another very positive implication of preregistration is that in that field, the notion of a registered report emerged, and that is related to the publishing process, right? In the sense that you pre-register your study, your study protocol, before you start collecting your data, and you submit it to peer review at that point. And so the journal evaluates the quality of the question, the quality of the methodology that is going to be deployed to answer that question, the quality of the analysis, the control of biases and all that. And you have an opportunity to revise your study plan at that point. But if the study plan is accepted by the journal, you then get an acceptance in principle. As long as you follow that protocol, if your results are positive or negative, whether they confirm your hypothesis or not, the journal will publish your paper. And so that is a commitment very early in the process that even if you end up with negative results, you have an opportunity to publish. And that is a very effective tool to combat publication bias. And there's been studies done in psychology where you see that actually the proportion of negative results since the introduction of registered reports is much higher than in the general literature. That's a very good incentive against publication bias, which is a massive problem. And we could do with more of that in preclinical research, for example.
14:20 PLOS’ priorities for open science
JC: Your organization, the Public Library of Science, you have been pioneers in open science. And the original driver for this quarter of a century ago was open access to the scientific literature, but clearly we have come a long way from that. So, from all these areas that you've been touching upon, which are the current priorities at the Public Library of Science? What are you focusing your efforts on?
Veronique: You're right that we very much started with the focus on open access to the published literature, and I think open access has now gathered a lot of steam, and we're really... seeing a lot of progress across the publishing landscape. We are now focused on really trying to promote open science more. And we do this in different ways. We have a number of interventions and experiments really that go to address all the obstacles that you were mentioning originally at the beginning of this conversation. We're deploying things to try to make it easier to share your data, share your code and so on. We’re integrating repositories within the submission, the journal submission process, for example. We are trying to make it more rewarding to do that. So for example, we are signaling when there is reusable data that is shared in an article so that we can drive traffic to these data, we can drive traffic to the article, we make it more visible. And that usually results in citation advantages and things like that. And then we also have policies where editorial boards come together and decide that for their field and community, it is time to move to something that is required. Across all of our journals, we have a data-sharing requirement, which is that at the time of publication, your data must be made available, taking into account some legitimate exceptions around privacy and legal issues. But, you know, with some journals, for example, that I've started, mandating co-chairing as well, because in their community they see that the community has matured to that point and it's really important.
So we're really working with the communities to try to see where, you know, where is the appetite, where is the readiness, and sort of the policy is usually very helpful when you already have quite a bit of adoption and you want to push it to the next level and you get sort of the laggard and the people who are resisting. That is the most effective tool in other places where the readiness is not there. There is maybe appetite, but it's very difficult. Then it's more about the tools to facilitate it and make it more rewarding, make it much more visible as well. The other things that we are trying to do is trying to make all these contributions that are important to make data, code, protocols available, visible, so that people who contribute to that within a team of researchers actually get credit specifically for these kinds of contributions. So that's another thing that I'm fairly passionate about is really trying to make sure that these diverse contributions get surfaced, get recognized, end up normalizing sort of this type of things.
And then the other thing that we're doing as well, because we're trying all these new interventions, and some things work, some things don't work as well. We've developed tools to look at a corpus of literature and identify sharing practices in there. So which proportion of this corpus of articles actually has shared data? Which proportion has generated code and which proportion of that has shared their code? Similarly, we have an indicator coming for protocols. We're going to start looking at registration as well. We have established this registered report workflow that I was talking about earlier on different journals. We currently have a trial going on with Cancer Research UK where the evaluation of the protocol by the journal happens in parallel with the evaluation of the grant by the funder. And so if you're successful as a researcher you might get money to do the experiments as well as an acceptance in principle of the paper. And so we want to look at the quality of these types of articles where actually there has been no desire to adapt the results to the editorial criteria of a journal because that has already been decided early on. So that's another thing we're doing.
18:40 The Open Science Village beyond data access and sharing
Andy: Veronique, it seems like some parts of open science are further along than others, yeah? So in some ways, open access has kind of got all of the headlines and also seems to be further along the path, whereas data sharing, it's very kind of spotty. Some fields are doing it, other fields aren't. Then you were talking about kind of protocols and code, Computational biology and you know machine learning very much kind of pro talk towards that, but then if you're going through experimentalists and how they use algorithms and code like sometimes It's difficult to convince them that they need to go through all the hoops of sharing their code. And then you've just been talking about these preregistered reports, which again the few fields kind of have adopted them and they're kind of slowly creeping, that's my impression anyway, into these other fields — like Nature Human Behavior for instance — they started to use these as tools as well. What's your feeling as to where each of these aspects of open science are?
Veronique: You're right that there are different places, but they're also very... very intricately linked. Like if you look at funded policies, for example, there is a lot now, a lot of traction on open access. There's a lot of traction on open data. These are the things that are really at the forefront of like the policy landscape. But data without very detailed methodology about how the data was generated is not necessarily that helpful. Data without the code to analyze the data is not necessarily that helpful. So you really need to be able to move these different levers at the same time to really realize the benefits of open science. And I think that that's the challenge and that's the multi -pronged approach that we need to have in those areas.
Another way that I find helpful to think about these challenges is really to think in terms of what are we trying to achieve. And ultimately, the benefits of open science is really a society that benefits from scientific advancement more quickly and more openly and with more equity. I think that that's the prize at the end of the chain, but what is the first step in that direction and how do you prioritize these first steps? And I think it's very important to think, we have a tendency to try to boil the ocean and want to apply a blanket policy to everything. And sometimes that works, but it's also very, very difficult because there are all of these obstacles and challenges and cultural issues and things.
So what I find helpful is to think about where do these practices really add value and why and how. And I think that that's helpful because if you think about the value of open science in an academic setting or at the interface between academia and pharma, for example, I see the benefits of that really along two categories. One is transparency, reliability; the ability to interrogate the results, the ability to really look at the data to get a sense that you don't get from seeing the summary data in a figure in an article. That's one aspect. And then the other aspect is really the data that turns into a resource for the community to reuse, for other researchers to reanalyze in different ways, so a more continuous resource. And I think that the treatment that you give to the data sets, for example, that have these two potentials is slightly different.
When it's just about transparency, an excel spreadsheet in supplementary information is not a bad thing, right? It's just like it's probably all you need for that, right? A protocol that is a word document with a link with a link. It's not a bad, it's not necessarily bad for that transparency purpose. It ticks the box. It doesn't make it reusable, right? But not everything needs to be reusable, because not everything has reuse value.
So really trying to see where the where the efforts are really going to have more return on investment, if you want. That's really a way of starting to make the case for these things.
And if you think about pre-registration, the area of preclinical research, which is just before you go into drug discovery, for example, to me is a very important area. If you're talking to biologists and saying you're going to have to pre -register every experiment you do in your lab, they look at you like you have three heads, and they laugh you out of the room, right? This is never going to happen.
Now, if you have a preclinical animal study to test a drug in a certain condition in an animal model and that you are hoping that this is actually going to get you into clinical development, that's a completely different situation. And that's a place where pre-registration, checking for biases, making sure that you have randomization, blinding, all of these things done really correctly, that's where it's really important. And that's where we need to see the negative results, right? There is a huge issue about publication bias because we don't see these negative results. And then it goes into clinical development on the basis of what's published.
And what's published is only a small fraction of the universe of experiments that have been done on this particular drug in this particular model. So that's, for example, an area where I think you can use fully focus resources for much more stakeholders like from the publishers, from the funders, from the institutions to really try to implement a change that is going to have a real life value to it.
And then you start normalizing that practice and then eventually you bring it more upstream in the discovery process.
24:25 Reproducibility and reuse in drug research
JC: Our audience, as you know, is translational scientists and the people who are interested in drug discovery. Do you think this comes to mind? Are they ahead of the curve in terms of open science or are they playing catch up? As you know there are a lot of drug discovery data that are proprietary and must remain confidential, but at the same time there have been very important public -private partnerships and pre-competitive consortia and the spirit of these consortia aligns very well with open science. So I'm wondering what your thoughts are on the translational community and open science?
Veronique: It's a community that has actually moved quite a bit. And as you say, I think pharma has been a big driver on that. All these pre-competitive agreements on sharing, you actually have quite a bit of data sharing from pharma. You also have had investment in the ability to share more sensitive data and things like that, right? You have some specific repositories, you know, Yoda, for example, at Yale that really help control access. It's not a completely open environment, but it's a controlled access environment, which is also very beneficial for some types of data. So there is an appetite for that.
I also think that it's interesting because when we're talking about the two advantages, the reuse and that those big data sets are really are important for these aspect of reuse and reuse and things like that.
And then there is a reproducibility issue. And if you think back to the early 2010s, the debate around reproducibility in the biomedical community really started with pharma. It's Pharma who raised the alarm and saying, we can't reproduce the stuff that comes out of academia and is published in top journals.
They are really the ones who started that conversation. This is how I came into open science. I was interested in reproducibility and through talking with people in pharma, in academia, talking to funders, it became very obvious that there is a huge difference in how research is practiced in academia and how it's practiced in pharma or in biotech industry.
That's not necessarily surprising. I've worked in both environments. It's very different. difference in how research is practiced in pharma and in biotech industry. It's very different. But what's surprising is that there is a huge difference in how research is practiced in pharma and in biotech industry. It's very different. But what's surprising is that there is a huge difference in how research is practiced in that I think we minimize those differences. I use the "we" as journal editors. We tend to minimize that. We tend to treat academic results very often, as if they have all the rigor of an R &D process in pharma, which is not always the case. And it's too easy to make assumptions about the fact that surely this study was randomized and blinded, and there was a proper sample size calculation at the beginning to make sure that we would see an effect. And very often the result in academia is that, no, it is not randomized, it is not blinded. And there was no sample size calculation. We used six mice because that's what the people before us did. And it's just that you have that huge difference, and I think we've been minimizing that.
And so I think that you really have a vested interest from people in pharma to push these practices in academia and to really advocate for more transparency and more openness and higher quality of reporting and stronger ability to scrutinize results that are coming out of academic labs. And I think that you also have had a lot of sort of advocates within pharma that have been very active in trying to make more data available.
27:30 Can biotech companies be as open as pharma?
Andy: It's really interesting you know but again it's really so specific. So one of my bugbears is people conflate biotech companies — small or medium-sized enterprises — with pharma, yeah? So a pharma has, basically compared with a biotech, unlimited resources; thousands and thousands of people that they can devote to these things. It's a very different entity from a small biotech. So if a biotech is kind of doing a small trial — phase one, even phase two, efficacy data — then there's going to be certainly concern on the part of a biotech if that kind of data is then made of it and a pharma can go, oh, look at that. That's very interesting. Now I'm going to kind of come in, put all of my resources into this and, you know, basically leapfrog where the biotech is. So your point about us finding the right balance as to how open and who gets access. Because open works in certain contexts, it doesn't work in other contexts in biomedicine. That's my observation. What do you think?
Veronique: I found it very interesting that this all conversation around reproducibility started from pharma, right? Obviously there was a vested interest in that. There was the fact that they were competing for these things that were not reliable and I think that that's important. There is certainly still a lot of competition. And I don't know, right? I don't know if open is possible in that context but I think it would be interesting to see more investment from pharma where there are resources in pushing these practices in academic settings. I think that that would be quite interesting to see.
29:44 Pre-competitive consortia
JC: Another area in which open science can make a difference is advanced therapeutics. As you know, manufacturing of advanced therapeutics can be very expensive and complex, and there are already a few initiatives and public-private consortia that are trying to improve on these technologies. The idea is that all of the new knowledge from these initiatives will be shared and it will be interesting number one to pay attention and see if these advances are really shared broadly and number two maybe these initiatives could be some sort of sandbox in which to test new experiments in open science. I wonder if this is something we're thinking about?
Veronique: I think that's a very good point. Another area that you know better than I do, but it's the rare diseases area, right? I mean, that's another place where the ability to share this data is extremely important because you don't have that many patients. And you have some of these foundations who are really supporting that kind of research, which are very much oriented and thinking about the incentive system in a very different way than the typical academic-incentive system, right? I'm thinking CHDI and these kind of foundations who have really a focus on the shared resources.
JC: It's interesting that you mentioned rare disease because as you know, the Foundation for the NIH is leading a consortium called the Bespoke Gene Therapy consortium, and they are working on a playbook to speed up the development of therapies for ultra rare diseases. The idea is that this playbook will be publicly available, but I think it would be interesting to see how large is the delta between their vision of sharing this information with the community and the vision of organizations like yours that advocate for open science. Hopefully this delta is small, but it would be interesting to find out.
Andy: So, it's probably the subject for another entire podcast to talk about clinical trial data availability, But that's definitely another going on through into clinical research and how that data is shared and how people could perhaps reanalyze that and then finally go into clinical practice and how which clinical trial data clinicians have available. Ben Goldacre (https://www.alltrials.net/news/) has been a big proponent of getting clinical trial data out into the open on all these trials that have negative data in them that just never get into journals. Let's put that aside for now.
32:14 Moving the needle
Andy: Let's just take the final part of the podcast to talk a little bit about what are the areas where we can make the most difference? So you talked a little of the beginning, Veronique, about areas that you think where this model of doing research is particularly important. Obviously, there's things to be done to kind of strengthen what's going on there. And then there's this kind of question of how do we bring those principles out into new fields and push those. And who are the people that we work with to do that? I mean, you're at a publisher, but there's a funders and there's the researchers themselves. What are some of the things that we could do to make open science work better and kind of galvanize research progress?
Veronique: It's a systemic problem, right, and it needs a systemic solution. So there is no one solution with like one stakeholder doing something unilaterally which is going to change. the whole landscape. But I do think that there's a lot of actors who can actually really have a very important contribution. And if you coordinate that a little bit, a loose coordination, I think we could really make a huge difference.
One obvious area is the coordination between funding policies and journal policies. So we've had a data sharing policy at PLOS since 2014. If a paper is submitted and the data was not collected, maintained, recorded, with a view to share it, it's really late in the process to actually try to do something, right? So we do reject papers because there is no good data that are available. We do that. But that's only, it's the only recourse, right? So you really need to get to the researchers much earlier in the system.
And so if you. see things like the NIH policy, which now demands a data management plan as part of the grant submission, now that's a huge step forward, right? Because you get people, you get researchers in the spirit of thinking about that at the very beginning of their experiment. And ideally what you want is that they're being asked about that at the time of submitting the grant. They are being reminded of their accountability to that when they get the grant. When they get to submit to a journal, there is a policy at the journal to make sure that this is actually implemented. And then you have some compliance monitoring from the granting agency, right? But you need to talk to researchers throughout the chain. And so it's not just one actor that needs to do that. It's really having a coordination. And I think that there has been some work in trying to coordinate language and coordinate things between an organization like the NIH and journals in terms of their policy. And I think that that's very important.
The other very big player in that is the research institutions. You have a link to the funders because they get funding and educational grants and things like that. But they are really important in terms of providing the support and the infrastructure for this practice to the libraries, to their core facilities, I mean all of that is really important. So again, aligning these different things will make it easier for researchers. The other thing that I experience, especially when we when we're looking at our open science indicators and things like that, it's very rare and obviously there is some self -selection, but it's very rare that people don't want to share their data when they come to submit to us. Actually people are quite okay. It's just that they don't know where to put it, they don't have the time or the tools to really make it most helpful. There is all these practical issues associated with it. And so I think that we can facilitate all that by having a little bit more coordination between different players in the system throughout the research cycle, right? Funders, institutions, journals, aligning the incentives and the rewards and the expectations throughout the research cycle would be very helpful.
Andy: It's kind of like inconvenience, yeah? This stuff is really inconvenient. It's like, I mean, I saw as a Nature editor that the amount of time that it took a lab to upload all the materials for a paper over a decade. And it's always the poor postdoc, yeah? The poor postdoc has to kind of spend hours and hours uploading all the data files, all of the code, all of this stuff. It's really, really painful at the moment. That's my impression anyway.
Veronique: It's also not rewarded, right? I mean, there is the part about it, which is that very burdensome and inconvenient, but you also have all this amazing work that goes into, like, think about postdocs and early career researchers who spend a huge amount of time optimizing code packages. And this happens after publication. There is no reward for that. Nowhere is it captured, that type of activity, because we have an academic incentive system that is so obsessed with the published paper that all of these activities that are around the paper are actually not captured, not rewarded for that. And I think that that's another big shift that would have a huge impact; capturing these activities, making them visible, crediting individuals for that kind of thing, and making that important as part of the researcher assessment system, the tenure and promotion process, the hiring process, what are your open science practices? What have you done? What have you contributed to the community in terms of these kinds of resources? This would be one of the most important things important questions to ask instead of what's the impact factor of the journal you've published it.
38:00 Professional data curators?
JC: I wonder if there might be an opportunity here for creating a company or a service provider that manages all these data for scientists? The same way that you can outsource the creation of, for example, a mouse model, there may be companies that specialize in managing your data, particularly if proper standards for sharing and for reporting. Or maybe this can be an opportunity for an alternative career in science, something like a data steward position in charge of managing all the data that comes out of the lab, the same way that there are lab managers that take care of managing the day to day running of the lab. Maybe this position already exists and I simply don't know about it but I would say that it makes some sense.
Veronique: Yeah. I know that some repositories do that, right? They offer that service of curation and things like that, sometimes for a fee, so that's going in that direction. I know that some institutions, especially in Europe, I think Europe is ahead of the US on these questions, but there are a number of institutions in the Netherlands in particular that have data stewards as part of the institutional sort of environment. environment, right? And that's the person advising all the labs about these kinds of questions, and they're really the expert on advising, not processing it for you, but they are providing guidance and things like that.
You have some institutions who are also very much developing like a parallel career track, right? You have the tenure career track for academics, and then you have like a technical career track, right? Where you have people who become very important in terms of running core facilities and providing these kinds of things. I think it's (the University of) Edinburgh that has done that, and you really have a career path which is parallel to the typical academic tenure where you actually have that kind of thing. And I think that that's like, like, let's elevate these really important contributions.
39:53 Opening science around the world
JC: Well, it's great to hear that that's already happening. Now that you mentioned Europe and your observation that they are ahead of the US in terms of their attitude towards open science, I wonder what you have seen in other countries, particularly in China, which as you know, better than us is producing a lot of papers, but also has faced criticisms about data integrity in the past.
Veronique: So there certainly are open science grassroot movements in China as well. I think that the systemic issues are slightly different as well. First of all, you have a very conservative researcher assessment system that is really not taking into account these kinds of things as far as I know, at least in the majority of places. So from that top-down reward and incentive approach, that's not very progressive.
You have other issues related to openness about sovereignty over data and sharing data across international borders is more problematic coming from China. Another layer of complication is like who owns the data, how does the data travel or who can access the data, that's a different type of issue.
41:05 COVID-19, infectious disease and open science
Andy: It's interesting. That reminds me one of the key areas for open science is pathogen outbreaks yeah? And so COVID, and the release of the sequencing data of the ‘Wuhan pneumonia’, that coronavirus strain, was key to kicking off the entire kind of scientific endeavor around trying to work out countermeasures and antibodies and finally the mRNA vaccines, yeah?
Veronique: Yeah. I think the COVID pandemic had an urgency and the stakes were so high. that it really kicked off unprecedented ways of working for a lot. And so I think that there's been very, very good examples of things that have worked well.
There also have been a lot of examples of things that haven't worked that well. There's been counter examples of why data is important and it's not available, both in terms of data that has value for reuse and data that has value for reproducibility and scrutiny and all that. The famous Lancet paper that was retracted because the data from the choloroquine trial was not available. That was an example where the community needed to be able to look at the data because they couldn't make heads of tails of the analyses without actually looking at the data and this data was not available. And so that was a very prominent example of why you actually need data just for transparency, just to be able to scrutinize what has been done.
Obviously, the sharing of isolate sequences has been a very positive thing, but it's not without issues. The issues that have existed around sharing pathogens, isolates for vaccine development and therapeutic development, and then has been the countries who are the sources of sharing this information not benefiting from the developments when they come to fruition. That has been a long -standing issue, right? It started with influenza and, you know, the WHO (World Health Organization) has put into place this massive PIP (Pandemic Influenza Preparedness) framework about sharing influenza isolates because there was a real call for action from some countries where the disease is endemic and they're sharing isolates and then they're not benefiting from the then they don't have access to the vaccines and the drugs when they are being developed so that's a massive issue that was an issue with pathogen isolates it's now the same thing with sequences and that is creating a really difficult environment because you have the policy at the country level and then you have the individual researchers right I mean researchers in disease endemic countries that sequence pathogens and are good citizens, and they're putting their sequences in a database, and then the analysis comes out of ‘Harvard’, and they're not on the paper.
Again, it's a question of credit, right? Can we credit the data producers and elevate these to really what they deserve in terms of contributing to the research? These are the types of social problems that we need to be able to solve. It's not a technology problem. It's actually a social problem. It's about building capacity in different places to be able to do analysis locally and things like that, to be able to level the playing field. And it's giving credit for different types of contributions. I think that's very important.
Now, at the same time, you also have seen extreme examples of collaboration between groups that typically don't collaborate, right? Between different disciplines, different labs. different institutions, between pharma and academia. A big example of that was the protein interaction maps led from UCSF, but with massive, massive lists of contributors. This was a new way of working for these researchers across all of these silos. But it demands a different type of social construct and a different type of reward system. And I think that in times of pandemic, you have this huge urgency and obvious catalyzer of aligning everybody towards one goal. And then I hope that not all of that is lost after we come out of this.
JC: It's very interesting to think that when people started talking about open science or even earlier about open access, the way to get people on board was through punitive measures. For example, if you don't make your science publicly available, we won't publish your work. But now it seems that we are moving away from punishing people to rewarding their behavior. This is a pretty interesting change in the way open science is moving.
45:34 Veronique’s favorite tipple
JC: Veronique, this has been a very interesting conversation and we thank you for your time. As you know, the podcast is called The Mixer because we always ask our guests about their favorite cocktail. We were therefore wondering if you wouldn't mind sharing with us the identity of your go-to drink?
Veronique. Oh, it would have to be a white Negroni.
JC: Very nice. The white Negroni, as you know, is also one of my favorite cocktails. Do you know how to make it, Andy?
Andy: I have no idea.
Veonique: I'm sure JC will have some idea.
JC: Yes, no worries, Andy. We can surely prepare it.
Andy: So this has been so great, Veronique. Thank you so much for sharing your insights. Learned a lot. I think there's so much more to talk about as well, yeah? Also, this seems to be an area where the younger generation of PIs is really gonna be driving this, yeah? It's really a generational thing.
Veronique: Absolutely, absolutely.
Andy: Really exciting. Thank you so much for spending this time with us. We really appreciate it.
Veronique: Thank you. It was fun to talk. And I guess my only question is like, how do I get my White Negroni?
Andy: You have to come to New York!
Veronique: Yes, that can be arranged.
JC: Andy, that was a very interesting conversation. A lot of progress in the field of open science, but I'm sure you would agree that it remains a work in progress.
Andy: Absolutely. I mean, we just recently saw this paper published in Nature. The Alpha Fold 3 paper from DeepMind, and there was a big controversy over the lack of code that was published along with the paper there, so these debates about openness of science still rumble along.
JC: Yeah, I agree. It's a bit unfortunate that even a publication like Nature, which has pioneered a lot of these initiatives, in this case they seem to have dropped the ball. The other thing that I thought was very interesting from the conversation was the increasing recognition that there is a need to reward people for embracing open science.
Andy: Yeah, if you think about it, at the moment there really aren't many incentives for people to spend their time on this, other than being a good scientific citizen. Many research groups’ postdocs are spending hours upon end correcting code, lodging data in different types of data banks. This all takes time and there really are very few ways in which carrying out this type of work is given the recognition and accreditation that it deserves and it needs to come both from editors and publishers, it needs to come from funders and it needs to come from tenure committees, if we're going to be truthful about it. Like tenure committees need to look beyond just papers published in high impact journals, they need to start rewarding some of these activities as well. I don't know what you think, JC?
JC: Yeah, I agree. And look, even though there's still work to be done in this front, I'm quite optimistic. And we've seen a lot of progress, right? We've seen several disciplines that are really on board with this. Genetics is one, and there are a few others. And we mentioned some of them during the conversation with Veronique. So overall, I'm optimistic that an increasing number of scientists and organizations will embrace open science and show more goodwill.
Andy: Talking of goodwill, what do you think of Veronique's favorite cocktail, the White Negroni?
JC: Yeah. an excellent choice. I must confess that I'm also partial to the White Negroni. It's probably one of my three favorite cocktails. It's a very straightforward cocktail. It's a three-part cocktail. But interestingly enough, it's seldom available in your regular cocktail bar.
And that's because one of the ingredients is called Suze, which is a French bitter liquor, which is not that well known. So, if you go to your local bar and see that they serve the White Negroni stick around Andy, that means that they know what they're doing. The other thing that I've seen about the White Negroni when I go out is that some places do carry it, but they don't stick to the three basic ingredients, right, which are gin, Suze, and the Lillet Blanc. They start adding other liquors and do variations, and no, please, just stick with the basic cocktail.
Don't mess around with the ingredients Just enjoy it for what it is. So Andy as always next time you come to the house We'll prepare a white negroni for you to ponder with me the questions of whether it's worth to stick to that formula or if we need the experimentation
Andy: Well, you're such a purist JC when it comes to cocktails and beyond eh?
JC: I know right.
Andy: Well, I'd like to thank Veronique for a fascinating conversation, wide -ranging. I learned a lot. I'm sure you did, JC. Thank you to our listener and we look forward to our next guest. Cheers, JC.
JC: Cheers, Andy. See you next time.