Past the PDF | Innovations in Publishing Technologies Artwork

The Conversations

Wiley’s interview series features bright minds and leading experts from the world of academic publishing. The Conversations is all about sparking lively discussions on thought-provoking subjects, challenging the status quo, and embracing bold perspectives. Together with our guests, we dive into subjects shaping the future of scholarly communications. Don't miss out on expert insights.

All Episodes

The Conversations

Past the PDF | Innovations in Publishing Technologies

May 19, 2025 • Wiley • Season 1 • Episode 1

The Conversations, a new series from Wiley acts as a platform to have a dialogue and sometimes healthy debate on industry hot topics. Wiley’s own Jay Flynn, sits at the table and hosts fellow experts as they explore the various ways in which we are collectively reimagining the way information is created and communicated.

In this episode, we sit down with Nicole Bishop, founder & CEO of Quartolio and Chris Reid, director of product & publishing development at AAAS, to explore the future of publishing platforms to meet the customer needs.

Jay Flynn: 0:00

Hi, everyone, and welcome to The Conversations, a show brought to you by Wiley, a global leader in research and education publishing. This series is about exploring the biggest opportunities in the world of academic publishing. It's about asking tough questions and getting into meaningful debate about where our industry needs to go.

Jay Flynn: 0:17

With me today are Nicole Bishop and Chris Reid. Nicole is the founder and CEO at Cortolio, and Chris is the director of product and publishing development at Science. Now, let's start the conversation. Nicole, Chris, welcome. Thanks a lot for coming. Really appreciate you taking time out of your busy schedules to meet with us. Before we get into the meat and potatoes of our discussion today, just tell us our audience and each other a little bit about your backgrounds. And I'm just really curious about how you kind of got here. So we'll start with you, Nicole. What brought you into this, into the space you're at in your career right now? And just a little bit about how you got here.

Nicole Bishop: 1:12

Sure. Thanks so much. So first, we help scientists and doctors leverage 100% of the health data to make decisions versus the less than 1% up until now. And got into that leveraging AI from over 25 years in tech, which started in digitizing academic content at university libraries. But one day I woke up for work and I couldn't see out of my left eye. Due to a misdiagnosis as a child and all the wrong treatments throughout my adulthood, I was days away from going blind. But I met the right doctor who had done his research and he saved my sight. And that led to me transforming 20% 25 years in tech into an organization that leverages all of science to move research forward faster.

Jay Flynn: 1:58

When you say, I'm super curious about this, when you say like 1%, 99%, tell us what you mean.

Nicole Bishop: 2:03

Yeah, absolutely. So there's a massive amount of scientific data from that produced by the 17 national libraries, laboratories in the U.S., you know, articles, clinical trials, patents, policy documents, and all All these documents, as a collective, possess knowledge that can solve our biggest problems. But it's just simply sitting there. And we've gone as far as asking a simple question of that data. But I believe that that data can guide us to the next steps involving science. What are we doing this for if not to move science ahead? And so that's what we leverage AI to do.

Jay Flynn: 2:42

I want to come back to that sort of unlocking knowledge that's locked away. conversation because it is, for me, one of the big drivers of this project of digital transformation. But Chris, just tell us a little bit about how you got where you are and your work at Science.

Chris Reid: 3:01

Certainly. Yeah. So I have a slightly different background. So I come from much more of the publishing side. So I've been with AAAS and science for about five years now. And prior to that, I was with Oxford University Press. And reflecting on it, actually, I think one of the interesting things about my journey and coming through the career is how much technology has been pervasive in the background that I think I joined. I was very fortunate to start working in publishing at a point where technology was really instrumental in beginning to take over everything we did. And so looking about that and then looking at what science is trying to do now, there's a lot of commonalities there. Ultimately, what we're trying to do is communicate science. That sounds very easy. It's massively complicated. I think one of the things that we all struggle with is the vast volumes of content and of data and how we help scientists. And I think that's really ultimately what we're trying to do day to day. And publishing technology is instrumental to that. It's a fascinating place to be. Science is part of the AAAS. It's a society publisher. We are very uniquely placed. I hope we're a very well-known brand. People know us. But that becomes a lot of responsibility. So the challenge in technology is getting it correct as well, is building trust in science, reinforcing trust in science.

Jay Flynn: 4:21

It is. I mean, this data question is one of the things that connects science. what both of you do. And I think data is a really interesting place to start our conversation. Do you think we've got, as a group of people working on these same problems, right, do you think we've got our arms around sort of the data that doesn't show up in the papers? All right, so you say there's clinical trials, there's reports, there's case studies, but isn't there something in academia that, that kind of gives incentives for scientists not to share their data. And we only see the tip of the iceberg in your journal or in the data that the national labs produce. How do you think about it in your world? How do you get behind this big data problem?

Nicole Bishop: 5:11

Absolutely. So even when we just think about the number of submissions for, you know, to be published and the perhaps 1% that is actually published and all that massive data that is useful to solving science's problems, but it's being left alone. And all the data as well that, you know, when it comes to science, what went wrong. It's very valuable to understand what went wrong and have that, the ability to analyze that against what is going right and perhaps unlock that to lead us to that So one of the things that we did is develop a platform that would allow that from the moment that you're creating that science, whether you're publishing a protocol, an article via an open access or another platform, you have this ability to move it seamlessly to a platform that can leverage natural language processing, different aspects of AI to transform that into knowledge that can be utilized to move science ahead. So that's the way we look at it. at it is that data should not simply be a data point or a visualization on a dashboard, but it's truly a key, a component to how do we solve this problem. And it's humanly impossible. You know, health data, there's about one billion, one terabyte hard drives worth of health data. It's impossible to simply ask a question of that data or search through it and get what you need to get. And so by bringing all that data together and leveraging AI in a very specific way, we can leverage that data to guide us to the the next steps instead.

Jay Flynn: 6:44

So Nicole brings up an interesting point, Chris, and I hear it all the time, and I'm sure at Science you guys hear it. Nobody's interested in publishing null results. You can't make a career, you can't get tenure by explaining to the world all the things that didn't work. They're only interested in the new and the novel, not the sound science, not the case reports. Those aren't the things that the academic reward system keys off of. How do we even begin to wrap our arms around all the work that we're not seeing, that isn't showing up and getting submitted to your journals or Wiley's journals?

Chris Reid: 7:23

Gosh, it's a great question. And I think some of the responsibility sits with societies and publishers. I think a lot of it does sit with academics themselves and the institutions and the reward mechanisms and the incentivization, there is no incentive to publish that data. As you just said, if you want to build a career, get tenureship, go on, you need to publish in the very best journals. I know that can be quite a loaded term at times, But if your experiment fails, there's no win in sharing that. There is a win, of course, for society, because if another PhD student does the same thing and didn't realize you tried, that's inefficient, essentially. How we solve that is a challenge. I think it has to start in the academy itself, within the institutions, and have a reward mechanism for doing that. I think, in fairness, I think preprints and other things do go some way to helping with that, but they are not as recognized.

Jay Flynn: 8:16

Go ahead.

Nicole Bishop: 8:19

Speaking to inefficiencies, research productivity has been declining 7% every year. Over $200 billion in biomedical research alone is just on redundant research. So it's deeply important for when we think about this catching of the net that we can have with all the data that might not get published or might not have worked out, that may be where we can recover that $200 billion without wasting those funds. On research that's already been attempted and perhaps go a different direction. So, you know, in speaking to preprints, I think it does come down to technology and platforms that can be, you know, there can be incentives from a revenue perspective or what have you. to have platforms that feature that exact type of data and having that data available makes it available for platforms like my own AI platforms to then analyze that data in regards to what has worked, what hasn't, and then also from an administrative standpoint and say, okay, we're not going to go in this direction because it's already been explored.

Jay Flynn: 9:28

I'm going to show my age here. Okay. All right. But Let's talk about preprints as an example of the challenge with pace of adoption of technology in the academic publishing and the scholarly publishing space. So I'm old enough to remember when the first preprint servers were set up and they were called servers. And they were set up at the Los Alamos National Labs and they were, it was called the LANL preprint server. It was physics, right? Today it's called archive. But 25, 30 years ago, it was set up on the early internet as a way for physicists just to share work before it got submitted to journals. Today, 30 odd years later, We still don't have uniform preprint coverage across all of the domains. Now, what is it about this industry or this space that makes it so hard for new technology to get adopted uniformly? Why are we sort of in this world where we have all this consumer technology that is super high end, and yet the technology we use to manage the most important research done in the world is maybe not where it ought to be.

Chris Reid: 10:46

Yeah, it's a great question. I think, I mean, my instinct there is, I'll push back slightly because I think there are areas where actually we've adopted things pretty quickly. I think the question mark is the need for adoption, right? Is to say, well, actually, Sometimes with– and we think about this quite a lot at Science, right? There's– I'm sure they're a particularly good example. There's always hot areas within publishing, new technologies, research integrity is actually a really good case study of that. The thing we find in our internal discussions is trust. And so if we pick on maybe AI as a good example of this here, the AI– as a catch-all term, has so much promise. I'm sure we will talk about it a lot today. One of the things that we're looking at doing potentially is saying, well, actually, is there a way of using AI to create plain language summaries to help the accessibility of our content? So the classic question there, if there's a physics article and you're a biologist, maybe there's a value of it, but you aren't trained in that field, so a language summary could help you. AI is a great way of doing that. But even if it hallucinates one in 10,000 times, that lack of trust is a problem. And I think this is, and going back to the preprint point, that is a sort of a challenge there as well. And preprints are a very divisive topic within scholarly publishing, within academia. It feels very strongly both ways about it. But it's how to get the content that's as much as possible. And I have to say as much as possible because peer review is not a perfect word. I think it's the least worst process, but it's how to build that trust. And I think to answer your question perhaps more directly is with technology and that's maybe the slow adoption. As an industry, we have to be a little more conservative at times and not rush to do things because if we did something and then we build problems for ourselves down the line by sort of not ruining, I shouldn't say that, but damaging in some way the integrity of the scientific records, then we can build problems for ourselves in the future. And so I think that does speak to the kind of, yeah, the sometimes slow epic, not epic, slow, slow progress we see with technology. Although I do actually think that that is probably about to change quite significantly.

Nicole Bishop: 12:52

Absolutely. I think trust is definitely, you know, one of the factors with that legacy, legacy of the ideas of how things should work, legacy of research procedures, legacy technologies that have been in existence for decades and that are not capable of moving, you know, research forward and publishing forward in a way that's truly progressive. And so, you know, Legacy is okay. And if we approach the technology from a certain perspective, we can essentially add layers to that legacy. And so that we can transform that legacy into the future, something that we can really, you know, to build on, whether it's the pre-platforms that exist or existing publishing platforms. So legacy, I think, is definitely, it's deeply important. But when it comes to innovation, you know, we have to shift our ideas of how we approach it. And that's one of the things that we've done. And speaking to, you know, AI and, for instance, I think one of the main use cases that we've seen is we can summarize everything. But my personal standpoint is that's what abstracts are for. We can do so much more with AI, with the massive science that we have and academic content beyond, you know, a summary. And the hallucinations are very, very real when it comes to science from making up new authors that, you know, authors that don't exist, making up, you know, what we've seen enzymes when we've tested different models against our own. So I think it comes down to very specific use cases and and building AI for those use cases by discipline by discipline so that we can solve those problems according to that specific discipline without those hallucinations.

Jay Flynn: 14:40

At the end of the day, we're talking about data. Every single thing that, every item in a scholarly article, right? The name of the author, where that author works, who paid for the research, the number of co-authors, the email addresses, content in the abstract, the methods, the figures, the tables, all these things could be formulated as data. But I would argue that today, the vast majority of that stuff gets treated as text and images without a lot of metadata and without much semantic enrichment to help us understand it. And I think One of the things we're working on at Wiley is to change that with our new submission system where we actually break everything down and treat it as data as opposed to just plain, if you will, dumb text. But what do you do with all that data once you get it? And where do you draw the line? Data mandates from funding agencies, Nicole, require scientists to deposit their data in certain places. Should publishers be doing that? Should institutional libraries be doing that? How do we get our arms around this big data problem? Because, yeah, all the ocean buoy data in the world that's collected to make one research paper isn't something I think Wiley can afford to host. But I'm not sure that we can't afford not to disclose it to the point of about like making research more efficient. How do you think about that problem?

Nicole Bishop: 16:07

Yeah, if we are in the true business of what we say we are and advancing science, advancing research, then we need that data. figuring out how to host it, where to host it, and things like that. There are, you know, whether it's startups and things like that, the great AI race, you know, everyone's looking for data. And if you're in possession of that data, you can monetize that data. And so there's that incentivization from the corporate side. We have this data, and would you like to access it to train your AI models? And then we can all go about the business of science and solving it while also, you know, making that data more valuable from from multiple standpoints in the research and then, of course, from a possible, you know, the potential to monetize that data. So, indeed, I do believe that all of that data is valuable and it can be made more valuable and should be made available. Absolutely.

Chris Reid: 17:02

And science, how are you guys thinking about this? So really, it's tough for us, actually, because I think it starts with the authors and getting the authors to want to do it and incentivizing them to do it. You know, we're in a fortunate place at Science because of who we are and our brand that we have. People are happy to jump through hoops to get published. And so we do use that. And what I mean by use that is to say, well, actually, let's promote best practice. Let's say, well, we do need your data. We do need this. And we do end up in some situations where we're perhaps competing for papers, where that can be difficult, but we are very clear with our instructions and our desires and say, we need you to follow these guidelines.

Chris Reid: 17:40

We need you to provide the data to you. But it is a challenge and it is going to become more of a challenge. And I think actually there's an opportunity there as well. So maybe when we're talking about challenges, it's an opportunity to say, well, this data, we're not doing the data for the sake of this. This is actually helpful. This is useful. This is good for transparency. This is good for integrity. This shields us from things where if you don't provide your data, there's a question mark around your research and so it's good for the scientists it's good for the publisher that doesn't make it easy to manage at the same time so it is something that we think about the other part of it is economics I'm going to come back to economics a few times it's the cost of these things it's We work with Dryad, we work with other people who have that data storage, but that comes at a cost. And so how do we continue to be sustainable while also incurring these extra costs and provide them to our readers and to our authors as well? So it's a challenge now. I think, you know, I was thinking about this a little earlier that, you know, With a lot of AI development, as AI begins to be able to access real-time footage, so an AI could watch you, there's this idea that if you're doing an experiment, AI can watch you and then write out your method section. That avoids human fallibility to a certain point, but that's going to generate terabytes of data. And so how do we manage that? Exciting, interesting, but challenging.

Jay Flynn: 19:05

You brought up methods, you brought up protocols a second ago. Wiley publishes a big, database of protocols. This question around methods and this question around reproducibility of methods and the fine-grained detail that you need to provide, you're absolutely right. It's terabytes of information. In the biomedical field, it's not just you know, how many patients, but it's all that demographic data on those patients. It's a whole ecosystem of like medical records, procedures, prescriptions, all this other stuff. I mean, where does the boundary between, let's call it, work at the bedside, work in the field, data collection in the lab, on an instrument, and what gets published, where does that boundary need to move? if we could take a blank piece of paper and start over again, we probably wouldn't say the right vessel for all this information is a PDF, right? I think all of us can agree that that's probably true if we were really about the efficiency and the communication. So three years from now, like, or five years from now, where would we be experimenting to like try to redraw those boundaries?

Nicole Bishop: 20:16

Yeah, absolutely. I think the future is formatless. If we look at the evolution of publishing, going all the way back to the Gutenberg Press, format, right? And then we had paper and we got to digital. And now we're in a space where we can leverage all of this knowledge to do more, whether it is to ask a simple question. But there are so many different formats and different, PDF has been very helpful in disseminating, helping to disseminate information. And I think that's been very much the job of publishing. in publishing, we have knowledge. So what if we're looking at it from a different perspective of not just disseminating knowledge and not just making it more discoverable, but actually to help move things forward? formats is often the hindrance to that. And I think that when we, you know, we talked a bit about legacy, you know, you have the right technologies in place to lift that up and make those things format less, make them more ready, you know, for the web, web native, if you will, and the PDF being, you know, sort of indeed sort of a print option to move the data along. But certainly when we have, you know, this data in a, you know, the approach that we have is, you know, throw it into a database, a repository. But what we look at is that a couple of different layers when we draw that out is we have the data, we have domain expertise, and then we have a decision layer that we put on top. And that can be existing platforms because we understand very well that due to legacy, a lot of things might not change. But what we can do is add a layer that makes everything formatless with natural language processing and other technologies so that we can, you know, have that sort of a seamless way that when you publish something, it becomes part of a more of a collective knowledge and more accessible to more people.

Jay Flynn: 22:08

So how did journals react to that? I mean, like, formatless makes a ton of sense, right? We're breaking down boundaries between point A and point B on the journey of a piece of scientific information. But journals have a front and a back. They have a cover and a final page. They have a method section, an abstract, and a list of citations. I mean, the formats are almost prescribed ritualistically. So how did journals evolve to meet that opportunity,

Chris Reid: 22:43

I guess? Yeah, challenging question, let's put it that way. I think that you have to be The way I think about it is increasingly thinking about the production versus the consumption, right? So I think, as you've just been saying, I think not too far away, two, three, four, five years, consumption and how we interact with that content is going to be quite different. I very much agree in terms of having this overlay that builds on top of that content to allow connections between areas you might not otherwise be made. On the production side, I think that's slightly different because I think fundamentally, and I don't think I'm controversial in saying this, scientists like doing science. They do not like writing articles. They do not like that. So how do we continue to make that process easier for them? And actually, I think where having a format does help them, having a culturally, to use your word, sort of legacy approach of doing it in a very structured way is actually helpful for scientists. And I'm sure no one enjoys writing a methods section and there's lots of ways to improve that, but actually by having a set format that comes in and continues to go through that. And then we can build a layer on top of that that interrogates that. I think that may be the, maybe it's not the long-term place, but it's certainly the medium-term place. And so, yeah, I think it's a different perspective slightly. I think one thing we should be a little careful of is the PDF. The PDF has existed in its form and it's been so well-liked for a reason. And this is the same, I think a perennial complaint about publishing is Wouldn't we say, well, the article hasn't changed in the last 400 years. That's a bad thing. I'm a little careful about that. There's a reason why it's persisted for 400 years. And that is a very big cultural legacy to move away from. That is not saying there shouldn't be change. It's just being aware. I think, Jay, I'm sure you've experienced this at Wiley as well. Every time someone tries to say, well, let's do something other than the PDF, the audience wants the PDF. It's very hard to move away from that. And there is a reason behind that.

Nicole Bishop: 24:43

Yeah, and I think the beauty of technology lends that we can have legacy and keep it in place. And so when I say format less, it's not the throwing away of how we do things. It's actually enshrining it and allowing you to do exactly what you've always been doing. But we have that layer that says this is how you're doing things, and we're going to take all of these PDFs because what we do is we process PDFs. We process CSVs in all the different formats. But in that layer that we have on top of it, that's where it becomes formatless so they can flow more freely and we can extract valuable points of connection. But also I think it adds the ability to have layers of interactivity as well when you have that ability to take away those formats at that level. Not at the science level, you're at the bench, you're near your bedside, do exactly what you've been doing as it so evolves. But with technology, we can we can evolve that while maintaining legacy, I think.

Chris Reid: 25:46

Yeah. And actually, and to go back to your question a little bit, is to say, you know, I think this is one of the challenges for publishers. It's a public challenge for societies with our different societies, different journal portfolios. And I think this is going to be the big challenge in the next maybe three, four years is to say, well, actually, we're in a place where if you want to engage with scientists' content, you go to science.org, you read our content there. Is that going to be the same in five years' time? Maybe not. You probably will go to, who knows which winners will evolve in five years, but let's say, for the sake of argument, perplexity. The last perplexity question, ill-poor outsides. Is that damaging for our brand? That is a question for us. We don't want to be a commodity that fuels these tools. How do we actually get into that tool business as well? That's a very difficult question. That's something that's occupying a lot of thought internally. We don't want to be commoditized. We don't want our brand to be devalued because of that. But at the same time, we need to be part of that because if that's where researchers are finding their research and getting their answers, we need to be there. We need to be at the point where we meet the researchers where they are. And so squaring that circle is certainly going to challenge everything the next decade probably.

Jay Flynn: 26:56

Is this just another version of the same dilemma that publishers were facing 20 years ago when We build all these websites to host our journals and then discovered that nobody was coming there to do any searching and that all the searching was coming from PubMed and Google and those other things. And is this another version of that? Or is it more fundamental than that? I'll tell you how we're thinking about it. We're thinking about it in the sense that... We don't want these large language models trained on bad stuff. We want them trained on good stuff. We think that it's crucial that we embrace AI for the reason that these models must have access to peer-reviewed data. They must have access to peer-reviewed content because we don't want them going out on the common crawl data set on the public internet or on pirate websites or other things. We need to create both legal pathways and copyright respecting pathways for this, but also we just want the models to be smarter. So, I mean, are we worried about the wrong thing, I guess, in the sense that I'm worried about click throughs to my PDFs or should I be like, or should I be more outcomes focused? I'm struggling with this topic. So I'm just interested in how you guys think about it.

Nicole Bishop: 28:09

I think that's a bit where legacy, you know, speaking to both points where legacy lends its hand a bit, there is a very prescribed way, you know, your citations, you know, so, you know, the progression of science is always going to, you know, Wiley did it. You know, Wiley helped with this innovation. That's going to be noted. So whether you're finding it via perplexity or some other platform, there are procedures and you're going to need to follow those procedures. And I do think that there's a bit of, you know, the period of time that you mentioned. Some of the work that I did, you know, in academia was actually a bit gaming those results, you know, with work with Google for academic content. And, you know, that had a change of heart. So I was...

Jay Flynn: 28:51

You

Nicole Bishop: 28:53

needed to use your powers for good. number of different opportunities, whatever, be it a platform, you know, that is leveraging AI or, you know, some of the repositories that we have right now, ultimately they're leaning back to and clicking through to that PDF, you know, whether Wiley's hosting it or another publisher. And I think so that's more around discoverability and that's just a channel to discover for data. I think it's opening up the possibilities and wouldn't you want, you know, more opportunities for Wiley's articles to be found.

Jay Flynn: 29:53

So, I mean, you and I get to think about that, Chris, but we also have to worry about the economics, as you mentioned, right? And so that disruption question is big, but we didn't see that, you know, all of the doomsday scenarios, if you will, of disintermediation when we went online, those didn't play out. Are these things in your opinion, a flash in the pan, or is it more serious? I do think it's more serious,

Chris Reid: 30:22

yeah. Yeah, and I think if we look back to, you know, 1996 or a little bit later with Google, there's a kind of a deal you make with Google, right? You make your content available so it can be discoverable. And I think discovery is a word we'll come back to a lot. But actually with AI, that isn't the same deal. And I think the problem, going back to the sustainability side and the kind of economic side, is the majority of business models that have sustained journals for hundreds of years are based around usage. And so if you take the usage away, so if you change the equation so no longer discovery drives usage drives the business model, that raises a big question about sustainability. And I think, you know, The worry I have that if we think working with institutions can be challenging at times in terms of subscriptions and other things, working with giant AI companies is going to be far, far, far more challenging. And so there's a circle. It's very hard to square there because these organizations, the perplexities of the world need great content for them to work. So therefore, they're probably going to have to in some way pay for that great content, but they're not very likely to be willing to. So how does that act out in the future? And I think there are different business models out there which start to address that, but no business model is flawless, right? So it is, to go back to your point, I think it is a very serious point. I think this is... we keep talking about AI and I think if you talk to different people at AI, there's a lot of hype around it. And I think a lot of people are kind of, the bulls and the bears and the bears are saying, well, the hype's overblown and it's all going to come crashing down. I don't see that. I think there is probably too much hype and we'll probably plateau But I do see it as changing. And I think it's particularly changing our industry because how, to get back to what we were saying in the beginning, how researchers interact with our content is going to fundamentally change. And I think that's a good thing. It should do. There should be disruption. Like we said in the introductions, we're all in the scientific communication business. And so we can't just say, well, a PDF is good enough. None of us are saying that.

Jay Flynn: 32:26

I mean, yeah, I think we're in the communication business, but we're also in the facilitation of impact business, right? Like the... So I want to pivot a little bit and talk about what I think– one of the things that keeps me up at night is this question around standards. And I know standards can sound really boring, but they're the thing that makes everything work. And so just take biomedicine or take healthcare. The way that– prescription dispensing system in a hospital connects to an electronic health record in that same hospital, there's a standard for how those two machines talk to each other. And those standards are set by some industry groups and a couple of technology providers. And that works in the US and maybe in the UK and a couple other places that buy these software packages. And they're all the same software packages. We publish in everything from astronomy to zoology. And that means that we've got multiple standards for multiple domains. And the physicists and the chemists can't agree on anything. And they're only talking about atoms, right? And so when And to say nothing of the society role in all this, the society convening role. So let me start with you, Chris. As a convener of societies, AAAS convenes not just its own journals and its own journal community, but it's a society of societies in a lot of ways. And that's a big part of what you do. As we think about data standards for bringing... data off a machine into a PDF formatless, take the head off, publish a slideshow and a spreadsheet, publish some Python code and a PowerPoint, do an AI augmented method section, whatever you want to talk about. Who sets the standards for how all that works across so many different domains, geographies, regulatory regimes, funding regimes? How do you even begin to get your head around this? And what When you guys sit there and do your blue sky work, which I know you do at AAAS, at Science, how do you get your head around

Chris Reid: 34:38

it? With difficulty is the short answer. And to flesh it out, there are so many initiatives. And NISO does fantastic work in that kind of area of standards. And there's so many. If we pick on Orchid, for example, they have made great steps with those initiatives. I think the challenge, actually, maybe I won't talk about the challenge. Maybe I'll talk about the optimistic side, the hope. Again, you know, without putting all the eggs in the AI baskets. Yeah. You know, I think I'm very excited that actual, the way, you know, that AI can interpret text and look at the context around it is maybe we don't need the, I mean, I think we do need the- So we're just

Jay Flynn: 35:13

going to plow through it with

Chris Reid: 35:14

horsepower. Well, yeah, I'm not sure about that. I mean, maybe. Maybe. I mean, I think it's, but I think to your point, right, it is, if we try and say, There's a classic cartoon, I'd have to dig it out, where they say, well, we're going to create, there's five standards. none of these will create a super standard and they create a sixth standard. And then there's a seventh and an eighth. They're just, no one can ever agree. And so maybe that's our base. Let's say, well, actually there isn't going to be a standard. And so we need to find a way of having a tool that sits in some way and interprets this and converts it. You know, I think with, you know, it's something I do really struggle with because in some ways it's very easy. And in reality, it's very difficult. Like even when we're looking at institutions, right? The way an author calls their institution, it varies every, almost every single time. Sure, sure. And ultimately, we're dealing with humans, and so without forcing them to do something, which is very unpopular and very challenging, is there a way of sort of having a sort of intermediary step which allows that to say, well, actually, they're saying this, and they're probably meaning this. And actually, I'll do a shout out here to an article in the Scholarly Kitchen quite recently by Tim Vines from DataSeer, who talked about probabilistic metadata. And actually, I think that's, I haven't read the comments, but I think it's very interesting. Like, you can say, If this and that, then probably you can make the assumption. And it's 99% true. Now, as I just said at the beginning, trust is everything. So we want to be as close to 100% as possible. But where do we draw the line between actually 99% at this cost is fine versus 100% at the cost we can afford and therefore it'd be zero. So that's, yeah, I think that is an interesting, I'd recommend anyone to go and look at that article. It's a really, really powerfully written one. In

Jay Flynn: 36:57

the context of... publishing right the one of the things we're investing a lot of time in is it is getting that at least from where we first see the content at this point of submission, you keep talking about the number of submissions we had. Probably since we sat down, my organization's probably gotten 60 submissions in the time we've been talking. And that's 24-7, 365. And so when you bring in all this data and it's relatively unstructured and you're trying to standardize it, where do you put the burden? We've decided to put the burden on machines to do a lot of that inference, a lot of that probabilistic stuff. But we're also using reference tools to say you know we're going to describe MIT the same way every time we're going to describe the University of Pennsylvania Cambridge University the same way every time so that we can at least have some standards around author metadata institutional metadata but the science data the data that's coming in you know you do one of the things that makes me crazy and maybe you could tell me how I can not be crazy anymore is We get data from, let's say, a mass spec machine. We get a picture of an atom. And we have some infrared spectrometry on this thing. It's got all this metadata from the machine, time of day, when it happened, temperature in the room, method used, size, sample size, all this great stuff. it all gets thrown away by the time it shows up in a Wiley journal or in a science journal. And we have just a dumb TIFF image with a little bit of a wrapper on it or a JPEG. And then we smash that into a PDF and we go, ta-da, and all this data's gone. left on the cutting room floor. Tell me about how we fix this because I am stuck on

Nicole Bishop: 38:48

this. Yeah, we confront that every day. We love to get in the thick of every, you know, that's why I talk about formatless because that's what we built the technology to even those odds that you're consistently facing. So our approach is you have that, you know, that data from the, you know, You may be left with the image, but we take everything that there is. And we might pin it for later and put it in a different area of things. And it comes down to contextualization. You were talking about the different disciplines. AI can do far more than summaries, far more than have this perspective of what I would say is the popular and current perspective. But AI has existed for decades. For us data nerds and analytics and AI nerds, This has existed for decades. And so this current approach that's popular, large language models, is not the only approach. We have a causal AI perspective that looks at things entirely different. It isn't this brute force. We're going to present. We're going to analyze all this data, and then you can ask it a question. And that comes to the user experience because that's what it's been is you're having a chat and conversation. But we are just at the– this is very nascent, right? That's the experience that you have today, but it's constantly evolving. And we have a very different perspective on how the AI is sharing knowledge with you and thus changing how you discover data and indeed putting up front, this is where this data came from. So we have very much approach of giving you cues along the way that perhaps this is the drug that you should look into. Perhaps this is the lab test that you can look into. But as well, here's the evidence, whether it's coming from Wiley or another publisher. So we take all of that data, understanding that there's going to be an opportunity to leverage it at some point, and we actually are leveraging some of it right now. But AI can solve those problems, inclusive of contextualizing the data per discipline, which is exactly what we

Jay Flynn: 40:52

do. Right. So it's like that AI version of the probabilistic metadata point that

Chris Reid: 41:00

Yeah. Just to jump in, I do wonder if there's a sort of, I was going to say a fork in the road. And what I mean by that is to say, maybe this should have happened and has happened already, and we just don't think about it, or I don't think about it in this way, is to say, well, actually, journals, fundamentally, journals are produced for humans. That's a very obvious thing to say, of course. But should we start thinking about actually having two forks? There's a human format, and there's a machine format. And we say, well, actually, this version is consumable by humans, but this version with all the extra data is much more machine friendly. How we do that is, you know, challenging. But here, because one of the things, you know, I think we've talked about before, but coming back to is the volume of research, right? And AI has a great potential to help with that, but actually, and to build connections. But at some point, there will still be people reading things, you know, so... Evaluating those things, judging them short, yeah. Yeah, so, you know, when we're having these discussions in science, you know, if we go back to, I keep picking on perplexity, but if a scientist goes into perplexity, answer the question, I think our estimation is still the majority will want to go back to the original paper. What we'll sort of say is, well, actually, those people who are just trying to find a single fact or just found the wrong paper, they'll be eliminated. But actually, we still need to have that human digestible piece. And so that's an element for hope. But we have to acknowledge at the same time, which is I think what we're all saying, that machines will be a much greater part of that, that actually we need to use this content to drive advances. In other words, only machines can sift the that vast amount of data to do so. Is

Jay Flynn: 42:39

there a version of the world where we as publishers aren't even a part of that value chain? That the CAT scan run on a bunch of adolescent brains doing a PTSD study, right? all that data just gets put into an AI and the output gets created, the methods get described and publicized. And, you know, we changed the academic reward and the journal sort of goes, you know, it is 400 years is a pretty good run, right? Maybe those formats just evolve beyond their useful life. And we think about, you know, other ways that this stuff gets communicated. Is that a possibility, do you think?

Nicole Bishop: 43:30

I think that it's a transformation that can occur. What does it mean to be a publisher? I think when it comes to paper, thinking about the audience, they're always going to want to have that journal in their hand. There's always going to be the person that wants that PDF. And so I think it's going to be about personalization to specific audiences, about what it is that they... you know, want to consume and how they want to consume it, but in a sort of universal way, making it available to machines that, so that the consumption can actually increase because there's a greater deal of discoverability. You know, we look at, you know, a repository DOAJ and, you know, or any, you know, the NASA repository, you know, we've taken data from every one of the labs that produce science and that data is just sitting there. So, something has to be done with it. And it's being, you know, there's data that's being published, but when it's simply sitting there and not being consumed, even say 86,000, on average, 86,000 articles published per discipline, who's really reading all of that? It's, you know, there's a great deal of impossibility.

Jay Flynn: 44:47

I spend a lot of time on the weekend and it's very busy.

Nicole Bishop: 44:51

You know, there's a great possibility of that. But I think that the transformation is, you know, When you transform that data into something new, that's what opens up new revenue models. So I think publishers exist. How they make money may change. And hey, that can be a very good thing. These are new revenue streams. But I think that it's an inevitability of the transformation. But I don't think that the publisher disappears. I just think that they morph into something

Chris Reid: 45:21

new. Okay. What do you think? Yeah, so I was thinking about it from a sort of brand perspective. And so I think, and I know this is a very, very controversial topic within the open access movement, within scholarly publishing as well. The term prestige publication, prestige journals is used a dint against us, certainly. But when we think about it, it is interesting. The reason our brand works is because you can have a certain set of assumptions about an article published in science. Now, everything we publish, we're humans. Everyone makes mistakes. Nothing is perfect. But I think on average, you can make the assumption if an article is published in science or any other family journals or science partner journal, that actually it's gone through a number of steps. It's been peer reviewed in such a way. It's got this high quality, which I know is a dangerous word and sometimes attached to it. And so you have that brand. And I think that does, I mean, I think That continues, actually, because I think that the counterpoint to a lot of viewpoints where they say, well, actually, all we should do is just build a mega database and all scientists should put all their content in there. There has to be some way of filtering it. And I think what you'd find by doing that is you'd have brands emerging. You'd have different layers. Not everything is novel. Not everything. And again, I want to be a bit careful because just because something isn't novel doesn't mean it's not important. We talked about that earlier, right? The failed is important. But if you're a scientist, your time is extremely finite. How do you find a signal that says, well, actually this paper is a breakthrough I need to read. And that's where, of course, I think this and science thinks this, that having a paper published in science is a way to do that. And so I think that's where journal brands and journals do continue. Do I think they'll be in the same shape? We said 400 years. If I look 100 years in the future, which is madness, do I think we'll be in a similar place? No, absolutely not. But I would, as custodian of a brand like Science, which is, I should know the age, 125, 150 years old, we do expect it to be around for at least another 100. And that's something that does occupy our thinking and something that actually is a slight digression from your question, but I think it's something that a lot of societies deeply care about, is we see ourselves as the custodian of the journal today, but also for the future. Is there standing on the shoulders of giants that actually, and it's probably to our very first question, sometimes the slowness of adoption of technology is to say, well, the legacy piece that actually we do expect this to be around. We're custodians of it for a certain moment, and we'll pass it on to the next generation. I think society publishers particularly feel that. It's very important to us. And there is quite a lot of fear in the future about that.

Jay Flynn: 48:07

One final question, and maybe I'll start with you, Chris. As Wiley's a publisher, we've been around since 1807, but our oldest journal goes back to 1787. You know, so yeah, we're totally invested in that legacy and I completely understand that. But we're also a technology company that supplies you guys with a platform to publish science and the science partner journals and that family of journals. And we're venturing into things like AI for research integrity and checking for image manipulation. And we're spending a lot of time and energy trying to make the submission and peer review process better because that technology is old, right? The technology that's standard in the market today is 25, 30 years old. If you had one wish for that domain, right? This at the point of data collection, right? At that point of data collection, the things that enable you to get through the peer review, the preservation, the archive, all that, what would it be? Like if you could wave a magic wand, what problem would technology solve for you in that space in

Chris Reid: 49:17

five years? Good question. So it's probably a human answer rather than a technology question. But I think that technology can solve the human problem. So I think, so very much we think about things in upstream and downstream. Like I'm sure you do. That actually the more information we can collect upstream, so much of it downstream. The challenge, of course, you know, we don't want to create so many barriers for authors, you know, So many journals, and we do this as well, have extra questions, questions, questions, because we need all that information. So the technology piece would be actually, and maybe I alluded to this earlier, is to say, well, actually, is there a world, and maybe it's a decade off, not five years, where actually AI is helping you with your experiment, it's watching you, it's written your methods section, it's generated a huge amount of data. Because again, when humans write these things, they're very selective. It's a problem that's riddled throughout method sections. And so many times you read a method section, you say this, this, and this, and this, oh, well, hang on, they've missed out a step here. And a machine doesn't do it. So that would be my wish, that actually we can find a way to, again, I really want to focus it around helping authors and helping, maybe I should call them authors, helping scientists. Yeah. So just make that process as easy as possible. Because again, I think I said this before, they want to be scientists. The publication is almost an art. It's not an afterthought. It's almost an afterthought. And that work where they have to follow all these guidelines. So if we can ease in through that. Make that easier. Yeah. That would be the

Jay Flynn: 50:39

technological dream, I think. Right. It's focused on the user. It's focused on the focus. That's good product management, right? So, okay. If you had one wish, if you could say to the publishers, you know, God, just get your act together here, what would it be at that point of like data collection, that upstream place that Chris was talking about?

Nicole Bishop: 50:56

Yeah, I'd say when it comes, you know, back to legacy and, you know, adoption, there are technologies out there looking to, you know, sort of create this formatless world, if you will. so that we can leverage this data in new ways and actually solve some of those problems. And so it comes down to the conversations of let's give this a try. So when it comes to the submission process as well and modernizing the submission process and having tools at that exact point will solve a lot of the issues that we've talked about during this wonderful conversation. So I'd say, Legacy needs to definitely be addressed. Adoption, whether it's startups or different partners and things like that, I think as long as that part of the mind is open, we can evolve publishing and bring it to the future that we have sort of imagined here today. So I'd say that it starts with those conversations because the technology is here. It's here today. It's just a matter of adoption.

Jay Flynn: 52:02

Well, it does start with those conversations. And I think we've had a wonderful conversation today. So I want to really thank you both for taking time. We don't take it lightly. Everybody's, it's a busy world we live in. So I'm just personally grateful to both of you for coming and sharing insights and thoughts with this today. Thanks a lot. It was a lot of fun and yeah, really, really appreciate

Chris Reid: 52:23

it. Thank you. Thank you. All right. Thank you for having us. Great.