Dr. Chris Tucker:

Welcome to the Arthroscopy Journal Podcast. I'm Dr. Chris Tucker from the Walter Reed National Military Medical Center and the podcast founding editor. Today we'll be, again, discussing artificial intelligence and machine learning. Last episode, we spoke with Dr. Prem Ramkumar and covered a wide range of topics with clinical and research implications. In this episode, we'll dive even deeper into some of the more specific details of AI. I'm very excited to be joined for this discussion by a leading biomedical statistician, Mark Cote from Mass General Brigham. Mark's the associate editor for statistics for the Arthroscopy family of journals, and I think he's truly a gifted and talented scientist and researcher. But beyond that, he's an incredibly talented educator.

I've personally had the great fortune to listen to him speak on this topic and many others. He's got a unique knack for boiling down complicated and confusing statistical concepts into fairly digestible and understandable language, and he's often got some good humorous and relatable anecdotes. I look forward to him sharing his knowledge with us today. We're going to be referencing several of his publications to include two editorial commentaries, Machine Learning and Orthopedics: Venturing Into the Valley of Despair from the September 2022 issue and Artificial Intelligence, Machine Learning and Medicine: A Little Background Goes a Long Way Towards Understanding from the June 2021 issue. Mark, congrats on your work and welcome to our podcast.

Mark Cote:

Thanks, Chris. Happy to be here and excited to talk about this topic.

Dr. Chris Tucker:

As I mentioned, I'm excited for you to share your thoughts on AI in orthopedics because I think as a researcher and a statistician, you have enough of a different perspective than us surgeons that you often shed some light on these potential blind spots we may have looking at issues from our vantage point. On my last episode, I covered many of the basics of AI and discussed some of the potential applications it can have in the clinical realm as well as in medical research. We discussed its potential and also some of its potential pitfalls and dangers. I'd like to discuss some of the other nuances of this emerging technology starting first with the larger picture of AI. Just what is the big picture here with respect to our approach, the process and its structure?

Mark Cote:

Yeah, I think that's a good question, and I think Prem did a really nice job explaining AI as the idea of emulating or automating human behavior. So within the realm of AI, a subset of it is machine learning, which involves the use of data and a model essentially to make predictions. As far as the larger picture, I think it's helpful to start by looking at the differences between what we know or think of as statistics and what we know or what is meant by the terms machine learning. Neil Lawrence, who is currently the Deep Mind professor of machine learning at the University of Cambridge and prior to that was actually the director of machine learning for Amazon, had a great tweet on this difference. So to quote him, he said, "Statisticians want to turn humans into computers. Machine learners want to turn computers into humans. We meet somewhere in the middle."

What I really like about this tweet is it gets after the philosophy guiding each of these approaches. So historically, statistics has really been about removing our own biases and tendencies that we all have when we look at data, and this is a part of being human. In a sense, we've all have these internal models in our mind about how the world works, and this is really what's helped us survive. The problem is, though, that when we're presented with say, just a random batch of data, we'll often look at it and see a pattern. So historically, what statistics tries to do is remove these biases, and I like an example Neil gave, which is drug approval. So without statistical analysis, if we looked at say, data from a drug trial, an ineffective drug may seem effective just because we can get fooled when we look at data and see patterns in the data that aren't really there.

But I think there's a little bit more to statistics than just hypothesis testing and thinking about drug data. I think what's key is that when we're doing some type of statistical analysis, especially those that pop up in our papers, oftentimes, what we're doing is we're trying to estimate, measure or explain some type of phenomena that we observed. So say in the context of a paper, we may use something like logistic regression to model the propensity for infection after some type of surgery. So the variables we put into that model and the choices we make try to help us explain what might be contributing to the risk of infection. Once we identify those, we try to estimate their effect. To use the example Prem used, which was smoking, we might say that relative to non-smokers, maybe smokers had 2.0 times the odds of rotator cuff repair failure. Then we would surround that with some level of confidence.

Maybe we would say that this estimate goes from 1.1 to 4, meaning that this data is compatible with odds ratios that reflect very little risk, 1.1 all the way through a high level of risk, an odds of ratio of 4. But I think the key there is that we're really trying to estimate and explain what's happening with the data and really trying to understand at least maybe from a clinical perspective what's happening with the patients you see. But conversely, with machine learning, it's really about emulating human behavior. In some ways, it's about leveraging all those intuitions and tendencies that make us human because we're actually pretty good at a lot of things, and it would be really great if we could automate them.

So where we describe statistics with words like explain and estimate and measure, and we also talk about how to reduce bias or at very least account for our biases, when we're thinking about machine learning, it's more like we use more words like predict and recognize to characterize machine learning. Then we also discuss actually how to transfer our own tendencies and intuition to a computer so that we can automate and emulate human behavior. That's not to stay statistics can't be used to make predictions or that machine learning can't be used to estimate things. Of course, they can; however, the philosophy of each is different. I think this really plays an important role and provides some context. When we read these machine learning and AI-related research papers. Specifically, they have different objectives, and I think this plays a key role in how we interpret them.

Dr. Chris Tucker:

I think that's a nice segue way into my next set of follow-on questions where I wanted to dig into a few of the aspects of AI that we don't often talk about. The first being ethics. So with your nice explanation there as a backdrop, I was wondering what your thoughts are on the ethics of AI as it pertains to both clinical practice, but maybe more so medical research from your perspective.

Mark Cote:

So as far as clinical practice, I think we first have to consider how we would be using or are using an AI. So in the context of clinical practice, I think it really should be a tool that assists the clinician in decision-making and not necessarily one that replaces it. If you think about what's really the strength of some type of AI, and it's really that the methods associated with them have this ability to ingest and hold really large amounts of data while they're making predictions. It's much more than we could hold or consider in our own minds. So if you'd imagine if you're seeing a patient trying to hold all the research that's been published on this particular clinical problem, plus all the information that may exist in your registry or in your practice data about this clinical problem, plus all the data that's in the medical records of patients that have this clinical problem and then try to work through all these various connections and the data to make some type of prediction for the patient in front of you is impossible. We just can't do that, but computers can.

So in some ways, the computer is like an upgrade to our own limited capacities. But counter to that, if you think about it, computers don't possess some contextual understanding that we have. You can train a computer or some type of AI to recognize images. In fact, you could maybe train it to recognize images of people in an image. However, if that picture was taken say, during the Great Depression, it would be very difficult for a computer to recognize the emotional response we may feel when looking at that photo. So in the context of AI and clinical practice, this difference between the computer and the human can really be problematic. So I think that the next thing to consider with ethics is that we need to hold AI or these AI systems accountable. Dr. Doshi-Velez is another person I want to talk about. She's a professor of computer science at Harvard, and she has talked about this I believe in a TED Talk. But she gave this great example where they had trained a model to predict when patients in the ICU were going to die, or when they were likely to die.

They used volumes of medical record data. They used a bunch of physical measures, a bunch of biophysical sensors, stuff like that. It actually arrived at a model that was highly predictive, which was great, but when they dug a bit deeper, they discovered something interesting. So really what their question was is, "Wow, we have this highly-predictive model, but what is this model using when it's making this prediction?" Interestingly, it wasn't using any of those physical measures or things you might think about that might correspond to predicting when someone in the ICU who's under some distress may die. What the model was really keying on was when the word chaplain was written in the medical record. I think her larger point was that we need to hold AIs accountable, because on the one hand, they're powerful tools. They can do what we can't do, which is integrate and hold all that information at once and synthesize it in a way that can make a prediction, but it still lacks that ability or that thing that makes us human.

Her advice, and I think it's good, is to focus on local explanation. So what I mean by pulling AIs accountable is to look. There's ways to look under the hood and see what's happening in an AI, but perhaps what's better is that when a specific recommendation is made and you're using this AI ethically as a tool to assist your decision-making, what was it that was the data specifically that the model evaluated and how much weight did it put on when it made this prediction. Much in that previous example where maybe the prediction that the patient was going to die in a few hours was probably made based off the idea that someone in the medical record wrote, "Consult clergy," or, "Find the chaplain."

So in that case, when we're speaking about AI as a tool to assist and not replace decision-making and the ethics surrounding it, I think the idea of local interpretation and really, local explanation is important. I think you also asked about ethics and medical research, and I think we need to be mindful of the potential for AI when we're thinking about it now in a research sense and not really in how we're applying it if we have an AI that we want to use clinically, that we want to make sure that they're representative of the entire population. So I'll give you a quick example. We were recently looking at a machine learning approach for predicting overnight stay. There's a lot of models out there that do that very well.

Our goal was not to develop a model for clinical use, but rather to explore how well does this globally-performing model that does well when it's given all the data, how well does it do when it's stratified by race and ethnicity? Interestingly, you can assemble a model that's very good predictively across all your data, but when you start breaking it down by race and ethnicity, some of those racial groups and ethnic groups suffer from poor accuracy. So really, a seemingly well-performing model can actually perform poorly when you look at things like race and ethnicity, and I think that raises ethical concerns. So I think it's important on both sides, one clinically to recognize that it's a tool, and it's a tool we need to hold accountable and things like local explanation can help us. On the other side, there's what we're doing from a research perspective and trying to think about as we develop these models, are they equitable across everybody that we're interested in?

Dr. Chris Tucker:

Yeah, your comments relate nicely to some of the discussion I had had with Dr. Ramkumar where we talked about the big categories of AI being generative and non-generative. Obviously, there's more capacity for risk associated with the generative models, I think, is a general consensus, but then also supervised versus unsupervised AI. So I think if you had a 2 x 2 matrix, I feel like the intersection of generative and unsupervised has the highest risk for creating ethical dilemmas or breaches in ethics versus the supervised non-generative AI is where you're talking about using it as a tool, having human supervision or interaction and use of it.

That seems to be where the lowest risk resides as you described in your examples there, which both make logical sense to those of us when it's pointed out. But to a novice, you may not think of those sorts of things unless you're involved in it, like you said. So that being said, what kind of concerns do you have about this current rapid expansion of the use of AI, particularly in the realm of AI-related publications? What are the main concerns you have about that trend right now? Is it expanding too quickly, misuse of it, misunderstanding, misinformation? What do you think the biggest potential pitfalls are?

Mark Cote:

No, it's a good question. In some ways, the increasing publications, I think it reflects an interest in the area, which generally, I think is a good thing, but there are concerns. So I think a lot of the papers, it's just a re-analysis of existing registry data. I think Prem touched on this, and I loved his example of the idea that you're really just holding up a mirror to our data. It's important because I think you're not really finding anything new, you're just analyzing it in a different way. But I think when you consider the philosophy about what we spoke about earlier about machine learning being about emulating human behavior, sometimes to me it's not clear that why we need some of the studies that have been published. For example, Machine Learning Outperforms Logistic Regression. To me, it makes sense that if you tried several different algorithm approaches to predicting something, it's likely to do better than just one.

The bigger question is, does it really advance our understanding of the clinical problem or move us towards a solution? I think context is important also. It's not unusual to see a paper that promotes a machine learning algorithm for predicting, say like patient reported outcomes or satisfactions, but then they often refer the reader to a Shiny app or some type of web link that you can go to and use the model. I get it. That's cool and I think that's great, but oftentimes, these Shiny apps only take in a small handful of variables, for example, maybe age, current smoker, BMI, whether someone's diabetic and they use that to produce a prediction. I think this presents a few problems. One, just on face it's hard to believe that a patient's outcome would be determined by their age, their smoking status, the BMI and whether or not they have diabetes. What this actually probably reflects is these are the variables that maximize a set of metrics the authors were using when constructing these machine learning models.

In some ways, you can think it's almost like a poor representation of machine learning because I'm not sure it's really reflecting what it's intended to do. I think the other issue though is external validation, and this was touched on a bit in the previous podcast, but I think it's really critical for these type of models. So patient expectations, surgeon preferences, comorbidities, they're all varied from practice to practice. So when a machine learning algorithm is developed exclusively on a single practice's patient, it's unlikely, or I would even say at best unknown whether it will hold when used outside the practice. To give authors credit, a lot of them note these limitations, but I think the message gets mixed when the readers refer to the Shiny app. Then while these Shiny app webpages do provide some external validity disclaimer, it's like it's hard to unsee what you've already seen and it's attempting to just start plugging in some numbers and getting predictions.

That's not to say that they're all bad. Many provide Shapley values which provide local explanation for prediction like I was talking about previously, and we can talk about it a little bit more in a minute. But nevertheless, I think the message gets mixed and the intentions are unclear. The other concern I would have about the increasing amount of publications, it has to do with just data science practices themselves. There's an entire field focused on data science. They have well-established practices and guidelines. There are ethics that guide them. There's well-educated professionals who've been educated and do this type of work for a living. They focus on things like fair and equitable models, transparent methods, et cetera. These are all core components of what they do. So while there may be some good machine learning papers in our literature, many of them fail to adhere to these basic practices. A good example is the provision of a repository for the code and software use. Again, this was touched on previously, but I think it's important.

This is a basic practice among data scientists, yet, it's one that almost is universally ignored in our literature. When you think about it, it's really not all that surprising that many authors aren't doing this because many authors aren't data scientists, but rather they're just very well-meaning researchers that are really using parts of a practice of another discipline, that being like data science, but not all of those things when they're generating this AI-related research and subsequent manuscript. So if we're going to develop research that uses machine learning models to support some type of AI, it would probably serve us well to understand the data science practices and do things that they do like making the code available, like addressing and mitigating bias. Otherwise, I guess my concern would be that we're going to continue to populate the literature with machine learning papers about really for the sake of publishing papers about machine learning. I don't see it really moving forward. I just think it's more of a demonstration of what it can do and not a clear mission of how we're moving forward.

Dr. Chris Tucker:

Yeah, I think that's an interesting point you bring up at the tail end of your response there where you talked about the intersection of basically two fairly different fields of science, orthopedic surgery and computer science. While there's probably some overlap, at least from an intellectual interest perspective, when you picture an orthopedic surgeon in training focusing on the biological sciences and medicine and anatomy, and then you're picturing the computer scientist who's a very hard scientist working on code and such, we're now discussing an intersection in research of those two types of folks.

Like you said, their approach to publishing is just conceptually different in terms of what the requirements are for a publication. I know at the editor's meeting we talked about that and touched on that. From an editorial board perspective, what's the responsibility to have this minimum standard for publishing on a computer science-related topic in an orthopedic journal? So I know we're going to talk about that a little more later, but let's just hammer it home now. From your perspective as a researcher and an editor, what do you think are the minimum necessary standards for a paper on AI or machine learning to achieve the intended goals in an orthopedic journal?

Mark Cote:

So I think there's a few. I think for one, I think it's important to specify whether a problem has been addressed in why a machine learning or AI-driven approach is needed and what's its intention? What should it be trying to accomplish? So as we discussed, some of these papers are just re-analyses of existing registry of practice data. They seem to serve, and I get it, to demonstrate that a machine learning approach is powerful, like it can outperform some of the traditional approaches. So I think to answer your question, I think as a start, it really should be first, what is the problem and why do we need this type of approach? In addition, I think we need to consider that oftentimes, that these papers, they're trying to emulate human decision. That's the, I don't want to say the guiding principle, but the motivation or the objective of why you would venture down a machine learning road. So the question is when we see these papers and some of them are very good, then what's next?

If indeed there's potential to improve upon the current state of a problem with some type of AI-driven research, you'd expect a line of research looking at the process from end to end; specifically defining a problem, developing a model, deploying the model externally, evaluating its performance, and then moving towards some type of implementation. I think what often happens, though, is that we get papers that seem to be like one-offs. They show the potential for machine learning to provide greater predictive power than what we've seen before with traditional methods. Then investigators seem to move on to the next clinical problem. Whether it's posterior meniscal root tears or Bankart lesions, sometimes I feel like that data is grabbed and then again, redemonstrate the predictive power of a suite of algorithms that come from the machine learning community to show that, in fact, that a machine learning approach can definitely boost your predictive power.

So I'd say at a minimum we should try to discern what the goal of the research is and whether the machine learning approach is needed and rather than whether it's just a simple demonstration of what machine learning can do. I think another important concept is to address external validation, which simply refers to the performance of the model in different settings. So I don't necessarily think that a submission needs to have developed a model, internally validated it, meaning they've trained it on their data and then tested it on other data that they have that wasn't included in the model and then went out and externally validated before they can make a submission. But I think that discussion needs to be up front, and I think that idea of external validation needs to be up front, especially in these papers looking at patient expectations, patient satisfaction and patient reported outcomes because these things are so apt to vary from setting to setting.

Then finally, I think it's a tough topic, but I think you should consider incentives. For better or worse, published papers are currency. They get students trying to get into medical school, they're important for medical students trying to get into residency. They're important for the resident trying to get a fellowship. They're important for a faculty member who's looking to advance. I get it. We've all benefited from it, including myself. But I also think it's important to look at that angle a little bit because as complicated as the statistics and the underlying things that are happening under the hood, maybe, they're not the hardest things to fit. With a little bit of coding and a little bit of knowledge of certain open-source software, which, again, is free, it's easy to read in a CSV. It's easy to run a suite of algorithms and get some results.

It's easy to generate metrics like area under the curve and these different things that we put in machine learning papers, and then it's easy to incorporate that into a paper. By no means am I saying that people are running around recklessly fitting these models, but I think this is a problem that's been going on for decades, that publication is currency, and sometimes there may just be incentive to publish these things. I think if we keep an awareness of that, at least from the perspective of the manuscripts we consider or the papers we read and just try to determine where is it that we're trying to advance the understanding of a clinical problem or automate something that we couldn't automate before or develop some type of decision-making algorithm that could assist with clinical care. Then where are the papers that, again, well-intended that may just be demonstrating here's what machine learning can do? Because there's plenty of those out there, I think that's an area where I don't think we need many more.

Dr. Chris Tucker:

Yeah, I think the breadth of publication has certainly expanded. I think now we're alluding to the need to maybe improve the depth and the significance of the studies that are being done. You've mentioned, and we've touched on a few times already, the validation of these models. When I was reading one of your editorial commentaries, I learned of this new term called distributional shift where you described that as a shift from one set of circumstances to another, i.e., when a machine learning algorithm is trained on one set of data, like you've said, it becomes internally validated when it's applied to a set of data and then tested and then tested against more data from that same institution. But then when an attempt is made to apply that algorithm to another scenario or outside data, that's when you can run into that difficulty with external validation. So I know you said that you don't consider that necessarily a requirement for publishing papers, but just how much do you think it matters, the battle between internal and external validation for these newly-developed machine learning models?

Mark Cote:

Yeah. Well, I think it matters quite a bit. There's certainly situations, and you touched on them in the previous podcast or maybe it doesn't matter as much, but I think a lot of what's germane to orthopedic research is how does the patient do? Which patient reported outcomes or just simply satisfaction? Or it could be just be a binary answer like, "I'm better, I'm happy," whatever it may be. Those things, and I think you know from experience, are dependent on so many different variables. So the idea with distributional shift, it's a nice term, but really what it's saying is a mismatch. It's a mismatch between the data I trained on and tested on and made my model on. Then when it's deployed outside of my arena, how well it performs. In that context, especially when we're talking about something that's patient driven, you can see how a model could perform very well within a certain practice setting, but fail to perform well when it leaves that setting. So I think it's super important.

One of the famous examples of it is it's just not patient expectations, it can be other things. A famous, well, I say famous, but a recent example that got some press was a group of researchers at Mount Sinai that were looking at, I think they were trying to predict maybe pneumonia from a battery of stuff they were feeding the model from images and stuff in the medical record, and it performed extraordinarily well, to their credit. They've got a set of data from two different institutions, and it didn't perform as well. What was interesting was that the model was keying in on the fact that the folks that had pneumonia or some type of pulmonary problem tended to be in bed because of the nature of the hospital they were at, and they had to use a portable x-ray machine.

So at the end of the day, that great model was dependent somewhat on the fact that a portable x-ray machine was being used. So that's another example of the mismatch or that distributional shift. Doesn't always have to be patient expectations, it can be something like a portable x-ray machine. But it does underscore the point, especially contextual to orthopedics where we're not... most of the time what we're thinking about is how patients do, that how patients depends on a lot of different stuff. Machine learning and AI is great, and I think it's going to advance our understanding of a lot of these things, but I think if we just plow forward full speed and don't think about these things, we're going to put ourselves at risk for getting some of this stuff wrong.

Dr. Chris Tucker:

So just to really dig into the weeds of stats here, because this is your area of expertise, you mentioned it earlier on in our conversation, the concept of Shapley values. I doubt anybody who hasn't read about this topic knows what that is. Can you just explain for us what the concept of a Shapley value is and how they're used and applied?

Mark Cote:

Sure. So Shapley values come from cooperative game theory. So the idea is that you're trying to figure out the relative contribution of, say, in this case, each of your variables. So if we go back to what I was talking about earlier with that Dr. Doshi-Velez and that idea of local explanation, the reason we may have these AI technologies is that it's hard. If we were able to hold all that information in our brains and make these decisions, we wouldn't need this technology or this approach to help us with making predictions or understanding what's happening. One of the things you can do is local explanation. So if you have a patient and there's a given prediction for that patient, what a Shapley value is doing is it's showing you what variables... So let's back up a second.

Say it says that it says there's some risks for infection or there's some particular treatment you should do. What's important there is to understand we can't understand the whole model. We kind of can, but generally speaking, it's going to be really hard to understand all of the statistical mechanics that are happening under the hood. But what we can do is say, "For this given patient, you predicted X, and why was it that X came out of this model?" So what a Shapley value will give you is a list of the data that was incorporated into your model, and then for that particular patient, how much their weighting or how much contribution from each of those variables went into this decision. So you may look at one decision or this particular prediction and see that a large portion of this prediction was based off the patient's age.

Then maybe a little bit was based off their BMI and maybe a little bit was based off say, their sex, maybe a little bit was based off their preoperative score. But the biggest thing the model was considering what it made this particular prediction was age. This is where I think the two things merge. As a clinician, you can maybe recognize that yes, I get that maybe age is something that is predictive of having a bad outcome. But I'm the clinician and I understand that even though this person has maybe an age that's indicative of being very old, they're not an old patient. So in that regard, when you're seeing that local explanation, which is what the Shapley value provides you, it can help guide you as to say that I understand why the model is making this recommendation.

The model is a model, and it's less than perfect, but I can now evaluate the fact that because it puts so much emphasis on age, that's not something I'm putting emphasis on because I have the clinical expertise and the experience to understand that in this particular case, I don't think just the raw age of this person is really reflective of what the metric age is typically intended to reflect in a model which is righteous, some type of degradation over time as we get older. So that's really what that Shapley value is giving you. Again, they can be very valuable because you can see... maybe you can't understand exactly all the intricacies of a model, but you can at least understand what exactly is it valuing when it's making a prediction. I think that's where that intersection between using AI as a tool to assist in clinical decision-making, not replace it, really matters because it nicely merges with the clinician's own experience plus what the model's saying.

Dr. Chris Tucker:

Sure. I always enjoy when the Editorial Commentaries have catchy titles as yours did when you mentioned the venture into the value of despair, where you were referencing the classic dip in the Dunning-Kruger effect graphical depiction where increased competence correlates to this rapid drop in confidence when somebody realizes just how much they actually don't know about a topic. Can you just tell us about the Dunning-Kruger effect and why you feel like in terms of AI, we as a collective profession are just about to summit Mount Stupid and inevitably descend into this valley of despair?

Mark Cote:

Yeah, thanks. I have to credit Ian Wellington on that one. He co-authored into discussing these issues. He was able to quickly relate it to the infamous Dunning-Kruger effect. So the Dunning-Kruger curve basically plots confidence against knowledge. So the idea is that when you're learning about a new field or subject, a little bit of knowledge can give you a lot of confidence, but that also can be a problem. When you lack knowledge, your confidence is high. So this is what's considered Mount Stupid. The example that's always come to mind for me, for whatever reason is poker. I'm not a gambler, but I think poker's fascinating, and I think it fits the Dunning Kruger effect nicely. So I can imagine when you start playing poker, maybe you start learning a little bit to improve your game.

So you learn a little bit about odds and probability, and then you successfully use this information to your advantage when you're casually playing with your friends in some type of friendly poker game. So your confidence is high, yet, there's really a lot more to know. Then soon this little bit of knowledge may become a problem, especially when the poker game involves more experienced players, let's say, and then suddenly your losses start stacking up. So the idea is now you're in the valley of despair, you've realized that you actually have limited knowledge. You moved into this arena, you learn some things that gave you a lot of confidence, but now you realize there's so much more to know. Not only that, you're not really succeeding, so you have really no context. In that context, I guess in this example, you probably also don't have a lot of money left.

But moving on, if you devote the time to study the mechanics of poker and all the things associated with it like game theory and start to understand patterns of betting and a variety of other information, 'cause there's a lot out there that you need to consider, you slowly but progressively gain knowledge. As you gain that knowledge, you gain confidence. This continues until you hit what they call the peak of enlightenment where now not only do you have a high amount of knowledge, you also have a high amount of confidence. So I thought this was like an apt description of where we may collectively be when it comes to AI in the realm of orthopedic research.

That editorial itself was a commentary on actually Prem's paper on meaningless applications and misguided uses of AI in orthopedic-related research. To me, it was clear that there's some good papers out there, but there's also a lot of bad ones. So it seems like confidence is high, especially with the Shiny apps, the catchy manuscript titles, and oftentimes, the very definitive conclusions. So I don't know if we're at the peak of Mount Stupid, but per the available literature, it doesn't seem like we fully understand machine learning and AI-related research, nor would I say that we really have a clear goal or mission for integrating AI into practice. Rather, back then when I looked at it, I think what we've figured out is how to publish papers on these approaches, and I'm not really sure how that moves us forward.

Dr. Chris Tucker:

What do you think is the end game for AI? What do you see as the ideal future course for its use in the greater field of orthopedics?

Mark Cote:

Well, end game sounds very ominous.

Dr. Chris Tucker:

What do you think is our beacon that we should be heading toward?

Mark Cote:

No, it's a very good question. So I've spent a lot of time pointing out many concerns, but I am optimistic. I want to circle back a little bit to that Doshi-Velez talk, Dr. Doshi-Velez's talked that I mentioned earlier. I think she actually makes a really great point about AI systems towards the end of her talk, and it went something along the lines of to not use our tools to synthesize and manage all of the medical information really is to accept all the current harms due to our human inabilities. What I like about that statement is it speaks to the potential value of AI and orthopedics. What I like about our talk in general is it struck a balance between using AI but also holding them accountable. So I would say as far as an ideal course, I agree with a lot of what Prem said on the last podcast, that there are some really useful applications for AI right now. Many of them would be to unburden the administrative side of providing care.

I also think AI applications to improve the patient experience as opposed to directing their care would also be valuable. We don't spend a lot of time in that, but I think that field is well suited to address that. I think it's also worth trying to understand aspects of machine learning and AI that we maybe not have previously considered, like I was trying to explain before in that example where we were looking at machine learning models for overnight stay, but really, what we were trying to do is see what a globally-performing model that does very well or performs very well, excuse me, would perform equitably across race and ethnicity.

I think those are important points before plowing forward. I also think we would benefit greatly from working with data scientists and other experts in this realm to assist in developing some of these models and some of these technologies. There's a whole field out there that exists, and I think to try to do it in isolation is a mistake. As much as I sound like I'm putting down all that's come out, I'm actually excited about it and see the potential. But I think to realize the potential and to move that potential forward, we're going to have to engage with people that have been educated and trained to do this for a living.

Dr. Chris Tucker:

Yeah, I think all great points. As always, Mark, you've got a knack for explaining these fairly challenging topics in a nice, relatable way. I wanted to thank you for sharing your time and your thoughts with us today. Did you have any other closing remarks before we wrap up?

Mark Cote:

I didn't think I did, but I wanted more guidance. So I trained a model to figure out whether I had more closing remarks, but I knew enough to know that this model I had was just a tool that would assist me. This model is saying it agrees with my own perspective right now, which is I think we've covered as much as we can cover in the podcast. So I have no other closing remarks, but thank you for having me. I'm always happy to talk about this stuff. By no means am I an expert. These are just perspectives from someone looking at the research. But I think it's a great topic. I think it's important that we continue to talk about it 'cause if we don't continue to talk about it, I think we're all going to get confused. Machine learning and AI-driven technologies are coming, and I think that's great. But I think we have the responsibility to think about how we're going to implement these things and the benefits and the risks associated with them.

Dr. Chris Tucker:

Well, maybe one day your algorithm will just talk to my algorithm and we can both sit back and listen to a conversation that happens without either of us even here.

Mark Cote:

I agree.

Dr. Chris Tucker:

Well, thanks again, Mark.

Mark Cote's editorial commentary titled “Machine Learning and Orthopedics: Venturing Into The Value of Despair” is available in the September 2022 issue of The Arthroscopy Journal, which is available online at www.arthroscopyjournal.org.

This concludes this edition of the Arthroscopy Journal Podcast. The views expressed in this podcast do not necessarily represent the views of the Arthroscopy Association or the Arthroscopy Journal.

Thank you for listening. Please join us again next time.

Medical Disclaimer:

The information and opinions discussed herein, including but not limited to text, graphics, images, and other material contained in this podcast and its referenced paper are for informational and educational purposes only. No material in this podcast or its referenced paper is intended to be a substitute for professional medical advice, diagnosis or treatment. Specifically, all content and information in this podcast and its referenced paper does not constitute medical advice. Always seek the advice of your physician and/or other qualified health care provider with any questions you may have regarding a medical condition or treatment and before undertaking a new health care regimen, and never disregard professional medical advice or delay in seeking it because of something you were exposed to from this podcast or its referenced paper. The information discussed in this podcast and its referenced paper may not apply to every individual and may cause harm.