New York Times v. Artificial Intelligence #203

In this episode of the Screen Lawyer Podcast, Pete dives into the intricate world of copyright lawsuits surrounding the training of large language models. These lawsuits challenge the boundaries of fair use defenses and introduce new legal arguments, promising to reshape the landscape of AI copyright law.

Original Theme Song composed by Brent Johnson of Coolfire Studios.
Podcast sponsored by Capes Sokol.

Learn more about THE SCREEN LAWYER™ TheScreenLawyer.com.

Follow THE SCREEN LAWYER™ on social media:

Facebook: https://www.facebook.com/TheScreenLawyer
YouTube: https://www.youtube.com/@TheScreenLawyer
Twitter: https://twitter.com/TheScreenLawyer
Instagram: https://instagram.com/TheScreenLawyer

The Screen Lawyer’s hair by Shelby Rippy, Idle Hands Grooming Company.

0:00

On this episode of the Screen Lawyer Podcast, we're going to spend some time revisiting the lawsuits that have been filed challenging the large language model training system used by ChatGPT, Stability AI and others copyright infringement lawsuits filed last year. Well, one of them filed right at the very end of the year by The New York Times, raises some new issues and interesting legal arguments that may undermine the fair use defenses that have been largely asserted so far in those other cases. Hope you'll join us. We're going to dig into this interesting case. Hey there. Welcome to the Screen Lawyer podcast. I'm Pete Salsich, The Screen Lawyer. And on today's episode, I'm going to spend a little time revisiting some of the cases that were filed last year challenging the large language model training process used by Microsoft's Open A.I. ChatGPT, as well as Stability, A.I. and others. And most of these cases were essentially based on copyright infringement claims. And a lot of the discussion so far, including here on the podcast, has been on the probability that this is going to hinge on a fair use argument and that the AI providers are going to argue that their use of copyright protected materials to train is a transformative use, does not generate infringing works and is a fair use under the law and therefore not infringement. And those cases are beginning to work their way through the process very early on. But you're beginning to see some of those legal arguments get made. And, you know, I think I actually do think that the fair use argument has some compelling components to it based on existing precedent. But on today's episode, I want to focus in on a lawsuit that was filed right before the end of the year by The New York Times, again challenging the large language model used by Openai for ChatGPT, etc.. And again, you know, arguing that certainly copyright infringement is part of the problem, that the AIS are accessing New York Times databases, New York Times articles, etc. But they're making some interesting arguments now that are a little bit new and I think interesting and may undercut the fair use defense. But let's break it down a little bit. So in this situation, well, first of all, let's jump back a little bit to some of the previous cases that were filed. We talked earlier in the last year about the Sarah Silverman lawsuit. And it's really she and a number of other authors who were claiming that their works were being could be easily replicated in summaries of their works by asking ChatGPT simply to produce a summary. And there's no way that summary could have been produced without accessing the original book. That was one of the first arguments filed. Getty Images filed a lawsuit against StableVision against the StableVision process of using its images. That's proceeding. There have been others filed along these lines, and most of those focused on the use of essentially broad databases. It was the they were class action lawsuits, typically on behalf of many, many, many authors considered in the same place, but very hard to really prove, certainly that an individual set of content was accessed. Well, and those I think the fair use defense is a fairly I don't think it's an easy argument one way or another to predict. We learned that last year with the Supreme Court's ruling in the Andy Warhol case in which it came out differently than I thought and not surprising in a way. But it it just reminds us that the fair use is and is largely unknown sometimes how it's going to turn out. But in this case, The New York Times is raising some different arguments about its own database, its own data set, really a better way to describe it and it's the basis of the its data set as something distinct that has it invested in that makes this a little bit different argument. So first of all, what the the complaint alleges is built on the fact that ChatGPT can produce essentially replicas almost verbatim responses to what would be New York Times articles. And so and often they'll attribute those the results. So you go in to ChatGPT you ask for something or ask for you do a prompt based on looking a certain search term. And that prompt in Google would have pulled up in the New York Times database. New York Times has spent many, many millions of dollars on investing in its online product. We can argue another day about whether paywalls are good for journalism or bad for journalism, but they are a fact. And many large news agencies have invested great sums of money in their online presence, and some content is available for free. But a lot of it is behind the paywall. That is how they make their money. That is where they get subscribers and they specialize and that's their revenue source. So focusing on the value of the data set, The New York Times refers to that as the dataset in question. So it's data set here. This is not a class action case. This is just The New York Times bringing in case. Now, is it going to be followed by other journalists agencies? Absolutely, because the underlying issues will be similar. But the way they made the argument is that the because of two things, the investment in millions of dollars and the investment in years of its reputation as a trustworthy source, leaving aside political do you agree or disagree with one newspaper or the other The New York Times has built up? I don't think there's any doubt a significant reputation in the trustworthiness of its news and those two factors, the investment in time and money and the trustworthiness of the New York Times name are at the core of this case. And those are not copyright concepts, but they will play in perhaps to the fair use discussion. So, you know, the two things sort of happen here. One is you go to ChatGPT and you put a prompt in that's going to produce a result that mirrors almost verbatim The New York Times result. Well, in the first instance, if that's true, then you're no longer. The argument is that that simply by using The New York Times data set to produce that information, you are they are depriving the New York Times of revenue it would otherwise have received. And maybe you can see I'm hinting towards a commercial argument here, which is where this is going. But the other thing is they will often produce hallucinations. And it's interesting, that term is now worked its way into the AI discussion, and that's the result that you put in a prompt and it produces something that looks and pretends to be absolutely a series of statements of fact. This is I can trust this. And yet it's completely made up. In the very early days last summer, we heard about lawyers filing, you know, response briefs, citing cases that they pulled out of ChatGPT, And it turns out those cases were not real. So this concept of hallucinations being produced by AI has been present from the start. And you'll hear AI defenders say, well, that's going to get better. And it probably will if you spend a little time and you continue your prompts, the results that you get, get better and better every time. That's certainly true. But while it's going on now, this training is not happening, you know, in the in the lab. It's happening in real time out in public and having a real impact on how people access information. And so when these hallucinations are presented, particularly in an environment where they purport to be, according to the New York Times .... a bunch of facts tested out, if those facts are not true, the argument is that is harming the New York Times reputation as a trustworthy news source because the people don't know. That's not The New York Times. They don't know that that isn't what the New York Times article said. And particularly because, well, now I don't have to pay for the New York Times article I can just take, what ChatGPT told me about the New York Times article, then I'm not only am I not certain that I'm getting the accuracy that I expect from the New York Times, but I also don't have to pay The New York Times what I should otherwise have to pay it. Those two facts impact The New York Times in a way that makes a different sort of harm argument than a typical copyright infringement. And then you get into the discussion of, well, how much damages are to be awarded? How has the copyright owners suffered economically? But you don't have to show economic harm to show infringement. Again, it's either an infringement or it's not. And if it's not infringement because of a fair use defense, then it doesn't matter whether there was economic harm or not. It's simply not relevant. If it's fair use, it's fair use. So you don't really have this economic theory behind typical copyright infringement cases for the most part. But here they're sort of leading with these two issues that I think are probably more than anything else, ways to make it a little harder to make the fair use argument. You know, we there's there's a phrase that lawyers use. It happens actually in many walks of life where, you know, the concept of it's it's better to ask forgiveness than for permission. And that's spoken, you know, sort of offhandedly about, I'll just go ahead and do it and I'll apologize to the boss later, rather, because if I ask permission now, they'll say no. And I really think it should happen. You know, that concept is something that we that appears in all all phases of life, but it's one that gets used somewhat regularly in the fair use analysis when, you know, an infringer decides, well, I know I'm going to use their stuff and I don't want to take the if I ask, they're probably going to say no or they're going to say it costs a bunch of money. And and frankly, I think it's fair use. And I, I can make that argument later and sort of do it in forgiveness. And if they catch me or sue me, I'll get some sort of a license agreement. In other words, I'll deal with it later. Well, if The New York Times arguments are successful, I think maybe that becomes not so good advice because the economics may really move us towards licensing the data set upfront as far better than asking for forgiveness later. And here's why. So The New York Times argument digs into the commercial aspects of the harm, but it's more than just the harm they really try to frame ChatGPT, AI in general as a competitor. And that's where I think this gets interesting, because if this is two competitors doing the same thing, The New York Times is a provider of news information, factual information, you know, research stories, features, all sorts of different things. But certainly one of the things it does is it will provide articles and things that it has written. It has published in response to requests for information of a certain type. Now, if you asked ChatGPT the same question that you asked The New York Times, for example, in their search function, and you get something that looks like the answer to your question, according The New York Times, this happened and these are the places and here's the best of this or whatever it might be. Arguably, those are two exactly the same things that what is what makes it so interesting in the fair use analysis, because the Supreme Court case last summer in the Andy Warhol and Prince photographs case actually surprise, surprise me, I think surprised a lot of people in finding fair use for the use of these photography, these photographs. And leading into that case, I had thought that the prior case law would make it pretty clear that there was enough transformation in the two different uses. In other words, the images produced by Andy Warhol that were, no question, based on an original photograph of Prince many years earlier. But those images were really very different and had transformed visually in the medium and in the way they were expressing Prince's place and time, etc., etc.. And I think under a fair amount of us thought that under the existing Supreme Court precedent and fair use precedent, that would be found to be fair use, but the Supreme Court instead focused in on the fact that when you boil it right down, these were two competing versions of the exact same thing the use of an image to sell magazines. The first photograph was a photograph of Prince commissioned to sell a particular issue of magazine back when it was originally taken. The use that was sued on of the Andy Warhol paintings were the use of the Andy Warhol images right after Prince died to sell magazines, a retrospective issue on Prince's life. But the court focused on the fact that these were the same thing, and they were both commercial uses, not artistic uses as such. Certainly they had art in them, but the basic use was a commercial use. Commercial versus commercial. And in that context, there's no fair use. And when you boil it down like that, it's fairly simple to understand. And whenever the Supreme Court speaks on an issue, it tends to give lawyers new phraseology, new ways to say, we're going to we need to have this sentence in our argument. And so if you can say commercial use versus commercial use, and if you can really boil it down to the same use in both cases, it becomes very hard to argue that the later use is fair use because you're not transforming it. It's not the training process itself. It's that it's really that the training process is used to generate a competing product. It's not so much product for product. You know, that's the other issue that's underlying all of this, that, you know, you can't just pick a horse, I don't think, in this argument yet because that question hasn't been really addressed. We're not necessarily dealing with this article by the Supreme, by The New York Times and this ChatGPT output and comparing the two of them, because that's not typically how it happens. But there doesn't appear to be any doubt that the New York Times dataset has been accessed when the ChatGPT results reference, according to The New York Times, things like that. There's no question. Right? So and there's a lot of New York Times information that's publicly available, but you still can't use it to create a competing product. There are fair use aggregate news sites that have been in existence for a long time. When you would just go to you might have a news aggregator source that you go to that collects all sorts of different links to articles from different news sources. But in that situation, while there is a reproduction being made of the original news source, it's typically just a thumbnail and it includes a link to the original. And so you don't get the whole original, you get sent to the original. And if that's behind a paywall, you run into that paywall that's been in existence for a long time. This is different. This is an entirely new free service that literally purports to replace the New York Times and others perhaps, who have invested similarly to build these databases. Data sets that are based on their reputation for trustworthiness in the news, something that's extraordinarily important these days, I think, and simply replacing it. And so now the argument that they're making really is that these are two competing versions of the same thing. And frankly, I think that's a fairly compelling way to use the Supreme Court's recent fair use case as really a sword, and and again, it's what lawyers do. You have to figure out how to craft your version of the case, your description of the facts in such a way that the court recognizes something they've already ruled upon and says, yeah, this is an easy case. So I think that's what's happening here. And I think it's really going to be interesting to see how that plays out. The other one that's kind of thrown in towards the end of the claim is the reputational harm that it's actually harming their reputation. Now reputational damages are not, you know, a measure of copyright damages, nor are is reputational harm. One of the four factors that are considered in fair use. But certainly by referencing The New York Times and then purporting to put something out there that turns out to be false or made up, that's more of a trademark or Lanham Act damaging, you know, purporting to say something that we said and it turns out to be false. So it reaches beyond just a sort of pure copyright fair use case and brings in these other areas of intellectual property. And the trademark world has its version of fair use defenses. But they typically require you to be truthful about the other trademark that you're referencing and to make it very clear that you're not creating any association. But if ChatGPT produces something that says, according to New York Times, the following one you think it could speak on behalf of, somehow The New York Times must have said that's okay. Perhaps more importantly, they’re essentially saying you can trust what I'm saying because The New York Times said it. That's a way of saying you can trust my product because I'm associating myself with a trustworthy brand. Well, that's classic Lanham Act violation. And so I think this case presents some new arguments. Again, very, very early days. There'll be arguments filed both ways on it, and we'll have to follow it. But it's an interesting update in this category of law that we're going to be watching closely. So this is going to be interesting to follow and we will continue to do so as things generate. I want you to stick around with The Screen Lawyer Podcast coming up more this year on our Season 2. We've got lots of guests coming up. We will produce new episodes every other Wednesday. And if you're watching this on our YouTube channel, be sure to hit that like and subscribe button down below so you'll get all of our content regularly. And if you're checking us out on an audio podcast, you can find us wherever you get your audio podcast and hope you follow us there. And if you need any other information, check us out at TheScreenLawyer.com Take care.

The Screen Lawyer Podcast

New York Times v. Artificial Intelligence #203

Listen to this podcast on