The Screen Lawyer Podcast

Sarah Silverman v. AI Lawsuit Pt. 1 #108

July 19, 2023 Pete Salsich III Season 1 Episode 8
The Screen Lawyer Podcast
Sarah Silverman v. AI Lawsuit Pt. 1 #108
Show Notes Transcript

In this week’s episode of The Screen Lawyer Podcast, host Pete Salsich III dives deep into the latest legal controversy surrounding OpenAI and its affiliates.

Comedian and author Sarah Silverman and other authors recently filed class-action lawsuits against OpenAI, alleging copyright infringement of their books. The lawsuits are centered around the highly detailed summaries of their works produced by Chat GPT, which the plaintiffs claim demonstrate OpenAI's unauthorized "ingestion" or copying of their books. 

Get ready for an engaging conversation examining these lawsuits and navigating the intricate web of legal matters involved. 

Original Theme Song composed by Brent Johnson of Coolfire Studios.
Podcast sponsored by Capes Sokol.

Learn more about THE SCREEN LAWYER™ TheScreenLawyer.com.

Follow THE SCREEN LAWYER™ on social media:

Facebook: https://www.facebook.com/TheScreenLawyer
YouTube: https://www.youtube.com/@TheScreenLawyer
Twitter: https://twitter.com/TheScreenLawyer
Instagram: https://instagram.com/TheScreenLawyer

The Screen Lawyer’s hair by Shelby Rippy, Idle Hands Grooming Company.

Hey there. Welcome to The Screen Lawyer podcast. On this week's episode, we're gonna spend some time with some recent lawsuits filed by famous authors against OpenAI based on the allegation that ChatGPT’s large language model learning system infringes great portions of their copyrighted works. There's going to be fair use defenses in this case as well, and we'll dig into those, too. So stick around. Hi there. Welcome to The Screen Lawyer podcast. I'm Pete Salsich, The Screen Lawyer. You know, today I want to spend a little time digging into some AI cases that were recently filed. And, you know, we've spent a lot of time talking about A.I.. In fact, everybody is right. It's it's pretty much everywhere you go if you're a lawyer like me, you're constantly seeing legal issues pop up, intellectual property issues. We've talked a little bit about some of those that come up in the copyright ownership situation. For example, if there's too much generative AI produced content in a particular work, there may not be enough human authorship in order to get copyright registration for that work that's already kind of getting tested and screened. The Writers Guild and others are trying to come up with some formulas to try to figure out how you would measure the appropriate amount of human authorship, along with use of the generative A.I., to make sure that there's still copyright protection that's going to get worked out. But I want to spend a little time today with another part of that, and that's the learning side. Writing not so much the output side, although there is an output component that we're going to talk about today and it's going to be framed by two lawsuits that were filed within the last ten days or so. Interestingly enough, both by the same set of lawyers in the same jurisdiction, Northern District of California, federal U.S. District Court, and both against the same defendants, OpenAI and a variety of OpenAI subsidiaries or affiliate companies. Essentially all of the owners, operators, users of the ChatGPT, generative AI product. And so the allegations are these. So the first one was brought June 28th, and it was by Paul Tremblay and Mona Awad, two published authors, as a class action alleging that their works were infringed and they have the same type of infringement is going on for many, many authors. So they're trying to seek class certification here. And a class action attorney can dig into more on whether or not they have the right numerosity and similarity and things like that. But I suspect the the allegations do support a class action certification. But interestingly enough, just about ten days later, last Friday, on July 7th, same lawyers filed a lawsuit with Sarah Silverman, famous comedian and author Christopher Golden and Richard Kadri, two other published authors, the three of them alleging essentially the identical allegations, also seeking class certification. Not sure what that was about. Maybe we'll find out. But it's essentially one set of allegations that are the same, and that's we want to spend some time with. So the allegation is really this Each of these authors has copyrighted books that they make available for purchase. They're on Amazon, they're elsewhere. And the allegation is that the large language model training system that OpenAI uses has essentially access to these works. And I don't think there's any real dispute about that. But the what their point is that what it's doing is actually training the AI to make essentially derivative copies, which would be infringement. So the thinking about this initially has been or my thinking, I guess I should say about this so far has been that the act of sort of scraping the Internet for copyrighted works and we talked about it in terms of graphics originally when the AI was going out and looking at all sorts of different types of paintings, images, many, of course, in the public domain because they're long past any copyright protection, but also other more common works that are newer and still protected, but generally sort of looking at all these different types to get what brush stroke styling and color composition things so that when you give a prompt to the AI, it can produce something that's in the same style as a famous artist, etc. I think it's a little bit different in the text world and that's what we have going on here. So they're going out and accessing huge libraries of books, some of which are publicly available. Certainly many that contain public domain works. That's certainly true. But the lawsuit alleges that the number of books in the datasets that OpenAI is saying it's using are way too large to be legally available works, and that, in fact, they're most likely access in certain sites, torrent sites and others are shadow libraries, which contain lots of infringing materials to access these books. Hundreds of thousands of works out there protected by copyright and in to test it. What they did is they had asked ChatGPT to create a summary of these particular published works for these authors, and the summaries are remarkably similar to the originals. They're not perfect, of course, but they're very, very similar and detailed summaries and they've attached some of those as exhibits in the lawsuit. And when you look at it, it's interesting because there's a world in which probably are familiar with Cliff’s Notes, right where you could go get a summary of a novel for a class. But those summaries are very, very simplified recitations of the main ideas or the main characters. They didn't produce much of the actual content at all, and they sort of hover a line and generally are considered fair use. In that sense, they're really meant to be a companion to read the original. But these summaries, while not necessarily marketed by OpenAI as come get ChatGPT for summaries. The fact is when you ask ChatGPT to issue a summary, it does so it can do so and that output itself is likely a derivative work. And of course, as we know, if you're not the copyright owner or don't have a license or permission from the copyright owner, you can't create a derivative work that's infringement. And I think the reality is, yeah, I think it is infringement as far as it goes. We'll talk about the fair use defense in a minute. But what's interesting to me in these cases is it's kind of an example of the legal system, the lawyers involved, the creators working with their lawyers to try to figure out how to test these issues in the courts. You know, as we've talked about and you've seen probably a lot of commentary about typically the courts follow behind the technology, and that's certainly true. And it was true, you know, long before you had had Napster litigation that finally figured out some of the issues with music file sharing. But it took that litigation to finally figure things out. Little by little, however, other people and other businesses were anticipating where the litigation might come and positioning themselves for a different approach. iTunes, in a lot of ways, was born out of Napster in a way, because Napster didn't teach us that people, you know, love to steal. It just taught us that people love to acquire music and to share music song by song, digitally, Apple said, What if we charged $0.99 to everybody? Guess what? People said, Sure, I didn't really want to steal. I just want to get my music this way and that iTunes was born. So there are plenty of examples where a business model and contract structure and other other vehicles eventually solve the problems created by the new technology and eventually the courts catch up. And then people sort of order their their behavior for a period of time. Well, now we're in one of those new areas with A.I.. A.I. is so rapidly changing everything, you know. Recent guest Mitch Jackson on this podcast led us through so many amazing uses of these A.I. tools to help lawyers and others do their jobs better. And so it's it's not going away. And they're just powerful tools. But in this sort of generative process in generating what these AI’s are going to be able to do, we have to ask these questions about copyright ownership, in particular in this case, because I think that needs to get sorted. And we're going to have I believe these cases are going to end up testing the fair use defense on this very issue. And here's how that analysis would go. We start from the assumption that this is, in fact copyright infringement, meaning there is a derivative work produced as output. That derivative work is a close enough copy of the original to in fact qualify as a derivative by definition, and there was no permission. That's copyright infringement. Open and shut. Those are the elements necessary for a copyright infringement claim. Now the defendant is likely to assert a fair use defense, arguing that while technically, yes, this is infringement, it's still a permitted use under the Fair Use doctrine, which allows under it's really the underpinning of that is the First Amendment. We want to encourage expression in this country. We want to encourage political speech, satire, commentary. These are all things that the fair use doctrine protects. And there's a set of four factors that you go through when you analyze a fair use defense. And and no one factor is dispositive. If you get this one factor, you win. There's one that tends to drive a great deal of the analysis, and we'll spend some time with that. But you'd still have to go through these factors together. And we have some recent guidance on that because very recently this term, the US Supreme Court decided the Andy Warhol Prince paintings, photography, fair use case in a way that frankly surprised a lot of us and found that it was not fair to use for Andy Warhol to take original photographs of prints and to paint them differently. And it gave some language. It says it's narrowly limited to the facts of the case, but it does. It's going to be the case that we use as our framework for making the next set of arguments. That's how we do it. And so using that kind of as an underpinning, let's go through what these factors would be. So the first one is what is the nature of the original work? Is it an expressive creative work or just a list of facts or something like that here all of the alleged the plaintiffs alleged works that are part of the basis of this claim are published books absolutely going to get a high degree of copyright protection. They they are novels, fiction, biographies, memoirs, types, things. Nonfiction works as well. But all absolutely highly protected by copyright. So that puts them in the favor of you're going to have to show more in order to get fair use. The second factor is what is the nature and purpose of the new use? So in this case, this is where it's going to be interesting. So what is the new use? I think what they're trying to say is that the new use in these particular cases are the summaries, these outputs that were generated by this. Now OpenAI is not in the business of selling summaries. It doesn't advertise come here to get summaries of all your favorite books so you don't have to buy the original. It doesn't sell that product. Nevertheless, the its product creates these summaries on prompt, and these summaries themselves are if they are derivatives, then they are infringing works. So that's one of the issues here. I mean, if the summary itself is a derivative work, now we have to see whether that derivative work is nevertheless transformative in some way is used for a different purpose. This is where I think it's going to get interesting, because typically up until the Andy Warhol case was decided, we were all operating under the the case arising out of 2 Live Crew’s “Pretty Woman” years ago. Campbell versus Acuff-Rose Music. And in that case the court talked about transformative uses. If the new work does something different, speaks differently, portrays a different. In that case, cultural references was used. The original was a sad song of lost love, and the new one was really speaking about objectification of women. All sorts of other things going on changed enough of the lyrics to make that new point and really kind of made it through on on parity grounds. And that was sort of our framework for thinking about what a transformative use. So with those is the background, if you look into the derivative works here, if you separate just and say, okay, the, the infringing works are these summaries, then what is the purpose of the summary? If it's detailed enough, arguably it's doing the same. The summary itself is serving the same purpose. It's telling you what the book was about. If you read the book, you hear from beginning, middle to end, you follow a character arc, storyline. Whatever the subject of the book is about, these summaries are detailed enough that they could arguably be a replacement for that, but they're doing the same thing. The summaries themselves aren't going in to some other purpose. They don't exist for some other purpose. They exist as a freestanding work. Under the Supreme Court's recent analysis in the Andy Warhol case, I think that's dead on. Not fair use, not transformative. In that case, they said essentially in both cases it was the use of an image to promote a magazine article, and the court said therefore, they're the same purpose, not transformative, even though they look differently and they created a different impression. All of those things. The reality was the use of using an image to sell a magazine article was the same. And so the Supreme Court said, that's not fair use. Well, if that's the analysis here, a summary, a derivative work is a recitation of what the story's about. The book itself is a recitation of what the story's about. I don't think that's transformative under the Supreme Court's most recent ruling. But let's assume that maybe there's an argument. It is, and we'll come back to this in a second. The third factor is did you only use as much of the original as was necessary to make the point? So in a parody, you have to use a fair amount of the original work in your parody so that it's very clear what you are parodying. And that's an important part of it. But you don't use all of it. You use as much as necessary to make the parody point. In other circumstances, you may use only a very little bit. Take, for example, a movie review. Movie review is absolutely fair use when they talk about the quality of the movie, the storyline, the acting, the directing, everything, whether it's good, bad or otherwise, and then show clips from the movie to make the point that the reviewer is saying in those circumstances, you're not showing the entire movie, but you're showing snippets of it here and there to show to make the point you want to make about the acting or the filming or whatever. Those are absolutely uses of copyrighted material without consent, but you're only using as much of the original movie as necessary to make your point in talking about the clips. So in that circumstance, in a movie review, that analysis weighs in favor of fair use. Well, here, if they've got a summary that essentially tells you how the book starts, what happened in the middle, who the characters are and what happens at the end. You're using the whole book and it's not a parody. They're not commenting on the original. They're not doing something for a different purpose. I think you're using way more than is necessary to make your point or maybe using exactly as much as necessary to simply create an infringing work. So I think that piece doesn't work in favor of fair use. And then the last piece is what is the impact of the new work on the financial market for the original? So is there a market for these original works? Yes, all of these are books. They're available for purchase. That's the point. And yet, conceivably, if I can just oh, I've always been meant to read that latest Sandman Slim novel by Richard Kadri, but I can just go to“Hey, ChatGPT Summarize the most recent Sandman Slim novel by Richard Kadri”, and it tells me everything that happens. Maybe. Yeah, I don't think so. I'm not going to read it. Or maybe that's enough. The point is, we don't know. But it could absolutely have an impact on the market for the original and be a replacement that weighs heavily against fair use. So as you can see, if we take the infringing work, meaning the summaries that were prompted as output, if we take those as the infringing work, I don't think it's fair use. I don't think you have a defense of fair use to say these summaries are somehow permissible because they used the entire work. They could have a negative impact on the market for the original. They're used for the same purpose and the original has high copyright protection. That's not fair use, but I don't think it's that simple. And here's where I think it's going to get really interesting. And frankly, maybe some of, you know, better commentators and I'm paying attention to much smarter people than me who are digging into these issues because what's really, I think, at issue is the process of the large language model, learning how it goes in scrapes or in the phrasing that they use in the complaint, ingesting copyrighted material is in a way, it's a pretty good word, right? That's what they're doing. They're consuming great amounts of copyright protected materials, along with great amounts of public domain materials. That's certainly true, but both are true. And then they're learning and teaching and getting better and better and better at writing and composing that process by itself arguably changes the fair use analysis because you don't really have output that is an infringing new work. You have certainly access and use of copyright protected materials without permission, but the simple training and learning isn't automatically producing infringing works and in order for there to be infringement, you've got to have a second work. That's the nature of it. You have an original work and then you have a second work and then the question is, does the second work infringe the first? If you just talking about the process of training, it's hard to point to any particular second work and to use the fair use analysis, which I think is what's very clever about these cases is the lawyers and their clients, the people working together on this essentially have created a bunch of infringing works themselves selves by using the technology. It's really interesting. I mean, you know, is it contributory infringement is an inducement? I don't know. I don't think so. But it triggers these other areas of the law because, you know, the plaintiff isn't supposed to be able to induce the defendant into doing something wrong and then sue them for doing something wrong. So by simply saying into the prompt, create a summary of my new book and if it does it, then aha, I've got you in infringement. Seems like some version of digital or virtual entrapment. But you know, these things are being sold as being sold to us or we are adopting it. Whether you considered sold or not. The whole point is it is marketed as our ability to put ideas into it and get back new content, newly created generated content generated by the A.I., not by the human. Yes, of course We give it prompts, we give it ideas, we give it suggestions, we tell it speak in the voice of such and such. Or we say, Hey, AI, you are a world renowned this. You have this level of knowledge. Do the following and it does that. So we are putting prompts in, but I'm pretty sure that the cases are going to hold. That prompts themselves are not copyright protected. So the human isn't really creating this and that's really already been tested. So the output is coming from the A.I. just because the author said, Can you do a summary of my book doesn't mean the author did anything wrong. Because the point is, anybody who knows how many people out there have said, Oh, I don't know if I want to read Sarah Silverman's book, but I can ask my ChatGPT to learn all about it, and they'll tell me. So, you know, the reality is the process that's being enabled by OpenAI is enabling copyright infringement. And in other areas of law under the DMCA and elsewhere, this argument that the purpose of this platform is to enable infringement is in itself an effective challenge that was largely all the way back into the Napster cases. And the concept that the DMCA grew out of was this notion of when is a platform neutral? In which case it gets a safe harbor protection under the DMCA. So YouTube, you can't sue YouTube for copyright infringement because it's just neutral. It allows people to put stuff up and it has a notice and takedown procedure in place so that if a copyright owner realizes, hey, my stuff's being copied up there on YouTube, you can issue a notice to YouTube and YouTube will take it down. That's what the DMCA does. That's how it evolved into that space. Well, I wonder if we're going to need a new version. Almost certainly we're going to need new legislation to help us sort of sort through some of these things. But it's kind of the same analysis, right? You've got this platform, this let's call it a platform, ChatGPT. And I know that's not the only one. In fact, Sarah Silverman's lawyers also sued Meta and it's AI under the same set of facts. So it's not just about ChatGPT, but that's the most obvious one to talk through. So the argument that they're making is, hey, at least in this sense, by scraping all these copyrighted works, you are literally creating something enabling infringement in a in a wide range. And so then the argument might be, yeah, but we're not, you know, we're just a tool how people use it is is up to them. And so if someone said, let's say I decided I had seen Sarah Silverman's book and I thought, Oh my God, that's a great idea for a book of my own. I'll just go out. Hey, ChatGPT, give me a summary of Sarah Silverman book with as much detail as you can and it produces this thing. And then I go in and say, Well, change the characters to this, change this to female to male, do a few things like that. And all of a sudden I think I've written my own original work and I go out and publish that. Well, certainly if what I just said is are the facts, I'm very likely personally to be guilty of copyright infringement because I created a new work with this tool, and that tool copied the original. But again, in that situation, I'm the guilty party, not the tool. In this case, they're suing the tool. And I think that's what we just don't know yet how this is going to get played out, how those arguments are going to work their way through the courts. You know, typically the arguments go, look, you know, you've got to sue the actor, not the tool. Right. And that's there's a certain logic to that. Right. Because there could be many examples of that. ChatGPT using what it learned from Sarah Silverman and many, many other people's books to come up with something new that isn't a summary, but maybe something in the process of seeing all these hundreds of thousands of works enabled it to put language together in a certain way that had a certain style or whatever. And absolutely, you know, that's using a tool, I think, to create something new. And the creator of the new work is the potential defendant or not in a copyright case. But this is a case against the tool, against the tool makers. And so I think it's going to be kind of fundamental into how the challenge goes, because essentially the it's the fact that the AI went out into this large library of copyrighted works, possibly accessing significant libraries that are not legal and exist for the purpose of providing access to these works and getting around copyright laws. And so that can be proven. Maybe that's part of the issue and that changes the behavior in some way. I'm not sure what the outcome looks like if the plaintiffs win the lawsuit. You know, when we're when we're lawyers, particularly in in larger, you know, issues or cases like this, one of the things that you have to think about when you're making an argument to a court, particularly an area where something is new, are not certain. It's not an obvious go this way or go that way. It's it doesn't exist yet. And so one side is arguing to the judge, Your Honor, this case is just like A, B and C, which we've decided many times means the sky is blue and the other side's. No, Your Honor, this is not like A, B, and C, this is D, E, and F, which has also been many decided many times. And in that case, the sky is red. And so we make these arguments, we try to argue it's a little bit more this way, a little bit more that way by referring back to things that already exist. But sometimes it's not that clear what thing in the past this is just like and I think this may be one of those circumstances. So in that circumstance, when you talk about Your Honor, you ought to rule this way. One of the things you start talking about are policy arguments. What should the courts decide? How should intellectual property law be used in light of this new technology and how it will impact the public at large? And so we have this vast population of authors and creators of copyrighted works who have enjoyed copyright protection throughout the life of our country because we constitutionally value creative works. That's a fundamental principle that I don't think is going anywhere. On the other side, we have this new technology that is enabling the world to do many great things right and all sorts of different ways and advancing productivity, etc., etc., etc. Leaving aside arguments about whether that's good or bad, those are different. But the reality is we have this technology that is being widely embraced and widely used, and I don't think there's any doubt that there is genuine good that comes out of it. That's another competing interest. So we're faced with these two competing interests and arguing to the Court Well, Your Honor, if you rule this way, then the following will happen and then follow is either good or bad. The other side said, You know, Your Honor, if you rule that way, this is going to be horrible. You've got to do this. So with that as a framework, what do we argue? You know, what? What do we say? For example, if you rule that this is copyright infringement, then what has to happen? Do the AI’s have to erase the things that they've learned? I don't think that's possible. I don't think that's the intent. And I don't think you can reach that far as the court because the AI’s, I think, could be doing everything that they're doing in the learning process and make a decent, fair use argument. It's just that the fair use argument I think, falls apart for them when you start talking about actual output. And so then the question, I think it could go on one of these weird ways that is more fundamental. You can't sue a tool. You can't sue the makers of a tool for what people do with it. I mean, this is, you know, gun laws and all sorts of other extraordinary, you know, highly dangerous activities. There's certain laws around what type of risk can be made. But the fact that something is dangerous in and of itself or can be misused, it doesn't make it illegal. And that's fairly well tested in our system. So it's going to be very interesting to to make these arguments because on the one hand, if you can sue the tool, then you have to figure out, well, what is the order look like? What is the remedy? Is it payment to the authors? Every time a summary is spit out, how are we going to know it's impossible to police we don't OpenAi doesn't know. Maybe it does. Maybe Big Brother knows what I created out of it. But the point is, we're not tracking all those things down in the minute and find individual infringements because now you're talking about a bunch of people that may be infringers depending on how they're using it. So I think you have to get until there is a work put out into the world, they're going to argue, well, in this case, we didn't put that work in the world. We're not the right defendants. You have to wait, Sarah Silverman, until some other person Pete Salsich, puts a work out that used our tool and infringed your work. And then it won't matter whether it used our tool or not. It's just a straightforward infringement analysis. Like you always had. And AI is no more than a typewriter or whatever. I don't know whether that's going to be the outcome, but if that's the outcome, then that really tells all of these authors you really don't have any protection because your works are getting infringed. Constantly. But how can you find them and how can you attack each one? Are you you know, and again, you're only going to know it when someone put something out in the world to publish it. But the thing's already been created by the tool and the individuals are already getting the benefit of this creation, which is an infringement. And yet there's no way for the author to find out about it. And that's the conundrum we're in. This is fascinating. These cases were just filed. We haven't seen any answers yet. We haven't seen affirmative defenses. I suspect they'll be motion practice. It might go on a very long time. Just on class certification. But it does tee up the issues, I think, in a very, very interesting way and not one that I had thought about before. And this idea that the summaries itself can be infringing works. So I think it's it's a really fascinating situation. You know, and these are far from the only lawsuits that I think are going to come. Lawsuits have been filed against OpenAI for other sets of issues related to the work it does. So we could talk about those in another time. But I just thought this copyright fair use question here really poses an interesting conundrum here. And I think that this is one of those situations where we're going to have to unpack sort of which set of Supreme Court precedents we use under the fair use analysis. You know, is the Andy Warhol case limited to its facts as it purports to say, or does it give us a new way to talk about the fair use factors? I think both of those are probably somewhat true, but this will be a test. The biggest question in my mind is what is the infringing work and who wrote that work? Who is the infringer? In this case I don't think that's an easy question and I think it's really a challenge to the process, and that's going to take some time to figure out. I don't this is also an interesting you know, a lot of times in cases, you know, they get filed early on, but we we kind of have a sense that before we ever get a ruling, for sure, a definitive ruling on the issue, we're going to end up with a settlement. The parties are going to work something out and settle the case. And so you won't end up getting a judgment, you won't end up getting an appellate court opinion that's published and then maybe a challenge to the Supreme Court. We won't know. But here I'm not sure that there can be a settlement. What would the settlement be short of OpenAI acknowledging that it's not going to do this anymore and destroying what it's done and providing. I just don't think there's a way to settle this case. There's not, you know, a dollar demand as such that they can make somebody whole here and there, the whole point of the class action is to say there's many, many, many people like this are having this issue. So some resolution is going to have to be made. Usually those become, you know, fairly large class action settlements, certain behavior changed and then certain dollars paid out to the class, etc.. And everybody who's a member gets a little something. You've probably gotten something in the mail somewhere along the line where you were part of a class action because you bought a particular cell phone at a certain time and there was something else going on so we could end up like that. But I think long before we start talking about settlements, there's the behavior issue. There's the fundamental question, can you sue the tool for infringement when other people are occasionally making infringing works by using the tool? This is a question that's been asked in some circumstances before. So we have some guidance, but I really think the facts here are unique and are going to be worth watching. So we're going to keep at this. Thanks for hanging around today. I hope you found this interesting. There is so much more in this space. I welcome any comments that you have. If you're listening to this, wherever you get your podcasts, please follow and and join us every week. You can check us out on TheScreenLawyer.com, we publish episodes every other Wednesday, so they're always available to you. And if you're watching this on YouTube, thank you hit that like and subscribe button down below. So you'll always get notifications. Whenever we have something new coming out, So I hope you’re well, be careful what you put into that AI more importantly, be careful what you do with it as an output. Make sure you have enough of your own human authorship so you aren't infringer no matter where it came from. Take care. Talk soon.