New AI Lawsuits Relating to the Use of Allegedly Stolen Data

Litigation risks associated with consumer privacy are well-known. Until this year, almost all of the consumer privacy litigation was aimed at companies releasing individuals’ personal information to others. But with the advent of AI, we are seeing new permutation of privacy claims — liability for companies that receive data scraped from the internet including de-anonymizing data of website visitors. In this episode, Dorsey Associate Melonie Jordan and Dorsey Partner and Podcast Host Kent Schmidt discuss what some recent court filings, providing a preview of the road ahead for AI-related litigation relating to consumer privacy including two new California lawsuits.

This podcast is not legal advice and does not establish an attorney-client relationship or create any duty of Dorsey & Whitney LLP or those appearing in this podcast to anyone. Although we try to assure that the content of this podcast is accurate, comprehensive, and reflects current legal developments, we do not warrant or guarantee those things. The opinions expressed in this podcast are the opinions of those appearing in the podcast only and not those of Dorsey & Whitney. This podcast is considered attorney advertising under the applicable rules of certain states.

Voiceover
Welcome to another episode of the SharkCast on Litigation Risks Management, where we explore why businesses are so frequently sued and how to mitigate and navigate the dangers lurking in these risky waters. Join us now as we welcome our host, Kent Schmidt, Litigation Partner at the law firm of Dorsey & Whitney.

Schmidt
Welcome SharkCast listeners. I am thrilled that you’ve joined us today for an episode to talk about and issue that is cutting edge, right off the press, lawyers, print machines and newly filed lawsuits that are discussing some very current topics I think will be of interest to each of you. Most people know consume privacy is one of the fastest growing areas of evolving litigation risk for companies. What makes this area of law so challenging is that the technologies are constantly changing. The regulations and legal theories continue to merge and it makes it very difficult to understand what companies can and cannot do with the data they receive from their consumers. Of course, over the last couple months in particular there has been great emphasis in the news on artificial intelligence. Today we’re going to discuss two newly filed lawsuits, both in California that illustrates some of the emerging legal theories relative to consumer privacy and artificial intelligence. To tackle these challenging topics, I am very pleased to welcome to the podcast studio, Melonie Jordan.

Melonie is an attorney in our Labor and Employment group here with me in Southern California. She focuses on all aspects of employment law, litigation, and compliance. Particularly as an emphasis in the area of privacy and all of the privacy regulations that apply to employers. As well as more generalized privacy practice. Welcome to SharkCast. I’m very pleased that you’ve joined us today for an in person interview to talk about consumer privacy. How are you doing today?

Jordan
I’m doing great. Thank you so much for having me. I’m very excited about today’s topic.

Schmidt
Well you’ve had to do a little bit of homework to prepare for today’s topic. We’re going to get into this because these lawsuits that we’re going to talk about are hot off the presses, as I’ve indicated. Both lawsuits involve a concept of data scraping with of course we’re familiar with in the concept of AI. Can you sort of explain to us before we get into the details of these lawsuits. What is data scraping and what does it have to do consumer privacy?

Jordan
Sure. So, I think it’s important for us to step back and frame the conversation as it relates to web scraping. Because these lawsuits relates on the use of AI and its employment to web scraping. AI is essentially the means by which you’re teaching and using technology to perform cognitive tasks that essentially have been relegated to humans, but now we’re employing it for a different kind of machines. So there are different kinds of AI. AI has been around for several decades, but more recently we’ve seen this shift into generative AI and natural language processing. Which is at issue with a lot of these lawsuits, particularly these two, or mainly one that was filed here. The other one kind of mentions it in passing. When we’re talking about data scraping. AI is allegedly being used to scour the entire internet and pretty much scrape information from each webpage that is on there, and essentially taking that information, condensing it down, and then using it to train the AI to generate responses, generate content, generate images, generate audio, and I guess there’s been a lot of lawsuits surrounding that issue.

Schmidt
So let’s talk about the first big lawsuit in the Northern District of California. On June 28, the plaintiff is PM v. OpenAI LP, and there are a number of other affiliated entities, including Microsoft Corporation, that are named in this class action complaint. And by the way it’s filed by the Clarkson law firm. I’ve litigated against them before. They’ve very good lawyers. This is quite a complaint. Can you unpack it for us?

Jordan
Sure. So this complaint spans 151 pages, and it read like a cross between an Orwellian novel and a law review article. But it could be condensed down I think most succinctly to two paragraphs within the complaint, paragraphs 146 and 147. It says despite established protocols for the purchase and use of personal information, Defendant’s, meaning Open IA and its entities and Microsoft Corporation, took a different approach, theft. They systematically scraped 300 billion words from the internet, books, articles, websites and posts, including personal information obtained without consent. OpenAI did so in secret, and without registering as a data broker as it was required to do under applicable law. Scraping involves the use of bots, or robot applications, deployed for automated tasks, which scan and copy the information on web pages, then store and index the information.

So this lawsuit then focuses on two different kinds of theft as it relates to the World Wide Web, if you will. First would be web scraping, and they’re using that term to refer to non-users is what they call them. Individuals who’ve never used OpenAI products, but they are, have used the Internet in some shape or form essentially. So they’re saying you scraped the web of their information. And then you have the second kind of information that was allegedly stolen, and that is what they’re calling user data, which is individuals who’ve actually used the OpenAI products and now they’re taking information that they’re inputting into the products and using it to train the models and generate content as well.

Schmidt
So let’s break down the web scraping component of this.

Jordan
Sure.

Schmidt
So if I am in the AI business, I’m one of these defendants, and I’m going out and scraping, to use their word, of the entire universe of data that’s out there. Much of that data is presumptively legally obtained. It’s out there because people put it out there, they don’t care that it’s out there. How can there be liability at least with respect to that?

Jordan
I think that’s, so it raises a ton of different issues. Right? One would be, well, when you as a consumer engage with one website, are you of the understanding that that engagement could then be used for potentially another business relationship such that another company’s AI or Microsoft as this complaint alleges, could use the information. So that’s where this web scraping, I guess theory is arising, which is giving rise to 15 different causes of action alleged in this complaint. Another cause of action that we’re seeing is whether there’s an invasion of privacy that’s associated with that use. Did you have an expectation of your personal information being used in this way? Another cause of action or liability could be, well, was this information obtained lawfully in the first place? A lot of these statutes that they’re alleging within this complaint focus on unauthorized access of personal information.

Schmidt
So what is the significance of the fact that much of this data is anonymous? It’s data that doesn’t really relate to an individual person, it has no value on individualized basis, but the aggregate of the data, the statistics as to who is doing what and what the trends are is the real value. Does this complaint draw in the distinction between anonymized data or aggregated data that really doesn’t carry with it a reasonable expectation of privacy?

Jordan
No, and I think that’s an interesting point. It doesn’t seem that the complaint really focuses on whether the data was anonymized or I guess accessed in a way that it couldn’t be traced to an identifiable individual. I think the complaint really focuses on the fact that the information was accessed at all. That’s in contrast to the next lawsuit that we’ll talk about, which does focus on whether the information was de-anonymized or not. In this circumstance it’s really about the fact that this information was used in the first place, both by people who never use OpenAI products, and by people who have used OpenAI as they allege.

Schmidt
Well, let me turn the discussion to what the plaintiff here are seeking to accomplish. What is their objective? And before we talk about the relief that’s sought, one observation I have is as you read this entire complaint, there is a good case to be made that there should be some sort of government regulation of AI. The question is are you gonna issue, are you gonna obtain that regulation through an injunction from a district court as opposed to a process by which state and federal lawmakers can start getting a handle on this AI and regulate it in a more thoughtful manner. What are your thoughts on seeking to accomplish AI regulation via an injunction?

Jordan
You know, I don’t know if that’s necessarily effective in the long run. We’ll have to see. Certainly the complaint alleges two different actions taken by the FTC and spend a considerable amount of time focusing on the FTCs and force of action as it relates to other companies. But when seeking an injunction from I guess a district court in a northern, the Northern District of California. I’m not quite sure if it will be as long lasting as I guess the complaint alleges that it seeks, right.

Schmidt
I could just imagine a district court judge saying I understand there’s a lot that needs to be done here, but how can I accomplish this with an injunction? Maybe that should go to Sacramento or Washington D.C., but not my desk ‘cause it’s overwhelming to try to figure and balance all of these interest, and essentially start governing AI in the collection of this data.

Jordan
Sure, and I, I honestly, I think that that might be the response of the court, both in terms of them, the means or the remedies that are being sought, but also the mechanism in which the plaintiffs are using. Is a class action of this size that helpful? Essentially this complaint admits that this lawsuit, or this punitive class action would include virtually everyone who’s used the internet in some shape or form. The complaint goes to great lengths to make a distinction between those who use the products and those who didn’t. And it talks about how OpenAI before was just using information for academic purposes largely, and so it looked at the entire internet essentially before 2023. That’s decades and decades of information, and that’s billions of people. And so how is a court going to manage a class focusing on this amount of people spanning several different states? They’re pulling states, individuals from California, Illinois, Florida, they even alleged a cause of action under New York law as well. This is unwieldly as a mechanism, and I think that it will probably be viewed as unwieldly in terms of the remedy sought.

Schmidt
There’s a component of this complaint, recurring theme as I read it, where there’s a contention that, as you mentioned, it started out as a non-profit, and now it’s making billions of dollars because AI is such a lucrative emerging area. But what relevance does that really have to these causes of action that the company started off as non-profit and now is for profit?

Jordan
Honestly that’s a great question. I think the, it really just, I guess gets more a, maybe a public policy argument, more of a publicity issue. There is no relevance in terms of the actually 15 causes of actions themselves. Interestingly they make references throughout this complaint to the California Consumer Privacy Act, but they don’t actually bring a cause of action under it.

Schmidt
And why is that? Why would you not, comprehensive data privacy, why would the plaintiffs not bring a CCPA claim here?

Jordan
Well, one, I think as it relates to your earlier question, the CCPA doesn’t apply to non-profits. Right? And so, but they could bring it now because this company is now a for profit enterprise. That said, I believe that they didn’t allege CCPA because the CCPA has a cure provision within it. Right? So these plaintiffs would have the duty to send a notice to the defendants that there’s been a violation of the CCPA, and then the defendants have 30 days to secure the alleged violation. Should they cure the alleged violation then that eliminates the statutory damages. And so a great swath of the recovery then is it’s cut from under them. And it’s interesting too that they, again, they mention the California Consumer Privacy Act right out of the gate in at least one paragraph, and then there’s a couple footnotes that cite to the data broker registration provision that incorporates and references the CCPA as well. So it’s, they’re kind of dancing around it, and I even know it in paragraph 141 that they’re, when describing the personal information that the allege was taken, they’re using I guess what a lot of privacy practitioners would understand as a sensitive personal information definition. And so they’re kind of dancing around the periphery of it, but they’re not actually evoking a cause of action under it. So it’s a pretty interesting approach that this complaint has taken.

Schmidt
You mentioned the data broker requirements in the CCPA. In general, without getting into too far into the weeds, what are those requirements that accompany that’s engaged in some activity with data should check into to ensure whether or not they need to be registered as a data broker in California?

Jordan
Yeah, sure. So I wanna say as back in 2020 actually under the CCPA, or under the data broker registration law I should say, all data brokers must register with the California attorney general’s office. And a data broker is defined as a business that knowingly collects and sells to third parties the personal information of a consumer with whom the business does not have a direct relationship, and this law refers back to the CCPA for the definitions of business, sale, and third party. But interestingly it doesn’t define what a direct relationship is. At least in my understanding at this point. So again, dancing around that CCPA provision, but not jumping into the waters.

Schmidt

A lot of interesting coins and interesting concepts and challenges for the plaintiffs here. But if you boil it down to its essence, what’s the take away, word to the wise that a company should consider in reading this lawsuit?

Jordan
Well think about the risk with AI. It’s very exciting area, it’s, promises a lot of benefits for companies, for individuals, and then the employment aspect, certainly for the employment relationship. But this lawsuit says to me that if you are incorporating AI into your business, whether it’s through managing employment relationship or just providing goods to consumers, perhaps think about whether you would fall under the definition of data broker, or the other pitfalls that are involved, such as invasion of privacy. Can I even think about it from, again, the employment aspect. Say you type in someone’s name, tell me about Kent Schmidt, and ChatGPT4, if you paid for the subscription, gives you, or gives me information about you. And then I use that as a basis to determine whether I want to interview your or extend a job offer to you. Does that search constitute a background search such that certain ban the box ordinances would be triggered, like in L.A. or in San Francisco.

So thinking about whether those constitute background searches, whether the information is accurate or correct, it brings up a plethora of issues. So just be cautious about it, and words to the wise. Open AI in Microsoft, their facing a class action that’s alleging $3 billion are at issue here. And so whether that’s obtained or not time will tell. Right? But again, think about the CCPA and think about these other fate laws that are out there that may be applicable in this circumstance.

Schmidt
That’s a good word of advice, Melanie. Let’s now turn to the second of the two lawsuits. Hernandez v. MRI Software, LLC. This is a lawsuit filed by my long-time litigation advisory, Scott Fare. We’ve probably had 20 lawsuits against one another, and Scott mentioned this lawsuit to me when we were talking on the phone on one of our cases a couple of weeks ago. He’s jazzed about these lawsuits. If anyone knows Pacific trial attorneys, and Scott Fare, he tends to file the same type of lawsuit again and again and again. He’s a plaintiff’s consumer class action lawyer. So I’ve asked Melanie to take a look at this lawsuit. Thankfully it’s much shorter than the first lawsuit we talked about. And give us a summary of what’s alleged here and what the takeaways are for companies focused on litigation risk.

Jordan
Sure. So this lawsuit was very interesting, or is very interesting. Pretty much the plaintiffs allege that MRI software, they’ve installed software on its own website that allows MRI to de-anonymize and quote dox every visitor to its site. I think the two paragraphs aptly sum up the point of the lawsuit, which is they’re saying that lead forensics describes itselves as the world’s number one website visitor identification software, lead forensics being the software that’s being employed by MRI. So software reveals the identity of your anonymous website visitors, turning them into actionable sales ready leads in real time. And then the plaintiff alleges that MRI installed spyware because the spyware reveals the identity of your anonymous website traffic and turned them into actual sale leads in real time and gives you access to that power by revealing the identity of your previously unknown website visitors.

So essentially this website, or this plaintiff’s complaint, is saying that you’re installing this software that can tell you who I am and you’re not properly informed me or given me notice that this was occurring, and that my information would be used in this way, such that now I’m personally identifiable.

Schmidt
So let me make sure I’m tracking this. If I go to the defendant’s website, as of right now if the defendant didn’t have the software all they would have is my IP address and they couldn’t really do anything with that. But what lead forensics does is to combine my IP address with an innumerable amount of data that they have, and to let that company know hey, it was Kent Schmidt that went to that website.

Jordan

Kent Schmidt, or that it was likely Kent or someone, you know…

Schmidt
Someone using Kent’s computer.

Jordan
Yeah, right. And so they’re then bringing two causes of action under California state law. And I think it is important to note that this lawsuit was filed in state court as opposed to federal. Obviously because of and whatnot for the Northern District of California case. But this one just alleges violations of California’s penal code, the California Unauthorize to Computer Data Act, and then two sections 630 through 638 as it relates to aiding and abetting allegedly the software company. So I think that they’re saying then that this information is able to be used from your IP and address, but also information that’s generally out on the web, to then pool together a profile to figure out who you are, and then now we’re able to effectively market to you. That’s what this complaint is alleging.

Schmidt
It’s interesting to me that the plaintiff analogizes this lawsuit to stolen property, a concept that we’re all familiar with. Do you think that’s a fair metaphor for what’s going on here?

Jordan
I don’t, I don’t think it’s fair. I think that receiving stolen property, it evokes like imagery of like a seedy pawn shop somewhere. And I don’t that that’s, I think it’s an oversimplification of what may actually occur, and probably a little inaccurate. I think what, the real question is, for at least the these lawsuits is, is the information that we put onto the web, is that now publically available information such that anyone can access it and use it to train their software?

Schmidt
Well, for one thing, the stolen property is far from clear because sure, this data is out there and no one knows necessarily how it got there.

Jordan
Right.

Schmidt
And it may not have been stolen.

Jordan
Right. So it, I guess, yeah, we don’t know what stolen means in this context. Right? And to be fair information is on the internet it doesn’t necessarily mean, you willingly and voluntarily put it there. Right? We don’t know if, at least I don’t know if some of the software is utilizing information from the dark web or other aspects of the World Wide Web here. But I think if it’s utilizing or filtering out irreputable sources and focusing on reputable web pages at least, is that improper, ‘cause what would separate that from a general Google search or other search engine result?

Schmidt
There’s a phrase that’s used throughout this lawsuit, de-anonymize. And the concept is if you put all this information out there, then on its own you remain anonymous. But once you put the pieces of the puzzle back together the picture emerges as to who it is. Is this a concept that’s ever been recognized in the law, a claim or someone’s violation of a right to privacy due to de-anonymized data?

Jordan
Oh sure, I think that the different laws that, at least are in play in the state level are seeking to address this issue point blank. I mean California was the first here, of course modeling the GDPR in Europe. But last I checked, there’s at least 10 other states who have privacy legislation that’s been enacted now, and all of them are focusing on our ability, or a company’s ability to understand and identify you, and making sure that you’re aware that they can identify you, and then understanding what they’re doing with the data that they have about you.

Schmidt
Melonie, in addition to the work that you do as a lawyer, you serve a vital function at Dorsey by sitting on Dorsey’s newly formed AI Task Force. Discussing how AI is going to impact the practice of law what things are you learning that you’re at liberty to share with us and with our listeners on some of the ways we’re going to be seeing AI impact the practice of law going forward.

Jordan
It’s a real honor, first of all, to serve on the AI Task Force for Dorsey. The task force represents a slice of practitioners from across our platform that are figuring out how we can employ AI potentially and the further ends of our practice, but then also a subset of us are kind of looking at ways that AI are impacting different areas of law. In particular as it relates to employment issues, I have been building a workplace privacy practice which focuses on the unique point in time that we’re in right now. With the add event of remote work over the last several years combined with the general acceptance now of AI and society and now in the workplace. I’ve been tasked within advising employers about the risk that are presented by employing AI both as it relates to managing the employment relationship and as it relates to the employees actually performing the work involved. So it’s an exciting time, an exciting area, different changes every day, and I’m really excited about the workplace privacy work that we’re doing.

Schmidt
It must be a real challenge because what you understand about AI in July becomes obsolete by October, or maybe that’s a slight exaggeration, or maybe or maybe not, or maybe it’s an understatement.

Jordan
I think it’s more of an understatement, right? I mean GPT was released to the public in March, I think, of 2023, and we’re in July now and my goodness, it’s changed so much. So, it’s continuously moving. You have those issues. You also have the proliferation of many states and acting privacy legislation as is. And to my understanding now, you have California who’s kind of standing as the Lone Ranger and applying privacy law to the employment relationship and so my practice really focuses on that overlap between the two and managing the risk associated with the employment of the technology. But also, you know, facing off any issues such that if litigation were to arise then, you know, defend it and get after it.

Schmidt
Well that’s very good. I must say that I’m still a nascent user, or early adopter of a lot of AI and the only way I was able to get through this 150 page complaint that was filed by the Clarkson Law Firm that we discussed earlier was to use some AI. I used Speechify which is great app that you can put a PDF into the program and have it essentially read any text to you through your IPhone. So that’s how I was able to get through that complaint. So we’re all enjoying different aspects of AI, but it’s also important to think about some of the litigation risks that are emerging as well. Well we’ve come to the part of our episode in which we do what we call the deeper dive to learn a little bit more about you as a person in addition to your legal practice and. Melonie, I understand that you are the mother of a ten year old, a three year old and a nine month old, so congratulations.

Jordan
Thank you.

Schmidt
And hats off to you for all the responsibilities that go with that. What’s your secret to managing your busy and very evolving practice and all the responsibilities that you have at home as well?

Jordan
Well I think it’s just understanding where the priorities lie. Making those first things first and being unapologetic about those priorities and about my time. So I’ve noticed as I’ve had more children, three girls now, I’m a lot more protective of my time, and I seem to accomplish more than I did when I was, you know single and you know with no responsibilities. It’s funny because you think that you won’t. You may feel that children slow you down, but they’ve only, you know empowered me. So I guess my secret is, you know, this is the time that I have to accomplish this task. This is the time that I have to do it, and I do it. And then the time that I have with my children, then I’m very, very protective of that. My husband and I really prioritize spending time with each other and spending time with our three girls.

Schmidt
What are some of the things you do as a family here in Sothern California to enjoy time away from the office and time with those three girls?

Jordan
We love going on hikes together. Hiking in California is very different then hiking in the east coast where I’m from. We’re not trekking through mountains and other rugged terrain. Really its nice hills with lovely ocean views, on top of the world hike, Laguna Beach. We love doing that with our girls. We love heading down to San Diego and spending time on Coronado Island. That’s our favorite spot. We just really, really enjoy spending time at the beach. I meant there’s a plethora of options here, right? So it’s good.

Schmidt
What are the challenges of hiking in Southern California? It’s pretty hilly.

Jordan
Yes!

Schmidt
So you’re heading uphill and then your downhill and then back uphill again and so those kids are learning some good fitness skills early on to be able to tackle the hikes that you just mentioned.

Jordan
Sure.

Schmidt
What about the balance of practicing in really two practice areas, which is privacy and labor and employment? Some of which intersect of course with employee privacy, but how do you manage really two practices that are in many ways distinct?

Jordan
Sure. So I really try to focus on that overlap. I think that if I was just a core privacy attorney I wouldn’t be as affective, but my first love has been employment law since coming out of law school. I really enjoy the work quite a bit and I find that a subset of employment issues focus on employee data and privacy, and you know not just those two areas, but thinking more so about employees in the workplace and protecting their interests there more broadly. So, looking at surveillance issues, looking at, you know, workplace monitoring issues looking at, I guess, background checks, and drug testing and all those other aspects of what people would term as your private life. And so I balance it by making sure that I stay within that overlap of the diagram, if you will.

Schmidt
I imagine the times that you’re able to be by yourself and pursue some hobby or other leisure activity are few and far between with all these responsibilities at work and at home, but to the extent that you do get those rare occasions, what are some things that you like to do to unwind and engage in a little self-care?

Jordan
Yes. So one is I, I actually am a morning person. Interesting fact, but there’s 80% of the human population are morning people apparently, so I’m one of them. And I like to get up before my kids get up, before my husband is up and I like to go to the gym and listen to podcasts, listen to my YouTube videos. It’s just my time to zone out and tend to myself in that way and then I’m pretty regimented about my weeks in terms of, you know, Thursday night is date night with my husband. Friday night is Friday family fun night, and then Saturday is self-care for myself. So Saturday night I’m enjoying a bubble bath, some sparkling cider and watching some Netflix.

Schmidt
Good for you. Well you know this is a marathon not a sprint. The practice of life and law in general. So it’s good to pace yourself and to take care of yourself along the way. That’s very important. So, thank you so much for being here. We appreciate your incites and your thoughts, and we’ll have to have you back some time because we know AI is quickly evolving and things we talk about are going to be developing quickly over the next several months and years. That’s all the time we have for today. Thank you for listening. I’m indebted to the extraordinary team at Dorsey for making this podcast and episode possible. For more resources on this and other litigation risk go to litigationrisks.com where more information can be found including a book on managing litigation risk written my yours truly. Until next time my friends this is another reminder that there are a lot of sharks swimming out there in the murky waters, so swim safely.

Voiceover
This podcast is not legal advice and does not establish an attorney-client relationship or create any duty of Dorsey & Whitney LLP or those appearing in this podcast to anyone. Although we try to assure that the content of this podcast is accurate, comprehensive, and reflects current legal developments, we do not warrant or guarantee those things. The opinions expressed in this podcast are the opinions of those appearing in the podcast only, and not those of Dorsey & Whitney. This podcast is considered attorney advertising under the applicable rules of certain states.

SharkCast

New AI Lawsuits Relating to the Use of Allegedly Stolen Data

Listen to this podcast on