This podcast episode discusses the law applicable to artificial intelligence (AI), which is increasingly being used in various industries and applications. ChatGPT, a large language model, is introduced as an example of a neural network that can generate text, but sometimes produces false information. Other examples of AI discussed are Dall-E 2, Stable Diffusion, and Midjourney, which can create images from textual descriptions. The podcast also covers the legal frameworks that apply to AI, including intellectual property law, contract law, tort law, and criminal law. The issues surrounding intellectual property protection for AI, liability of AI systems for harm caused, and the current state of regulation of AI in different jurisdictions are also discussed. The importance of understanding the legal landscape surrounding AI is emphasized given its increasing use and potential impact on various aspects of society. May It Please The Internet is a podcast by revisionlegal.com
Introduction:
This is May It Please The Internet, a podcast brought to you by Revision Legal. Lawyers who represent businesses that make money online.
Hey everyone, this is John di Giacomo and you’re listening to the Revision Legal May It Please The Internet podcast. And I’m joined today by my partner Eric Misterovich. Hey, Eric.
Eric:
Hey, John. How are you? John:
I’m good. And we are talking today about accountability and liability for artificial intelligence, which is the hot topic these days.
Eric:
AI is everywhere. Seems like every Twitter post I read, if it’s taking a cold plunge, it’s about how AI is changing someone’s business from coming up with blog posts ideas to writing the posts, to seemingly taking over the entire legal industry.
John:
Have you seen these debates? Maybe you don’t get to feed the same stuff that I do, but have you seen these debates about artificial general intelligence and whether or not we should pause development because it’s going to take over the world? Have you seen this stuff?
Eric:
I’ve seen that. I’ve listened to some podcasts about it and the people working on it, and I think I heard something like one in 10 people that are actively involved in the AI industry think that’s possible, which is absolutely crazy.
John:
It’s pretty interesting. There’s a great Sam Harris podcast where he interviews this guy, and I cannot remember his name right now. It’s a Russian name, but he’s an American citizen, obviously, but he’s kind of at the forefront of this idea that AI or artificial intelligence is extremely dangerous. And I follow him on Twitter and he’s just going crazy because people are giving him a very, very hard time saying his ideas are insane and that he is going to stop progress. And then there’s this whole other crew that agree with him. So it’s very interesting to watch this debate and it all stems from ChatGPT. And Eric, what is ChatGPT?
Eric:
ChatGPT, so it’s a large language model. It’s taken over absolutely everything. And John, you’ve provided a great outline of this about how it is a neural network. I’d like to hear you explain exactly what that means.
John:
You want me to… This explanation will sound extremely intelligent to unintelligent people and extremely stupid to intelligent people. So ChatGPT is now a LM. It’s a large language model. It’s a type of neural network. If you’re interested in this stuff, there’s a great book that I actually brought with me that I have in front of me called Numsense! Data Science for the Layman: No Math Added. It’s a really good explanation of the underlying algorithms that go into data science and particularly artificial intelligence. But ChatGPT is a neural network, and basically the way it works is each neuron in the network calculates an output based on input. So whatever the input is, it calculates an output based on that input. And then each neuron is then connected to other neurons. So if you think of neurons, like pieces of a sentence is a really good way to think about ChatGPT.
And then each neuron is assigned a weight, like a value of, predictive value likelihood of it being the correct value. And then the output that ChatGPT provides is based on that predictive value. So what it does is it really predicts what word should follow in a sentence based on the previous word. And that’s a really, really simplified explanation, but it produces really interesting results. And we’ll discuss more about the input, where the data comes from, but it also has a lot of false output. So the false output in AI is called a hallucination. And a hallucination is a sentence that whether based on semantics or syntax is plausible, meaning it reads correctly to a human, but it’s false, objectively false so it provides false information. And so ChatGPT provides a lot of accurate information and it provides a lot of false information based on these hallucinations.
Good example is I asked it who I am, and it did a really good job of saying, John di Giacomo is the founder of Revision Legal. He holds a degree, a Juris doctor from Michigan State University College of Law. And then it goes on to say he also holds an LLM from University of New Hampshire, which obviously is not correct. So it’ll be interesting to see how some of these things get solved for over time. And then the second category, other than ChatGPT, is these image-based neural networks or artificial intelligences. Those are things like Dall-E 2, Stable Diffusion and Midjourney. And these create images from textual descriptions, text descriptions that are captions of photos. And this is, again, a neural network and it synthesizes a photo based on the input of the user and its knowledge of those captions and text descriptions.
So it outputs a photo that is really a synthesis of what it knows and the connections in that neural network and the probabilities of whether it’s the right outcome based on the input of the user. So that’s the landscape for AI. All of that was probably wrong, so feel free to send me an email and complain. But Eric, hopefully-
Eric:
That all sound right.
John:
…that gets us started. Eric:
That all sound right to me. I mean, I think the best way to understand this, I think if you’ve played around with it at all, I think your technical explanation of it makes sense. You can tell how things are coming together, and it’s, at the one hand, incredible that it’s able to pull this information together and produce a response that is accurate and then it’s also terrible in that it’s completely wrong. Or the image stuff just ends up looking either amazing or completely ridiculous. And it is just, where is this technology
and what is it going to be in five years? That’s where you start to get these questions of, is this a good thing or a bad thing? And it’s pretty hard to tell right now, I think, I mean, there’s obviously so many ways you can look at this as being good, but you can also see this being, I don’t know, weaponized or used the wrong way or in a way that is causing more confusion to the kind of inability of our country to tell the difference between fact and opinion already.
John:
We already see some of this in things like big data, like Cambridge Analytica was an example, where big data is being used for these kinds of arguably nefarious purposes. We see it in things like creating false images of celebrities or creating synthesized voice records of celebrities to make them say things that they aren’t actually saying. I’ve actually stopped allowing clients to record consults for this reason. I’m actually a little concerned about it, particularly because I don’t want to get sued from malpractice. This is now a real potential issue. And then we also see it in areas like data privacy, where there’s a question of, should a algorithm or should artificial intelligence be used to make, for example, credit decisions or decisions about health or decisions about judicial outcomes as a new area of concern?
But today we want to talk about really accountability and liability with respect to intellectual property law, contract law, and a little bit of tort law. And I think, Eric, we should start with IP law. And there’s one glaring category of IP law that really kind of stands out when it comes to these two particular applications of AI ChatGPT, and the image based AIs, and that’s copyright infringement. And why do you think that is?
Eric:
The internet is already one big example of copyright infringement, but this is a whole new level of infringement because there is this kind of non-human actor involved. And copyright, it’s this foundational law of if you create a work, you own all the rights in it, meaning you have the sole ability to share that work, to reproduce it, to distribute it. And AI has just taken all of that data out in the world, whether it’s text or images, and then can use that to teach it to create similar answers to prompts. And the most obvious case of this so far has been Getty Images. Everyone’s probably familiar with Getty Images maybe received a demand letter from Getty Images for using one of their images without permission. Well, now Getty is going after Stability AI, which is behind Stable Diffusion, and alleging that essentially 12 million images have been infringed.
It’s a pretty fascinating case, it’s in its infancy. We just checked the docket, an amended complaint was just filed, so this is a long way to go. But I think in the complaint, they allege about 8,000 registered copyrights are subject to the lawsuit, and they are alleging that Stable Diffusion through some other independent contractors essentially took their entire dataset of images, and now are using those same images to reproduce either identical or substantially similar images in response to prompts.
John:
That’s really interesting. So how did Stability AI get access to Getty’s images? Eric: I mean, this will probably be sorted out a bit, but in the complaint, they allege that Stable Diffusion was working with this German contractor and that the German contractor created this dataset that included content from all over the internet, including Getty Images. That dataset then was provided to Stability
AI, who essentially used that dataset to visit those links and scrape all of that data, which of course seems like a huge problem for Stability AI.
John:
That’s what I was wondering is there a claim for a breach of the terms of use agreement of Getty Images? And it sounds like there probably is. So for those who are listening, terms of use agreements are important. We’ve discussed this a thousand times on our podcast. In this particular case, Getty Images, which is a sophisticated company with lots of experience in litigation, probably has a pretty good terms of use agreement that prohibits scraping. And whatever script that was written to mine this data allegedly probably scraped and violated that terms of use agreement. So this is another-
Eric:
It’s not a separate cause of action, but it’s certainly the foundational element of access to the works that came through that scraping, and they certainly allege within the complaint that it violated their terms of use, although they’re not asserting it as a separate cause of action.
John:
Interesting. I wonder why they made that choice. I wonder if they just feel like it’s not strong enough. And I also wonder why they chose a German company to do this data collection. When you were reviewing the bleedings, was there any connection between Stability and the German company? Did they share ownership or anything of that sort?
Eric:
They alleged that the German company is completely funded by Stability AI. So there’s some kind of connection there. We don’t know everything, but it seems like they’re almost one and the same.
John:
I wonder if there’s some piece of German law that provides a better legal environment for scraping those images or collecting that data set. That seems like that would be a smart decision to make.
Eric:
If that’s the case, certainly seems like there’s some plan to it, you would think, because it would have to anticipate this is going to happen.
John:
So this an interesting problem because you and I were talking on a phone call today. We had read this journal article written by a colleague. I got extremely emotionally upset by the article because I think it’s just a wrong take on the law. And my complaint was that we came into the internet age with this dream that the internet would solve all these problems. Information would be free, it would be a Democratic way to distribute knowledge, but we didn’t do it at the cost of small artists and people who are trying to make a living from their art and their work and their creativity and their ingenuity. And this is yet another example where really the first to market gets to take all of that collective work and knowledge and creativity and monetize it, right?
Eric:
It seems like they’re taking work done by other people and creating this new method of access to that work. Instead of paying an artist a license fee, you insert a prompt into Midjourney and you get that image without ever communicating with anyone. And the artist who is helping produce that underlying content that teaches these AI models, they’re left out.
John:
It’s a lot like the other service providers out there, but it does feel fundamentally different. What do you think the difference is between what OpenAI or Stable Diffusion or Dall-E are doing in comparison to, let’s say, for example, Google as a search engine? I mean, Google scrapes webpages, it takes those results, it frames them within its own system. It monetizes them by placing pay per click ads at the top of the page. It shows image results. What is the analytical difference between what Google does and what ChatGPT, for example, does from an IP perspective? Or is there one?
Eric:
I think there’s a couple different ways you can think about it. I heard someone else smarter than me explain the differences. Google is, you are a research librarian when you are using Google. You’re conducting research, you’re getting the ability to find multiple answers to your question. You can read those answers. You can make your own decisions on what you believe is the most accurate. Where AI models, you’re almost like an engineer where you’re prompting this machine to give you a certain output. And the differences that come from that are, you don’t really know where that output came from in the AI part. Whereas with Google, you literally could cite to where you got that information. You’re not going to be able to do that with, at least right now, typing in questions into ChatGPT and trying to get an answer.
You’re not entirely sure where that’s from. So there’s a kind of hidden level of knowledge and sourcing that makes it difficult for a rights’ holder to kind of wrap their hands around. If someone wants to write a blog post, they insert a prompt into ChatGPT, they get an answer and they write a blog post with, they copy and paste that. But that underlying content infringes someone else’s original work. There’s this step that’s missing of, well, is the person using GPT really attempting to copy someone? Not the same way as if you went to a Google search result, found a link and copied and pasted the text yourself. There is a difference between the two.
John:
I want to mention something that the copyright office did recently because I think it frames one of the problems that I see. The copyright office recently said that the output of AI might not be copyrightable or in some cases is not copyrightable because works that are created by AI without human intervention fail to meet the authorship requirement. The authorship requirement for creativity under the Copyright Act requires human authorship. And then ultimately, the copyright office said that whether an AI produced or outputted work will be copyrightable depends on the level of human authorship involved. So how is it used? How does it operate? What level of human intervention was required to produce that result? And it’s interesting because if AI can’t be an author from the perspective of copyright ability, can it also not be an infringer from the perspective of copyright ability? Because the infringement requires the creation of a derivative work by an author. So I wonder how does a judicial system solve for this issue? Because it seems like in the case of Google, it’s not as clear whether the results of a Google search are a derivative work. Because Google takes those results, it frames them on its website. But like you said, you can see where the original work comes from. Same with the images, but in the case of ChatGPT, it’s
hiding it and it really truly is synthesizing it and creating a derivative work. But if we get a ruling that says that isn’t a derivative work because it’s not subject to copyright protection or Getty loses its lawsuit, it seems like a fundamental problem for rights holders that would have to be solved by some kind of congressional action.
Eric:
It seems like an enormous problem if there’s not a way to use copyright law to prevent the infringement hearing. I mean, it seems like this is completely ripe for rampant infringement and the guardrails to help creators, even just help the AI industry. There probably does need to be laws about it. This is going to be one of the many areas I think that’s going to need regulation. But right now, it does present that strange question of, well, if there’s not a human involved essentially, and it’s not a protectable work, how can it be infringing? But at the same time, it’s clearly infringing.
John:
The other layer of this that’s interesting is that when we do fair use in the US, we look at, one of the questions is the work transformative? That’s one of the questions that you have to answer to determine whether the use of someone else’s work constitutes fair use. And synthesizing that information to produce a new work seems to be highly transformative. So it’ll be interesting to see how the output of something like ChatGPT is analyzed from that perspective. And then the other side of that is facts are not copyrightable. So what actually is something like ChatGPT using? Is it using the creative work? In the case of Dall-E or Stable Diffusion or any of the other image based Ais, it’s pretty clear that they’re working from copyrightable work, but you can’t take facts and then rearrange them with your own expression. You can do this. I mean, you can take facts and rearrange them with your own expression without a follow of copyright law. So it’ll be interesting to see how courts treat it from that perspective as well.
Eric:
The facts aren’t copyrightable is always something I like to explain to clients because it’s a little bit difficult to draw the distinction in what that really means. But I always explain it in the terms of a cookbook, a recipe is not copyrightable in how to make a chocolate chip cookie. Copyright is not about ideas. It’s about works. And so if you’re writing a book, that’s work that’s subject to copyright. Now is the exact measurements and directions on how to make the cookie, you can’t stop other people from having the same recipe. But how you arrange that recipe and how you describe it, and especially if that’s in a book of a series of descriptions and explanations in specific recipes, then that whole book becomes subject to copyright. But you can’t just stop someone from writing about the exact same way to make a cookie.
John:
Do you remember the Game Genie for Nintendo?
Eric:
The cheat code? John:
The Cheat Code. So this reminds me of the Game Genie. It was a case about the Game Genie back in the day. You can tell I’m not that old. There was a case about the Game Genie back in the day where the question was, was there a derivative work being made when the Game Genie changed the code of the underlying Nintendo game? And what the court looked at was whether the code had been fixed in a tangible medium of expression. So was there fixation when the code was placed into RAM? And I might be misremembering this case, but I remember it being somewhat analytically similar to this. I wonder if courts are going to look at ChatGPT and say either the input is fixed in a tangible medium of expression and use in a way when it’s training AI models, that it is copyright infringement or say it’s used for such a transitory and short period of time to give it this knowledge that it just doesn’t matter.
It’s not actually copyright infringement. It just seems like there’s so many complex issues to unpack with the way that this data gets to a researcher or a company gets used and then gets outputted that we’re going to be litigating these cases for years to come.
Eric:
Absolutely. I mean, the Getty Images one seems like low hanging fruit in a relatively new technology. I mean, they still are reproducing the watermark in some of the images. The watermarks distorted in some of the images. Some of the images just look absolutely terrible but the watermark looks good. I mean, there’s all kinds of problems. Now we’re getting into trademark issues as well. So that case certainly is not going to solve every problem about AI and copyright, but it’s going to be the start of it, and we’re going to have a long way to go. It’s going to keep attorneys and judges busy for a while.
John:
Well, let’s talk about the final category that I think is worth talking about. We already talked about contract law briefly about scraping and the importance of a terms of use agreement, but tort law has a little place to play as well. There’s this case that’s similar to Getty in some ways against Prisma Labs, which makes the Lensa. Lensa is this app that produces AI generated, best way to describe is profile pictures based on geometric face data that you provide through your cell phone. You may have seen people posting these on social media sites. There’s fantasy versions and sci-fi versions, et cetera. And they were sued for a violation of using facial geometry, which under statutes like the Illinois Biometric Information Protection Act, BIPA, you have to get express written consent for. So Eric, what do you think about this area? Do you think we’re going to see more work here as well?
Eric:
A hundred percent. The people pushing this industry forward, I really wonder how much they’re really thinking about these kinds of lawsuits.
John:
My answer to that is zero. Eric:
I mean, it’s a cost of doing business, and they’re going to go and go, go, go and just not worry about this. But I think there’s definite risk of liability and facing class action lawsuits that they’re going to have a hard time getting out of.
John:
I downloaded Lensa and I played with it when everybody else was going on Facebook and posting 10 images of themselves. I never paid for it because I just felt weird because my wife is going to ask me why I paid $10 for profile pictures. It’s not explainable in my household. But I did look at the way that they asked for consent because I was aware at the time, obviously, that BIPA was a potential concern, and I thought the way that they did it was correct. But there’s this kind of open question under Illinois law as to whether or not it needs to be signed written consent and not just a clickwrap or a browsewrap. So I think we’ll see some things there as well. But I was playing MLB The Show 2023 this weekend. Got it for free on Xbox Game Pass.
This is what they call it, I think. And it’s got this biometric feature where you upload a scan of your face and it’ll actually input you into the game. It takes your facial geometry, takes the scan of your face, and then puts you on your own baseball player that you can then play through a career, which my career, by the way, is not going very well. I haven’t made it out of the minors. But the way that they did consent was a lot more onerous than what I think a lot of these other companies do. So there does seem to be some level of knowledge that this is an emerging area of law that they have to take seriously. If I recall correctly you needed to consent on two devices, the phone and the actual console, which I thought was interesting. And then there was another lawsuit recently over voice data where TikTok had used some, apparently allegedly used some woman’s voice for its robot voice. I don’t use TikTok, Eric. I know you do, or at least did. Is that like a feature? How does the voice work on TikTok? [inaudible 00:27:03]
Eric:
You can have it read whatever text you put on, and you can pick what voice you want to have it read in. It’s very surprising that, I mean, it seems a pretty simple thing to actually get the rights to the voice that you’re going to use for this. So that’s pretty dumb that they didn’t do that. But they’re moving fast, I guess.
John:
I just asked that question solely because I wanted you to admit that you actually use TikTok. Eric:
I do. TikTok is, I’ll say, I mean, I know there’s talk about banning it and probably should be banned. I mean, it is unbelievably addicting. I’ll use it for a while and then I just delete it because I find myself looking at it and why am I even doing this? I do it before I even realize it, and then I just delete it. And eventually I’ll find my way back because I want to find some recipe or something that I saw on there. But my mine is all golf and cooking stuff.
John:
Well, I make fun of you, but I used Facebook Reels for exactly the same reason. I was sitting at the pool yesterday while my daughter was at swim practice, and I realized that I had flipped through like 20 reels and she could be drowning. And I looked up and I’m like, “Oh, she’s still alive. That’s probably good.”
Eric:
So it’s dangerous. John:
It is very dangerous. And I wanted, before we go, that’s really all we had this week. But before we go, I wanted to say that earlier in this episode I mentioned that there is a skeptic of artificial general intelligence. His name is Eliezer Yudkowsky. I’d forgotten what his name was. And so if you get a chance to look him up, but interesting skeptic, great podcast with him and Sam Harris. Lots of people disagree with him, but I thought the podcast was great.
Eric:
The podcast I listen to is called Plain English, Derek Thompson. He has got an episode about AI, which was really good.
John:
That’s the one where the engineer analogy came from.
Eric:
Yeah. (affirmative).
John:
Cool. I’ll have to check that out too. Well, that’s all we have. Thanks Eric. Anything else you want to add? Eric:
Nope, that’s it. Looking forward to seeing AI either solves the world’s problems or completely blows it up.
John:
We’ll see. It’s going to keep us busy, that’s all I know. I don’t think it’s going to replace us anytime soon, I think. Or maybe we won’t let it. Maybe we’ll just keep, all the lawyers are just going to keep suing AI companies to make sure it never replaces us. Who knows?
Eric:
That sounds right. John:
It does sound right. Well, thanks everyone. Again, this is the May It Please The Internet podcast. I am John di Giacomo joined by Eric Misterovich as always, and we appreciate you listening.
Almost half of the States in the U.S. have enacted some version of an online personal or consumer data privacy statute. The statutes all use a similar framework that requires data collectors and processors to provide notices, obtain consent, and comply with mandates and prohibitions. For example, all of the online data privacy statutes require […]
The Ninth Circuit Court of Appeals — located in San Francisco — partially struck down California’s Age-Appropriate Design Code Act (“CAADCA”). See Cal. Civ. Code §§ 1798.99.28 et seq. The CAADCA was passed in 2022 by the California State Assembly. The CAADCA was enacted to protect the online privacy of children — persons under the […]
Trademark creation and use are both essential for the success of modern businesses. Trademarks are logos, words, phrases, designs, marks, and other things that consumers use to identify a business as the commercial source for goods or services. Trademarks are generally registered and can be valid for as long as the trademark is being used […]