AI Copyright Plaintiff Loses Again on DMCA Claims featured image

AI Copyright Plaintiff Loses Again on DMCA Claims

by John DiGiacomo

Partner

Copyright

As we have discussed several times on this blog, there are major issues with copyright laws and the use of generative artificial intelligence (“AI”) programs like OpenAI’s chatbot ChatGPT. At least two dozen cases have been filed by various plaintiffs and groups around the United States making various copyright infringement claims against the companies that train and operate AI-generative programs.

At least a couple of the copyright plaintiffs have attempted to bring claims under the Digital Millennium Copyright Act (“DMCA”). In particular, the plaintiffs have attempted to assert that AI content generation programs have violated the DMCA by removing what is called “copyright management information” (“CMI”). CMI refers to various details about a copyrighted work that is attached to a digital copy of the work. These details include information like the title, author’s name, the owner of the copyright, etc. The DMCA prohibits intentionally removing or altering the CMI without permission.

In general, the copyright plaintiffs have claimed that the CMI information was removed from images and text that were used by the AI owners during the training process used for the AI programs/modules. The plaintiffs are keen to succeed on DMCA claims because the potential money damages available are significantly higher than standard copyright infringement claims.

In two major case rulings, the DMCA CMI claims have been dismissed. The first case, Andersen v. Stability AI, Ltd. (U.S. N.Dist. Cal. 2024), dismissed the DMCA CMI claims on a Motion To Dismiss based on the fact that plaintiffs had no facts to support the claim (or if such facts did exist, the plaintiffs did not put them in their Complaint or otherwise provide them to the court). In short, the DMCA CMI claims were dismissed with prejudice because the plaintiffs had no evidence that the AI training programs actually removed the CMI.

In the second case — Raw Story Media, Inc. v. OpenAI, Inc., Case No. 24 Civ. 01514, Dist. Court, SD New York 2024 — the plaintiffs also claimed that the CMI was removed and had some evidence to support that claim. However, after examining the legal issues, the federal court judge held that the plaintiffs had no basis for their claim — standing — because there was no damage or injury to the plaintiffs in the simple removal of the CMI. In effect, the judge held that, for an injury to occur, removal of the CMI had to be coupled with some sort of dissemination by the AI program/module. To date, none of the plaintiffs — in any of the cases — have asserted that the AI-generative programs are or have disseminated copies of the copyrighted works. Since there had been no dissemination of the copyrighted works without their associated CMI, the plaintiffs had no DMCA claims. Those parts of the case were dismissed (with leave to amend).

The Broader DMCA Framework and What These Rulings Mean for AI Litigation Strategy

The dismissal of CMI claims in both Andersen v. Stability AI and Raw Story Media v. OpenAI illustrates the difficult standing and evidentiary hurdles plaintiffs face in AI copyright litigation. But these rulings do not resolve the larger landscape; they shape the litigation strategy going forward in important ways.

What the DMCA CMI Provisions Actually Prohibit

Section 1202 of the DMCA (17 U.S.C. § 1202) prohibits two related categories of conduct: (1) knowingly providing false copyright management information, and (2) intentionally removing or altering CMI. CMI is defined broadly to include the title of a work, the name of the author, the name of the copyright owner, terms and conditions of use, and identifying information in a digital file — essentially the metadata that allows users and automated systems to identify who owns a work and under what terms it can be used.

The plaintiffs’ theory in the AI cases was straightforward: when AI companies scraped billions of text files, images, and other digital works to build training datasets, the process stripped the accompanying metadata — the author names, copyright notices, and terms of use that were embedded in the files. The resulting training datasets and trained models therefore contained the expressive content of the copyrighted works without the identifying information that would alert users that the content was copyrighted.

Why the Courts Dismissed These Claims

The dismissals in Andersen and Raw Story turned on two different legal defects:

  • Andersen: No evidence of removal. The plaintiffs could not allege — with sufficient factual specificity to survive a motion to dismiss — that Stability AI’s training process actually removed CMI from the ingested files. Without facts supporting that allegation, the claim was speculative. The court dismissed with prejudice, meaning the plaintiffs could not re-plead the DMCA CMI claims. This outcome highlights the challenge plaintiffs face in pleading AI copyright cases: the inner workings of AI training pipelines are proprietary, and plaintiffs typically lack access to technical details about how training data was processed before the lawsuit begins.
  • Raw Story: No injury from removal alone. The Southern District of New York took a different approach. Even accepting the plaintiffs’ factual allegations about CMI removal, the court held that the plaintiffs had not suffered a cognizable “concrete” injury from the removal alone, as required for Article III standing under TransUnion LLC v. Ramirez, 594 U.S. 413 (2021). The injury from CMI removal only materializes, the court reasoned, if the works are subsequently distributed without their CMI — enabling downstream users to engage with the content without knowing it is protected. Since the plaintiffs did not allege that OpenAI’s systems were actively distributing copies of their articles without CMI, there was no concrete injury.

Implications for Future DMCA CMI Claims

The Raw Story ruling’s leave-to-amend suggests a pathway for future DMCA CMI plaintiffs: allege that AI outputs reproduce content from training data without the associated CMI, or allege that the AI system generates content that is disseminated in a form that obscures copyright ownership. Whether AI outputs — which are generated text, images, or code rather than copies of the original training data — can constitute “distribution” of the training works without CMI is a question that no court has yet resolved.

The combination of the Andersen and Raw Story dismissals suggests that standalone DMCA CMI claims, without accompanying evidence of actual output dissemination, will face significant judicial skepticism. Plaintiffs’ counsel in these cases has increasingly pivoted toward direct copyright infringement claims (which do not require the same standing analysis) and toward cases where the factual record of training data composition is better developed.

Discovery and the Technical Record

For copyright owners considering AI litigation, the most important strategic question is whether they can develop the technical evidence necessary to support both CMI removal claims and direct infringement claims before or during litigation. Some plaintiffs have used technical discovery — including subpoenas to third-party dataset curators — to build the technical record. The growing availability of academic research on AI training dataset composition (including disclosures like those from Anna’s Library noted in the related post) provides a foundation for factual pleading that was not available to early plaintiffs.

If your copyrighted works may have been used without authorization to train commercial AI systems, or if you have questions about your company’s AI copyright exposure, contact the Copyright and Internet Law attorneys at Revision Legal or visit our copyright practice page.

Contact the Copyright and Internet Law Attorneys at Revision Legal

For more information, contact the experienced Copyright and Internet Lawyers at Revision Legal. You can contact us through the form on this page or call (855) 473-8474.

Extra, Extra!
Related Posts

The Risks of Using AI-Generated Content in Your Business

The Risks of Using AI-Generated Content in Your Business

Artificial intelligence has become part of nearly every business operation. Businesses now use AI tools to write marketing copy, generate product images, compose emails, draft social media posts, and produce video and audio content at a scale that was not possible a few years ago. The efficiency gains are real. But so are the legal […]

Read more about The Risks of Using AI-Generated Content in Your Business

Put Revision Legal on your side