AI-Copyright Fight: Is It a National Security Threat? featured image

AI-Copyright Fight: Is It a National Security Threat?

by John DiGiacomo

Partner

Copyright

The holders of copyrights for newspapers, magazines, books, and other publications are involved in numerous legal battles with owners of AI modules over alleged copyright infringement. The plaintiff copyright owners claim that the AI large language modules have been trained on huge quantities of copyrighted materials without permission and — most importantly — without payment. The plaintiffs claim that such training is actionable copyright infringement and the plaintiffs seek to recover vast amounts of money damages that are allowed under the Copyright Act.

Recently, some interesting factual nuances have been disclosed about the source of some of the copyrighted materials and about whether the United States’ restrictive copyright laws create a national security threat. It is probably not well-known that there are a couple of enormous “illegal” or “shadow” online libraries containing tens of millions of books and other materials (including academic papers) which are all covered by copyright protections.

As discussed here in some detail, the owners are fully aware of the copyright protections and admit that they are running an “illegal” library. However, they feel they have a moral obligation to ensure that “humanity’s heritage” is not lost. About 20 years ago, the largest such shadow library was called Z-Library. About 15 years ago, governments around the world were able to successfully shut down Z-Library, but not before at least one group had offloaded the whole library onto a new server with a new name. The current version is called Anna’s Library. Anna’s Library claims to have grown the Z-Library to the point that it now contains 140 million items.

What is interesting for our purposes is that many AI companies have, over the last few years, contacted Anna’s Library for assistance with using Anna’s Library as a source of training materials for their AI modules. From the article linked above, Anna’s Library stated:

“Virtually all major companies building LLMs contacted us to train on our data. Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality. This is notable given China’s role as a signatory to nearly all major international copyright treaties. We have given high-speed access to about 30 companies.” (emphasis added)

For additional details, see article here.

The foregoing revelations are interesting in and of themselves, adding a factual nuance to the AI-copyright cases.

However, just as interesting is the suggestion that a national security problem exists. As can be seen in the quote above, according to Anna’s Library, Chinese AI firms are not particularly concerned about the legality of using pirated books and illegal libraries. For creation and training of AI modules, this then puts Chinese AI firms at a significant advantage over U.S. firms who are more leery of copyright complications. For example, in this report (page 7), it is noted that Anna’s Library was used, in part, in the pretraining phase of an earlier version of China’s DeepSeek AI module.

So, the concern then is that, with less constraints, Chinese and other national AI firms will move more quickly to advance their AI modules, leaving U.S. and other Western AI modules to fall behind. This, then, generates the questions of whether that constitutes a national security threat and what can be done about it. Many suggest that the obvious answer to the first question is “yes.” As for the second, one suggestion is to amend copyright laws to create exceptions for AI training.

The Legal Landscape of AI Copyright Litigation

The shadow library revelations are a factual backdrop to what is already a massive and consequential wave of copyright litigation. Understanding the legal theories being advanced — and the defenses being asserted — matters for any business involved in AI development, content creation, or media.

The Core Legal Claims Against AI Companies

The plaintiffs in AI copyright cases — including the New York Times Co. v. Microsoft Corp., No. 23-cv-11195 (S.D.N.Y.), Authors Guild v. OpenAI, and dozens of related cases — generally assert three types of copyright claims under the Copyright Act (17 U.S.C. § 101 et seq.):

  • Direct infringement arising from the copying of copyrighted works during the training process. Each act of copying — even temporarily into a computer’s RAM — constitutes a reproduction under 17 U.S.C. § 106(1), which is an exclusive right of the copyright owner.
  • DMCA copyright management information (CMI) claims under 17 U.S.C. § 1202, alleging that AI training processes strip CMI (metadata including author name, title, and copyright notice) from ingested works.
  • Output infringement — the emerging claim that AI-generated outputs are themselves infringing because they reproduce substantial portions of training data or create works that are “substantially similar” to copyrighted works in the training set.

The Fair Use Defense and Its Limitations

The primary defense asserted by AI companies is “fair use” under 17 U.S.C. § 107. Courts weigh four statutory factors: (1) the purpose and character of the use, including whether it is commercial and whether it is transformative; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used; and (4) the effect of the use on the potential market for the original work.

AI companies argue that training is transformative because it extracts statistical patterns rather than reproducing the expressive content of the original works — analogous to the “intermediate copying” found permissible by the Ninth Circuit in Sega Enterprises Ltd. v. Accolade, Inc., 977 F.2d 1510 (9th Cir. 1992). However, the Delaware ruling in Thomson Reuters v. Ross Intelligence (discussed separately) rejected the fair use defense where commercial competition with the plaintiff’s product was the direct purpose of the training. That precedent, if widely followed, significantly weakens the fair use argument for commercially deployed AI systems.

The National Security Dimension and Policy Responses

The national security argument advanced by some commentators — that strict U.S. copyright enforcement hobbles domestic AI development relative to Chinese competitors — has generated substantive policy debate. Several legislative proposals would create a copyright exception for AI training on lawfully accessed works. The most prominent is the proposed “Generative AI Copyright Disclosure Act,” which takes the opposite approach, requiring AI developers to disclose training datasets to the Copyright Office rather than exempting training from infringement liability.

The European Union’s AI Act and its accompanying provisions under the DSA/DMA create a different framework: AI developers must disclose when copyrighted works are used for training, and rights holders may opt out. Whether U.S. law will converge toward a disclosure/opt-out model, a fair use safe harbor, or a compulsory licensing scheme remains an open question that the courts and Congress are actively shaping.

What This Means for Businesses

Businesses that use AI-generated content in their products, marketing, or operations face downstream legal uncertainty. If the training data underlying a commercial AI tool included substantial quantities of copyrighted works without license, the outputs generated by that tool may carry infringement exposure that flows to users as well as to the AI developer. Contractual indemnification provisions in AI platform terms of service — and their limits — should be carefully evaluated by any business deploying AI-generated content at scale.

Copyright owners whose works may have been used without license to train AI models should preserve evidence of their ownership and monitor the litigation landscape closely, as class settlements in some of these cases may provide compensation opportunities.

Contact the Copyright and AI attorneys at Revision Legal or visit our copyright practice page to discuss your rights and obligations in the AI copyright landscape.

Contact the Copyright and AI Attorneys at Revision Legal

For more information, contact the experienced Copyright and AI Lawyers at Revision Legal. You can contact us through the form on this page or call (855) 473-8474.

Extra, Extra!
Related Posts

The Risks of Using AI-Generated Content in Your Business

The Risks of Using AI-Generated Content in Your Business

Artificial intelligence has become part of nearly every business operation. Businesses now use AI tools to write marketing copy, generate product images, compose emails, draft social media posts, and produce video and audio content at a scale that was not possible a few years ago. The efficiency gains are real. But so are the legal […]

Read more about The Risks of Using AI-Generated Content in Your Business

Put Revision Legal on your side