As reported here, a U.S. federal judge in San Francisco has ruled that it is “fair use” under U.S. Copyright laws to use books and other literature to train an AI module, but that storage of the literature in a database is not. The case involves an AI development company called Anthropic. The lawsuit was brought by a group of authors who asserted claims for copyright infringement for the use of their writings without permission (or payment). See Bartz v. Anthropic, PBC, Case No. C 24-05417 WHA (N. Dist. Cal. 2025).
Generally, U.S. Copyright laws provide legal protections for original works of authorship, including protection for the author’s right to exclusive use of the materials. If a person or company violates that right, they can be sued for copyright infringement, where civil penalties and fines can be extensive. In cases of willful infringement, civil damages can be awarded in amounts exceeding $100,000 per work that has been infringed. It is claimed that Anthropic used millions of books, newspapers, magazines, and other types of written works as part of the training procedures for its AI module.
However, there are some circumstances where a person or company can use copyrighted works without permission and without being deemed to have infringed the works. These circumstances go under the general legal term of “fair use.” In the Anthropic case, the judge ruled that using the book and other materials was “fair use” in the same manner as if an individual obtained a book for personal use and instruction. The idea is that a person might study a copyrighted work, not to copy it, but to learn to create other original works of authorship. In this sense, the intent is to create “new transformative” works. This has been a key legal argument made by Anthropic and other companies being sued for similar behavior. For the first time, a federal judge agreed with the reasoning and ruled in favor of “fair use.”
The decision is a tremendous “win” for companies that are creating and developing AI modules.
However, the “win” was only partial. The judge also ruled that it was NOT “fair use” for Anthropic to store the books and other written materials in a digital database. It is admitted by the parties that, to quote the judge:
“The firm also purchased copyrighted books (some overlapping with those acquired from the pirate sites), tore off the bindings, scanned every page, and stored them in digitized, searchable files. All the foregoing was done to amass a central library of “all the books in the world” to retain ‘forever.’”
The judge ordered Anthropic to face trial later in 2025 on the claims that its digital storage of over 7 million books was copyright infringement. As noted, each proven incident of copyright infringement comes with a significant award of money damages.
It should be noted that, a day after the Anthropic decision was released, a different federal judge in San Francisco — US District Judge Vince Chhabria — issued a decision that was at odds with the holding of the Anthropic decision. Judge Chhabria opined that the use of copyrighted materials might be infringement, but, in that case, the parties suing had not provided enough evidence for the case to move forward.
Contact the Internet Law and Copyright Law Attorneys at Revision Legal
For more information, contact the experienced Internet Law and Copyright Law Lawyers at Revision Legal. You can contact us through the form on this page or call (855) 473-8474.
The Four-Factor Fair Use Analysis in Bartz v. Anthropic
The district court in Bartz v. Anthropic applied the four-factor fair use test under 17 U.S.C. § 107 to the AI training question. On the first factor — the purpose and character of the use — the court found the use “transformative” in the sense that using books to train an AI model is analogous to a student reading books to develop their own writing ability, not to reproduce the books. The court analogized the training process to the “intermediate copying” doctrine recognized in software reverse-engineering cases, where copying is permitted if necessary to access unprotected elements.
On the fourth factor — market harm — the court found that AI training does not create a direct market substitute for the original books. The court distinguished between the training process (which the court found to be fair use) and the output of the AI (which, if it reproduced substantial portions of the original books, could still constitute infringement). The court’s analysis on this point is significant because it suggests that the AI industry’s fair use argument may be strongest when the AI’s output is genuinely novel and least when the AI reproduces verbatim passages from its training data.
The Storage Ruling: A Critical Limitation on the Fair Use Defense
While the training-as-fair-use ruling was a major win for AI developers, the court drew a sharp line at storage. Anthropic had: (1) obtained books from pirate sites; (2) purchased books from authorized sources and scanned them; and (3) stored all of these works in a persistent digital database. The court held that maintaining a database of copyrighted works — especially when some were obtained from pirate sources — is not protected by fair use.
The storage ruling has significant practical implications for AI companies. A training process that does not retain copies of the training data after the training is complete may be more legally defensible than one that maintains a persistent searchable database of the original works. AI companies that are building or operating training databases are exposed to significant copyright liability unless they obtain licenses from copyright holders.
The Broader AI Copyright Litigation Landscape
Bartz v. Anthropic is the first major judicial ruling on AI training and fair use, but it is far from the last. Parallel litigation involving other AI developers is proceeding in courts across the country:
- Authors Guild v. OpenAI — a class action alleging that GPT models were trained on millions of books without permission or compensation
- Getty Images v. Stability AI — alleging mass copying of photographic images to train Stable Diffusion
- The New York Times v. Microsoft/OpenAI — alleging that Copilot and ChatGPT reproduce verbatim passages from NYT articles, threatening the market for licensed journalism
- Music industry litigation — multiple suits alleging that AI music generation tools were trained on copyrighted recordings without licenses
What Copyright Owners Should Know
The Bartz decision is encouraging for copyright holders in one key respect: the court made clear that fair use for AI training is not a blanket defense. Storage of copyrighted works without a license remains infringement, and AI output that reproduces substantial portions of original works may independently constitute infringement regardless of whether training was fair use.
Copyright owners who believe their works have been used to train AI models without authorization should:
- Register their works with the U.S. Copyright Office to preserve eligibility for statutory damages
- Monitor AI output for verbatim or near-verbatim reproduction of their works
- Consider whether their existing licensing agreements need to be updated to explicitly address AI training uses
- Consult with a copyright attorney about potential claims under the DMCA’s anti-circumvention provisions if AI companies bypassed technical access controls to obtain their works
Contact the Attorneys at Revision Legal
If you have questions or need legal advice, contact the experienced attorneys at Revision Legal. Our team handles copyright law matters for businesses and individuals nationwide. Call us at (855) 473-8474 or use the contact form on our website.