On May 9, 2025, the U.S. Copyright Office released a report on how copyright laws are likely violated by artificial intelligence (“AI”) modules and training procedures. See here. For the relevant portion of the Report, see here.
The Report is considered important because there are currently dozens of pending lawsuits around the country — and internationally — between copyright holders and the tech firms that are creating and training AI programs. The report will certainly be read and referenced by the litigants and judges involved in those cases.
It is acknowledged by everyone involved that the tech companies download millions of copyrighted works — like books, magazines, newspapers, music, etc. — for the purposes of “training” an AI module/program. From this training, an AI program is then supposedly able to mimic the style of an author, a songwriter, and others as consumer/user-requested “output.” Copyright holders are arguing that this process violates their property rights related to reproduction and derivative works. As the Copyright Office’s Report states: “The Copyright Act grants copyright owners a set of exclusive rights: to reproduce, distribute, publicly perform, and publicly display their works, as well as the right to prepare derivative works.” The legal response from the AI tech firms is to argue that no infringement is occurring or that, if there is infringement, the infringement is protected by the fair use doctrine. The Copyright Office’s Report weighs in against those arguments.
First, the Report makes the case that prima facie infringement is occurring. According to the Report, this is based on three basic concepts, including:
- The AI module must reproduce — copy — all of the copyrighted materials as part of the training process — the Report emphasizes that the copying often occurs many times as the AI module “… transfer[s] them across storage mediums; convert[s] them to different formats; and creat[es] modified versions or includ[es] them in filtered subsets; this is prima facie violation of the bar against reproduction
- Some AI modules retain resident copies as part of their programming — again, a prima facie violation of the reproduction bar
- During the training process, AI modules modify the original works as part of the process of weighting and perfecting — according to the Report, these modifications create derivative works violating copyright holders’ rights with respect to derivative works
With respect to fair use, the Report rejects the arguments being made by the tech firms. As the Report states, “fair use” is a set of judge-made rules that allow for certain uses of copyrighted materials even if the use is infringement. Typically, a list of four non-exclusive factors is examined:
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes — included in this factor is the idea that “transformative” works are “fair use”
(2) the nature of the copyrighted work
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole and
(4) the effect of the use upon the potential market for or value of the copyrighted work
The Report rejects the tech firms’ arguments that the training procedures are “fair use.” One argument made is that everything done by the AI programs is “transformative.” This idea is soundly rejected by the Copyright Office. The Report also focused heavily on the fourth factor, suggesting that allowing AI programs to function would substantially reduce the potential market for and value of true and authentic works of authorship.
Contact the Internet Law and Copyright Attorneys at Revision Legal
For more information, contact the experienced Internet Law and Copyright Lawyers at Revision Legal. You can contact us through the form on this page or call (855) 473-8474.
The Four-Factor Fair Use Analysis Applied to AI Training
The fair use doctrine, codified at 17 U.S.C. § 107, requires courts to evaluate four non-exclusive factors: (1) the purpose and character of the use; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used; and (4) the effect of the use on the potential market for the original work. The Copyright Office’s May 2025 report analyzed all four factors with respect to AI training.
On the first factor — purpose and character — the Office acknowledged the “transformative use” argument made by AI developers: that ingesting millions of works to learn patterns and generate new outputs is analogous to a student reading books to learn to write. However, the Office expressed skepticism about whether AI training is truly transformative when the output directly competes with and substitutes for the original works in the market. The Office also noted that the commercial nature of AI development weighs against fair use.
On the fourth factor — market harm — the Office found this to be the most significant factor weighing against AI companies. The Office noted that AI-generated text, images, and music can directly substitute for licensed content, depressing the market for the original works and undermining licensing markets that copyright holders have a right to develop.
The Storage Problem: Why the Database Is the Key Issue
While courts have begun to grapple with whether training itself constitutes infringement, the storage question is more straightforward as a matter of copyright law. When an AI company downloads millions of books, scans them, and stores them in a digital database, each of those acts independently implicates the copyright holder’s exclusive right of reproduction under 17 U.S.C. § 106(1).
The district court in Bartz v. Anthropic ruled that maintaining a persistent database of copyrighted works — especially when those works were obtained from pirate sites — is not protected by fair use. This is consistent with how courts have treated digital reproduction in other contexts. In A&M Records, Inc. v. Napster, Inc., 239 F.3d 1004 (9th Cir. 2001), the court held that storing and transmitting copyrighted works without permission was infringement regardless of the ultimate purpose.
Implications for Pending AI Copyright Litigation
The Copyright Office’s report and the Bartz v. Anthropic decision are the first significant judicial and administrative guidance in a wave of litigation that includes:
- Getty Images (US), Inc. v. Stability AI, Ltd. — alleging that Stability AI copied millions of Getty images to train its image-generation model
- Authors Guild v. OpenAI — a class action by authors alleging that OpenAI’s GPT models were trained on their copyrighted books without permission
- The New York Times Co. v. Microsoft Corp. — alleging that Microsoft’s Copilot and OpenAI’s ChatGPT reproduce verbatim passages from NYT articles
- Music industry litigation against AI music-generation services alleging infringement of recorded songs and underlying musical compositions
The outcomes of these cases will define the boundaries of the AI industry’s license to use existing creative works. The Copyright Office’s report signals that federal copyright law, as currently written, provides meaningful protections to copyright holders — and that legislative change, rather than litigation losses, may be required if AI companies want a broad fair use shield for training.
What Copyright Holders Should Do Now
If you are a creator, publisher, or business that owns copyrighted works, the developing AI copyright landscape creates both risks and opportunities:
- Register your copyrights with the U.S. Copyright Office — registration is a prerequisite to suing for infringement in federal court and is required to claim statutory damages
- Review your existing licensing agreements to determine whether they permit licensees to use your works for AI training purposes
- Consider opt-out mechanisms where available — some AI companies have begun offering tools for copyright holders to request exclusion of their works from training datasets
- Monitor for AI-generated output that reproduces or is substantially similar to your protected works
- Consult with copyright counsel about whether to join existing class actions or pursue individual claims
Contact the Attorneys at Revision Legal
If you have questions or need legal advice, contact the experienced attorneys at Revision Legal. Our team handles copyright law matters for businesses and individuals nationwide. Call us at (855) 473-8474 or use the contact form on our website.