Anthropic contends that its use and storage of the books fall under fair use as defined by Section 107 of the Copyright Act. The fair use doctrine allows the use of copyrighted works without permission if certain criteria are met. The Copyright Act lists four factors to be considered in determining whether a given use is fair, with no single factor being decisive on its own:
- The purpose and character of the use;
- The nature of the copyrighted work;
- The amount and substantiality of the portion used in relation to the entire work;
- The effect of the use on the potential market for or value of the copyrighted work.
In this case, the Court found two distinct uses: (i) creating a comprehensive central library of potentially useful content and (ii) training large language models (LLMs) using that library's content.
Training LLMs through copies of books
The Court ruled that using copies to train LLMs qualifies as fair use based on an assessment of those four factors. The first, third and fourth factor weighing in favor of fair use, while the second factor, often seen as the least important factor, opposes it because the authors’ work are considered expressive and are thus highly protected.
In reaching his final decision, Judge Alsup highlighted the first fair use factor: the purpose and character of the use. This factor favors fair use when the new work is transformative, meaning it serves a new and different function. Judge Alsup described Anthropic's use of the authors' books as “exceedingly transformative”, making it permissible under U.S. copyright law. He further noted that the outputs generated by Claude did not infringe on the authors' rights, as none of their works were made public in a manner that violated their works, which the authors acknowledged and never raised as a claim.
Creating a comprehensive central library
Regarding the other use, creating a comprehensive central library, the Court distinguished between purchased and pirated copies. The purchased copies, which Anthropic converted from print to digital, were found to be justifiable for a different fair use (that of format change) since the original print copies were destroyed in the process and the digital versions were not shared.
In contrast, the pirated copies used to build the library were not justified as fair use. All factors weighed against fair use in this instance. Anthropic's employees indicated they would retain these pirated copies indefinitely for general purposes, even after deciding not to use them for training LLMs. The Court emphasized that each use requires its own justification, and Anthropic failed to provide a valid reason for keeping the pirated copies beyond convenience and cost savings.