The Legal Boundaries of Generative AI: Infringement Disputes in the U.S. Patent, Trademark, and Copyright Regimes and the Training of Large Models
- Bill Deng
- Jul 30
- 5 min read
With the widespread application of generative artificial intelligence in fields such as creation, design, and content generation, the works it produces have posed unprecedented challenges to the existing U.S. patent, trademark, and copyright systems. At the same time, AI companies’ massive collection and use of materials in the training of large language models has triggered intense legal disputes concerning copyright infringement and fair use. This article aims to review relevant policy trends and key judicial cases, analyze the ownership issues of AI-generated works under the current legal framework, and examine the compliance risks and case law implications surrounding the use of training data for AI.
I. The Impact of AI-Generated Works on U.S. Patent, Trademark, and Copyright Regimes
The U.S. Copyright Office has made it clear that if the traditional elements of authorship in a work are generated by a machine rather than a human, the work will not be considered to have human authorship and is thus ineligible for copyright protection. In other words, even if a user prompts a generative AI with simple instructions to produce a complex work, this alone does not meet the threshold for copyright eligibility. Only when a human contributes substantial creative input—such as through choices in selection, editing, or arrangement—can a work potentially qualify as AI-assisted creation and be eligible for registration. As of February 2024, the Copyright Office has registered over 100 AI-assisted works and has emphasized that applicants are obligated to disclose AI usage; failure to do so may result in cancellation of registration.
In the patent system, although AI may generate vast quantities of publications that lack practical applicability or technical teaching, these works may still be cited as prior art, posing significant burdens for patent applications. As AI generation technology continues to evolve, its output is increasingly likely to be deemed as enabling technical disclosure, presenting new challenges for the entire patent examination system.
Within the patent domain, the United States Patent and Trademark Office (USPTO) reiterates that under current law, patent inventors must be natural persons. In the case of Thaler v. Vidal, the Federal Circuit Court ruled that AI systems cannot be listed as inventors, a position that remains dominant. The USPTO advises applicants to clearly document the human contribution in AI-assisted inventions to prepare for potentially stricter reviews. Meanwhile, as AI is widely employed in patent searches and prior art analysis, relevant practices are also evolving rapidly.
In the area of trademark protection, if AI-generated images incorporate unauthorized well-known brand elements (e.g., Disney princesses or Minions), they may constitute trademark infringement or dilution under Section 1125 of the Lanham Act. Notably, Disney and Universal have filed lawsuits against image generator company Midjourney, accusing its system of being a "relentless plagiarism machine." Although no verdict has been issued, the case is expected to become a landmark in distinguishing fair use, transformative expression, and trademark infringement. Additionally, when AI outputs closely resemble actual brand identifiers (e.g., logos or slogans), courts may apply the consumer confusion test established in Polaroid Corp. v. Polarad Elecs. Corp.
Separately, AI voice generation raises issues at the intersection of intellectual property and personality rights. In Lehrman v. Lovo Inc., two voice actors sued the AI voice company Lovo for using their voices in training without permission. Although the court dismissed most copyright and trademark claims, it allowed the right of publicity claims to proceed and granted the plaintiffs an opportunity to amend their copyright infringement allegations. This highlights the legal gaps in current trademark and copyright laws in the context of AI.
II. Infringement Disputes in the Training of Large Language Models
As Large Language Model (LLM) training increasingly depends on massive text corpora, copyright disputes between AI companies and content owners have grown more frequent—particularly focusing on whether the training process constitutes “copying” and whether it qualifies as “fair use.” In the Anthropic case, a California judge held that the company’s use of millions of books to train the Claude model constituted a classic example of transformative use, and thus fell under fair use. However, the judge also noted that Anthropic had no right to use pirated copies to build its training set, meaning the case is still scheduled for a formal trial in December 2024.
Meta faces similar allegations. A group of authors sued over its use of the LibGen database—a massive repository of pirated books—to train the LLaMA model. While most claims were dismissed, the court pointed out that the plaintiffs failed to demonstrate how the AI outputs harmed the market for the human-authored works and provided no substantial evidence. The judge remarked that although market substitution could be “the most promising argument,” the plaintiffs barely addressed it in this case.
In Thomson Reuters v. Ross Intelligence, the court analyzed the four fair use factors. Even though Ross only used headnotes from legal cases as indexing data and didn’t output them directly, the court found its product lacked transformative character and was commercial in nature, thus not protected by fair use. The court emphasized the fourth factor—the effect on the market for the original—and stated that allowing Ross’s use would undermine Thomson Reuters' business model of licensing data to AI developers, setting a significant precedent against fair use defenses in AI training.
In Authors Guild et al. v. OpenAI, the plaintiffs argued that OpenAI copied and used full books for training without authorization, constituting both direct and indirect infringement, and alleged that OpenAI removed copyright management information, violating Section 1202 of the DMCA. In March 2024, the court dismissed the indirect infringement and secondary liability claims, but allowed the core direct infringement claim to proceed.
The lawsuit New York Times Company v. Microsoft is seen as the first major case filed by a mainstream media organization against LLM training practices. It accuses Microsoft and partner OpenAI of using New York Times content to train their models and producing outputs that closely resemble original articles when prompted, thereby damaging the company’s subscription and licensing models. The case remains ongoing, but its broad implications may directly shape the copyright boundaries of AI content generation in the future.
In terms of legal principles, Section 106 of U.S. copyright law states that even temporary copying, such as content stored in RAM, can constitute an act of reproduction. The fair use doctrine under Section 107 considers four factors:
Purpose and character of the use, especially whether it is transformative;
Nature of the original work;
Amount and substantiality of the portion used;
Effect on the market value of the original work.
Among these, the fourth factor has recently been emphasized by courts as the most critical.
As for case law:
In Google LLC v. Oracle America, Inc., the Supreme Court found that Google’s use of Java API was transformative and served the public interest, constituting fair use.
In Authors Guild v. Google, Inc., the court ruled that Google's scanning of books for search indexing purposes did not substitute for the originals and was protected by fair use.
Supporters of AI development frequently cite these precedents to legitimize LLM training. However, rights holders counter that LLM outputs can functionally replace and replicate human works, thereby exceeding fair use boundaries.
Conclusion
Under current U.S. intellectual property law, AI-generated content remains in a legally ambiguous zone. Human involvement is widely regarded as a prerequisite for protection under copyright, patent, and trademark laws, making purely AI-generated content ineligible for legal recognition. At the same time, AI companies' use of unauthorized materials in training has triggered a growing wave of litigation. In the future, the rulings in cases such as those involving The New York Times, Anthropic, and OpenAI are expected to play a crucial role in balancing copyright infringement, fair use, and public interest—and will directly impact the compliance roadmap and innovation space for the generative AI industry.
(Disclaimer: The information provided is for reference only and should not be regarded as a legal basis or advice on any topic. All rights reserved. Reproduction requires permission from Allbelief Law Firm.)
Comments