Nine Newspapers Sue OpenAI and Microsoft, Escalating a High-Stakes Fight Over AI Training and Copyright

36
Modern Microsoft office building in Silicon Valley, shown amid rising legal scrutiny over the company’s partnership with OpenAI and the alleged use of copyrighted news content to train AI models.
Microsoft is a co-defendant in multiple lawsuits alleging that AI systems trained on copyrighted news content without permission. The company’s deep financial partnership with OpenAI places it at the heart of the growing legal battle over how AI models are built and what sources they can lawfully use. File photo: bluestork, licensed.

NEW YORK, NY – Nine regional news organizations have filed a sweeping copyright lawsuit against OpenAI and Microsoft, marking the latest and one of the most aggressive legal challenges to the fast-growing artificial intelligence industry. The complaint, filed in New York federal court, alleges that hundreds of thousands of copyrighted news articles were copied, scraped, and ingested without permission to build the datasets used to train large-language models powering products such as ChatGPT and Microsoft Copilot.

According to reporting from several outlets, the plaintiffs are seeking more than $10 billion in damages, arguing the alleged use amounts to wholesale intellectual property theft on a scale never before seen in American publishing.

The group of plaintiffs includes papers such as The Virginian-Pilot, Los Angeles Daily News, Boston Herald, and six other regional outlets. Their lawsuit claims the companies replicated their journalism, including paywalled stories, and reused it to train AI systems that now generate summaries, explanations, and conversational responses that compete with the original reporting. According to the filing, the companies “copied and reproduced” the material without consent, bypassing subscription walls and contractual protections in a way the plaintiffs argue undermines the economic foundation of local news.

This latest case arrives amid an expanding wave of copyright actions targeting OpenAI and Microsoft from across the media landscape. In the past two years, novelists, online news sites, and major national publishers have launched similar suits, each alleging that their work was taken without permission to train artificial intelligence models. The most prominent of these is the lawsuit filed by The New York Times in 2023, which accuses the companies of ingesting millions of Times articles, allegedly enabling AI systems to answer questions using the substance of Times reporting while circumventing the newspaper’s subscription business. That case helped set the stage for a broader examination of AI training practices and the legality of using copyrighted materials at scale.

Other media groups have followed. A coalition of eight newspapers filed suit in early 2024, and several digital outlets, including The Intercept, Raw Story, and AlterNet, brought their own claims later that year. Major digital publisher Ziff Davis, owner of IGN, PCMag, and CNET, joined the fray in spring 2025. And in April of this year, the U.S. Judicial Panel on Multidistrict Litigation ordered many of these cases consolidated in Manhattan federal court, determining they share core questions of law and fact about whether large-scale data ingestion constitutes infringement.

Collectively, the lawsuits challenge one of the foundational practices of modern AI development: the widespread use of online text, including news articles, books, and blogs, to train models capable of generating human-like responses. OpenAI and Microsoft have argued that using publicly available material for training is lawful under the doctrine of fair use, which allows limited reproduction of copyrighted content for transformative purposes. They also contend that their systems do not reproduce copyrighted works verbatim unless prompted in highly specific ways, and that the models learn patterns rather than storing entire articles. However, academic studies and court filings in several cases suggest that model “memorization” does occur, raising questions about whether training data can be extracted or reconstructed in ways that infringe copyrights.

The rising legal pressure comes as judges have begun rejecting efforts by the companies to dismiss key copyright claims. In late 2025, a federal judge refused to dismiss major portions of the authors’ lawsuit against OpenAI, allowing core infringement questions to move forward. The Times case and several media lawsuits have similarly survived early challenges, keeping the legal issues alive as discovery expands.

For local and regional news outlets, the latest lawsuit reflects growing concern about the economic impact AI-generated content may have on an already fragile industry. Many regional papers have endured years of shrinking advertising revenue, newsroom cuts, and ownership consolidation. The plaintiffs argue that AI companies are now exploiting the very journalism that local outlets struggle to fund, potentially accelerating the decline of independent reporting if courts determine the practice is unlawful.

The stakes extend beyond journalism. A ruling that mass ingestion of copyrighted content violates the law could force AI companies to remake their training pipelines, secure large-scale licenses, or restrict the data they use. Conversely, a broad fair-use ruling could set a permissive precedent, allowing AI developers to continue using vast amounts of copyrighted material without compensation – an outcome that would reshape the economics of writing, publishing, research, and online content creation.

As the lawsuits advance, courts are being asked to navigate issues that have no clear precedent: whether copying for machine training constitutes infringement; whether generative models are transformative enough to qualify for fair use; and how much risk of market harm exists when AI systems can deliver factual summaries or paraphrased interpretations that compete with the original work. It may take years, multiple appeals, or even intervention from Congress before the boundaries of AI training are fully defined.

For OpenAI and Microsoft, the cumulative legal exposure is enormous. Beyond the billions sought in damages, the companies face potential court-ordered restrictions, mandatory licensing requirements, and intense scrutiny over how their models were built. For the media industry, the legal outcomes could determine whether journalism remains a protected commercial product or becomes raw material for AI systems free of compensation.

Timeline of Major Copyright Lawsuits Against OpenAI & Microsoft (2023–2025)

DateEvent
Sept. 19, 2023A group of authors files a class-action lawsuit alleging their books were copied without permission to train OpenAI’s models.
Dec. 27, 2023The New York Times files a landmark lawsuit accusing OpenAI and Microsoft of ingesting millions of articles without authorization.
Feb.–Apr. 2024Multiple digital news outlets – including The Intercept, Raw Story, and AlterNet – file suits alleging unauthorized use of their journalism in AI training.
Apr. 30, 2024Eight major U.S. newspapers sue OpenAI and Microsoft, claiming large-scale infringement of protected news articles.
Apr. 3, 2025The U.S. Judicial Panel on Multidistrict Litigation consolidates numerous copyright lawsuits into one federal docket in Manhattan.
April 2025Digital publisher Ziff Davis (owner of IGN, PCMag, and CNET) files its own copyright infringement lawsuit.
Late 2025A federal judge rejects key dismissal motions, allowing major copyright claims-including the authors’ case – to proceed.
Nov. 2025Nine regional newspapers file a new lawsuit seeking more than $10 billion in damages for alleged unauthorized ingestion of their articles.

Q&A: What a Major Copyright Ruling Could Mean for OpenAI, Microsoft, and the Future of AI

Why are OpenAI and Microsoft facing so many copyright lawsuits at the same time?

The AI industry grew faster than existing copyright law, and many media companies argue that their articles, books, and archives were copied without permission to train large-language models. Because OpenAI and Microsoft built some of the world’s most widely used AI systems, they have become central defendants in multiple cases that raise nearly identical legal questions. Courts have since consolidated many of those lawsuits into a single federal docket in Manhattan.

What exactly are the news outlets claiming?

The plaintiffs argue that OpenAI and Microsoft ingested their work – including paywalled articles – into massive training datasets without consent. They claim this copying allowed AI systems like ChatGPT and Copilot to reproduce the substance of their reporting, undermining subscription-based business models and violating copyright law. Some lawsuits also allege that the companies copied the material multiple times as the models were retrained.

How much financial exposure could the companies face if they lose?

The damages sought across all lawsuits are not fully calculated, but they extend into the tens of billions of dollars, with one recent lawsuit alone seeking over $10 billion. The final amount would depend on how many infringements courts recognize and whether damages are assessed per article, per dataset, or per act of copying. Even under conservative scenarios, the total exposure would be one of the largest intellectual-property liabilities ever imposed on a tech company.

Would OpenAI and Microsoft actually have to pay all of that at once?

Almost certainly not. In large-scale litigation involving multiple plaintiffs, courts typically encourage negotiated global settlements rather than immediate lump-sum payments. These settlements often include multi-year or even multi-decade payout structures, similar to how tobacco companies, pharmaceutical firms, and energy corporations resolved major liabilities. Structured agreements help protect ongoing operations and prevent sudden financial shocks.

Could Microsoft end up covering most of the bill?

Very likely. Microsoft is a co-defendant in many cases, has invested heavily in OpenAI, and directly integrates OpenAI technology into its products. As one of the wealthiest companies in the world with tens of billions in cash reserves, it is positioned to absorb even extremely large payments. If the combined damages exceed OpenAI’s financial capacity, Microsoft would almost certainly step in as the primary payer or backstop.

Would OpenAI or Microsoft be at risk of collapsing?

A collapse is highly unlikely. OpenAI, on its own, could face severe pressure depending on the outcome, but Microsoft has enough financial strength to stabilize the situation. If OpenAI ever faced a liability it could not meet independently, Microsoft or another major technology company could acquire more equity, provide financing, or fully absorb OpenAI’s operations. The intellectual property and talent are too valuable to abandon.

How would AI training practices have to change if the plaintiffs win?

A ruling against the companies could force a shift toward high-cost licensing, publisher partnerships, or compulsory rights agreements. Instead of treating the open web as freely accessible training data, developers might be required to pay publishers for datasets or join industry-wide licensing programs. Some experts compare this potential shift to what happened in the music streaming industry, where legal rulings transformed songs from “freely copied” files into paid, regulated assets.

Could AI companies still train on copyrighted content at all?

Yes, but only if they negotiate rights. Courts may decide that existing data-collection methods violated copyright law but leave the door open for licensed training. Large publishers are already signaling interest in collective licensing systems similar to ASCAP or BMI, which issue music licenses. AI companies might ultimately treat journalism the same way the music industry treats songs – an asset that requires permission and payment.

What would the ruling mean for smaller publishers or independent websites?

If courts impose licensing requirements, smaller publishers may benefit from new revenue streams. Even modest fees for dataset inclusion could become meaningful funding sources for local or niche outlets. However, enforcement challenges remain: smaller sites may need industry organizations to negotiate on their behalf because individual licensing would be too complex to manage.

Is it possible that Congress steps in to clarify the law?

Yes. Judges have noted that copyright law did not anticipate machine-learning systems capable of absorbing billions of documents. If rulings across the country diverge – or if liabilities become too large for courts to handle piecemeal – Congress could create new rules governing AI training, similar to how lawmakers codified rules for cable retransmission, music royalties, and digital copying in earlier eras of technology.

Would end users of AI tools notice any immediate changes?

Probably not right away. Even if OpenAI and Microsoft lose, the fallout would unfold gradually as settlement negotiations take place. Over time, users might see changes in how AI systems cite sources, how they summarize news, or what content they refuse to reproduce. They may also see subscription-based enhancements tied to licensed news databases.

What is the long-term impact if the plaintiffs succeed?

A broad ruling against OpenAI and Microsoft would reshape the economics of AI. Training data would shift from a free resource to a paid commodity, and the legal framework for AI development would more closely resemble the regulated models used in music, television, and digital publishing. For the news industry, the outcome could revive or stabilize revenue models. For AI companies, it would make development more expensive – but it would not halt progress altogether.

Comment via Facebook

Corrections: If you are aware of an inaccuracy or would like to report a correction, we would like to know about it. Please consider sending an email to [email protected] and cite any sources if available. Thank you. (Policy)