Federal Court Orders OpenAI to Produce Millions of Anonymized ChatGPT Conversations in Copyright Litigation

By John Colascione Last updated Jan 10, 2026

A federal judge has ordered OpenAI to turn over millions of anonymized ChatGPT user conversations as part of ongoing copyright litigation brought by major news publishers. The ruling has intensified scrutiny over data use, privacy expectations, and how AI systems are trained and evaluated in court. File photo: Varavin88, licensed.

NEW YORK, NY – A U.S. federal magistrate judge in Manhattan has ruled that OpenAI must hand over approximately 20 million anonymized user chat logs to news publishers pursuing copyright claims, rejecting the company’s arguments that doing so would unduly compromise user privacy and burden its operations.

The discovery directive arises in the context of a copyright infringement suit originally filed by The New York Times and consolidated with similar claims from other media organizations, which assert that OpenAI trained its large language models on copyrighted news content without permission.

The procedural conflict dates back to mid-2025, when plaintiffs first sought access to OpenAI’s internal chat logs as part of evidence to test whether ChatGPT outputs reproduced proprietary content.

Magistrate Judge Ona T. Wang issued a production order in early November 2025, directing OpenAI to turn over a sample of 20 million consumer ChatGPT conversations de-identified to strip direct personal identifiers.

Judge Demands OpenAI to Release 20 Million Anonymized ChatGPT Chats in AI Copyright Dispute https://t.co/gb1XGK5DcR
— Andro (@AndroOxinu) January 8, 2026

wow. Maduro not only news from Moynihan Courthouse (SDNY) today. OpenAI ordered to finally turn over 20 million logs (note: de-identified sample) to plaintiffs under protective order. OpenAI has very publicly resisted two prior orders and now likely heads towards sanctions. 1/2 pic.twitter.com/8VAJne1EOl
— Jason Kint (@jason_kint) January 6, 2026

OpenAI sought to block or stay that order, arguing in court filings that users’ private conversations would still face significant privacy risks and that most of the logs were irrelevant to the plaintiffs’ claims. The company also proposed alternative methods, such as running specific keyword searches to locate relevant material, rather than transferring raw data.

In early December 2025, the judge denied OpenAI’s request for reconsideration, reaffirming the earlier directive and concluding that the data was proportional to the litigation needs and could be protected under existing safeguards and anonymization procedures.

The logs in question are drawn from user interactions spanning approximately December 2022 through November 2024 and represent a random or statistically valid sample of conversations retained by OpenAI’s systems.

While the records do not include enterprise accounts, subscription business logs, or API customer data, each selected log contains a full sequence of prompts and model responses – potentially tens of millions of individual entries.

The judge’s ruling underscores that protective orders and anonymization are intended to mitigate privacy concerns, but the court has nevertheless emphasized that relevance to the case warrants production.

OpenAI has appealed the magistrate’s production order to a district court judge, arguing that it is overly broad and requires disclosing chats unrelated to the dispute. Company representatives have described the requirement as an intrusion that may violate longstanding privacy expectations between users and the platform.

Plaintiffs, including The New York Times and others, contend that access to user logs is critical to proving their allegations that the AI system outputs text that is substantially similar to copyrighted material, which they argue undermines OpenAI’s reliance on a “fair use” defense.

Beyond this specific case, the decision has broader implications for how courts balance user privacy against evidentiary needs in litigation involving artificial intelligence and data-driven technologies. Legal analysts say it could influence future discovery disputes in similar copyright and data governance litigation.

Under the current schedule, OpenAI is expected to begin providing the de-identified logs to plaintiffs once anonymization is complete unless the company secures an emergency stay from an appellate court. The ongoing litigation is expected to proceed through 2026, with the discovery phase playing a central role in shaping arguments on both sides.

Key Facts and Details

Item	Detail
Case Type	Federal copyright infringement litigation
Primary Defendant	OpenAI
Lead Plaintiff	The New York Times (with other publishers in related actions)
Court	U.S. District Court, Southern District of New York
Presiding Magistrate Judge	Ona T. Wang
Ruling Issued	Late 2025 (order reaffirmed December 2025)
Discovery Ordered	Approximately 20 million anonymized ChatGPT user conversations
Time Period Covered	Roughly December 2022 – November 2024
Data Scope	Consumer ChatGPT logs only (no enterprise, API, or business accounts)
Privacy Safeguards	De-identification and protective order
Current Status	OpenAI appealing discovery order

Timeline of Key Developments

Date	Event
Late 2023	Major publishers, led by The New York Times, file copyright claims against OpenAI
Mid-2025	Plaintiffs request ChatGPT logs to evaluate alleged copyrighted output
Nov. 2025	Magistrate judge orders OpenAI to produce 20M anonymized chat logs
Nov.–Dec. 2025	OpenAI seeks reconsideration and stay of the order
Dec. 2025	Court denies reconsideration, reaffirming discovery directive
Jan. 2026	Appeal pending before district judge; production preparations underway