Federal Court Orders OpenAI to Produce Millions of Anonymized ChatGPT Conversations in Copyright Litigation

68
OpenAI logo shown on a smartphone screen with a laptop in the background, representing artificial intelligence and ongoing copyright litigation involving ChatGPT user data.
A federal judge has ordered OpenAI to turn over millions of anonymized ChatGPT user conversations as part of ongoing copyright litigation brought by major news publishers. The ruling has intensified scrutiny over data use, privacy expectations, and how AI systems are trained and evaluated in court. File photo: Varavin88, licensed.

NEW YORK, NY – A U.S. federal magistrate judge in Manhattan has ruled that OpenAI must hand over approximately 20 million anonymized user chat logs to news publishers pursuing copyright claims, rejecting the company’s arguments that doing so would unduly compromise user privacy and burden its operations.

The discovery directive arises in the context of a copyright infringement suit originally filed by The New York Times and consolidated with similar claims from other media organizations, which assert that OpenAI trained its large language models on copyrighted news content without permission.

The procedural conflict dates back to mid-2025, when plaintiffs first sought access to OpenAI’s internal chat logs as part of evidence to test whether ChatGPT outputs reproduced proprietary content.

Magistrate Judge Ona T. Wang issued a production order in early November 2025, directing OpenAI to turn over a sample of 20 million consumer ChatGPT conversations de-identified to strip direct personal identifiers.

OpenAI sought to block or stay that order, arguing in court filings that users’ private conversations would still face significant privacy risks and that most of the logs were irrelevant to the plaintiffs’ claims. The company also proposed alternative methods, such as running specific keyword searches to locate relevant material, rather than transferring raw data.

In early December 2025, the judge denied OpenAI’s request for reconsideration, reaffirming the earlier directive and concluding that the data was proportional to the litigation needs and could be protected under existing safeguards and anonymization procedures.

The logs in question are drawn from user interactions spanning approximately December 2022 through November 2024 and represent a random or statistically valid sample of conversations retained by OpenAI’s systems.

While the records do not include enterprise accounts, subscription business logs, or API customer data, each selected log contains a full sequence of prompts and model responses – potentially tens of millions of individual entries.

The judge’s ruling underscores that protective orders and anonymization are intended to mitigate privacy concerns, but the court has nevertheless emphasized that relevance to the case warrants production.

OpenAI has appealed the magistrate’s production order to a district court judge, arguing that it is overly broad and requires disclosing chats unrelated to the dispute. Company representatives have described the requirement as an intrusion that may violate longstanding privacy expectations between users and the platform.

Plaintiffs, including The New York Times and others, contend that access to user logs is critical to proving their allegations that the AI system outputs text that is substantially similar to copyrighted material, which they argue undermines OpenAI’s reliance on a “fair use” defense.

Beyond this specific case, the decision has broader implications for how courts balance user privacy against evidentiary needs in litigation involving artificial intelligence and data-driven technologies. Legal analysts say it could influence future discovery disputes in similar copyright and data governance litigation.

Under the current schedule, OpenAI is expected to begin providing the de-identified logs to plaintiffs once anonymization is complete unless the company secures an emergency stay from an appellate court. The ongoing litigation is expected to proceed through 2026, with the discovery phase playing a central role in shaping arguments on both sides.

Key Facts and Details

ItemDetail
Case TypeFederal copyright infringement litigation
Primary DefendantOpenAI
Lead PlaintiffThe New York Times (with other publishers in related actions)
CourtU.S. District Court, Southern District of New York
Presiding Magistrate JudgeOna T. Wang
Ruling IssuedLate 2025 (order reaffirmed December 2025)
Discovery OrderedApproximately 20 million anonymized ChatGPT user conversations
Time Period CoveredRoughly December 2022 – November 2024
Data ScopeConsumer ChatGPT logs only (no enterprise, API, or business accounts)
Privacy SafeguardsDe-identification and protective order
Current StatusOpenAI appealing discovery order

Timeline of Key Developments

DateEvent
Late 2023Major publishers, led by The New York Times, file copyright claims against OpenAI
Mid-2025Plaintiffs request ChatGPT logs to evaluate alleged copyrighted output
Nov. 2025Magistrate judge orders OpenAI to produce 20M anonymized chat logs
Nov.–Dec. 2025OpenAI seeks reconsideration and stay of the order
Dec. 2025Court denies reconsideration, reaffirming discovery directive
Jan. 2026Appeal pending before district judge; production preparations underway
Comment via Facebook

Corrections: If you are aware of an inaccuracy or would like to report a correction, we would like to know about it. Please consider sending an email to [email protected] and cite any sources if available. Thank you. (Policy)