Potential Evidence Deleted by OpenAI in NYT Copyright Case

Spread the love

Share It:

ChatGPT Perplexity WhatsApp LinkedIn X Grok Google AI

OpenAI’s Accidental Deletion of Critical Evidence in Copyright Case with The New York Times

OpenAI may have unintentionally deleted crucial data that is pivotal to its ongoing copyright lawsuit initiated by the New York Times. This incident has raised serious concerns about the handling of sensitive information within the context of legal proceedings.

According to a report from TechCrunch, legal representatives for the Times and its co-plaintiff, Daily News, have communicated with the judge overseeing the case. They detailed how “an entire week’s worth of its experts’ and lawyers’ work” was “irretrievably lost.” OpenAI had allocated two dedicated virtual machines to the plaintiffs to assist in investigating alleged copyright violations. Alarmingly, the letter states that on November 14, OpenAI engineers erased “programs and search result data” stored on one of these virtual machines, complicating the case significantly.

SEE ALSO:

OpenAI says over 2 million people consulted ChatGPT for the 2024 election

The Times has accused OpenAI, along with Microsoft— which utilizes OpenAI’s models for its Bing AI chatbot— of committing copyright infringement by training its models on paywalled and unauthorized content. The lawsuit outlines numerous instances of “near-verbatim” copying within ChatGPT responses. In response, OpenAI has strongly denied these allegations, asserting that their models were trained exclusively on publicly available data, thereby qualifying as fair use under copyright legislation. The crux of this case revolves around whether the Times can substantiate claims that OpenAI’s models replicated and utilized its content without proper compensation or acknowledgment.

Mashable Light Speed

While OpenAI successfully recovered a significant portion of the erased data, the “folder structure and file names” of the work were lost, rendering the data unusable for the ongoing legal process. Consequently, the plaintiffs’ legal team must restart their evidence-gathering efforts from square one. In their correspondence, they indicated that there is “no reason to believe [the erasure] was intentional,” yet they also emphasized that “OpenAI is in the best position to search its own datasets.” However, the AI company has refrained from disclosing any specifics regarding its training data practices, which adds another layer of complexity to the situation.

OpenAI faces multiple copyright claims from various parties. Notably, a lawsuit from Raw Story and AlterNet was recently dismissed, as the plaintiffs failed to demonstrate sufficient harm to support their allegations. In contrast, OpenAI has proactively engaged in licensing agreements with several media organizations, allowing them to use their content for training purposes and to provide ChatGPT responses that include proper citations. A recent report by Adweek indicated that OpenAI is paying publishing giant Dotdash Meredith at least $16 million annually to license its content, reflecting a strategic move to navigate the complex landscape of copyright and content usage.