OpenAI Deleted Evidence in Copyright Lawsuit, Says NYT

Spread the love

Share It:

ChatGPT Perplexity WhatsApp LinkedIn X Grok Google AI

Astrophysicist Stephen Hawking shared a thought-provoking hypothetical scenario with Last Week Tonight’s John Oliver nearly a decade ago, illustrating the potential risks associated with artificial intelligence. In this cautionary tale, a team of scientists constructs an ultra-intelligent computer and poses the question, “Is there a God?” The computer’s reply is both chilling and unsettling: “There is now.” Following this, a sudden bolt of lightning strikes, severing the power and preventing any possibility of shutting down the machine. This metaphor serves as a stark reminder of the unpredictable nature of advanced AI, especially in light of recent controversies surrounding OpenAI and the missing evidence in the New York Times’ ongoing plagiarism lawsuit.

According to a report from Wired, a recent court declaration filed by the New York Times reveals that OpenAI’s engineers unintentionally erased crucial evidence related to the AI’s training data, which had required extensive effort and time to compile. While OpenAI has managed to recover some of this data, they have not restored the original file names and folder structures that could clarify when the AI incorporated content from the Times into its training models. This missing information is vital for understanding the extent of the alleged copyright infringements.

In response to the NYT’s allegations, OpenAI spokesperson Jason Deutrom firmly disagreed, stating that the company plans to submit their official response soon. This legal battle has been ongoing since December of the previous year, where the Times has been contesting OpenAI and Microsoft over accusations of copyright violation involving the AI models developed by OpenAI.

The lawsuit currently resides in the discovery phase, which is a critical period where both parties exchange evidence to strengthen their cases ahead of trial. OpenAI was required to provide its training data to the Times, yet the specific details regarding the data utilized to create the AI models remain undisclosed to the public. This lack of transparency raises ongoing questions regarding the ethical implications of AI development.

As part of their compliance, OpenAI established a “sandbox” environment consisting of two virtual machines to facilitate the NYT’s legal team’s research efforts. The legal team devoted over 150 hours examining the data on one of the machines before discovering that the data had been deleted. OpenAI has acknowledged the deletion incident, labeling it as a “glitch.” Despite attempts by OpenAI engineers to rectify the situation, the restored data did not contain the NYT’s proprietary content, forcing the Times to essentially reconstruct their findings from the ground up. The NYT’s legal representatives have expressed that they have no grounds to suspect the deletion was deliberate.