In a surprising turn of events, engineers at OpenAI have accidentally deleted data potentially relevant to the copyright lawsuit against the company. The lawsuit, filed by The New York Times and Daily News, alleges that OpenAI scraped their works to train its AI models without permission.As part of the lawsuit, OpenAI had agreed to provide two virtual machines for the publishers' counsel to search for copyrighted content in its AI training sets. However, on November 14, OpenAI engineers erased all the search data stored on one of the virtual machines, according to a letter filed in the U.S. District Court for the Southern District of New York.Although OpenAI was able to recover most of the data, the folder structure and file names were lost, rendering the recovered data unusable. This incident has sparked concerns over OpenAI's ability to search its own datasets for potentially infringing content, with the plaintiffs' counsel arguing that OpenAI is best positioned to do so using its own tools.The case raises important questions about fair use and AI model training, with OpenAI maintaining that training models using publicly available data is fair use. However, the company has also inked licensing deals with several publishers, including The Associated Press and Financial Times, although the terms of these deals remain undisclosed.The incident highlights the need for clearer guidelines on AI model training and copyright law, as the tech industry continues to grapple with the implications of AI-generated content.
As part of the lawsuit, OpenAI had agreed to provide two virtual machines for the publishers' counsel to search for copyrighted content in its AI training sets. However, on November 14, OpenAI engineers erased all the search data stored on one of the virtual machines, according to a letter filed in the U.S. District Court for the Southern District of New York.
Although OpenAI was able to recover most of the data, the folder structure and file names were lost, rendering the recovered data unusable. This incident has sparked concerns over OpenAI's ability to search its own datasets for potentially infringing content, with the plaintiffs' counsel arguing that OpenAI is best positioned to do so using its own tools.
The case raises important questions about fair use and AI model training, with OpenAI maintaining that training models using publicly available data is fair use. However, the company has also inked licensing deals with several publishers, including The Associated Press and Financial Times, although the terms of these deals remain undisclosed.
The incident highlights the need for clearer guidelines on AI model training and copyright law, as the tech industry continues to grapple with the implications of AI-generated content.