2023 marked the rise of generative AI and 2024 could well be the year its makers reckon with the technology’s fallout of the industry-wide arms race. Currently, OpenAI is aggressively pushing back against recent lawsuits’ claims that its products including ChatGPT are illegally trained on copyrighted texts. What’s more, the company is making some bold legal claims as to why their programs should have access to other people’s work.
[Related: Generative AI could face its biggest legal tests in 2024.]
In a blog post published on January 8, OpenAI accused The New York Times of “not telling the full story” in the media company’s major copyright lawsuit filed late last month. OpenAI argues that their programs scraping of online works falls within the purview of “fair use.” The company further claims that it currently collaborates with various news organizations on dataset partnerships, excluding The Times, and dismisses any “regurgitation” of copyrighted material as a “rare bug” they are working to eliminate. This is attributed to “memorization” issues that can be more common when content appears multiple times within training data, such as if it can be found on “lots of different public websites.”
“The principle that training AI models is permitted as a fair use is supported by a wide range of [people and organizations],” OpenAI representatives wrote in Monday’s post, linking out to recently submitted comments from several academics, startups, and content creators to the US Copyright Office.
In a letter of support filed by Duolingo, for example, the language learning software company wrote that it believes that “Output generated by an AI trained on copyrighted materials should not automatically be considered infringing—just as a work by a human author would not be considered infringing merely because the human author had learned how to write through reading copyrighted works.” On Monday, Duolingo confirmed to Bloomberg it has laid off approximately 10 percent of its contractors, citing its increased reliance on AI.
On December 27, The New York Times sued both OpenAI and Microsoft—which currently utilizes the former’s GPT in products like Bing—for copyright infringement. Court documents filed by The Times claim OpenAI trained its generative technology on millions of the publication’s articles without permission or compensation. Products like ChatGPT are now allegedly used in lieu of their source material at a detriment to the media company. More readers opting for AI news summaries presumably means less readers subscribing to source outlets, argues The Times.
The New York Times lawsuit is only the latest in a string of similar filings claiming copyright infringement, including one on behalf of notable writers, as well as another for visual artists.
Meanwhile, OpenAI is lobbying government regulators over their access to copyrighted material. According to The Telegraph on January 7,

