By The Atlantic
Published on January 11, 2024
A technical problem known as “memorization” is at the heart of recent lawsuits that pose a significant threat to generative-AI companies.
Earlier this week, the Telegraph reported a curious admission from OpenAI, the creator of ChatGPT. In a filing submitted to the U.K. Parliament, the company said that “leading AI models” could not exist without unfettered access to copyrighted books and articles, confirming that the generative-AI industry, worth tens of billions of dollars, depends on creative work owned by other people.
We already know, for example, that pirated-book libraries have been used to train the generative-AI products of companies such as Meta and Bloomberg. But AI companies have long claimed that generative AI “reads” or “learns from” these books and articles, as a human would, rather than copying them. Therefore, this approach supposedly constitutes “fair use,” with no compensation owed to authors or publishers. Since courts have not ruled on this question, the tech industry has made a colossal gamble developing products in this way. And the odds may be turning against them.