Meta is in need of new data sources to train its AI

Samantha Johnson

Apr 7, 2024
Tech giants are in a race to find new data sources to fuel their AI systems. Meta, in particular, has been exploring various options, including potentially buying Simon & Schuster to harvest data. The company also considered how to handle potential lawsuits instead of negotiating licensing deals.

As AI systems become more powerful, tech companies are aggressively seeking data to train their systems, raising concerns about potential copyright violations. For example, there have been suspicions that OpenAI used YouTube to train its video generator, Sora. However, OpenAI’s CTO denied these accusations.

During Meta’s meetings, attendees discussed the possibility of buying Simon & Schuster or paying for licensing rights to obtain new titles. Meta had already compiled summaries of books, essays, and other online content, some of which contained copyrighted information. The company considered whether to continue collecting data from potentially copyrighted sources without acquiring proper licensing deals.

Despite concerns about ethical considerations, Meta decided to rely on the precedent set by the court case Authors Guild vs. Google. This case established that Google could scan and digitize books for Google Books under fair use guidelines. Meta’s lawyers argued that they could train their AI systems under the same guidelines.

Meta did not immediately respond to a request for comment on these discussions. Ultimately, the company is navigating complex legal and ethical issues as it strives to power its AI systems with data from various sources.

