Meta According to recently made public excerpts of a deposition he gave late last year, CEO Mark Zuckerberg seems to have used YouTube’s fight to stop pirated content to support his own company’s usage of a data set containing copyrighted e-books.
The deposition relates to the AI copyright case Kadrey v. Meta and was included in a complaint that the plaintiffs’ lawyers filed with the court. This case is one of many that pit AI corporations against authors and other intellectual property holders as they weave their way through the American legal system. The defendants in these lawsuits, which are AI businesses, typically argue that training on protected content constitutes “fair use.” Many copyright owners don’t agree.
According to excerpts of a transcript released Wednesday night, Zuckerberg stated during his deposition, “For example, YouTube, I think, may end up hosting some stuff that people pirate for some period of time, but YouTube is trying to take that stuff down.” “And I would assume that the great majority of the content on YouTube is of a decent caliber and that they are authorized to do so.”
Some hints about Zuckerberg’s views on copyright content and fair use can be found in excerpts from his deposition. It should be mentioned, nevertheless, that the deposition’s complete transcript was not made public. For more context, TechCrunch has contacted Meta; if the firm responds, the piece will be updated.
Zuckerberg seems to be defending Meta’s use of LibGen, a training data collection of e-books, to create its Llama family of AI models, according to the deposition nuggets. AI companies such as OpenAI have flagship models that compete with Meta’s Llama.
As a “links aggregator,” LibGen offers access to copyrighted materials from companies such as Pearson Education, Cengage Learning, Macmillan Learning, and McGraw Hill. LibGen has faced numerous lawsuits, closure orders, and tens of millions of dollars in fines for copyright violations.
Despite worries about the potential legal ramifications among Meta’s AI executive and research teams, Zuckerberg reportedly approved the use of LibGen to train at least one of the company’s Llama models, according to court documents made public this week.
According to a court document, Meta workers were quoted by the plaintiffs’ counsel, which includes best-selling authors Sarah Silverman and Ta-Nehisi Coates, as calling LibGen a “data set we know to be pirated” and warning that its use “may undermine [Meta’s] negotiating position with regulators.”
Zuckerberg stated in his deposition that he “hadn’t really heard of” LibGen.
During the deposition, Zuckerberg stated, “I understand that you’re trying to get me to give an opinion of LibGen, which I haven’t really heard of.” “I simply don’t know anything about that particular thing.”
When questioned by David Boies, one of the plaintiffs’ lawyers, Zuckerberg clarified why it would be irrational to forbid the use of a data collection such as LibGen.
Given that some of the content on YouTube might be protected by copyright, would I wish to enforce a policy prohibiting its use? “No,” he responded. “In certain situations, it might not be the best course of action to impose such a complete ban.”
Meta should be “pretty careful about” teaching on copyrighted content, Zuckerberg said.
ICYMT: As MP, I don’t need a siren. John Dumelo
“You know, [if] someone is offering a website and purposefully attempting to infringe upon people’s rights … According to the transcript of his deposition, Zuckerberg stated, “Obviously, it’s something that we would want to be cautious about or careful about how we engaged with it or maybe even prevent our teams from engaging with it.”
New allegations
Since it was submitted to the United States District Court for the Northern District of California, San Francisco Division in 2023, the plaintiffs’ attorneys in the Kadrey v. Meta case have made multiple amendments to the complaint. New accusations against Meta include that the corporation cross-referenced some stolen books in LibGen with copyrighted novels that were offered for license, according to the plaintiffs’ attorney’s most recent amended lawsuit, which was submitted late Wednesday. Lawyers claim Meta employed this strategy to assess if pursuing a licensing deal with a publisher was wise.
According to the updated petition, Meta purportedly trained its most recent generation of Llama models, Llama 3, using LibGen. Additionally, the plaintiffs claim that Meta is training its next-generation Llama 4 models using the data collected.
The updated lawsuit claims that by adding “supervised samples” to Llama’s fine-tuning, Meta researchers attempted to conceal the fact that Llama models were trained on pirated content. Additionally, the new complaint claims that as recently as April 2024, Meta downloaded illegal e-books for Llama training from another source, Z-Library.
Publishers have taken a variety of legal proceedings against Z-Library, or Z-Lib, including domain seizures and takedowns. The Russian nationals who allegedly kept it were accused of money laundering, wire fraud, and copyright infringement in 2022.
SOURCE: TECH CRUNCH