Meta just lost a major fight in its ongoing legal battle with a group of authors suing the company for copyright infringement over how it trained its

Meta Secretly Trained Its AI on a Notorious Piracy Database, Newly Unredacted Court Docs Reveal

submited by

Style Pass

2025-01-10 08:00:04

Meta just lost a major fight in its ongoing legal battle with a group of authors suing the company for copyright infringement over how it trained its artificial intelligence models. Against the company’s wishes, a court unredacted information alleging that Meta used Library Genesis (LibGen), a notorious so-called shadow library of pirated books that originated in Russia, to help train its generative AI language models.

The case, Kadrey et al. v. Meta Platforms, was one of the earliest copyright lawsuits filed against a tech company over its AI training practices. Its outcome, along with those of dozens of similar cases working their way through courts in the United States, will determine whether technology companies can legally use creative works to train AI moving forward and could either entrench AI’s most powerful players or derail them.

Vince Chhabria, a judge for the United States District Court for the Northern District of California, ordered both Meta and the plaintiffs on Wednesday to file full versions of a batch of documents after calling Meta’s approach to redacting them “preposterous,” adding that, for the most part, "there is not a single thing in those briefs that should be sealed.” Chhabria ruled that Meta was not pushing to redact the materials in order to protect its business interests but instead to “avoid negative publicity.” The documents were originally filed late last year remained publicly unavailable in unredacted form until now.