[Glass Wings] A Bunch Of Authors Sue OpenAI Claiming Copyright Infringement, Because They Don’t Understand Copyright

<https://www.techdirt.com/2023/07/11/a-bunch-of-authors-sue-openai-claiming-copyright-infringement-because-they-dont-understand-copyright/>

"You may have seen some headlines recently about some authors filing lawsuits
against OpenAI. The lawsuits (plural, though I’m confused why it’s separate
attempts at filing a class action lawsuit, rather than a single one) began last
week, when authors Paul Tremblay and Mona Awad sued OpenAI and various
subsidiaries, claiming copyright infringement in how OpenAI trained its models.
They got a lot more attention over the weekend when another class action
lawsuit was filed against OpenAI with comedian Sarah Silverman as the lead
plaintiff, along with Christopher Golden and Richard Kadrey. The same day the
same three plaintiffs (though with Kadrey now listed as the top plaintiff) also
sued Meta, though the complaint is basically the same.

All three cases were filed by Joseph Saveri, a plaintiffs class action lawyer
who specializes in antitrust litigation. As with all too many class action
lawyers, the goal is generally enriching the class action lawyers, rather than
actually stopping any actual wrong. Saveri is not a copyright expert, and the
lawsuits… show that. There are a ton of assumptions about how Saveri seems to
think copyright law works, which is entirely inconsistent with how it actually
works.

The complaints are basically all the same, and what it comes down to is the
argument that AI systems were trained on copyright-covered material (duh) and
that somehow violates their copyrights.

Much of the material in OpenAI’s training datasets, however, comes from
copyrighted works—including books written by Plaintiffs—that were copied by
OpenAI without consent, without credit, and without compensation

But… this is both wrong and not quite how copyright law works. Training an LLM
does not require “copying” the work in question, but rather reading it. To some
extent, this lawsuit is basically arguing that merely reading a
copyright-covered work is, itself, copyright infringement.

Under this definition, all search engines would be copyright infringing,
because effectively they’re doing the same thing: scanning web pages and
learning from what they find to build an index. But we’ve already had courts
say that’s not even remotely true. If the courts have decided that search
engines scanning content on the web to build an index is clearly transformative
fair use, so to would be scanning internet content for training an LLM.
Arguably the latter case is way more transformative.

And this is the way it should be, because otherwise, it would basically be
saying that anyone reading a work by someone else, and then being inspired to
create something new would be infringing on the works they were inspired by. I
recognize that the Blurred Lines case sorta went in the opposite direction
when it came to music, but more recent decisions have really chipped away at
Blurred Lines, and even the recording industry (the recording industry!) is
arguing that the Blurred Lines case extended copyright too far."

Cheers,
       *** Xanni ***
--
mailto:xanni@xanadu.net               Andrew Pam
http://xanadu.com.au/                 Chief Scientist, Xanadu
https://glasswings.com.au/            Partner, Glass Wings
https://sericyb.com.au/               Manager, Serious Cybernetics

A Bunch Of Authors Sue OpenAI Claiming Copyright Infringement, Because They Don’t Understand Copyright

Tue, 8 Aug 2023 11:33:14 +1000

Andrew Pam <xanni [at] glasswings.com.au>