5 Matching Annotations
  1. Jan 2024
    1. One user on X pointed to the fact that Japan has allowed AI companies to train on copyright materials. While this observation is true, it is incomplete and oversimplified, as that training is constrained by limitations on unauthorized use drawn directly from relevant international law (including the Berne Convention and TRIPS agreement). In any event, the Japanese stance seems unlikely to be carry any weight in American courts.

      Specifics in Japan for training LLMs on copyrighted material

    2. After a bit of experimentation (and in a discovery that led us to collaborate), Southen found that it was in fact easy to generate many plagiaristic outputs, with brief prompts related to commercial films (prompts are shown).

      Plagiaristic outputs from blockbuster films in Midjourney v6

      Was the LLM trained on copyrighted material?

  2. Jul 2023
    1. First, under a highly permissive view, theuse of training data could be treated as non-infringing because protected works are not directlycopied. Second, the use of training data could be covered by a fair-use exception because atrained AI represents a significant transformation of the training data [63, 64, 65, 66, 67, 68].1Third, the use of training data could require an explicit license agreement with each creatorwhose work appears in the training dataset. A weaker version of this third proposal, is to atleast give artists the ability to opt-out of their data being used for generative AI [69]. Finally,a new statutory compulsory licensing scheme that allows artworks to be used as training databut requires the artist to be remunerated could be introduced to compensate artists and createcontinued incentives for human creation [70].

      For proposals for how copyright affects generative AI training data

      1. Consider training data a non-infringing use
      2. Fair use exception
      3. Require explicit license agreement with each creator (or an opt-out ability)
      4. Create a new "statutory compulsory licensing scheme"
  3. Feb 2023
    1. Certainly it would not be possible if theLLM were doing nothing more than cutting-and-pasting fragments of text from its training setand assembling them into a response. But this isnot what an LLM does. Rather, an LLM mod-els a distribution that is unimaginably complex,and allows users and applications to sample fromthat distribution.

      LLMs are not cut and paste; the matrix of token-following-token probabilities are "unimaginably complex"

      I wonder how this fact will work its way into the LLM copyright cases that have been filed. Is this enough to make a the LLM output a "derivative work"?