‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

  • S410@lemmy.ml
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    9
    ·
    7 months ago

    Every work is protected by copyright, unless stated otherwise by the author.
    If you want to create a capable system, you want real data and you want a wide range of it, including data that is rarely considered to be a protected work, despite being one.
    I can guarantee you that you’re going to have a pretty hard time finding a dataset with diverse data containing things like napkin doodles or bathroom stall writing that’s compiled with permission of every copyright holder involved.

    • Exatron@lemmy.world
      link
      fedilink
      English
      arrow-up
      28
      arrow-down
      6
      ·
      7 months ago

      How hard it is doesn’t matter. If you can’t compensate people for using their work, or excluding work people don’t want users, you just don’t get that data.

      There’s plenty of stuff in the public domain.

    • HelloThere@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      19
      arrow-down
      5
      ·
      edit-2
      7 months ago

      I never said it was going to be easy - and clearly that is why OpenAI didn’t bother.

      If they want to advocate for changes to copyright law then I’m all ears, but let’s not pretend they actually have any interest in that.

    • deweydecibel@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      2
      ·
      7 months ago

      I can guarantee you that you’re going to have a pretty hard time finding a dataset with diverse data containing things like napkin doodles or bathroom stall writing that’s compiled with permission of every copyright holder involved.

      You make this sound like a bad thing.

    • BURN@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      7 months ago

      And why is that a bad thing?

      Why are you entitled to other peoples work, just because “it’s hard to find data”?