this post was submitted on 10 Jul 2023
420 points (94.7% liked)

Technology

34776 readers
110 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 34 points 1 year ago* (last edited 1 year ago) (2 children)

I like her and I get why creatives are panicking because of all the AI hype.

However:

In evidence for the suit against OpenAI, the plaintiffs claim ChatGPT violates copyright law by producing a “derivative” version of copyrighted work when prompted to summarize the source.

A summary is not a copyright infringement. If there is a case for fair-use it's a summary.

The comic's suit questions if AI models can function without training themselves on protected works.

A language model does not need to be trained on the text it is supposed to summarize. She clearly does not know what she is talking about.

IANAL though.

[–] [email protected] 25 points 1 year ago (2 children)

I guess they will get to analyze OpenAI's dataset during discovery. I bet OpenAI didn't have authorization to use even 1% of the content they used.

[–] [email protected] 15 points 1 year ago

That's why they don't feel they can operate in the EU, as the EU will mandate AI companies to publish what datasets they trained their solutions on.

[–] [email protected] 7 points 1 year ago (1 children)

Things might change but right now, you simply don't need anyones authorization.

Hopefully it doesn't change because only a handful of companies have the data or the funds to buy the data, it would kill any kind of open source or low priced endeavour.

[–] [email protected] 4 points 1 year ago

FWIW, Common Crawl - a free/open-source dataset of crawled internet pages - was used by OpenAI for GPT-2 and GPT-3 as well as EleutherAI's GPT-NeoX. Maybe on GPT3.5/ChatGPT as well but they've been hush about that.