Technology

58303 readers

11 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

519

OpenAI strikes Reddit deal to train its AI on your posts (www.theverge.com)

submitted 5 months ago by [email protected] to c/[email protected]

122 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 104 points 5 months ago (2 children)

They always were.

Only now they've agreed to pay Reddit for it. This is what their third party lockdown was really all about.

They're helping themselves to your Lemmy comments for free, as that's just how it's designed. If you post anything publicly anywhere, it's getting slurped up by a bot somewhere.

[–] [email protected] 15 points 5 months ago (3 children)

I'm not a lawyer. But isn't the reason they had to go to reddit to get permission is because users hand over over ownership to reddit the moment you post. And since there's no such clause on Lemmy, they'd have to ask the actual authors of the comments for permission instead?

Mind you, I understand there's no technical limitation that prevents bots from harvesting the data, I'm talking about the legality. After all, public does not equate public domain.

[–] [email protected] 14 points 5 months ago (1 children)

users hand over over ownership to reddit the moment you post

Not ownership. Just permission to copy and distribute freely. Which basically is necessary to run a service like this, where user-submitted content is displayed.

And since there's no such clause on Lemmy, they'd have to ask the actual authors of the comments for permission instead?

It's more of a fuzzy area, but simply by posting on a federated service you're agreeing to let that service copy and display your comments, and sync with other servers/instances to copy and display your comments to their users. It's baked into the protocol, that your content will be copied automatically all over the internet.

Does that imply a license to let software be run on that text? Does it matter what the software does with it, like display the content in a third party Mobile app? What about when it engages in text to speech or braille conversion for accessibility? Or index the page for a search engine? Does AI training make any difference at that point?

The fact is, these services have APIs, and the APIs allow for the efficient copying and ingest of the user-created information, with metadata about it, at scale. From a technical perspective obviously scraping is easy. But from a copyright perspective submitting your content into that technical reality is implicit permission to copy, maybe even for things like AI training.

[–] [email protected] 2 points 5 months ago

Thanks for that clarification. I was afraid it would be that murky.

[–] [email protected] 4 points 5 months ago

Well the legality seems to be something you can ignore when you have billions of dollars in VC money to fritter around.

It certainly didn't stop them hoovering up music and movies, and the owners of those have a lot more power than any of us do.

Tech is fast, the law is slow, and you can make many times the cost of lawyers and fines by the time anybody gets around to telling you to stop it.

[–] [email protected] 3 points 5 months ago (1 children)

Well even if it was a legal argument, they wouldn't care. Like Facebook and all the rest. They say they don't share your data but we all know that's a lie

[–] [email protected] 0 points 5 months ago (1 children)

They are public communication platforms, how could they not share your data publicly?

[–] [email protected] 1 points 5 months ago

Not all your data should be public

[–] [email protected] 9 points 5 months ago* (last edited 5 months ago) (2 children)

What if I say the word gasp fuck?

[–] [email protected] 6 points 5 months ago

Well they've probably got filters that remove all that before it teaches their Ai to swear. So you need to be more subtle for 𝑓ucks sake.

[–] [email protected] 3 points 5 months ago

These fuckers see it as well. Fuckity fuckity fuck.