this post was submitted on 12 Jun 2024
238 points (99.6% liked)

Fediverse

28691 readers
752 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to [email protected]!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 2 years ago
MODERATORS
 

Maven, a new social network backed by OpenAI's Sam Altman, found itself in a controversy today when it imported a huge amount of posts and profiles from the Fediverse, and then ran AI analysis to alter the content.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 126 points 6 months ago (4 children)
[–] [email protected] 145 points 6 months ago (4 children)

The wildest part is that he's surprised that Mastodon peeps would react negatively to their posts being scrapped without consent or even notification and fed into an AI model. Like, are you for real dude? Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI? Come the hell on...

[–] [email protected] 32 points 6 months ago (5 children)

People can complain, but the Fediverse is built to make consuming user’s data easy. If you don’t want AI using your data, don’t put it on such an easily “scrapable” network.

[–] [email protected] 47 points 6 months ago (5 children)

Yeah, and girls dress for rape. They are just aaasking for it!

I will go off on a tangent.

Just because something is online it does not mean I give a full green light on anything.

Fuck this noise of social parasitic networks hammering free service therefore pay with data into everyone's skull. And everyone posts crap.

It is a billion dollar business. LLMs are extracting millions and will generate more.

You know why? Because worthless shit you post online is not worthless after all.

Yes, you are reading it right. Pay me. Pay us.

Before anyone ridicules this. Yall be defending billion dollar corporations, staffed with millionaires below C-levels.

People should start demanding money from these greedy assholes.

[–] [email protected] 23 points 6 months ago* (last edited 6 months ago)

I don't think they're making a moral argument, but pointing out the reality of the situation as it stands.

This is a problem that can only be fixed through legislation and aggressive enforcement backed by large punitive actions.

Until that happens, it's better to acknowledge and understand the reality of the situation, than to believe that a morally righteous condemnation will somehow unmake that reality.

It sucks. I agree with your philosophical stance, except for the payment for personal data, as I'd prefer a complete opt-out. However, none of that changes where we're at right now.

[–] [email protected] 1 points 6 months ago

A mild copyright violation based on a system designed around the constant distribution of copies of things is NOT a parable about sexual violence, people.

I feel like this extremely insensitive rape take is the fediverse's version of the Godwin Law.

[–] [email protected] 1 points 6 months ago

ITT people not recognizing that there’s a difference between comparing and equating.

People, it’s possible to make analogies to more serious situations without saying the two things are equal. The statement above is saying it’s there’s a shared mentality, not a shared level of consequence/seriousness.

[–] [email protected] -2 points 6 months ago

You're right but...

It's the same with open source products. Companies just take it, make billions off it, give nothing back, will try the embrace, enhance , extinguish tactics, will hide any GPL licensing because of course they would...

It'll happen anyway, and you can't stop it. Like you said, girls dress to rape is bullshit. But if a girl goes in a skimpy bikini in a Bombay bus at 9pm, then you're kind of asking for something. Open source is open for everyone, that is kind of the point, it's the reason why it became so big in the first place, but it WILL be abused because there are always abusers out there

[–] [email protected] -4 points 6 months ago* (last edited 6 months ago)

Are you seriously co.parong having your shitty Internet comments scrapped by AI to someone actually raping you? Wtf?

[–] [email protected] 15 points 6 months ago

Alternatively, use a closed ecosystem susceptible to data rot and loss.

Want to contribute to our open source project? Join our discord

Would you want art to be unfindable because scraping for AI image generation happens? It's a solution looking for problems.

[–] [email protected] 8 points 6 months ago (3 children)

This is what I've been saying the entire time. It sucks, and it's wrong, but the fediverse is built from the ground up as an open sharing platform, where amour data is shared with anyone. It shouldn't be, and it's wrong, but there is nothing to stop anyone from doing it. To change that would alter federation at a core level

[–] [email protected] 13 points 6 months ago (1 children)

I would rather my content be open to the world for however it wants to use it than owned by a single company that gets to profit off aggregating and selling it.

[–] [email protected] 4 points 6 months ago

Fully agree. The annoyances of free and open are vastly outweighed by the negatives

[–] [email protected] 2 points 6 months ago

Yeah but doesn't hubzilla (https://hubzilla.org/page/info/discover) applies a privacy layer to how its content it is distributed? The issue then lies also in how the social network gets implemented in function of its purpose, in hubzilla vs lemmy case for instance is a public board vs a social network

[–] [email protected] 1 points 6 months ago (2 children)

That doesn’t mean it’s licensed to be used in a for profit software.

[–] [email protected] 2 points 6 months ago

If it ends up being ruled that training an LLM is fair use so long as the LLM doesn’t reproduce the works it is trained on verbatim, then licensing becomes irrelevant.

[–] [email protected] 2 points 6 months ago (1 children)

I've had this argument with other people, but essentially at this point there is no licensing beyond server ownership here, and most servers don't have any licenses defined. Even if they do, then sure they did something wrong... but how would you ever prove it or enforce it? The only way to actually disallow them is to switch from open federation to closed - which goes against what we're trying to build with federation.

[–] [email protected] 0 points 6 months ago* (last edited 6 months ago)

There has been instances before where LLMs gave up clues as to what source it used. When that happens, they can be sued.

Im okay with people using our data for whatever, since it’s all open and it should be. But I rather put a little bit of effort to make for profit use technically illegal. It’s better than nothing.

[–] [email protected] 1 points 6 months ago (1 children)

People can complain, but the Fediverse is built to make consuming user’s data easy

Correction: it is built to make consuming users's data not easy, but more human.

WHat you are thinking of is AP, not "Fediverse", and even then that's a stretch.

[–] [email protected] 2 points 6 months ago

Correction: it is built to make consuming users's data not easy, but more human.

What does that even mean?

WHat you are thinking of is AP, not "Fediverse", and even then that's a stretch.

Honestly, I think Fediverse is inseparable from AP (or some similar protocol). You can split hairs if you want, but the thing that makes it different from all other social media services is that it allows the content created by users on one service to be imported into a different service.

You can hope and dream that it is only services like Lemmy consuming user content from services like Mastadon, but this same protocol makes it easy for services like ChatGPT to consume the same data.

[–] [email protected] 1 points 6 months ago (1 children)

Just because our data is accessible doesn’t mean it’s legally licensed to be used by a for profit company. Free doesn’t meant you can do what you want with it, it just means no cost.

[–] [email protected] 2 points 6 months ago (1 children)

I don’t disagree. I’m just saying that so long as you’re putting content on this platform, you are powerless to stop any service from using the features of the platform in whatever way they want.

It was built for easy and open consumption of user content by other services.

[–] [email protected] 1 points 6 months ago

Oh yeah for sure. Anything I type here is for the whole world to see and I’m okay with that as long as it’s anonymous.

[–] [email protected] 10 points 6 months ago (2 children)

It sounds like they weren't "being fed into an AI model" as in being used as training material, they were just being evaluated by an AI model. However...

Have you spent more than 4 seconds on Mastodon and noticed their (our?) general attitude towards AI?

Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn't matter what you're actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

It sounds like Maven wants to play nice, but if the "general attitude" means that playing nice is impossible why should they even bother to try?

[–] [email protected] 6 points 6 months ago (1 children)

The anti-AI knee-jerk reactions can be extreme, I agree, but at the same time one of important features of Mastodon is that your feed is nor controlled by an algorithm in any way.

So when a company comes, takes those posts and screws with them to create an algorithm to show them, I understand people getting angry - at least some of them joined to be free of that exact thing...

[–] [email protected] 8 points 6 months ago

One of the important features of Mastodon is that you can choose what your feed is. Everyone's feed has an algorithm determining what's in it even if it's just a simple "list the posts of everyone I've subscribed to in chronological order."

If someone else wants to see a feed of content that is curated and sorted in a different way, why get angry at them? They're not forcing you to see that feed.

[–] [email protected] 2 points 6 months ago (2 children)

Yeah, the general attitude of wild witch-hunts and instant zero-to-11 rage at the slightest mention of it. Doesn’t matter what you’re actually doing with AI, the moment the mob thinks they scent blood the avalanche is rolling.

This wasn't always the case. A lot of research on NLP uses scraped social media posts (2010's). People never had a problem with that (at least the outrage wasn't visible back then). The problem now is that our content is being used to create an AI product where there is zero consent taken from the end-user.

Source: My research colleagues used to work on NLP

[–] [email protected] 4 points 6 months ago (1 children)

For me, more specifically, the problem is they took my data and made a tool to sell it back to me without paying me for it.

I have no real issue with current ai stuff, other than you're effectively taking our stuff and want us to pay you for doing so.

If they weren't freeloading on everyone, I suspect you'd have a lot less angry people.

[–] [email protected] 1 points 6 months ago

This. If Maven offered me a stipend for life to have my content used (because they're not going to remove it in 3 or 6 months, right? once ingested it's there forever), then I would be far more open to at least discussing their terms.

[–] [email protected] 1 points 6 months ago

Consent isn't legally required if it's fair use. Whether it's fair use remains to be ruled on by the courts.

[–] [email protected] 6 points 6 months ago

It's not surprised. He's acting surprised because he got caught. It's pretty standard for these jerkass tech bros. "Move fast break things" is code "break laws be unethical" - as I think we've all seen if you do it often and fast enough you can keep way ahead of any kind of accountability because everybody else is trying to play catch up well the last thing has already filtered out of the news cycle.

[–] [email protected] -4 points 6 months ago (2 children)

I'm surprised as well. We put our posts up for anyone to replicate and republish, yet we still get mad when somebody replicates and republishes it. It does not make sense. Activitypub is an open network with zero privacy expectations.

[–] [email protected] 6 points 6 months ago

And yet we don't want our posts to be fed into AI slop, nor do we want independent hosts to pay for the massive amount of traffic generated by a massive corporate entity to trying to consume data en masse.

[–] [email protected] 2 points 6 months ago

What has our copyright got to do with privacy expectation?

[–] [email protected] 6 points 6 months ago

Look at that shit-eating grin, he knows. There's no way someone can be that out of touch, right? Right?!?

[–] [email protected] 2 points 6 months ago

How does someone with a last name that close to secretion choose to go by Jimmy?