this post was submitted on 22 Feb 2024
1019 points (98.8% liked)

Technology

60060 readers
3375 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 30 points 10 months ago* (last edited 10 months ago) (5 children)

Hey guys, let's be clear.

Google now has a full complete set of logs including user IPs (correlate with gmail accounts), PRIVATE MESSAGES, and also reddit posts.

They pinky promise they will only train AI on the data.

I can pretty much guarantee someone can subpoena google for your information communicated on reddit, since they now have this PII (username(s)/ip/gmail account(s)) combo. Hope you didn't post anything that would make the RIAA upset! And let's be clear... your deleted or changed data is never actually deleted or changed... it's in an audit log chain somewhere so there's no way to stop it.

"GDPR WILL SAVE ME!" - gdpr started in 2016. Can you ever be truly sure they followed your deletion requests?

[–] [email protected] 26 points 10 months ago (2 children)

"lets be clear"

You're making things up and presenting them as facts, how is any of this "clear"?

[–] [email protected] 6 points 10 months ago

How do you think Reddit is restoring posts that people have been deleting?

Do you think Google’s deal simply allowed them to scrape old.reddit? Hell no, there is probably a live replica of Reddit prod at Google somewhere, including deleted posts and all edits.

You don’t think they paid $60m just scrape, do you?

[–] [email protected] 4 points 10 months ago* (last edited 10 months ago)

Since an IP address alone is not considered PII, can you prove that they did not provide IP addresses for each post?

Do you think it's more or less likely that ip addresses, account names, private messages and deleted messages and posts would be included?

Remember that they paid 60 million dollars for this information and web scrapers have been capable of capturing subreddit post data for over a decade as is at a $0 price tag from reddit.

[–] towerful 17 points 10 months ago (1 children)

Where does it say they have access to PII?
I would imagine reddit would be anonymising the data. Hashes of usernames (and any matches of usernames in content), post/comment content with upvote/downvote counts. I would hope they are also screening content for PII.
I dont think the deal is for PII, just for training data

[–] [email protected] 2 points 10 months ago (1 children)

Where does it say they have access to PII?

So technically they haven't sold any PII if all they do is provide IP addresses. Legally an IP address is not PII. Google knows all our IP addresses if we have an account with them or interact with them in certain ways. Sure, some people aren't trackable but i'm just going to call it out that for all intents and purposes basically everyone is tracked by google.

Only the most security paranoid individuals would be anonymous.

[–] towerful 4 points 10 months ago (1 children)

Depends where and how its applied.
Under GDPR, IP addresses are essential to the opperation of websites and security, so the logging/processing of them can be suitably justified without requiring consent (just disclosure).
Under CCPA, it seems like it isnt PII if it cant be linked to a person/household.

However, an ip address isnt needed as a part of AI training data, and alongside comment/post data could potentially identify a person/household. So, seems risky under GDPR and CCPA.

I think Reddit would be risking huge legal exposure if they included IP addresses in the data set.
And i dont think google would accept a data set that includes information like that due to the legal exposure.

[–] [email protected] 2 points 10 months ago (1 children)

ML can be applied in a great number of ways. One such way could be content moderation, especially detecting people who use alternate accounts to reply to their own content or manipulate votes etc.

By including IP addresses with the comments they could correlate who said what where and better learn how to detect similar posting styles despite deliberate attempts to appear to be someone else.

It's a legitimate use case. Not sure about the legality... but I doubt google or reddit would ever acknowledge what data is included unless they believed liability was minimal. So far they haven't acknowledged anything beyond the deal existing afaik.

[–] towerful 1 points 10 months ago

Yeh, but its such a grey area.
If the result was for security only, potentially could be passable as "essential" processing.
But, considering the scope of content posted on reddit (under 18s, details of medical (even criminal) content) it becomes significantly harder to justify the processing of that data alongside PII (or equivalent).
Especlially since its a change of terms & service agreements (passing data to 3rd party processors)

If security moderation is what they want in exchange for the data (and money), its more likely that reddit would include one-way anonymised PII (ie IP addresses that are hashed), so only reddit can recover/confirm ip addresses against the model.
Because, if they arent... Then they (and google) are gonna get FUCKED in EU courts

[–] [email protected] 6 points 10 months ago (1 children)

it's in an audit log chain somewhere so there's no way to stop it.

Gut feel based on common tech platform procedures, right? (As opposed to a sourceable certainty.)

I’d bet $100 you’re right. That said, I’d give a caveat if I were you and I were going with my instincts.

[–] [email protected] 3 points 10 months ago

Gut feel based on common tech platform procedures, right? (As opposed to a sourceable certainty.)

It would be PR suicide to disclose exactly what data is shared. Cambridge Analytica is a prime example of a PR nightmare with similar data.

I don't even need to look at reddit's terms and conditions to know that there is practically nothing stopping them from handing this kind of data over legally for anybody who hasn't submitted GDPR deletion requests. I never trust compliance of laws that cannot be verified independently either because i've seen all kinds of shady shit in my career.

[–] [email protected] 3 points 10 months ago

They definitely won't be selling any of that to scammers /s

[–] [email protected] 3 points 10 months ago

Makes me glad for my VPN and burner emails, but yeah... Privacy nightmare.

Although Google also has your email, location, IP, every website you visit, all your searches...