this post was submitted on 27 Apr 2024
202 points (99.5% liked)

PCGaming

6376 readers
1 users here now

Rule 0: Be civil

Rule #1: No spam, porn, or facilitating piracy

Rule #2: No advertisements

Rule #3: No memes, PCMR language, or low-effort posts/comments

Rule #4: No tech support or game help questions

Rule #5: No questions about building/buying computers, hardware, peripherals, furniture, etc.

Rule #6: No game suggestions, friend requests, surveys, or begging.

Rule #7: No Let's Plays, streams, highlight reels/montages, random videos or shorts

Rule #8: No off-topic posts/comments

Rule #9: Use the original source, no editorialized titles, no duplicates

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 10 points 6 months ago (1 children)

I’m quite familiar. It legally works, if you can prove that your data actually made it into the training set, you might be able to successfully sue them. That’s extremely unlikely though. If you can’t litigate a law, then it essentially doesn’t exist.

Besides, a researcher scraping websites isn’t going to take the time to filter out random pieces of data based on a link contained in the body. If you can show me a research paper or blog post or something where a process is described to sanitize the input data based on license, that would be pretty damn interesting. Maybe it’ll exist in the future?

Besides, the best way to opt-out of AI training is to enable site-wide flags, which mark the content therein as off limits. That would have the benefit of not only protecting you, but everyone else on the site. Lobbying your lemmy instance to enable that will get a lot more mileage than anything else you could do, because it’s an industry sanctioned way to accomplish what you want.