this post was submitted on 27 Apr 2024
202 points (99.5% liked)
PCGaming
6376 readers
1 users here now
Rule 0: Be civil
Rule #1: No spam, porn, or facilitating piracy
Rule #2: No advertisements
Rule #3: No memes, PCMR language, or low-effort posts/comments
Rule #4: No tech support or game help questions
Rule #5: No questions about building/buying computers, hardware, peripherals, furniture, etc.
Rule #6: No game suggestions, friend requests, surveys, or begging.
Rule #7: No Let's Plays, streams, highlight reels/montages, random videos or shorts
Rule #8: No off-topic posts/comments
Rule #9: Use the original source, no editorialized titles, no duplicates
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I’m quite familiar. It legally works, if you can prove that your data actually made it into the training set, you might be able to successfully sue them. That’s extremely unlikely though. If you can’t litigate a law, then it essentially doesn’t exist.
Besides, a researcher scraping websites isn’t going to take the time to filter out random pieces of data based on a link contained in the body. If you can show me a research paper or blog post or something where a process is described to sanitize the input data based on license, that would be pretty damn interesting. Maybe it’ll exist in the future?
Besides, the best way to opt-out of AI training is to enable site-wide flags, which mark the content therein as off limits. That would have the benefit of not only protecting you, but everyone else on the site. Lobbying your lemmy instance to enable that will get a lot more mileage than anything else you could do, because it’s an industry sanctioned way to accomplish what you want.
And what makes you think that can't be done? You make it sound like because (you believe) it's so hard to do you should have just not even bother trying, that seems really defeatist.
And like I said multiple times now, it's a simple quick copy and paste, a 'low-hanging fruit' way of licensing/protecting a comment. If it works, great it works.
I have no control over the Lemmy servers, I only have control over my own comments that I post.
Also, the two options are not mutually exclusive.
Again, both you and I know the history of the robots.txt file and how often and how well it's honored, especially these days with the new frontier of AI modeling.
It would be best to do both, just to make sure you have coverage, so that if the robots.txt is not honored, at least the comment itself is still licensed.
~Anti~ ~Commercial-AI~ ~license~ ~(CC~ ~BY-NC-SA~ ~4.0)~