No Stupid Questions

36409 readers

1012 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)

Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.

Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.

Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.

Rule 4- No self promotion or upvote-farming of any kind.

That's it.

Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.

Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.

Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.

Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.

Rule 10- Majority of bots aren't allowed to participate here.

Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago

MODERATORS

[email protected]

If AI spits out stuff it's been trained on (lemmy.sdf.org)

submitted 13 hours ago by [email protected] to c/[email protected]

41 comments fedilink hide all child comments

doesn't it follow that AI-generated CSAM can only be generated if the AI has been trained on CSAM?

This article even explicitely says as much.

My question is: why aren't OpenAI, Google, Microsoft, Anthropic... sued for possession of CSAM? It's clearly in their training datasets.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 5 points 12 hours ago

I think you misunderstand what's happening.

It isn't that, as an example to represent the idea, openai is training their models on kiddie porn.

It's that people are taking ai software, and then training it on their existing material. The wired article even specifically says they're issuing older versions of the software to bypass safeguards that are in place to prevent it now.

This isn't to say that any of the companies involved in offering generative software don't have such imagery in the data used to train their models. But they wouldn't have to possess it for it to be in there. Most of those assholes just grabbed giant datasets and plugged them in. They even used scrapers for some of it. So all it would take is them accessing some of it unintentionally for their software to end up able to generate new material. They don't need to store anything once the software is trained.

Currently, none of them lack some degree of prevention in their products to prevent it being used for that. How good those protections are, I have zero clue. But they've all made noises about it.

But don't forget, one of the earlier iterations of software designed to identify kiddie porn was trained on seized materials. The point of that is that there are exceptions to possession. The various agencies that investigate sexual abuse of minors tend to keep materials because they need it to track down victims, have as evidence, etc. It's that body of data that made detection something that can be automated. While I have no idea if it happened, it wouldn't be surprising if some company or another did scrape that data at some point. That's just a tangent rather than part of your question.

So, the reason that they haven't been "sued" is that they likely don't have any materials to be "sued" for in the first place.

Besides, not all generated materials are made based on existing supplies. Some of it is made akin to a deepfake, where someone's face is pasted onto a different body. So, they can take materials of perfectly legal adults that look young, slap real or fictional children's faces onto them, and have new stuff to spread around. That doesn't require any original material at all. You could, as I understand it, train an generative model on that and it would turn out realistic fully generative materials. All of that is still illegal, but it's created differently.