this post was submitted on 19 Jan 2025

32 points (76.7% liked)

No Stupid Questions

36409 readers

1012 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)

Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.

Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.

Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.

Rule 4- No self promotion or upvote-farming of any kind.

That's it.

Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.

Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.

Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.

Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.

Rule 10- Majority of bots aren't allowed to participate here.

Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago

MODERATORS

[email protected]

If AI spits out stuff it's been trained on (lemmy.sdf.org)

submitted 13 hours ago by [email protected] to c/[email protected]

41 comments fedilink hide all child comments

doesn't it follow that AI-generated CSAM can only be generated if the AI has been trained on CSAM?

This article even explicitely says as much.

My question is: why aren't OpenAI, Google, Microsoft, Anthropic... sued for possession of CSAM? It's clearly in their training datasets.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 43 points 12 hours ago (1 children)

Well, it can draw an astronaut on a horse, and I doubt it had seen lots of astronauts on horses...

[–] [email protected] -4 points 12 hours ago* (last edited 12 hours ago) (5 children)

Yeah but the article suggests that pedos train their local AI on existing CSAM, which would indicate that it's somehow needed to generate AI-generated CSAM. Otherwise why would they bother? They'd just feed them images of children in innocent settings and images of ordinary porn to get their local AI to generate CSAM.

[–] AnAmericanPotato 1 points 4 hours ago

which would indicate that it’s somehow needed to generate AI-generated CSAM

This is not strictly true in general. Generative AI is able to produce output that is not in the training data, by learning a broad range of concepts and applying them in novel ways. I can generate an image of a rollerskating astronaut even if there are no rollerskating astronauts in the training data.

It is true that some training sets include CSAM, at least in the past. Back in 2023, researches found a few thousand such images in the LAION-5B dataset (roughly one per million images). 404 Media has an excellent article with details: https://www.404media.co/laion-datasets-removed-stanford-csam-child-abuse/

On learning of this, LAION took down their database until it could properly cleaned. Source: https://laion.ai/notes/laion-maintenance/

Those images were collected from the public web. LAION took steps to avoid linking to illicit content (details in the link above), but clearly it's an imperfect system. God only knows what closed companies (OpenAI, Google, etc.) are doing. With open data sets, at least any interested parties can review, verify, and report this stuff. With closed data sets, who knows?

[–] [email protected] 20 points 12 hours ago (1 children)

How do they know that? Did the pedos text them to let them know? Sounds very made up.

[–] [email protected] -1 points 12 hours ago (1 children)

The article says "remixed" images of old victims have cropped up.

[–] [email protected] 10 points 11 hours ago (1 children)

And again, what's the source? The great thing with articles about CSAM is that you don't need sources, everyone just assumes you have them, but obviously cannot share.

Did at least one pedo try that? Most likely yes. Is it the best way to get good quality fake CSAM? Not at all.

[–] [email protected] 1 points 11 hours ago (1 children)

I don't know man. But I assume associations concerned with child abuse are all over that shit and checking it out. I'm not a specialist of CSAM but I assume an article that says old victims show up in previously-unseen images doesn't lie, because why would it? It's not like Wired is a pedo outlet...

Also, it was just a question. I'm not trying to convince you of anything 🙂

[–] [email protected] 4 points 10 hours ago* (last edited 10 hours ago)

I think that aricle lacks nuance. It's a bit baity and attends to the usual talking points without contextualizing the numbers or what's actually happening out there, the consequences or the harm. That makes me believe the author just wants to push some point across.

But I've yet to read a good article on this. Most articles are like this one. But yeah, are a few thousand images much in the context of crime that's happening online? Where are these numbers from and what's with the claim that there are more actual pictures out there? I seriously doubt that at this point, if it's so easy to generate images. And what consequences does all of this have? Does it mean an increase or a decrease in abuse? And lots of services have implemented filters... Are the platforms doing their due diligence? Is this a general societal issue or criminals doing crime?

[–] [email protected] 10 points 12 hours ago* (last edited 11 hours ago)

It's certainly technically possible. I suspect these AI models just aren't good at it. So the pedophiles need to train them on actual images.

I can imagine for example AI doesn't know what puberty is since it has in fact not seen a lot of naked children. It would try to infer from all the internet porn it's seen, and draw any female with big breasts, disregarding age. And that's not how children actually look.

I haven't tried, since it's illegal where I live. But that's my suspicion why pedophiles bother with training models.

(Edit: If that's the case, it would mean the tech companies are more or less innocent. At least at this.

And note a lot of the CSAM talk is FUD (spreading fear, uncertainty and doubt) I usually see this in the context of someone pushing for total surveillance of the people. It's far less pronounced in my experience than some people make it to be. I've been around on the internet, and I haven't seen any real pictures, yet. I'm glad that I didn't, but that makes me believe you have to actively look for that kind of stuff, or be targeted somehow.

And I think a bit mure nuance would help. This article also lumps together fictional drawings and real pictures. I think that's counterproductive, since one is a heinous crime and has real victims. And like, drawing nude anime children or de-aging celebrities isn't acceptable either (depends on legislation), but I think we need to differentiate here. I think real pictures are entirely on a different level and should have far more severe consequences. If we mix everything together, we kind of take away from that.)

[–] [email protected] 5 points 11 hours ago

That’s not exactly how it works.

It can “understand” different concepts and mix them, without having to see the combination before hand.

As for the training thing, that would probably be more LORA. They’re like add-ons you can put on your AI to draw certain things better like a character, a pose, etc. not needed for the base model.

[–] [email protected] 7 points 12 hours ago

Training an existing model on a specific set of new data is known as "fine tuning".

A base model has broad world knowledge and the ability to generate outputs of things it hasn't specifically seen, but a tuned model will provide "better" (fucking yuck to even write it) results.

The closer your training data is to your desired result, the better.