this post was submitted on 20 Oct 2023

1350 points (100.0% liked)

196

17478 readers

1264 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

Other rules

Behavior rules:

No bigotry (transphobia, racism, etc…)
No genocide denial
No support for authoritarian behaviour (incl. Tankies)
No namecalling
Accounts from lemmygrad.ml, threads.net, or hexbear.net are held to higher standards
Other things seen as cleary bad

Posting rules:

No AI generated content (DALL-E etc…)
No advertisements
No gore / violence
Mutual aid posts are not allowed

NSFW: NSFW content is permitted but it must be tagged and have content warnings. Anything that doesn't adhere to this will be removed. Content warnings should be added like: [penis], [explicit description of sex]. Non-sexualized breasts of any gender are not considered inappropriate and therefore do not need to be blurred/tagged.

If you have any questions, feel free to contact us on our matrix channel or email.

Other 196's:

founded 2 years ago

MODERATORS

[email protected]

1350

AI rule (media.infosec.exchange)

submitted 2 years ago by [email protected] to c/[email protected]

184 comments fedilink hide all child comments

source

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 26 points 2 years ago (5 children)

One thing I've started to think about for some reason is the problem of using AI to detect child porn. In order to create such a model, you need actual child porn to train it on, which raises a lot of ethical questions.

[–] [email protected] 27 points 2 years ago (1 children)

Cloudflare says they trained a model on non-cp first and worked with the government to train on data that no human eyes see.

It's concerning there's just a cache of cp existing on a government server, but it is for identifying and tracking down victims and assailants, so the area could not be more grey. It is the greyest grey that exists. It is more grey than #808080.

[–] [email protected] 3 points 2 years ago

well, many governments had no issue taking over a cp website and hosting it for montha to come, using it as a honeypot. Still they hosted and distributed cp, possibly to thousands of unknown customers who can redistribute it.

[–] [email protected] 9 points 2 years ago (1 children)

I'm pretty sure those AI models are trained on hashes of the material, not the material directly, so all you need to do is save a hash of the offending material in the database any time that type of material is seized

[–] revoopy 17 points 2 years ago (2 children)

That wouldn't be ai though? That would just be looking up hashes.

[–] [email protected] 9 points 2 years ago

You're almost there...

[–] [email protected] 4 points 2 years ago (2 children)

Nah, flipping the image would completely bypass a simple hash map

From my very limited understanding it's some special hash function that's still irreversible but correlates more closely with the material in question, so an AI trained on those hashes would be able to detect similar images because they'd have similar hashes, I think

[–] [email protected] 3 points 2 years ago

Perceptual hashes, I think they're called

[–] [email protected] 2 points 2 years ago (2 children)

could you provide a source for this? that spunds very counterintuive and bad for the hash functions. especially as the whole point of AI training in this case is detecting new images. And say a small boy at the beach wearing speedos has a lot of similiarity to a naked boy. So looking by some resemblance in the hash function would require the hashes to practically be reversible.

[–] [email protected] 4 points 2 years ago* (last edited 2 years ago)

Not all hashes are for security. They're called perceptual hashes

Probably a case of definitional drift of the word, because it probably should be just for the security kind.

[–] [email protected] 3 points 2 years ago

I'm no expert, but we use those kind of hashes at my company to detect fraudulent software Here's a Wikipedia link: https://en.m.wikipedia.org/wiki/Locality-sensitive_hashing

[–] [email protected] 6 points 2 years ago* (last edited 2 years ago) (1 children)

You absolutely do not real CSAM in the dataset for an AI to detect it.

It's pretty genius actually: just like you can make the AI create an image with prompts, you can get prompts from an existing image.

An AI detecting CSAM would have to be trained on nudity and on children separately. If an image-to-prompts conversion results in "children" AND "nudity", it is very likely the image was of a naked child.

This has a high false positive rate, because non-sexual nude images of children, which quite a few parents have (like images of their child bathing) would be flagged by this AI. However, the false negative rate is incredibly low.

It therefore suffices for an upload filter for social media but not for reporting to law enforcement.

[–] [email protected] 3 points 2 years ago

This dude isn't even whining about the false positives, they're complaining that it would require a repository of CP to train the model. Which yes, some are certainly being trained with the real deal. But with law enforcement and tech companies already having massive amounts of CP for legal reasons, why the fuck is there even an issue with having an AI do something with it? We already have to train mods on what CP looks like, there is no reason its more moral to put a human through this than a machine.

[–] [email protected] 2 points 2 years ago* (last edited 2 years ago)

This is a stupid comment trying to hide as philosophical. If your website is based in the US (like 80 percent of the internet is), you are REQUIRED to keep any CSAM uploaded to your website and report it. Otherwise, you're deleting evidence. So all these websites ALREADY HAVE giant databases of child porn. We learned this when Lemmy was getting overran with CP and DB0 made a tool to find it. This is essentially just using shit any legally operating website would already have around the office, and having a computer handle it instead of a human who could be traumatized or turned on by the material. Are websites better for just keeping a database of CP and doing nothing but reporting it to cops who do nothing? This isn't even getting into how moderators that look for CP STILL HAVE TO BE TRAINED TO DO IT!

Yeah, a real fuckin moral quandary there, I bet this is the question that killed Kant.