this post was submitted on 24 May 2025

1527 points (99.3% liked)

Science Memes

14717 readers

1820 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules

Don't throw mud. Behave like an intellectual and remember the human.
Keep it rooted (on topic).
No spam.
Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.

Research Committee

[email protected]

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago

MODERATORS

[email protected]

1527

Black Mirror AI (mander.xyz)

submitted 1 week ago by [email protected] to c/[email protected]

209 comments fedilink hide all child comments

(page 2) 50 comments

sorted by: hot top controversial new old

[–] [email protected] 35 points 1 week ago

Such a stupid title, great software!

[–] mspencer712 35 points 1 week ago

Wait… I just had an idea.

Make a tarpit out of subtly-reprocessed copies of classified material from Wikileaks. (And don’t host it in the US.)

[–] [email protected] 26 points 1 week ago (12 children)

Why are the photos all ugly biological things

load more comments (12 replies)

[–] [email protected] 20 points 1 week ago (2 children)

Btw, how about limiting clicks per second/minute, against distributed scraping? A user who clicks more than 3 links per second is not a person. Neither, if they do 50 in a minute. And if they are then blocked and switch to the next, it's still limited in bandwith they can occupy.

[–] [email protected] 10 points 1 week ago (3 children)

I click links frequently and I'm not a web crawler. Example: get search results, open several likely looking possibilities (only takes a few seconds), then look through each one for a reasonable understanding of the subject that isn't limited to one person's bias and/or mistakes. It's not just search results; I do this on Lemmy too, and when I'm shopping.

load more comments (3 replies)

[–] JadedBlueEyes 9 points 1 week ago (6 children)

They make one request per IP. Rate limit per IP does nothing.

load more comments (6 replies)

[–] [email protected] 14 points 1 week ago

Typical bluesky post

[–] [email protected] 14 points 1 week ago* (last edited 6 days ago) (5 children)

There should be a federated system for blocking IP ranges that other server operators within a chain of trust have already identified as belonging to crawlers. A bit like fediseer.com, but possibly more decentralized.

(Here's another advantage of Markov chain maze generators like Nepenthes: Even when crawlers recognize that they have been served garbage and they delete it, one still has obtained highly reliable evidence that the requesting IPs are crawlers.)

Also, whenever one is only partially confident in a classification of an IP range as a crawler, instead of blocking it outright one can serve proof-of-works tasks (à la Anubis) with a complexity proportional to that confidence. This could also be useful in order to keep crawlers somewhat in the dark about whether they've been put on a blacklist.

load more comments (5 replies)

[–] [email protected] 13 points 1 week ago

--recurse-depth=3 --max-hits=256

[–] [email protected] 11 points 1 week ago (2 children)

I'm imagining a break future where, in order to access data from a website you have to pass a three tiered system of tests that make, 'click here to prove you aren't a robot' and 'select all of the images that have a traffic light' , seem like child's play.

load more comments (2 replies)

load more comments