manual and builds are here: https://zadzmo.org/code/nepenthes/
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
They're framing it as "AI haters" instead of what it actually is, which is people who do not like that robots have been programmed to completely ignore the robots.txt files on a website.
No AI system in the world would get stuck in this if it simply obeyed the robots.txt files.
The disingenuous phrasing is like "pro life" instead of what it is, "anti-choice"
It's not that we "hate them" - it's that they can entirely overwhelm a low volume site and cause a DDOS.
I ran a few very low visit websites for local interests on a rural. residential line. It wasn't fast but was cheap and as these sites made no money it was good enough Before AI they'd get the odd badly behaved scraper that ignored robots.txt and specifically the rate limits.
But since? I've had to spend a lot of time trying to filter them out upstream. Like, hours and hours. Claudebot was the first - coming from hundreds of AWS IPs and dozens of countries, thousands of times an hour, repeatedly trying to download the same urls - some that didn't exist. Since then it's happened a lot. Some of these tools are just so ridiculously stupid, far more so than a dumb script that cycles through a list. But because it's AI and they're desperate to satisfy the "need for it", they're quite happy to spend millions on AWS costs for negligable gain and screw up other people.
Eventually I gave up and redesigned the sites to be static and they're now on cloudflare pages. Arguably better, but a chunk of my life I'd rather not have lost.
Good
I am so gonna deploy this. I want the crawlers to index the entire Mandelbrot set.
I'll train with with lyrics from Beck Hansen and Smash Mouth so that none of it makes sense.