this post was submitted on 26 Mar 2025
36 points (100.0% liked)

Opensource

2365 readers
54 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

CreditsIcon base by Lorc under CC BY 3.0 with modifications to add a gradient



founded 2 years ago
MODERATORS
top 10 comments
sorted by: hot top controversial new old
[–] snowe 4 points 5 days ago (1 children)

about 50% of traffic to programming.dev is bots who have marked their user-agents as such. I'm pretty confident the actual number is higher, but haven't spent time validating.

[–] JustFudgnWork@sh.itjust.works 2 points 4 days ago (1 children)
[–] RonSijm 3 points 3 days ago (1 children)

Snowe is sysadmin of programming.dev...

So source: Snowe

Oh thanks lol

[–] sudo 2 points 5 days ago

while others could be executing real-time searches when users ask AI assistants for information.

WTF? Is this even considered ai anymore? Sounds more like a Just-In-Time search engine.

The frequency of these crawls is particularly telling. Schubert observed that AI crawlers "don't just crawl a page once and then move on. Oh, no, they come back every 6 hours because lol why not." This pattern suggests ongoing data collection rather than one-time training exercises, potentially indicating that companies are using these crawls to keep their models' knowledge current.

Whats telling is that these scrapers aren't just downloading the git repos and parsing those. These aren't targeted in anyways. They're probably doing something primitive like just following every link they see and getting caught in loops. If the labyrinth solution works then that confirms it.

[–] sudo 1 points 5 days ago (1 children)

Lol the article 403s with my VPN on.

[–] Kissaki 1 points 5 days ago

you evil AI you! /s

[–] onlinepersona -3 points 6 days ago* (last edited 6 days ago) (1 children)

Maybe these open source sites should move off the public internet and use alternative DNS servers with signup and alternative TLDs. Something like OpenNIC, but with signup. Or go straight to darknets like TOR and I2P. Maybe I2P would be better as it's slower and crawlers would probably timeout just trying to access content.

Anti Commercial-AI license

[–] Kissaki 4 points 5 days ago* (last edited 5 days ago) (1 children)

Unless you continuously change you IP I don't see how locking DNS resolution behind a signup would solve it. You only need to resolve once, and then you know the mapping of domain to IP and can use it elsewhere. That mapping doesn't change often for hosted services.

Any wall you build up will also apply to regular users you want to reach.

[–] onlinepersona 1 points 5 days ago

That's a good point. Using alternative DNS servers and alternative TLDs might be useful until they cotton on. It could even stress OpenNIC 🤔

I2P could be better.

Anti Commercial-AI license