this post was submitted on 25 Mar 2025
262 points (98.9% liked)

Announcements

23994 readers
45 users here now

Official announcements from the Lemmy project. Subscribe to this community or add it to your RSS reader in order to be notified about new releases and important updates.

You can also find major news on join-lemmy.org

founded 5 years ago
MODERATORS
 

In the last weeks Lemmy has seen a lot of growth, with thousands of new users. To welcome them we are holding this AMA to answer questions from the community. You can ask about the beginnings of Lemmy, how we see the future of Lemmy, our long-term goals, what makes Lemmy different from Reddit, about internet and social media in general, as well as personal questions.

We'd also like to hear your overall feedback on Lemmy: What are its greatest strengths and weaknesses? How would you improve it? What's something you wish it had? What can our community do to ensure that we keep pulling users away from US tech companies, and into the fediverse?

Lemmy and Reddit may look similar at first glance, but there is a major difference. While Reddit is a corporation with thousands of employees and billionaire investors, Lemmy is nothing but an open source project run by volunteers. It was started in 2019 by @dessalines and @nutomic, turning into a fulltime job since 2020. For our income we are dependent on your donations, so please contribute if you can. We'd like to be able to add more full-time contributors to our co-op.

We will start answering questions from tomorrow (Wednesday). Besides @dessalines and @nutomic, other Lemmy contributors may also chime in to answer questions:

Here are our previous AMAs for those interested.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 22 points 3 days ago (4 children)

What are your thoughts on blocking AI scraper access? Any attempts to improve that on the side of Lemmy? Basic things like allowing to customize the robots.txt easily would already help.

I also recently tried this new AI block tool called Anubis with Lemmy, but for some reason it fails with Lemmy-ui. Might be interesting to investigate further.

[–] [email protected] 14 points 2 days ago

You can load a different robots.txt in your nginx config, something like this:

location /robotx.txt {
    index /path/to/my/robots.txt;
}

Additionally 1.0 will change the "private instance" to work with federation enabled (see https://github.com/LemmyNet/lemmy/pull/5530). Then only logged-in users will see content, while AI scrapers wont see anything except the login page.

[–] [email protected] 17 points 3 days ago (1 children)

Anyone that wants to scrape Lemmy would have an easier time setting up their own server, federating with everyone, and reading straight from their DB. No web scraping required. Though, web scraping defenses would be useful against general web scrapers/crawlers.

[–] [email protected] 16 points 3 days ago (1 children)

That would require the authors of these AI scrapers to actually give a f*ck. The problem is that they don't, and just scrape what ever they can find repeatatly almost like a ddos attack on the open web.

[–] Deebster 6 points 3 days ago

Yup, same as they could clone git repos in one shot, but they instead crawl every single page.

[–] [email protected] 2 points 3 days ago (1 children)

I just set up Anubis today. Specifically I'm only testing it for Lemmy-ui, and it seems to work fine.

It looks like the distributed waves that keep bringing the service down hit exclusively our lemmy-ui subdomain, so maybe non-SSR photon is also a good defense, heh.

[–] [email protected] 3 points 3 days ago (1 children)

Hmm, that is odd. I guess I need to double check my Nginx config for lemmy-ui then. You have your setup documented somewhere?

[–] [email protected] 2 points 3 days ago

I don't think it should be a problem, but I'm not that sure either. Lemmy.fedi.zutto.fi also runs it and that's just a normal lemmy-ui installation. I think Zutto simply forwarded all traffic to Anubis and then fixed federation. There was some discussion and config shared in sopuli's finnish matrix room.

[–] [email protected] 2 points 3 days ago* (last edited 3 days ago)

I've previously worked in anti-scraping. There is a negative 0% chance the Lemmy devs have the resources to effectively do this without tanking the server for everyone else.