this post was submitted on 25 Mar 2025
262 points (98.9% liked)

Announcements

23994 readers
37 users here now

Official announcements from the Lemmy project. Subscribe to this community or add it to your RSS reader in order to be notified about new releases and important updates.

You can also find major news on join-lemmy.org

founded 5 years ago
MODERATORS
 

In the last weeks Lemmy has seen a lot of growth, with thousands of new users. To welcome them we are holding this AMA to answer questions from the community. You can ask about the beginnings of Lemmy, how we see the future of Lemmy, our long-term goals, what makes Lemmy different from Reddit, about internet and social media in general, as well as personal questions.

We'd also like to hear your overall feedback on Lemmy: What are its greatest strengths and weaknesses? How would you improve it? What's something you wish it had? What can our community do to ensure that we keep pulling users away from US tech companies, and into the fediverse?

Lemmy and Reddit may look similar at first glance, but there is a major difference. While Reddit is a corporation with thousands of employees and billionaire investors, Lemmy is nothing but an open source project run by volunteers. It was started in 2019 by @dessalines and @nutomic, turning into a fulltime job since 2020. For our income we are dependent on your donations, so please contribute if you can. We'd like to be able to add more full-time contributors to our co-op.

We will start answering questions from tomorrow (Wednesday). Besides @dessalines and @nutomic, other Lemmy contributors may also chime in to answer questions:

Here are our previous AMAs for those interested.

you are viewing a single comment's thread
view the rest of the comments
[–] seang96@spgrn.com 10 points 3 days ago (2 children)

On the server perspective, I have a question, what are your thoughts for horizontal scaling on the database? This seems to be the biggest limitation and requiring higher spec hardware to scale especially for the bigger instances.

My tiny instance for example I give over 20GB of RAM just to postgres to make it perform efficient enough.

[–] dessalines@lemmy.ml 11 points 2 days ago (3 children)

The way to solve the database problems isn't to keep throwing more and more money at powerful servers and scaling. Its to fix it at the root: lemmy's unoptimized database.

@dullbananas has done invaluable work in making our DB better (and all of these will be in 1.0), but I'm convinced that if we had even 1-2 more Postgresql experts do a pass over the DB, and ideally one full-time expert, all of these problems could be solved.

[–] Transform2942@lemmy.ml 5 points 2 days ago (1 children)

Does the project maintain a list of known slow queries? This is my favorite type of work

[–] dessalines@lemmy.ml 6 points 2 days ago (2 children)

The post list query is by far the worst offender. It needs to filter, sort, cursor paginate, and join to many tables, and indexes are hard to follow and keep up with.

What's more is that the problems only surface with lots of historical data, meaning we can only really test the query plans with a fully populated DB.

All this requires running lemmy locally, and inspecting the postgres query durations. We really need proper test suites (lemmy DB perf is one example) that can stress-test production data also.

Here is one historical issue:

[–] Transform2942@lemmy.ml 4 points 2 days ago* (last edited 2 days ago) (2 children)

Good evening Dessalines, I have started looking at the posts query.

The lowest hanging fruit I think would be if we could replace some of the joins with WHERE EXISTS which can have a huge impact on the query time. It seems this is supported in Diesel: https://stackoverflow.com/a/74300447

This is my first time looking at the codebase so I can't tell yet which joins are purely for filtering (in which case they can be replaced by WHERE EXISTS) and which joins need to be left in because some of their columns end up in the final SELECT

I can't tell for sure yet but it also looks like this might also be using LIMIT...OFFSET pagination? That can be a real drag on performance but isn't as easy to fix.

EDIT:

Looking some more, and reading some linked github discussion - I think to really get this out of the performance pits will require some denormalization like a materialized view or manual cache tables populated by triggers. I really like the ranking algorithm but so far I'm finding it difficult to optimize from a query perspective

[–] dessalines@lemmy.ml 2 points 1 day ago (1 children)

This is helpful. Could you make a github issue and copy-paste this there? Thx.

[–] Transform2942@lemmy.ml 3 points 1 day ago (1 children)
[–] Blaze@lemmy.dbzer0.com 2 points 1 day ago (1 children)
[–] ademir@lemmy.eco.br 1 points 19 hours ago (1 children)

lol! i love your inputs hahaha

[–] Blaze@lemmy.dbzer0.com 2 points 17 hours ago (1 children)
[–] ademir@lemmy.eco.br 2 points 17 hours ago

I love it, thanks!!

[–] Blaze@lemmy.dbzer0.com 2 points 2 days ago

Sounds promising

[–] Transform2942@lemmy.ml 6 points 2 days ago (1 children)

I should have some time tonight to start looking at this. Thanks for the info!

[–] Blaze@lemmy.dbzer0.com 2 points 2 days ago

Thank you in advance!

[–] seang96@spgrn.com 3 points 2 days ago (1 children)

I 100% agree with this and there have been great strides since I started using Lemmy ~v0.17! That said at some point optimization will have lower returns and have a higher effort to put into and once a community grows extensively it likely might not be enough, so I was curious to what you guys were thinking at that point, something like Ctius for sharding postgres?

[–] dessalines@lemmy.ml 5 points 2 days ago

I'm sure we're nowhere near that level yet. We haven't come close to postgres's limits, and most of our bottlenecks are unoptimized queries.

[–] Blaze@lemmy.dbzer0.com 1 points 2 days ago (1 children)

That's a very interesting point, have you tried asking for support on !lemmy@lemmy.ml or other general communities? There are probably a few Postgres experts on the platform

[–] dessalines@lemmy.ml 4 points 2 days ago

We've asked for help various times, but don't usually get much help. Despite the seemingly large number of "experts" out there, only a tiny number of them contribute to open source. I'd still consider it mostly a wasteland, with a few people doing the work that should be done by 100x their number.

[–] nutomic@lemmy.ml 7 points 2 days ago (1 children)

20 GB RAM for a single user instance sounds like a lot. Did you use pgtune? It may also help to run a reindex or full vacuum.

[–] seang96@spgrn.com 2 points 2 days ago (1 children)

Yeah I used pgtune as a base and found more memory needed to be assigned to certain spots especially to keep federation with bigger instances, otherwise timeouts would occur resulting in my instance being constantly behind.

That said I read postgres 17 is much more memory efficient, though I have yet to move my lemmy database to it yet since its the largest haha.

[–] nutomic@lemmy.ml 4 points 2 days ago (1 children)

Maybe your disk is too slow, or latency between Lemmy and Postgres is too high?

[–] seang96@spgrn.com 3 points 2 days ago (1 children)

It is a k8s cluster and using ceph for all of my storage so the latency from that I bet is the largest reason and upping the memory offsets the disk writes. i also have another postgres DB syncing as a fallback for high availability. Fortunately after tuning the database and giving it enough RAM my instance has been running pretty stable for over a year without any changes.

I am also using less powerful computers for the entire infrastructure (not server grade) which brings to the point of having horizontal scaling on database I imagine will be a growing need with growing instances, communities, and users since it can be cheaper to run multiple smaller spec servers rather than a single with the added benefit of high availability.

[–] nutomic@lemmy.ml 6 points 2 days ago (1 children)

Postgres supports sharding which should work without any changes in Lemmy. But so far not even lemmy.world needs that. There are also read replicas which would require support directly in Lemmy afaik. Such a feature will surely be added as instances grow bigger over time and need more resources.

[–] seang96@spgrn.com 3 points 2 days ago

I didn't think of using read only replicas, that would probably be a very good way to go since its probably 80%+ of actions are reads. Thanks for answering, I am excited to see the how lemmy grows and thanks for all the devs hard work!