this post was submitted on 25 Mar 2025
265 points (98.9% liked)

Announcements

24002 readers
5 users here now

Official announcements from the Lemmy project. Subscribe to this community or add it to your RSS reader in order to be notified about new releases and important updates.

You can also find major news on join-lemmy.org

founded 5 years ago
MODERATORS
 

In the last weeks Lemmy has seen a lot of growth, with thousands of new users. To welcome them we are holding this AMA to answer questions from the community. You can ask about the beginnings of Lemmy, how we see the future of Lemmy, our long-term goals, what makes Lemmy different from Reddit, about internet and social media in general, as well as personal questions.

We'd also like to hear your overall feedback on Lemmy: What are its greatest strengths and weaknesses? How would you improve it? What's something you wish it had? What can our community do to ensure that we keep pulling users away from US tech companies, and into the fediverse?

Lemmy and Reddit may look similar at first glance, but there is a major difference. While Reddit is a corporation with thousands of employees and billionaire investors, Lemmy is nothing but an open source project run by volunteers. It was started in 2019 by @dessalines and @nutomic, turning into a fulltime job since 2020. For our income we are dependent on your donations, so please contribute if you can. We'd like to be able to add more full-time contributors to our co-op.

We will start answering questions from tomorrow (Wednesday). Besides @dessalines and @nutomic, other Lemmy contributors may also chime in to answer questions:

Here are our previous AMAs for those interested.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 11 points 5 days ago (3 children)

The way to solve the database problems isn't to keep throwing more and more money at powerful servers and scaling. Its to fix it at the root: lemmy's unoptimized database.

@dullbananas has done invaluable work in making our DB better (and all of these will be in 1.0), but I'm convinced that if we had even 1-2 more Postgresql experts do a pass over the DB, and ideally one full-time expert, all of these problems could be solved.

[–] [email protected] 5 points 5 days ago (1 children)

Does the project maintain a list of known slow queries? This is my favorite type of work

[–] [email protected] 6 points 5 days ago (2 children)

The post list query is by far the worst offender. It needs to filter, sort, cursor paginate, and join to many tables, and indexes are hard to follow and keep up with.

What's more is that the problems only surface with lots of historical data, meaning we can only really test the query plans with a fully populated DB.

All this requires running lemmy locally, and inspecting the postgres query durations. We really need proper test suites (lemmy DB perf is one example) that can stress-test production data also.

Here is one historical issue:

[–] [email protected] 6 points 5 days ago (1 children)

I should have some time tonight to start looking at this. Thanks for the info!

[–] [email protected] 2 points 5 days ago

Thank you in advance!

[–] [email protected] 4 points 4 days ago* (last edited 4 days ago) (2 children)

Good evening Dessalines, I have started looking at the posts query.

The lowest hanging fruit I think would be if we could replace some of the joins with WHERE EXISTS which can have a huge impact on the query time. It seems this is supported in Diesel: https://stackoverflow.com/a/74300447

This is my first time looking at the codebase so I can't tell yet which joins are purely for filtering (in which case they can be replaced by WHERE EXISTS) and which joins need to be left in because some of their columns end up in the final SELECT

I can't tell for sure yet but it also looks like this might also be using LIMIT...OFFSET pagination? That can be a real drag on performance but isn't as easy to fix.

EDIT:

Looking some more, and reading some linked github discussion - I think to really get this out of the performance pits will require some denormalization like a materialized view or manual cache tables populated by triggers. I really like the ranking algorithm but so far I'm finding it difficult to optimize from a query perspective

[–] [email protected] 2 points 4 days ago (1 children)

This is helpful. Could you make a github issue and copy-paste this there? Thx.

[–] [email protected] 3 points 4 days ago (1 children)
[–] [email protected] 2 points 3 days ago (1 children)
[–] [email protected] 1 points 3 days ago (1 children)

lol! i love your inputs hahaha

[–] [email protected] 2 points 3 days ago (1 children)
[–] [email protected] 2 points 3 days ago

I love it, thanks!!

[–] [email protected] 2 points 4 days ago

Sounds promising

[–] [email protected] 3 points 5 days ago (1 children)

I 100% agree with this and there have been great strides since I started using Lemmy ~v0.17! That said at some point optimization will have lower returns and have a higher effort to put into and once a community grows extensively it likely might not be enough, so I was curious to what you guys were thinking at that point, something like Ctius for sharding postgres?

[–] [email protected] 5 points 5 days ago

I'm sure we're nowhere near that level yet. We haven't come close to postgres's limits, and most of our bottlenecks are unoptimized queries.

[–] [email protected] 1 points 5 days ago (1 children)

That's a very interesting point, have you tried asking for support on [email protected] or other general communities? There are probably a few Postgres experts on the platform

[–] [email protected] 4 points 5 days ago

We've asked for help various times, but don't usually get much help. Despite the seemingly large number of "experts" out there, only a tiny number of them contribute to open source. I'd still consider it mostly a wasteland, with a few people doing the work that should be done by 100x their number.