In the last weeks Lemmy has seen a lot of growth, with thousands of new users. To welcome them we are holding this AMA to answer questions from the community. You can ask about the beginnings of Lemmy, how we see the future of Lemmy, our long-term goals, what makes Lemmy different from Reddit, about internet and social media in general, as well as personal questions.
We'd also like to hear your overall feedback on Lemmy: What are its greatest strengths and weaknesses? How would you improve it? What's something you wish it had? What can our community do to ensure that we keep pulling users away from US tech companies, and into the fediverse?
Lemmy and Reddit may look similar at first glance, but there is a major difference. While Reddit is a corporation with thousands of employees and billionaire investors, Lemmy is nothing but an open source project run by volunteers. It was started in 2019 by @dessalines and @nutomic, turning into a fulltime job since 2020. For our income we are dependent on your donations, so please contribute if you can. We'd like to be able to add more full-time contributors to our co-op.
We will start answering questions from tomorrow (Wednesday). Besides @dessalines and @nutomic, other Lemmy contributors may also chime in to answer questions:
Here are our previous AMAs for those interested.
Does the project maintain a list of known slow queries? This is my favorite type of work
The post list query is by far the worst offender. It needs to filter, sort, cursor paginate, and join to many tables, and indexes are hard to follow and keep up with.
What's more is that the problems only surface with lots of historical data, meaning we can only really test the query plans with a fully populated DB.
All this requires running lemmy locally, and inspecting the postgres query durations. We really need proper test suites (lemmy DB perf is one example) that can stress-test production data also.
Here is one historical issue:
I should have some time tonight to start looking at this. Thanks for the info!
Thank you in advance!
Good evening Dessalines, I have started looking at the posts query.
The lowest hanging fruit I think would be if we could replace some of the joins with
WHERE EXISTS
which can have a huge impact on the query time. It seems this is supported in Diesel: https://stackoverflow.com/a/74300447This is my first time looking at the codebase so I can't tell yet which joins are purely for filtering (in which case they can be replaced by
WHERE EXISTS
) and which joins need to be left in because some of their columns end up in the finalSELECT
I can't tell for sure yet but it also looks like this might also be using
LIMIT...OFFSET
pagination? That can be a real drag on performance but isn't as easy to fix.EDIT:
Looking some more, and reading some linked github discussion - I think to really get this out of the performance pits will require some denormalization like a materialized view or manual cache tables populated by triggers. I really like the ranking algorithm but so far I'm finding it difficult to optimize from a query perspective
This is helpful. Could you make a github issue and copy-paste this there? Thx.
Done: https://github.com/LemmyNet/lemmy/issues/5555
Nice number
lol! i love your inputs hahaha
Trying to keep it fun ha ha
I love it, thanks!!
Sounds promising