this post was submitted on 05 Jun 2023
34 points (97.2% liked)

Lemmy

12544 readers
91 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to [email protected].

founded 4 years ago
MODERATORS
 

With forewarning about a huge influx of users, you know Lemmy.ml will go down. Even if people go to https://join-lemmy.org/instances and disperse among the great instances there, the servers will go down.

Ruqqus had this issue too. Every time there was a mass exodus from Reddit, Ruqqus would go down, and hardly reap the rewards.

Even if it's not sustainable, just for one month, I'd like to see Lemmy.ml drastically boost their server power. If we can raise money as a community, what kind of server could we get for 100$? 500$? 1,000$?

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 15 points 1 year ago* (last edited 1 year ago) (11 children)

Based on looking at the code and the relatively small size of the data, I think there may be fundamental scaling issues with the site architecture. Software development may be far more critical than hardware at this point.

[–] [email protected] 5 points 1 year ago (10 children)

What are you seeing in the code that makes it hard do scale horizontally? I've never looked at Lemmy before, but I've done the steps of (monolithic app) -> docker -> make app stateless -> Kubernetes before and as a user, I don't necessarily see the complexity (not saying it's not there, but wondering what specifically in the site architecture prevents this transition)

[–] [email protected] 16 points 1 year ago* (last edited 1 year ago) (9 children)

Right now it looks to me like Lemmy is built all around live real-time data queries of the SQL database. This may work when there are 100 postings a day and an active posting gets 80 comments... but it likely doesn't scale very well. You tend to have to evolve to a queue system where things like comments and votes are merged into the main database in more of a batch process (Reddit does this, you will see on their status page that comments and votes have different uptime tracking than the main website).

On the output side, it seems ideal to have all data live and up to the very instant, but it can fall over under load surges (which may be a popular topic, not just an influx from the decline of Twitter or Reddit). To scale, you tend to have to make some compromises and reuse output. Some kind of intermediate layer such as every 10 seconds only regenerate the output page if there has been a new write (vote or comment change).

don’t necessarily see the complexity (not saying it’s not there

It's the lack of complexity that's kind of the problem. Doing direct SQL queries gets you the latest data, but it becomes a big bottleneck. Again, what might have seemed to work fine when there were only 5000 postings and 100,000 total comments in the database can start to seriously fall over when you have reached the point of accumulating 1000 times that.

[–] [email protected] 3 points 1 year ago

Do you know of any resources about this, and/or how to implement it?

load more comments (8 replies)
load more comments (8 replies)
load more comments (8 replies)