this post was submitted on 12 Jun 2023
22 points (100.0% liked)

Technology

37717 readers
501 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

What would happen if instead of users swarming existing servers when a fediverse service was put in the spotlight, each user spun up their own micro-instance and tried to federate with existing servers?

There's always the odd person who decides to host a personal fediverse service in their homelab for themselves, but would the fediverse work if that was actually the primary mode of interaction? Or would it fail in a similar way to now where the servers which receive the most federation requests need to scale up?

Presumably the failure modes for federation are easier to scale than browser requests since it's an async process.

top 26 comments
sorted by: hot top controversial new old
[–] [email protected] 9 points 1 year ago (3 children)

Possibly failure, because setup isn't just a simple or of box plop. And i can't see how pings from 5000 microservers is better than 5000 users looking to register? But that's more of a question than an informed opinion

[–] [email protected] 2 points 1 year ago

You also have to account another type of "ping" if a user lives in a cave 300 meters deep under sea level

[–] [email protected] 2 points 1 year ago (3 children)

Maybe I should clarify with "each user successfully spun up..." I'm mostly curious if the 5000 microservers trying to federate is a more sustainable access pattern than 5000 users hitting the website.

Since federation is an async process, it can be optimized on both ends in a way that user browser requests cannot.

At the same time, federation would overall result in more bandwidth being used because not every user wants to view every post in the frontend.

[–] [email protected] 5 points 1 year ago

Maybe I should clarify with “each user successfully spun up…” I’m mostly curious if the 5000 microservers trying to federate is a more sustainable access pattern than 5000 users hitting the website.

Sustainable in what sense?

It's way more sustainable in the sense of "one website is not controlling the entirety of the experience of a given type of service for 5000 users", for example. I think it's important to talk about specific kinds of sustainability, and specific threats to it.

Things to consider (apart from bandwidth-related considerations):

  • technical knowledge necessary to safely and securely run and maintain a service
  • space, time, and resources (including financial) to do so
  • ability, willingness, and energy to moderate a service (this is where Big Tech platforms are falling flat on their faces, for example, and where smaller fedi communities work pretty damn well)
[–] [email protected] 3 points 1 year ago

But instance federation is an async process that is happening constantly. A user on your instance may be a realtime load, its only sporatic (on a per user basis). Basically, me spinning up an instance is a constant burden on the network, but me browsing is just a temporary load on a single server.

My understandings is that the best situation is a good number of powerful machines with instances with users evenly distributed amongst them.

[–] [email protected] 2 points 1 year ago

But instance federation is an async process that is happening constantly. A user on your instance may be a realtime load, its only sporatic (on a per user basis). Basically, me spinning up an instance is a constant burden on the network, but me browsing is just a temporary load on a single server.

My understandings is that the best situation is a good number of powerful machines with instances with users evenly distributed amongst them.

[–] [email protected] 0 points 1 year ago (1 children)

that ansible book works great, its just a bash script away from regular user DiY.

I've watched people who never used a computer install blockchain nodes and miners (including the networks). If someone wants to do it, they WILL figure it out.

[–] [email protected] 0 points 1 year ago (1 children)

Sure I'm not saying they won't I'm saying there's not that many people who 'want' to beyond the effort of clicking install

[–] [email protected] 1 points 1 year ago (1 children)

my point is mainly that we are that close already. The ansible setup already boils it down to the bare minimum. its down to platform testing and building an installer.

[–] [email protected] 1 points 1 year ago

Next step is to spin up a cloud service which does all that for you, leaving you to just input a credit card and configure DNS correctly.

[–] [email protected] 5 points 1 year ago (1 children)

I dont think so. As an example, take the [email protected] community for example. It can have say 1000 subscribers from lemmy.ml but only needs to send content to lemmy.ml once as it comes in. All 1000 subscribers see the cache copy from lemmy.ml and a message is only sent back to beehaw.org for comments, votes, etc. With everyone having their own instance beehaw.org would have to send updates to each one instead of sending an update to one instance and 100 users seeing it. A good level to strive for is many small communities of say a few thousand (1-5 thousand or so). That way one single server doesnt get to massive but federation requests arent overwhelming instances either

[–] [email protected] 0 points 1 year ago (1 children)

So let's say we want to scale up to several million users - what would that look like?

[–] [email protected] 2 points 1 year ago

Well, if we wanted say 50 million users at 5000 users per instance we would have 10000 instances. If we wanted 1 billion users we would have 200 thousand instances

[–] [email protected] 4 points 1 year ago (2 children)

What you're describing is no longer federation but full P2P. From a purely technical point of view, it may work, but the biggest problem will be abuse (spam, excessive resource use, illegal content). When a new instance shows up, how do you know if it's a spammer or not? And if an instance is blocked by another instance, whose side should you be on?

[–] [email protected] 1 points 1 year ago

I hadn't even thought of the moderating yet.

[–] [email protected] 1 points 1 year ago

It wouldnt really be full P2P: I'd expect moderated communities to act as a funnel which everyone interacts with each other through. I wasn't really considering the hypothetical micro instances to be like a normal server, since even when federated its unlikely that they would consume as much federation bandwidth as a large instance. Most people wouldn't run a community, simply because they don't want to moderate it.

Realistically, the abuse problems you mention can already currently happen if someone wants to. It's easier to make an account on an existing server with a fresh email, spam a bit, and get banned than it is to register a new domain ($) and federate before doing the same. I think social networks would have a lot less spam if every time you wanted to send an abusive message, you had to spend $10 to burn a domain name.

Most of the content would still live on larger servers, so you end up moderating in the same place. Not much difference between banning an abusive user on your instance and banning an abusive single-user instance.

[–] [email protected] 2 points 1 year ago (1 children)

The way activitypub works is that each community has a list of every server that has at least one subscriber to that community.

Every time someone does something in that community, the community sends all those servers a message that tells them what just happened.

So instead of a few hundred servers it might have to inform of your one upvote of a post, it would have to basically inform every user (every user's server)

It would be bad, it's not designed to do that.

[–] [email protected] 1 points 1 year ago (1 children)

So you're saying that there's a sweet spot between the number of servers being federated and the number of users per server. I wonder what the optimal network distribution would look like.

[–] [email protected] 1 points 1 year ago

not a great range but im going to guess between 1,000 and 10,000 users per node.

this is usually the point where midrange servers can be used successfully and the operation is manageable by normal people. This also groups people enough that they aren't spamming the network with more requests than necessary to sync with thier friends.

[–] [email protected] 1 points 1 year ago (1 children)

It’s a similar concept to email, so I would imagine there will always be big players who will have a reputation of trustworthiness/reliability.

The whole concept here seems to favor spinning up your own “cache” instance between you and the content you want (similar to how old email clients worked, downloading emails from the mail server and never live-fetching them), which is fabulous for distributing the load. Discovery takes a back seat when doing that, but it’s still pretty doable.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

I think the main difference between fediverse and email WRT cache instances is that if you create a cache instance for email, you're only caching your personal emails. If you create a cache instance for a lemmy community, you're caching every event on the community.

My intuition says there's probably a breakpoint in community size where the cost of federating all events to the users who subscribe to them becomes greater than the cost of individually serving API requests to them on demand. Primarily because you'll be caching a far greater amount of content than you actually consume, unlike with email.

Edit: That said, scaling out async work queues is a heck of a lot easier than scaling out web servers and databases. That fact alone might skew the breakpoint far enough that only communities with millions of subscribers see a flip in the cost equation...

[–] [email protected] 2 points 1 year ago

Maybe I'm wrong (I'm on Lemmy since yesterday morning) but if you host your instance you're only caching the communities you are interested in ...if you never care about a community or interacted with an instance then those data will never reach your instance. Federated doesn't imply full redundancy

[–] [email protected] 1 points 1 year ago

I would be shocked if it worked well, seeing as it wasn't designed for that.

Even if it did though, where would we be having this conversation? It would work more like a texting app than any kind of community.

[–] [email protected] 1 points 1 year ago (2 children)

I'm running my personal instance, I haven't had any issue interacting.
AFAIK it would help spread the load since my instance just asks/receives the activity once from other instances and then aggregates everything locally.

So each time I access a post I need to ask: How many upvotes does it have? How many comments? Which ones are new? From those comments how many upvotes each one has? Which ones are replays to others? Also, get me pfp of each user.
I just changed the sorting, either main feed or comments in a post, well I need to ask in what order they should be displayed.

All of these queries are done only in my own instance with my instance's DB.

In this case beehaw.org just sends "Hey this post got an upvote", and my instance figures out how it would affect the rest of the posts in my feed.

Also, right now lemmy.ml is taking a toll with all the new users, it takes a while to refresh the page and get any update, but with my instance I can keep scrolling and reading the data my instance already got from lemmy.ml or any other instance.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

I'm also running my own instance, very few users and everything is really fast. Because I'm not on the same instance as all those users.

[–] [email protected] 1 points 1 year ago

I think there is a tipping point somewhere.

I think the connection calc is n * (n - 1) / 2 (at least, that's what it is for mesh networks) so 1000 servers would be handling ~500k connections each.
That would be for 1000 users.
Those connections might be more lightweight, but there are significantly more of them (might even run into OS issues with that many open connections)

If each server was handling 50 users, the mesh connections would then be 1.2k.
50 users should be a blip wrt server load, and 1.2k mesh connections is more manageable.

load more comments
view more: next ›