this post was submitted on 07 Dec 2024

63 points (98.5% liked)

Fediverse

28730 readers

230 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to [email protected]!

Rules

Posts must be on topic.
Be respectful of others.
Cite the sources used for graphs and other statistics.
Follow the general Lemmy.world rules.

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 2 years ago

MODERATORS

[email protected]

Seems .world and .ee federation are broken (lemm.ee)

submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]

31 comments fedilink hide all child comments

Is it a .world or a .ee issue?

Seems .world to .ee is broken so i wont be seeing anyone's comments from .world and the rest of the fediverse probably wont see this.

Heres ya issue: https://grafana.lem.rocks/d/bdid38k9p0t1cf/federation-health-single-instance-overview?orgId=1&var-instance=lemmy.world&var-remote_instance=lemm.ee

Edit found more info

top 31 comments

sorted by: hot top controversial new old

[–] Nothing4You 23 points 2 weeks ago (1 children)

it's not just lemmy.world.

of the larger instances, the following have trouble sending activities to lemm.ee currently:

lemmynsfw.com -> lemm.ee: 2.81d behind
sh.itjust.works -> lemm.ee: 1.04d behind
lemmy.world -> lemm.ee: 22.5h behind

i pinged @[email protected] on matrix about 30h ago already about the issues with federation from lemmynsfw.com, as it was the first one i noticed, but I haven't heard back yet.

[–] [email protected] 1 points 2 weeks ago

Seems like theres some widespread DNS problems again today

[–] [email protected] 20 points 2 weeks ago (1 children)

https://grafana.lem.rocks/d/bdid38k9p0t1cf/federation-health-single-instance-overview?orgId=1&var-instance=lemmy.world&var-remote_instance=lemm.ee

[–] [email protected] 2 points 2 weeks ago (1 children)

Wait, how do they get that data remotely? I was looking at my instance vs world and I saw there's like the +1 hour from a week or so ago when I upgraded to latest mbin lol.

I guess they're looking at common activities and when they appear on each?

[–] Nothing4You 9 points 2 weeks ago (2 children)

lemmy has a public api that shows the federation queue state for all linked instances.

it provides the internal numeric id of the last activity that was successfully sent to an instance, as well as the timestamp of the activity that was sent, and also when it was sent. it also includes data like how many times sending was unsuccessful since the last successful send. each instance only knows about its own outbound federation, but you can just collect this information from both sides to get the full picture.

there is also https://phiresky.github.io/lemmy-federation-state/site to look at the details provided by a specific instance.

[–] [email protected] 2 points 2 weeks ago (1 children)

This is a nice tool though using it I think triggered my IP to be flagged with Cloudflare when I was trying to fix an issue with my instance and lemmy.ml.

[–] Nothing4You 2 points 2 weeks ago (1 children)

lemmy.ml doesn't use cloudflare, that's strange.

i've also never had issues with this when looking at instances that do use cloudflare.

[–] [email protected] 1 points 2 weeks ago

After commenting I had a theory and it may be right. I have dual WAN for redundancy and setup a routing policy for ml and world to route through my WAN that is not CGNAT, going with the assumption that CGNAT sometimes the public IP is blocked. The primary problem with it is that images will break when federating and after doing this it seems to be working better.

That being said it all started happening after I used the Lemmy state checker and I assume since it queries the endpoint for the selected site on an interval I got flagged by something.

[–] [email protected] 1 points 2 weeks ago (1 children)

That makes sense. So it's showing me world's federation with me and not the other way (since I'm not sure such info is available on mbin)

[–] Nothing4You 2 points 2 weeks ago (1 children)

pretty much, yeah. lemmy has a persistent federation queue instead of fire and forget requests when activities get generated. this means activities can be retried if they fail. this allows for (theoretically) lossless federation even if an instance is down for maintenance or other reasons. if mbin has a similar system maybe they could expose that as well, but unless the system is fairly similar in the way it represents this data it will be challenging to integrate it in a view like this without having to create dedicated mbin dashboard.

[–] [email protected] 2 points 2 weeks ago

We can see it ourselves. We use rabbitmq for incoming (and maybe outgoing, it's been a while since I looked at how it is) federation. So, you can see the queues there. For incoming (from rabbitmq) and outgoing there are also queues (symfony messenger) and these handle failures and can be configured and can be queried.

After the upgrade I just took the default configuration again (because it seems queue names changed). But I used to have various rules setup in rabbitmq for retries and it took a fair few tries before the messages ended up in the proper "failed" queue (which needs manual action to retry). Some items you eventually need to clear (instances that just shutdown, or instances that lost their domain for example). They will never complete.

But it's not exposed in any way to my knowledge. Well unless people have their rabbitmq web interface open and without login of course.

[–] [email protected] 9 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

Huh.

I can see the post from my .world account

[–] [email protected] 6 points 2 weeks ago

Yeah but u cant see this comment from my .ee account

[–] [email protected] 7 points 2 weeks ago (1 children)

I hope it get fixed soon. Currently using this alt.

[–] [email protected] 11 points 2 weeks ago (2 children)

Mind u lemmy without .world is an interesting experience

[–] [email protected] 10 points 2 weeks ago

its about 25% of my instance traffic. its noticeable, but theres definitely a shift towards spreading content out.

[–] [email protected] 6 points 2 weeks ago

It is.

[–] [email protected] 7 points 2 weeks ago (1 children)

I know there was a problem where federation was lagging by a couple of days, but I thought it was fixed? I'll hit up the Admins.

[–] [email protected] 7 points 2 weeks ago

Nar its lagging now. Ps Can only see this cos im on a .world alt

[–] [email protected] 5 points 2 weeks ago

and the rest of the fediverse probably wont see this.

As a The Lemmy Club user, I can properly see the post. Federation seems to be working okay.

[–] [email protected] 5 points 2 weeks ago (1 children)

I'm seeing this 12 hours later lol

[–] [email protected] 4 points 2 weeks ago

Its starting to catch back up

[–] [email protected] 3 points 2 weeks ago* (last edited 2 weeks ago)

I see it? Maybe I just haven’t noticed but I think .world is fine for me?

Edit: nvm you might be right. I just checked my posts in [email protected]. They normally get 60-100 up votes but the last one only got 3.

[–] [email protected] 3 points 2 weeks ago (1 children)

Admins have been talking about it, it looks like it has something to do with banning/unbanning users. Sometimes a ban/unban doesn't want to propagate to .ee and that causes a logjam somehow.

Smarter people than me are looking at it!

[–] [email protected] 3 points 2 weeks ago (2 children)

If its that bloody bot that bans u if ur comment karma is to low thats gonna be funny. I got banned/unbanned like 10 times in an hour cos my comment karma was fluctuating near the bots threshold.

[–] [email protected] 2 points 2 weeks ago

That thing is such a nuisance. Makes the modlog nearly unusable.

[–] [email protected] 2 points 2 weeks ago

Til there is such a thing

[–] [email protected] 3 points 2 weeks ago (1 children)

Looking at incoming request. .world is working OK for me. They seem to be batching stuff like I'll get nothing for 30 seconds, then over 3 seconds like 50+ requests.

Of course I don't know if their queue is backed up and I'm getting delayed stuff. I'd need to stop processing and look into the incoming queue to see what they're sending.

Bit of an edit. Looking at incoming again I can see under newest items, an entry from world that was 11 minutes old. Oh I have an idea. I'll see if this edit gets there in a timely manner.

Spoiler alert, it was instant.

Oh ignore me. It's specifically between those two instances I guess.

[–] [email protected] 4 points 2 weeks ago (1 children)

Aparently its between big instances and lemm.ee only. All of said instances use cloudflair. I suspect that cloudflair has blocked/rate limited the larger instances from reaching lemm.ee.

[–] [email protected] 3 points 2 weeks ago (1 children)

I think it must be hit/miss. Because I think those edits I made would have gone from my instance to world and then from world to .ee, and it was happening within seconds.

So, presumably random stuff is being dropped or delayed?

[–] [email protected] 1 points 2 weeks ago

Yeah ur right. Everything going on here must go through .world as it is the community. Which makes it even weirder that we dont have .world users federaring out.

My understanding of federation is ur events get sent to ur instance (kbin.life) ur instance sends those events to the instance of the relevent community (lemmy.world) the instance of the community sends that event to all instances subscribed to that community (lemm.ee, kbin.life, etc) my instance receives that event then notified me.

So this conversion is goibg through .world just that .world users events seem not to be gettibg sent to .ee

I think we need someone smarter than both of us to fix this