FYI, this is due to a confluence of issues.
- We are the largest instance with the highest active user count - and by a good margin.
- We are dealing with a premature software base that was not designed to handle this load. For example, the way the ActivityPub Federation queues are handled are not conducive to high volume requests. Failed messages stay in queue for 60 seconds before they retry once, and if that attempt fails it sits in queue for one hour before attempting to retry. These queued messages sit in memory the whole time. It's not great, and there isn't much we can currently do to change this, other than to manually defederate from 'dead' servers in order to drop the number of items stuck in queue that are never going to get a response. Not an elegant solution by any means, and one we will go back and address when future tools are in place, but we have seen significant improvement because of this.
- We have attempted contacting Lemmy devs for some insight/assistance with this, but have not heard back yet, at this time. Much of this is in their hands.
- We were able to confirm receipt of our federation messages (from lemmy.world) to other instance admins instances at lemm.ee and discuss.as200950.com. As such we do know that federation is working at least to some degree, but it is obviously still in need of some work. As mentioned above, we have reached out to the Lemmy devs, who are instance owners of Lemmy.ml, to collaborate. I cannot confirm if they are getting our federation at this time. Hopefully in coming Lemmy releases this becomes easier to analyze without needing direct server access to both instances servers.
As you can see, we are trying to juggle several different parameters here to try and provide the best experience we can, with the tools we have at our disposal. You may consider raising an issue on their GitHub about this to try to get more visibility to them from affected users.