Ask Lemmy
A Fediverse community for open-ended, thought provoking questions
Please don't post about US Politics. If you need to do this, try [email protected]
Rules: (interactive)
1) Be nice and; have fun
Doxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them
2) All posts must end with a '?'
This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?
3) No spam
Please do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.
4) NSFW is okay, within reason
Just remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either [email protected] or [email protected].
NSFW comments should be restricted to posts tagged [NSFW].
5) This is not a support community.
It is not a place for 'how do I?', type questions.
If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email [email protected]. For other questions check our partnered communities list, or use the search function.
Reminder: The terms of service apply here too.
Partnered Communities:
Logo design credit goes to: tubbadu
view the rest of the comments
There are multiple reasons depending on who you ask and the specific instance:
The only information that actually gets federated to other servers is public information that is globally visible anyway. Fediverse servers don't (or at least SHOULDN'T) trust each other.
It's not actually that hard to index and store the information, especially if you just want textual post data - Mastodon at least can serve you an easy to parse version of a user's posts if you request it. Sure you need to poll for the information rather than it just being sent to you, but I think if they were motivated enough they could do it.
Easy to spin up a scrapper server that isn't threads to collect data.
Yeah, I was gonna say...as things stand, the privacy situation on the Threadiverse is in many respects weaker than on, say, Reddit. Yeah, you get to choose the third-party app that may live on a phone, or the Web client, and your instances only directly pushes some data out via federation.
However, if you're on the Threadiverse, then you have no idea what a given Threadiverse instance out there pulling in federated data is storing. You don't know how secure your instances is, even if your instance admin has the best intentions. Unless your instance is whitelisting a very limited set of trusted instances or isn't federating at all and is private, treating anything you put out there as basically accessible to every organization and company is probably a good idea.
Your own instance may not retain deleted (including by mods or admins) or edited comments, but it's a good bet that if someone else's instance isn't yet, they will, and they'll permit recovering them. There were people doing this on Reddit via pushshift.io.
It's probably possible to have people analyzing comment activity to detect where someone's instance is, based on time-of-day and holiday and so forth activity; people had several sites doing this for Reddit.
And it's probably not that hard to obtain a user's IP address, so either you want to be okay with what you're posting maybe being linked to your IP or avoid having a persistent IP, like, via use of a VPN or something. Probably possible for someone to at least roughly geolocate an IP. Might be possible to correlate it with other logs; if someone, for example, has access to someone's Steam login history and can link that to an identity and can link both to an IP address at different times, they can probably deanonymize a user.
There are also text classifiers that can run on comments, extract things like someone's likely gender and anything else that you've trained a statistical text classifier on a large-enough corpus. Probably can get at least approximate age, and I've seen classifiers that aim at identifying roughly where someone lives. Some famous examples of deanonymization via text:
Robert Hanssen, a very serious mole in the FBI, was caught after he used the phrase "the purple-pissing Japanese", which was a quote from General George Patton, in an anonymous context, and someone had heard him use it once before (not a computer, just humans managed to pull this off). It's probably possible to cross-correlate unusual phrases across identities; it doesn't take many to form a unique signature.
The Federalist Papers were an important set of documents written under the pseudonym "Publius" by several major Founding Fathers in the US -- Alexander Hamilton, James Madison, and John Jay. They argued for the ratification of the US Constitution. Some centuries later, computer-based Bayesian statistical analysis became practical, and it became possible to deanonymize most of the articles -- train a classifier on their known works, then run it on their anonymous works, and get an estimate with confidence level as to the identity of the author. That was pretty nifty from a historian's standpoint, but it's worth considering that the same technique is also viable today to deanonymize people.
With Reddit or similar, Reddit's probably gonna data-mine what they can and may sell it to some parties, but they also probably won't be directly feeding it to random unsavory person, though it may wind up in their hands.
There are probably a couple of good ways that lemmy/kbin could legitimately improve privacy.
I don't know what the logging situation is today, but having the option for an admin to bound log retention time might be a good idea; retaining enough for abuse and debugging, but not leaving a lot of data around in case someone breaks in and swipes 'em. You still need to trust your instance admin, and the lemmy/kbin software, but at least it's possible for an admin to bound what gets swiped if someone breaks in.
Not allowing remote images in comments, which is presently permitted; as I point out above, that's going to let user IP addresses be extracted by parties other than their instance. At least give the user the option to block them, and have home instances maybe have an option to cache them and serve them locally...that'll create its own storage and bandwidth concerns, but one can at least imagine heuristics to deal with that.
Having some form of public/private key authentication -- like, I can upload a pubkey to an account -- to permit someone to prove that they are who they say they are in the event of later instance compromise.
Thank you for answering my question