this post was submitted on 30 May 2024
166 points (97.7% liked)

Videos

14114 readers
2 users here now

For sharing interesting videos from around the Web!

Rules

  1. Videos only
  2. Follow the global Mastodon.World rules and the Lemmy.World TOS while posting and commenting.
  3. Link directly to the video source and not for example an embedded video in an article.
  4. Don't be a jerk
  5. No advertising
  6. Avoid clickbait titles. (Tip: Use dearrow)

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 13 points 5 months ago (1 children)

While forensic linguistics is pretty cool, the Unabomber was caught because they released his manifesto and his brother’s wife and brother recognized the unusual phrasing such as ‘Eat your cake and have it too’.

If an author has a large amount of known works then it’s not too difficult to identify other writings by that same author. But if the author does not have a large body of writing that is known to come from that individual, then the best we can do is determine an approximate age and geographic location where the Individual grew up, and that’s only when the unidentified writing is large enough, like in the case of the Unabomber where his manifesto was 30k words.

[–] [email protected] 1 points 5 months ago (1 children)

I did simplify the whole thing, as you noticed; but note that his SIL and brother identifying him is another example of the same process, David knew that expressions that Ted used like "cool-headed logicians" were highly unusual, not too unlike what the socio- and forensic linguists did there.

But if the author does not have a large body of writing that is known to come from that individual

Such as a Lemmy or Facebook account? Or any other online account associated with your writing, really; we produce far more text in the internet than ourselves realise.

And while a priori, your different accounts through different websites might look completely disconnected, as you connect two of them as coming from the same person, connecting a third one is easier. And a fourth. So goes on.

A small caveat is that while the corpus is bigger, so is the noise introduced by people from the other side of the world that happen to use the same patterns as the person whom you want to identify. Even then, I believe that the ability to bulk process text to find authorship grew considerably faster than the number of potential matches.

[–] [email protected] 0 points 5 months ago

Additional signal is not noise