this post was submitted on 04 Aug 2023
89 points (98.9% liked)

Fediverse

27910 readers
5 users here now

A community to talk about the Fediverse and all it's related services using ActivityPub (Mastodon, Lemmy, KBin, etc).

If you wanted to get help with moderating your own community then head over to [email protected]!

Rules

Learn more at these websites: Join The Fediverse Wiki, Fediverse.info, Wikipedia Page, The Federation Info (Stats), FediDB (Stats), Sub Rehab (Reddit Migration), Search Lemmy

founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 6 points 1 year ago* (last edited 1 year ago) (4 children)

Would it be better if your backend federates with popular instances so they push new posts and comments to your search engine? That way, you don't need to scrape those instances to index their contents (because they'll voluntarily send the contents to you via activitypub).

[–] [email protected] 8 points 1 year ago (3 children)

Yep that's the new idea. The sad part is that with this method there's no way to get historical data. Only new posts. So if a server goes down, gets DDOSd etc... I'll lose posts forever.

Also building an ActivityPub implementation from scratch isn't trivial either. So that'll take some time.

I've got a few other ideas I'm playing with as well. Like just assuming that internal post IDs are all sequential and literally fetching them one by one. Or maybe some combination of both?

[–] Die4Ever 5 points 1 year ago* (last edited 1 year ago) (1 children)

Instead of building a new ActivityPub implementation, you could just run a regular instance of Lemmy and pull data from its database directly? Or use its API for searches?

[–] [email protected] 6 points 1 year ago

I was using it's APIs. But new restrictions have effectively been put in place that prevent me from using them for what I need. Similar API calls were being made that were causing DDOS attacks on lemmy.world.

As for running a lemmy instance itself. That's a thought but I need the data in a different format to do efficient searches. It's a tricky problem.

load more comments (1 replies)
load more comments (1 replies)