this post was submitted on 28 Jun 2023
12 points (64.3% liked)

Programming

17507 readers
9 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 1 year ago
MODERATORS
12
submitted 1 year ago* (last edited 1 year ago) by doostee to c/programming
 

From a technical and legal standpoint, ignoring ethics and dignity, is there anything preventing us from scripting a scraper that recreates reddit posts in a lemmy instance? Like maybe top 50 posts of the top 20 subreddits, without comments. I think it would help convince people to join, since the major argument for sticking with reddit is that it has more content. Thoughts?

top 16 comments
sorted by: hot top controversial new old
[–] pixelpop3 43 points 1 year ago* (last edited 1 year ago)

I don't like the idea. It seems like those fake websites that scrape stackoverflow and SEO to ruin Google search. Avoiding those sites are among the reasons people type "reddit" into searches. People want authentic interactions and I think mirroring reddit into Fediverse lacks authenticity and undermines its authenticity. Content here should be from people who are here.

If someone wants to assimilate content from reddit into something new and post it here that's good. That means the person is here and can be interacted with.

If someone wants to repost their own content here, that's also fine. They are here to interact with.

I just really think it's a bad idea to deliberately build a ghost town and think people will move in.

[–] [email protected] 35 points 1 year ago (1 children)

That already exists. I personally think we should just continue where most of us left off on Reddit and don't bother about it.

[–] [email protected] 11 points 1 year ago

This is a good point. Also the good stuff will (probably) be manually posted so these tools only duplicate the bad stuff.

[–] [email protected] 23 points 1 year ago* (last edited 1 year ago)

lemmit.online already has this and I blocked the communities of that instance because I don’t want Reddit content…I want Lemmy content.

If you do decide to do it, use a bot specific instance with bot specific communities. Don’t flood existing communities with bot content.

[–] auv_guy 14 points 1 year ago (1 children)

Why should someone switch when the content is the same but you cannot reach the OP? In fact you would need to go to reddit to reach the OP! I think this would drive people back to reddit.

[–] [email protected] 4 points 1 year ago (1 children)

One of Reddit's main pressure points for forcing reopening was that it was "unfair for the users" to keep data hidden and inaccessible. Mirroring all that data takes away some of the leverage. So I can understand the value... we can move and we don't have to worry about Reddit taking its ball and going home or claiming it deserves to be hostile because it is the steward of so much information.

[–] pixelpop3 2 points 1 year ago

Leverage for what purpose? To fix reddit? Let reddit die or not die.

Reddit has always come after mirrors and they will easily get courts to take down the instances. Don't forget that prior to the API change they came after pushshift.

Additionally, anyone mirroring reddit on the moral basis that the content is owned by the creators and reddit is an exploitative rentseeker, has an obligation to not become a rentseeker themselves. This means things like ensuring that content that users voluntarily delete is also deleted in the mirrors. Reddit in fact had a large battle with pushshift about this years ago such that pushshift supposedly now only keeps history of moderator and admin edits. I agree with that ethically.

And in many cases you may be legally required to do this. To be clear Reddit made pushshift change to respecting user delete requests because of legal exposure and compliance risks.

Not to mention that you don't really know that anyone intends their content to be mirrored on sites they do not use. Particularly now that Reddit seems to be forcing private subreddits to be open. There's no moral high ground for doing this.

[–] [email protected] 13 points 1 year ago

Not directly/automatically, no. But I personally don't see anything wrong with the same article being reported on HackerNews, Slashdot, Reddit and Lemmy; that's just similar sites doing similar sitey things.

[–] lowleveldata 13 points 1 year ago

IMO it's better to do that manually with hand selected posts so that it's less spammy

[–] [email protected] 11 points 1 year ago (1 children)

There is value in real people selecting what to post on a link aggregator like lemmy/reddit/... .
I don't want to loose that human feeling, both in posts and comments.
Of course the voting mechanism can do a lot of the heavy lifting, but having a flood of robot posts with a score of one might have a negative effect on good posts getting discovered.

~Hopefully~ ~the~ ~community~ ~will~ ~grow~ ~naturally~ ~to~ ~a~ ~point~ ~where~ ~it~ ~can~ ~satisfy~ ~my~ ~doom-scrolling~ ~addiction.~

[–] [email protected] 9 points 1 year ago (1 children)

Seems like the subscript markdown doesn't work on Jerboa yet.
Apologies to anyone bothered by the tildes. 🙇

[–] RandomDevOpsDude 2 points 1 year ago* (last edited 1 year ago)

~~test~~

Like this ~~test~~

Edit: You said subscript not stikethrough 🤦

[–] [email protected] 7 points 1 year ago

At such scale, a scraper wouldn't be necessary, that's easily doable by humans involved in these communities—with a human touch as well.

[–] [email protected] 5 points 1 year ago

I personally find those scraped posts annoying in other communities. First the formatting looks ugly in the list view. Second it just feels awkward to even interact with such a post, especially if it's a question where the real content is in the comment section on reddit.

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

It would normalize bot submissions, which is bad for a lot of reasons. Not the least, disproportional bot activity is one of the categories used for defederation for instances like lemm.ee.

load more comments
view more: next ›