this post was submitted on 10 Jun 2023
24 points (96.2% liked)

Lemmy

12524 readers
3 users here now

Everything about Lemmy; bugs, gripes, praises, and advocacy.

For discussion about the lemmy.ml instance, go to [email protected].

founded 4 years ago
MODERATORS
 

Hello! I wrote a simple bot that periodically checks for new reddit posts and posts them to lemmy, so that people migrating from reddit to lemmy can still be able to see their favourite posts, but familiarizing with lemmy.

currently the coments are not synced, but this may change in the future (perhaps)

Yes, it uses the Reddit API, so it will stop working on the 1st of July, but I think that then I can implement a sort of web scraper to access Reddit posts without the official API, so this may eventually keep working for a while.

this script is currently on my laptop so it will be offline most of the time, but if I get the approval I may host it somewhere to get it running 24h/24.

now the question... Is this allowed? having this bot running 24h/24 on large subreddits will mean a very high quantity of posts. will this cause any problem to Lemmy?

if you want a preview check out https://enterprise.lemmy.ml/c/reddit_memes, where I started syncing a few posts from r/memes

let me know your opinion on this!

==== EDIT

here's the bot source code

The bot is now running in https://sh.itjust.works/c/reddit_memes, let's try to see if it work (I hope that shit just works)

I'm a bit concerned about the legality of this, if anyone has any info please tell me!

top 33 comments
sorted by: hot top controversial new old
[–] [email protected] 11 points 1 year ago (3 children)

I would create a separate instance for this, I don't think anybody wants their instance to be flooded with automated posts.

[–] [email protected] 3 points 1 year ago

yes of course!

[–] [email protected] 3 points 1 year ago

100% create a separate instance, and allow anyone to add subreddits to your import list. I think this would be hugely helpful!

OP, you'll probably want to block NSFW content, though.

[–] [email protected] 1 points 1 year ago

Or at least create specific communities, e.g. r/pics to c/reddit-pics

[–] [email protected] 5 points 1 year ago (1 children)

This may be against reddits TOS and may have legal consequences. But I guess mostly for the person operating the bot, and not the Lemmy instance, as long as they block the bot on request.

[–] [email protected] 2 points 1 year ago (2 children)

Do you think this may be "dangerous" for me?

[–] [email protected] 4 points 1 year ago

Given their recent posture and actions, I would think yes it could be an issue for you, for sure. You’d want to check their terms of service as it may violate them. If you’re doing this for fun, add a step in the middle and get ChatGPT to rephrase every post to obfuscate their source :-)

[–] [email protected] 4 points 1 year ago (1 children)

I have no idea, but at least in EU you may be violating copyright by copying someone else's database (i.e. collection of data). I am not a lawyer though...

[–] [email protected] 2 points 1 year ago (1 children)

I Read the terms of service and what I understood is that the intellectual property of the content is the users', not reddit's, but I asked on r/legaladvices for security (there is no equivalent community on lemmy yet)

[–] [email protected] 4 points 1 year ago (1 children)

Yeah database protection is even for stuff that you don't own. You have then spent effort to compile the database. Which in this case is the collection of people's posts. But maybe asks lawyer if that applies in this case.

[–] [email protected] 4 points 1 year ago

Thanks! I'll share any news here on lemmy

[–] [email protected] 5 points 1 year ago (2 children)

Not an opinion, but a question - would syncing content from Reddit potentially overwhelm both Lemmy's servers and its organic content? If posts madenon Lemmy are mixed in with the tidal wave of Reddit material, might that put people off posting on Lemmy at all? Possibly not, just wondering.

[–] [email protected] 5 points 1 year ago (1 children)

I wondered this a bit too and share your concerns. On one hand I like the idea of populating Lemmy with more content, particularly for niche communities, but there's also a good amount of unique posts here that might get drowned out.

[–] [email protected] 2 points 1 year ago

Maybe create a specific community that aggregates all the syncs so it doesn’t drown out the original content from lemmy, dunno not the best idea from me but I am slightly concerned if it just ends up a Reddit clone post for post

[–] [email protected] 1 points 1 year ago (1 children)

This is my question as well, I have no idea

[–] [email protected] 1 points 1 year ago

Something something too much time wondering if they could and not if they should... ;-)

[–] [email protected] 4 points 1 year ago (1 children)

Just don't use the API or you might be 2 million dollars out of pocket in a month's time.

[–] [email protected] 2 points 1 year ago

Not giving a cent to reddit XD

[–] [email protected] 3 points 1 year ago (1 children)

Ha! I have been working on the same thing this weekend, except it uses the rss feed for posts and scrapes old.reddit.com for the details. It's written in python, but not quite finished - scraping works, automation not yet.

My plan was to have a separate Lemmy instance for this, where people can also request for new subs to be included. This would reduce the spam in bigger communities, and allow instances to block it all together if they wanted to.

Beside that, I'd pre- or postfix each post with a message it's a copy and a link to the original for copyright reasons. Moderation would be a separate story - Not particularly looking forward to that. Could make it so that if a post were flagged, it would re-aync with the original. Let reddit do the moderation :D

[–] [email protected] 1 points 1 year ago

Waiting for this - though I think any 'mirror' community recreated from Reddit should just get the reddit feed anyway... it'll be a while before parity, followed by superiority...

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago) (1 children)

I was thinking of something like that too, but probably never would've gotten around to implement it myself, so thanks! Could you put your script on Github or something maybe?

Also it would be great if it could copy at least top-level comments by OP, because that's often used for linking to a source.

[–] [email protected] 3 points 1 year ago

Yes I'll upload the code very soon!

Also it would be great if it could copy at least top-level comments by OP, because that's often used for linking to a source.

Great idea! Thanks!

[–] Die4Ever 2 points 1 year ago (1 children)

maybe you should open source the bot?

it could also have filtering rules like only import posts with a minimum score or only the front page of the sub, could be configurable for each subreddit separately

[–] [email protected] 2 points 1 year ago (1 children)

Yeah I'm literally creating the github repo right now lmao

[–] Die4Ever 2 points 1 year ago (1 children)

another tool that might be good, idk if it already exists, but an RSS feed to Lemmy importer

actually Reddit supports RSS output, so you might already be using that?

[–] [email protected] 2 points 1 year ago (1 children)

will reddit keep RSS support after the 1st of july?

[–] Die4Ever 2 points 1 year ago (1 children)
[–] [email protected] 2 points 1 year ago

we'll see

I edited the original post with the link to the github repo and the community it's running on now

[–] [email protected] 2 points 1 year ago

IANAL but i am pretty sure it is illlegal, copyrights prevents copying and iirc reddit users retain copyrights to what they write and you don't have their approval (in the form of a Term of service) to copy it.

Maybe you could automate that approval (e.g. users could have a pinned post on their profit with a copied message).

There are some non profits for open source that might have full time lawyers , maybe you can contact them.

[–] Die4Ever 1 points 1 year ago* (last edited 1 year ago)
[–] [email protected] 1 points 1 year ago (1 children)

I thought about this yesterday but definitely don't have the skills to make it happen. Tbh I would go ahead with it, but know you will be required to moderate the posts. I wouldn't set it up to post on a community that isn't yours.

[–] [email protected] 2 points 1 year ago (1 children)

No of course I would create a specific community for each subreddit I want to sync

Are you referring to moderating the comments? I was thinking of restricting the post privileges to only the bot

[–] [email protected] 2 points 1 year ago

Naw just the posts, I guess it depends on if the Bot is scraping all posts or just the top ones.