this post was submitted on 24 Jan 2024
8 points (66.7% liked)

Python

6413 readers
3 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

๐Ÿ“… Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

๐Ÿ Python project:
๐Ÿ’“ Python Community:
โœจ Python Ecosystem:
๐ŸŒŒ Fediverse
Communities
Projects
Feeds

founded 1 year ago
MODERATORS
 

Hello,

I made a simple script to scraper threads.net using python and selenium. the script is just few lines long and it's easy to understand.

So what this script does?

first it will open edge browser(which you can change it to firefox or chrome). now you have to enter credentials to log into it. your browsing data and credentials will be stored in user_data which you can move around.

It scroll through threads's feed/hashtag/explore and It will store the src of every image it encounters so at the end we will have a links.txt file containing all the links to the images we have encountered.

now we have links.txt and we can use the following command to download all the images from the links.txt

wget -i links.txt

the script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.edge.options import Options
import time

options = Options()
options.add_argument("--user-data-dir=user_data")

driver = webdriver.Edge(options=options)

driver.get('https://threads.net')

s = set()

input("Press any key to continue...")
for i in range(30):
    try:
        elements = driver.find_elements(By.XPATH, "//img")
        for e in elements:
            s.add(e.get_attribute("src"))
        driver.execute_script("window.scrollBy(0, 1000);")
        time.sleep(0.2)
    except:
        print("oopsie")

with open("links.txt", 'w') as f:
    links = list(s)
    for l in links:
        f.write(l+"\n")

driver.quit()

I hope it was usefull :D

Edit: here is a link to links.txt https://0x0.st/HGjx.txt

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 4 points 10 months ago (1 children)

Ok so why do you want all images from threads.net?

[โ€“] [email protected] 2 points 10 months ago

Because this way I can download a lot of wallpaper and anime pictures :D

There are tons of anime and wallpaper on instagram.com.

you can use this script to scrape instagram too! just change the url in driver.get().