Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54565 readers

477 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others

Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):

💰 Please help cover server costs.


Ko-fi	Liberapay

founded 1 year ago

MODERATORS

[email protected]

Website ripper? (lemmy.ml)

submitted 1 year ago by [email protected] to c/[email protected]

24 comments fedilink hide all child comments

I want to rip the contents of a pay website, but I have to log in to their web site on a web page to get access

Does anyone have any good tools for Windows for that?

I'm guessing that any such tools must have a built in browser, or be a browser plugin for it to work.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 4 points 1 year ago (1 children)

I have an account, so that's not a problem. The problem is how to automate going into every little content page and downloading the content, including the hi-res files.

[–] [email protected] 3 points 1 year ago (1 children)

I'm on a Mac and use SiteSucker so I know that's not super helpful but for windows you could try wGet or WebCopy? https://www.cyotek.com/cyotek-webcopy / https://gnuwin32.sourceforge.net/packages/wget.htm

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

Webcopy looks promising if I can get the crawler part of it to work with this site's authentication...

edit: I couldn't get Webcopy's spider to authenticate correctly.

Webcopy uses the deprecated version of Internet Explorer in Windows 10 as a module, and I can log into the website using the Capture Forms browser dialog, but the cookies or whatever else don't translate over to the spider.