this post was submitted on 23 Jan 2024

536 points (91.7% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54476 readers

770 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others

Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):

💰 Please help cover server costs.


Ko-fi	Liberapay

founded 1 year ago

MODERATORS

[email protected]

536

it sure beats having to buy it, but seriously come on... (i.imgur.com)

submitted 9 months ago by [email protected] to c/[email protected]

30 comments fedilink hide all child comments

not being able to ctrl-F a textbook or have click-to-chapter links sure makes studying harder these days... and any scanning software worth it's salt will at least do the bare minimum OCR automatically...

all 31 comments

sorted by: hot top controversial new old

[–] [email protected] 120 points 9 months ago* (last edited 9 months ago)

I much prefer doing ocr by myself if really needed, than getting an half assed "book" full of typos and broken tables just because someone did an automated OCR but didn't have the 5-6 hours required to manually edit to make it decent

Already be thankful that someone took the time to flip page by page in their scanner manually and upload it somewhere

[–] [email protected] 87 points 9 months ago

I hope this sentiment never stops someone from uploading a textbook without OCR. Once it's scanned it can always be OCRed at a later time.

[–] [email protected] 78 points 9 months ago (1 children)

Look, it's all about authorial intent - if the author had wanted their book to be easy to reference or accessible to people who use screen readers, they would have published a DRM free PDF in the first place. Gotta respect the artist's vision.

[–] [email protected] 10 points 9 months ago (1 children)

...and sometimes the artist turns out to be an idiot :D

[–] [email protected] 2 points 9 months ago

Or the professor who’s profiting off requiring the latest edition of their own book each year.

[–] [email protected] 70 points 9 months ago* (last edited 9 months ago) (1 children)

You can do it yourself:

https://ocrmypdf.readthedocs.io/

[–] [email protected] 3 points 9 months ago

There are so many:

https://stirlingtools.com/ https://docs.paperless-ngx.com/ https://www.openpaper.work/en/

[–] [email protected] 68 points 9 months ago (2 children)

Bitch you can't ctrl-F or click to chapter in an actual book either.

[–] JackbyDev 61 points 9 months ago

Wait until OP hears about the Index at the back of the book.

[–] [email protected] 18 points 9 months ago

I know, that's my point! PDF's are inherently superior BECAUSE you can usually CTRL-F them.

[–] [email protected] 56 points 9 months ago (2 children)

Simple: pirate adobe acrobat and ocr them yourself.

[–] [email protected] 42 points 9 months ago

Or OCRMyPDF https://github.com/ocrmypdf/OCRmyPDF

[–] [email protected] 16 points 9 months ago

I might just do that and reupload the OCR'd copy. I already have 3 or 4 books that I've saved out to cut the binding off of and scan in- gonna need OCR for that too.

In my free time, of course. University waits for no student...

[–] [email protected] 41 points 9 months ago

OCR'ing a book before uploading saves so much hours on the user end of things. I wish it were done more so I don't have to leave my computer running overnight to batch OCR stuff.

[–] [email protected] 41 points 9 months ago (1 children)

By the time you finished making this snarky meme, you could've set up a program to OCR a book yourself.

[–] [email protected] 2 points 9 months ago

'A' yes, but the more scan pix you get, the annoyter you get

[–] [email protected] 36 points 9 months ago

Sites like Anna's library should permit users to flag books without OCR and permit users to submit OCR version of the books.

[–] [email protected] 34 points 9 months ago* (last edited 9 months ago)

Be the change you wish to see in the world.

https://library.bz/main/upload/ anonymous username genesis password upload

[–] [email protected] 17 points 9 months ago (1 children)

https://projectnaptha.com/

[–] [email protected] 11 points 9 months ago (1 children)

Very impressive! That's a bummer that you need Chrome to make it happen though :/

[–] [email protected] 1 points 9 months ago

It will still work on PDFs loaded in Chrome to be fair though.

[–] [email protected] 16 points 9 months ago (2 children)

There are a bunch of online tools that are free and let you upload a PDF to have it go through OCR.

Just Google "Free PDF OCR" and click through all the ads to upload, then give them a temporary email address to get a download link to the finished product.

Hot tip: There are free temporary email address sites too, if you need one to avoid getting on their ad lists.

[–] [email protected] 11 points 9 months ago* (last edited 9 months ago)

List of free temorary email solutions.

https://www.guerrillamail.com/

https://10minutemail.com/

https://addy.io/ - this one is slightly different

and about a billion similar ones.

[–] [email protected] 2 points 9 months ago

If you have a jpg or png file, you can upload it to Google drive, then right click and open in Google docs, and it will OCR the text for you.

[–] [email protected] 5 points 9 months ago

Use the index

[–] [email protected] 4 points 9 months ago (1 children)

OCR?

[–] [email protected] 14 points 9 months ago

Optical Character Recognition. Essentially, software that "reads" an image and pulls text out of it.