datahoarder

6716 readers
23 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 4 years ago
MODERATORS
76
77
 
 

I have scraped a lot of links from instagram and threads using selenium python. It was a good learning experience. I will be running that script for few days more and will see how many more media links I can scrape from instagram and threads.

However, the problem is that the media isn't tagged so we don't know what type of media it is. I wonder if there is an AI or something that can categorize this random media links to an organized list.

if you want to download all the media from the links you can run the following command:

# This command will download file with all the links
wget -O links.txt https://gist.githubusercontent.com/Ghodawalaaman/f331d95550f64afac67a6b2a68903bf7/raw/7cc4cc57cdf5ab8aef6471c9407585315ca9d628/gistfile1.txt
# This command will actually download the media from the links file we got from the above command 
wget -i links1.txt

I was thinking about storing all of these. there is two ways of storing these. the first one is to just store the links.txt file and download the content when needed or we can download the content from the links save it to a hard drive. the second method will consume more space, so the first method is good imo.

I hope it was something you like :)

78
79
 
 

I've got a fairly new 14tb Seagate Expansion. It works fine, and I've been using it for a month and a bit.

I don't know how long it's been doing this, but the power supply is making a very faint alarm sound. The power supply is plugged into a Belkin surge protector powered on and with the "protected" status light lit, and it is plugged into an outlet. The HDD is currently not plugged in to a computer.

It's not a beep or electricity. It's a distinct weewooweewoo. I couldn't even determine the source until I pressed my ear against it.

Googling just points me towards typical "my HDD is making a sound, how long do I have until it dies", but nothing pointed me to the alarm sound from the power supply.

I'll check again if it makes the alarm in other conditions, but in the meanwhile, I was hoping someone here might know something.

Thanks in advance!

EDIT: The sound only happens when...

  • Power adapter is plugged into the HDD, AND the outlet
  • HDD is NOT plugged into the computer.

Plugging it into the computer stops the noise from the power adapter.

80
 
 

It seems like 6 or 7 years ago there was research into new forms of storage, using crystals or DNA that promised ultra high density storage. I know the read/write speed was not very fast, but I thought by now there would be more progress in the area. Apparently in 2021 there was a team that got a 16GB file stored in DNA. In the last month there's some company (Biomemory) that lets you store 1KB of data into DNA for $1,000, but if you want to read it, you have to send it to them. I don't understand why you would use that today.

I wonder if it will ever be viable for us to have DNA readers/writers... but I also wonder if there are other new types of data storage coming up that might be just as good.

If you know anything about the DNA research or other new storage forms, what do you think is the most promising one?

81
15
submitted 10 months ago* (last edited 10 months ago) by [email protected] to c/[email protected]
 
 

Sorry for not doing much research beforehand and asking a newbee question. I am looking for some entrypoint info to the question:

How would one go about datahoarding lemmy?

It seems to be a grade above what I've been doing so far (downloading video/audio from streaming platforms and backing up web articles and blogposts as pdfs) due to the distributed nature and the activitypub protocol.


Relevant stuff that I've found so far but havent studied extensively:

  1. This does not seem to store most of the data https://github.com/tgxn/lemmy-explorer
82
 
 

So I have a nearly full 4 TB hard drive in my server that I want to make an offline backup of. However, the only spare hard drives I have are a few 500 GB and 1 TB ones, so the entire contents will not fit all at once, but I do have enough total space for it. I also only have one USB hard drive dock so I can only plug in one hard drive at a time, and in any case I don't want to do any sort of RAID 0 or striping because the hard drives are old and I don't want a single one of them failing to make the entire backup unrecoverable.

I could just play digital Tetris and just manually copy over individual directories to each smaller drive until they fill up while mentally keeping track of which directories still need to be copied when I change drives, but I'm hoping for a more automatic and less error prone way. Ideally, I'd want something that can automatically begin copying the entire contents of a given drive or directory to a drive that isn't big enough to fit everything, automatically round down to the last file that will fit in its entirety (I don't want to split files between drives), and then wait for me to unplug the first drive and plug in another drive and specify a new mount point before continuing to copy the remaining files, using as many drives as necessary to copy everything.

Does anyone know of something that can accomplish all of this on a Linux system?

83
 
 

Hello c/datahoarder! I need your help. Not sure whether this has been asked before—I've tried searching the web, but the only advice I can find is how to download episodes for podcasts whose feeds are still active.

The problem I'm trying to solve is that one of my favorite podcasts, Endless Boundaries Jam Radio, went offline during the pandemic. All the usual feed aggregators still show up in internet searches, but as they are not file hosts, just feed aggregators, all the episodes are now dead links (e.g. Podbay, Tunein, etc).

Thing is, I had already downloaded several episodes using the Playapod app on my iPhone. It's usable for now, but I'm very concerned about when I need to upgrade to a new phone.

Is there a trick for access the individual files on my iPhone that were downloaded through a third party app such as Playapod? TIA

EDIT: I figured out how to do what I wanted. Once I had installed ifuse and related dependencies (e.g. libimobiledevice) on my Linux PC, I could connect my iPhone to my PC via USB and browse the files on my iPhone in my distro's default file browser. Many folders are named as GUIDs, making it harder to tell what's what by just looking at their names, but I narrowed down the right folder by opening up the Disk Usage Analyzer app in Linux. In my case, the Playapod app is one of very few apps with more than a gigabyte of data. I still have to go through and figure out which episode each mp3 file is, but that's still better than having nothing at all.

Thanks to everyone who responded. I hope this info helps anyone else in a similar predicament!

84
 
 

Hey guys, I'm setting up my NAS (openmediavault) and very much enjoying it! It now runs my Nextcloud and a couple of services. I got a mirror ZFS setup of two 8TB drives.

I got another two 8TB drives and am doubting whether I should add them as an extra mirror vdev, or create a new pool for extra backup. I'm not sure if that extra backup is necessary though, since I got a cloud backup already every day. My drives are only used 14% so I'm not even sure if I should already put them in the pool. What do you guys think?

85
27
submitted 11 months ago* (last edited 9 months ago) by [email protected] to c/[email protected]
86
4
submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/[email protected]
 
 

Just wondering if anyone knows which SAS connectors on the SAS826A backplane control which ports?

On my current setup only ports 8-11 are working so got some troubleshooting ahead of me.

The online manuals show the connectors but unhelpfullyndont indicate which ports are being used for each.

Also, anyone know what the ribbon cable beside the SAS wires is used for on supermicro cables? I don’t recall seeing it on other SAS cables.

87
 
 

I've recently aquired the hardware to build a home server/NAS. I'd love to know some community-guided advice on tools I should consider, and what best practices are?

For instance, how does redundancy work? Whay about automated backups? What OS should be running on a NAS? What utilities can I use to monitor the safety of my data? Perhaps even a guide about how to safely share that data outside my home network for personal use, or even open for the internet, without compromising my network?

Thanks for the discussion

88
 
 
89
32
submitted 11 months ago* (last edited 11 months ago) by [email protected] to c/[email protected]
 
 

Well I'm just about fed up with streaming bullshit. I currently have a home server that's just a raspberry pi4 with a bunch of docker containers and it served my light usage well.

But with transcoding on Jellyfin I'll be needing some more power. And a bunch of storage. So wanting to perhaps build a new little server.

CPU requirements aren't high at all. Need to transcode maybe 2 concurrent 4K streams, A cheap discrete GPU or a CPU with a decent enough iGPU could handle this. Other applications are basically negligible, like Vaultwarden and PiHole, torrent, using as a general file storage server.

I also recently acquired a mini PC which is plenty powerful, but doesn't have any way of adding a bunch of drives. So another option is setting up a pure NAS and just using the mini PC as the server. It's got an i7 10700T and iris 630 iGPU.

I've been using Linux and self hosting basic things for years, but I'm pretty new to this level of hardware and little experience with RAID.

Budget: ~$500ish - storage goal: 12+ TB

90
 
 

I have about 100gb and growing that is critical for my business. File size growth is slow, so it will be years and years before it even gets to 200gb.

I have multiple local copies and a copy in google drive, but I want to leave a hard drive at my mother-in-law’s house.

I only want 2.5 form factor or smaller as my mother-in-law will be carrying it here when she comes to visit us on the city.

I’m not sure what the recommendation is. I’m not a millionaire, I’m just freelance. So, I’d like to minimize cost.

91
 
 

XeNTaX forums and wiki have shut down and archival attempts have been suppressed. But for good reason!

92
 
 

I admit they were way too cheap for what they are (like 15% cheaper than same-size Ironwolf), so I gambled it haha there were no indications that these drives were OEM or similar.

Back to issue at hand: since I can't personally have the five years warranty on these, only the original purchaser can, and I have no way to know who they are and when they bought them, I should just return them, right? And maybe buy the next ones only from authorized sellers?

edit: also, now that I think about it, and before I make the same mistake twice, there's no way I can get enterprise drives as a normal consumer, can I, at least not brand new? I expect any enterprise drives I can find will have the same issue, i.e. bought by someone else for servers or similar, and then resold, correct?

edit 2: actually WD sells enterprise drives on their website, so my previous assumption about it was wrong

93
 
 

What do you think of dual actuator hard drives? I never knew these even existed...

Here's a quick summary of the vid for those who want a TL;DW:

  • Dual actuator drives are a single drive with two actuator arms inside
  • These arms have their own platters, each with access to half of the drive's capacity
  • The SAS version shows up as two separate drives: one for each actuator
  • The SATA version shows up as a single drive, however can be partitioned at a specific LBA near the middle to use both actuators independently
  • Linux kernel updated to support these drives better when queuing commands
  • Capable of saturating a 5gbit SATA link

Personally, my concern is RAID setups, particularly in a SAS config. Will filesystems like ZFS and BTRFS know that two storage devices are the same physical drive... aside from that, and concern about more mechanical parts, this looks exciting especially for sequential speed throughput!

EDIT: fix typos

94
 
 

cross-posted from: https://feddit.uk/post/4478496

Veteran film collector John Franklin believes the answer is for the BBC to announce an immediate general amnesty on missing film footage.

This would reassure British amateur collectors that their private archives will not be confiscated if they come forward and that they will be safe from prosecution for having stored stolen BBC property, something several fear.

“Some of these collectors are terrified,” said Franklin, who knows the location of the two missing Doctor Who episodes, along with several other newly discovered TV treasures, including an episode of the The Basil Brush Show, the second to be unearthed this autumn. “We now need to catalogue and save the significant television shows that are out there. If we are not careful they will eventually be dumped again in house clearances, because a lot of the owners of these important collections are now in their 80s and are very wary,” he added.

Discarded TV film was secretly salvaged from bins and skips by staff and contractors who worked at the BBC between 1967 and 1978, when the corporation had a policy of throwing out old reels. And Hartnell’s Doctor Who episodes were far from the only ones to go. Many popular shows were lost and other Doctor Who adventures starring Patrick Troughton and Jon Pertwee were either jettisoned or erased. A missing early episode of the long-running sitcom Sykes, starring Eric Sykes and Hattie Jacques, has also been rediscovered in private hands in the last few weeks.

...

The BBC said it was ready to talk to anyone with lost episodes. “We welcome members of the public contacting us regarding programmes they believe are lost archive recordings, and are happy to work with them to restore lost or missing programmes to the BBC archives,” it said.

Whether this will be enough to prompt nervous collectors to come forward is doubtful. While collectors are in no real danger, the infamous arrest of comedian Bob Monkhouse in 1978 has not been forgotten, Franklin suspects: “Monkhouse was a private collector and was accused of pirating videos. He even had some of his archive seized. Sadly people still believe they could have their films confiscated.”

95
 
 

Apologies if this isn't the right community to ask in, I figured folks who use them would know, but if there is a better place to ask, please let me know!

So - I need to buy some external storage, and looking at prices and with the long run in mind, I was leaning towards a 18TB WD Elements or 16TB SEAGATE Expansion, but after looking at the reviews I am concerned about the noise levels, since I have sensory processing disorder, making me really sensitive to noise and this is going right by my bed and will hopefully function as an active backup, so will be always on.

Apparently there's a thing called Preventative Wear Levelling which will cause the drive to rev up every few seconds, and all drives (HDD anyway) do it, but it's the size of the drive that affects how much sound it makes, is that right?

If that's the case, is there a size drive where the noise becomes noticeable that I should stay under, or is it a case of trial and error and "my mileage may vary", and actually any drive could end up being noisy?

Alternatively, is there a quiet but affordable (in terms of ££ per TB, I wouldn't buy a significantly smaller SSD for the same price as those I mentioned, for example) alternative?

TIA

96
 
 

it would be great if the downloader isnt command line based and can list all the VODs of the channel at once after typing in the channel username/ link (but not too necessary). TYSM <3

P.S - I don't wanna watch but directly download the VOD (i am aware of the TwitchNoSub extension)

97
 
 

I've been seeding many Foss things for years but for some reason, people keep downloading Ubuntu versions that are more than 3 years old.

Any ideas why there is always someone downloading the ancient stuff, especially Ubuntu?

98
 
 

I bought a 15.36TB SSD SAMSUNG PM1633A SAS MZ-ILS15TA DELL EMC MZ1LS15THMLS-000D4

I am trying to figure out what to buy in order to connect it to my desktop PC via PCIE. Is this a viable or recommended solution?

SFF-8643 to SFF-8639 cable

Dell LSI 9311-8i 8-port Internal 12G SAS PCle x8 Host Bus RAID Adapter 3YDX4

99
 
 

I have an old computer that I use for storing and streaming my media. It has an attached external drive. I would like to increase my storage and build something that could be extensible to at least 100TB. I am not worried about backup.

I looked and I think I need a HDD rack or enclosure. Some people gave me links to good deals on ebay and some other sellers but they are based on the US and shipping fees are high. I saw this HDD enclosure and it seems to be what I am searching but I don't if they are good.

Do you have some advices for me?

100
 
 

I'm trying to archive all the images from a website, this one specifically: https://stevegallacci.com/archive/edf
However, when I use a tool like DownThemAll, it just pulls the thumbnails that link to the full image and not the image itself. Dunno if that's because I'm using the software wrong or that's just a limitation of DownThemAll. Is there any way to bulk download the full images without having to do so manually?

view more: ‹ prev next ›