this post was submitted on 10 Feb 2025
16 points (100.0% liked)

datahoarder

7202 readers
1 users here now

Who are we?

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

We are one. We are legion. And we're trying really hard not to forget.

-- 5-4-3-2-1-bang from this thread

founded 5 years ago
MODERATORS
 

cross-posted from: https://lemmy.dbzer0.com/post/37424352

I have been lurking on this community for a while now and have really enjoyed the informational and instructional posts but a topic I don't see come up very often is scaling and hoarding. Currently, I have a 20TB server which I am rapidly filling and most posts talking about expanding recommend simply buying larger drives and slotting them in to a single machine. This definitely is the easiest way to expand, but seems like it would get you to about 100TB before you cant reasonably do that anymore. So how do you set up 100TB+ networks with multiple servers?

My main concern is that currently all my services are dockerized on a single machine running Ubuntu, which works extremely well. It is space efficient with hardlinking and I can still seed back everything. From different posts I've read, it seems like as people scale they either give up on hardlinks and then eat up a lot of their storage with copying files or they eventually delete their seeds and just keep the content. Does the Arr suite and Qbit allow dynamically selecting servers based on available space? Or are there other ways to solve these issues with additional tools? How do you guys set up large systems and what recommendations would you make? Any advice is appreciated from hardware to software!

Also, huge shout out to Saik0 from this thread: https://lemmy.dbzer0.com/post/24219297 I learned a ton from his post, but it seemed like the tip of the iceberg!

top 1 comments
sorted by: hot top controversial new old
[–] m0unt4ine3r 2 points 1 week ago

I don't use the Arr suite or Qbit (nor do I really torrent that much) so I can't speak on the second part, but for scaling I use Ceph. I currently have about 95 TiB across 3 machines and based on my experience, scaling it up further (ie adding more to a machine or adding new machines) is fairly straightforward and relatively simple. That said, I have my cluster set up to make 1 copy of data across each machine and have a few TiB reserved for metadata so I only have about 29 TiB for unique object storage but that kind of setup isn't strictly necessary. You could set up your own cluster such that there's no redundancy and utilize most of your available storage for unique objects (a relatively small portion of it will still need to go to metadata if you want to set up a Ceph filesystem, though, but that also isn't necessary).