datahoarder
Who are we?
We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
We are one. We are legion. And we're trying really hard not to forget.
-- 5-4-3-2-1-bang from this thread
view the rest of the comments
This will happen in any case unless you had enough capacity for redundancy.
What is in this 4TB drive? A Linux installation? A bunch of user data? Both? What kind of data?
The first step to this is to separate your concerns. If you had e.g. a 20GiB Linux install, 10GiB of loose home files, 1TiB of Movies, 500GiB of photos, 1TiB of games and 500GiB of Music for example, you could back each of those up separately onto separate drives.
Now, it's likely that you'd still have more data of one category than what fits on your largest external drive (movies are a likely candidate).
For this purpose, I use https://git-annex.branchable.com/. It's a beast to get into and set up properly with plenty of footguns attached but it was designed to solve issues like this elegantly.
One of the most important things it does is separate file content from file metadata; making metadata available in all locations ("repos") while data can be present in only a subset, thereby achieving distributed storage. I.e. you could have 4TiB of file contents distributed over a bunch of 500GiB drives but in each one of those repos you'd have the full file tree available (metadata of all files + content of present files) allowing you to manage your files in any place without having all the contents present (or even any). It's quite magical.
Once configured properly, you can simply attach a drive, clone the git repo onto it and then run a
git annex sync --content
and it'll fill that drive up with as much content as it can or until each "file"'snumcopies
or other configured constraints are reached.