this post was submitted on 08 Nov 2023
1 points (100.0% liked)

Data Hoarder

0 readers
3 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 

Hi,

reading here and there I'm getting scared and scared about bit rot or similar problems (firmware error? ssd nand ruined?): the disks seems to work fine, but data might be corrupted and so (my) easily backup strategy - that includes two rsynced copies in different places and time, one on ssd and another on hdd - cannot be enough. Please note that all my personal data are on ext4 filesystems and they are less than 1 TB (ok, it's not a datahoarding size bu this is a sub where theme-related experts are). Maybe the probability is low, and the probability that a critical file is impacted is lowest, but you know Murphy? I do.

Now, the gold solution should be to replace all of my physical servers with others that support ECC ram; then I'll have to buy at least 3 CMR-disks for building a ZFS raid or a btrfs similar one. Actually this solution is not sustainable because of time, space and cost: so I have to accept the risk to a second best solution... but which? I also would like to avoid the use of other (just optical) media type.

For example, using a backup tool - restic/kopia or proxmox backup server - might riduce the risk? I say so because of an incremental approach might allow me to restore data at selected point in the past. Of course, I have no way to find that point in the past and, moreover, i will lost all data produced after the time point. Maybe I could apply this strategy just to a subset of very critical and immutable data (official documents)? Or, for these documents, I could just use rsync with the checksum option?

As usual, thanks for any suggestion!

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 0 points 1 year ago (2 children)

Backups with versioning should solve the issue as soon as you would be able to identify which data got corrupted.

[–] [email protected] 1 points 1 year ago (1 children)

And there is a best practise to do so ... ?

[–] [email protected] 1 points 1 year ago

Best practices of making backups?