this post was submitted on 30 Jan 2024
76 points (100.0% liked)
Linux
48077 readers
762 users here now
From Wikipedia, the free encyclopedia
Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).
Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.
Rules
- Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
- No misinformation
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
Community icon by Alpár-Etele Méder, licensed under CC BY 3.0
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Ouch, that must have been a pain to recover from...
I've had almost the opposite experience to yours funnily. Several years ago my HDDs would drop out at random during heavy write loads, after a while I narrowed down the cause to some dodgy SATA power cables, which sadly I could not replace at the time. Due to the hardware issue I could not scrub the filesystem successfully either. However I managed to recover all my data to a separate BTRFS filesystem, using some "restore" utility that was mentioned in the docs, and to the best of my knowledge all the recovered data was intact.
While that past error required a separate filesystem to perform the recovery, my most recent hardware issue with drives dropping out didn't need any recovery at all - after resolving the hardware issue (a loose power connection) BTRFS pretty much fixed itself during a scheduled scrub and spat out all the repairs in dmesg.
I would suggest enabling some kind of monitoring on BTRFS's counters if you haven't, because the fs will do whatever it can to prevent interruption to operations. In my previous two cases, performance was pretty much unaffected, and I only noticed the hardware problems due to the scheduled scrub & balance taking longer or failing.
Don't run a fsck - BTRFS essentially does this to itself during filesystem operations, such as a scrub or a file read. The provided btrfs check tool (fsck) is for the internal B-tree structure specifically AFAIK, and irreversably modifies the filesystem internally in a way that can cause unrecoverable data loss if the user does not know what they are doing. Instead of running fsck, run a scrub - it's an online operation that can be done while the filesystem is still mounted
DO NOT RUN A SCRUB IF YOU SUSPECT HARDWARE FAILURE.
No seriously. If you are having hardware issues a scrub could make the corruption much worse. You should first make a complete copy of your data and then run btrfs check. Sorry for shouting but it is really important you don't stub a bad disk.