I'm not going to go into details - they are long, torturous and (of course) occurred very early in the morning - as they always did.
But, rsync.net used to run - exclusively - on UFS2 and, circa 2009/2010 we were pushing UFS2 to the very limits in terms of disk size, number of inodes per filesystem, size of memory used to fsck, etc.[1]
Several times we came very, very close to destroying a filesystem and losing customer data.
What made all the difference, every time, was:
- don't panic
- postulate what is happening
- do something non destructive that will prove or disprove the theory
- possibly repeat
Go slow and don't take any steps until you have proven that you understand the problem.
ALSO, when working with filesystems remember that you can run a filesystem, non-destructively, in read-only mode. There are a lot of investigations that can be done on a filesystem in either explicitly read-only mode or with tools (like fsck) set to run no-write, or non-destructive.
Your first steps should always be read-only ...
[1] I asked Kirk M., in 2010, what the long term solution to our problems with UFS was and he said "migrate to ZFS".
> There are a lot of investigations that can be done on a filesystem in either explicitly read-only mode or with tools (like fsck) set to run no-write, or non-destructive.
Some years ago (but not so long ago that I shouldn't have known better...), I wiped out by mistake my home directory were I had a few days of unpushed changes (yes, second mistake) to an important project I was working on.
When I realized what I had done, I immediately remounted read-only and I managed to preserve the data; a few hours later, ext4-undelete saved the day.
I did something like that recently, deleting a project directory with a mistyped git command intended to preserve the contents for safekeeping, ironically. (That's an interesting lesson in itself).
It was in a VM, so rather than remount read-only, I snapshotted the VM drive, then rebooted to perform recovery, confident I could try out things non-destructively by reverting to the snapshot.
Used ext4magic to undelete, and that tool did a really great job. Some files ended up with contents that should have been in other files, so it isn't perfect. But it was a project with only a few files, so I was able to reconstruct it easily enough because all the contents were at least recovered.
Did something similar for a customer a few years ago who had wiped a significant directory on a live system. That was on a RAID-1 (mirrored).
Rather than remounting read-only, to allow production to continue I detached one of the mirrored drives to preserve it for safekeeping, while allowing operations to continue using the other drive.
ntfsresize also destroyed my partition back then. When I searched it, I found the responsible bug in the bugtracker, closed several months ago, but it never made to the stable Ubuntu release. Fortunately, a manual fix guide was available. I stopped trusting Ubuntu and switched to SystemRescueCD (which is Gentoo based with newer packages) after that incident...
My trust in Ubuntu was greatly reduced when (about five years ago) the installer would put GRUB on the primary drive regardless of where the Linux system was installed, and even when you explicitly told it not to. This was fixed within a few months, but not before it trashed several Windows installations (using whole-disk-encryption so they were not recoverable).
But, rsync.net used to run - exclusively - on UFS2 and, circa 2009/2010 we were pushing UFS2 to the very limits in terms of disk size, number of inodes per filesystem, size of memory used to fsck, etc.[1]
Several times we came very, very close to destroying a filesystem and losing customer data.
What made all the difference, every time, was:
- don't panic
- postulate what is happening
- do something non destructive that will prove or disprove the theory
- possibly repeat
Go slow and don't take any steps until you have proven that you understand the problem.
ALSO, when working with filesystems remember that you can run a filesystem, non-destructively, in read-only mode. There are a lot of investigations that can be done on a filesystem in either explicitly read-only mode or with tools (like fsck) set to run no-write, or non-destructive.
Your first steps should always be read-only ...
[1] I asked Kirk M., in 2010, what the long term solution to our problems with UFS was and he said "migrate to ZFS".