All we need is an easy explanation of the problem, so here it is.
I’ve built a RAID 1 array of 2 disks, A and B.
That means that every bit on A is equal to a bit on B. If one disk fails, I can safely retrieve my data from the other disk. But then I started wondering: How true is this?
Let’s say a bit 1 on A reads 0, but 1 on B. How would the RAID controller be able to tell, which one is corrupted and which one is not? Is this based on what the so called “S.M.A.R.T.” technology reports, and is that really worth anything, or would I be just as well of with a non-RAID solution?
I can see why this is not a problem on RAID 5, so I’m planning to upgrade.
How to solve :
RAID 1 or RAID 5 would not protect against the sort of problem you are describing. They are mainly meant to protect against the hardware failure of a single drive (and, therefore, to reduce system downtime). With RAID 5, the parity information is not used until the failure of a drive is detected.
Although quite rare, bits can seemingly randomly change state due to a variety of causes – it’s called bit rot. To protect against bit rot you can:
- Add further redundancy, e.g., by using RAID 6, combined with regular data integrity checks.
- Use a file system which actively checks for data integrity, such as ZFS. By using ZFS with RAID-Z1 (single-drive redundancy), when reading any bit that randomly “flipped”, the error will be detected because the calculated checksum does not match the stored checksum. Then, where possible, ZFS will automatically correct the error using parity information.
RAID1 is not a backup solution at all. What RAID1 does is to protect you from a single-drive failure. That’s all. Well, okay, it also speeds up your read speeds a little. But it’s not a backup solution. If you delete a file, it’s deleted from both drives. If you format your RAID1, both drives are formatted. If your files are infected with a virus, you can’t recover. That’s why RAID1 is not a backup solution.
To answer your other question, if the data is mismatched on the drives, there’s no way to tell which is correct. However, the odds of this are perhaps not as high as you may think. See, for example, Wikipedia’s section on error handling on modern hard drives.
It’s not impossible to add additional error-detection and error-correction, but that is not typically done at the level of the RAID controller. Some file systems such as ZFS add additional protection for your data integrity.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂