Hard disks (rotating magnetic media in general, really) are pretty terrible. Capacity is great (note that I'm writing this in 2011, when 3 TB disks are available), but they have poor built-in reliability mechanisms. Most drives have a checksum covering blocks in the disk, but the media itself can (as it ages) silently corrupt bits. If enough bits in a block flip, the failure scenarios are rather bad:
To further complicate matters, data corruption can be silent. Unless all data on the disk is read periodically, there's no guarantee that data is still accessible. Worse, since there's a chance of silent data corruption (which isn't guaranteed to generate an IO error on read), there's no guarantee a corrupted file won't be read into your next full backup (you're doing something for backups, right?) - you could discover the problem months, if not years, after the corruption occurred, far to late to do anything about it.
Really, the best solution to this is to use ZFS on all your disks. All data
blocks are checksummed on all read and write options, and the filesystem can
be "scrubbed" on demand. Unfortunately, ZFS isn't available on all platforms
(yet). To combat this, I wrote fiv.
fiv is designed to recursively scan through all files on a
partition (excluding those matching optional patterns) and perform an md5
checksum of each file. Checksums are saved in an external database, which
is itself checksummed. The first pass collects checksums; the second pass
will either update the checksum (if the file has been modified), or verify
the existing checksum. If any checksum doesn't match (including the
database checksum), an error is reported.
Obviously, running fiv across all your storage nightly would
become untenable at any real scale. To combat this, fiv can also
be run with an optional maximum run size. When this pre-set amount of data
has been scanned (rounded up to the end of the file currently being processed),
fiv will save its position and exit. fiv resumes
where it left off on the next run - which is basically the behavior I wanted
for something I'd drop in a cron job.
If this sounds useful to you, feel free to give it a try. fiv
requires python, so I hope you have that installed. fiv --help,
or the man page, should provide all the operational details you need.
All code is copyright Mike Shuey, and licensed under GPL version 2.
2012/01/02: Updated with fiv 2.1.3 release (minor update for newer python release, and added a --version option).
| Debian packages | fiv_2.1.3-1_all.deb | |
|---|---|---|
| Debian source packages | fiv_2.1.3-1.dsc | fiv_2.1.3-1.tar.gz |