fiv

Checking your file integrity


Hard disks (rotating magnetic media in general, really) are pretty terrible. Capacity is great (note that I'm writing this in 2011, when 3 TB disks are available), but they have poor built-in reliability mechanisms. Most drives have a checksum covering blocks in the disk, but the media itself can (as it ages) silently corrupt bits. If enough bits in a block flip, the failure scenarios are rather bad:

If that block happened to contain a file, that's one problem, but if it contained part of the filesystem metadata, you may have just lost a whole directory.

To further complicate matters, data corruption can be silent. Unless all data on the disk is read periodically, there's no guarantee that data is still accessible. Worse, since there's a chance of silent data corruption (which isn't guaranteed to generate an IO error on read), there's no guarantee a corrupted file won't be read into your next full backup (you're doing something for backups, right?) - you could discover the problem months, if not years, after the corruption occurred, far to late to do anything about it.

Really, the best solution to this is to use ZFS on all your disks. All data blocks are checksummed on all read and write options, and the filesystem can be "scrubbed" on demand. Unfortunately, ZFS isn't available on all platforms (yet). To combat this, I wrote fiv.

fiv is designed to recursively scan through all files on a partition (excluding those matching optional patterns) and perform an md5 checksum of each file. Checksums are saved in an external database, which is itself checksummed. The first pass collects checksums; the second pass will either update the checksum (if the file has been modified), or verify the existing checksum. If any checksum doesn't match (including the database checksum), an error is reported.

Obviously, running fiv across all your storage nightly would become untenable at any real scale. To combat this, fiv can also be run with an optional maximum run size. When this pre-set amount of data has been scanned (rounded up to the end of the file currently being processed), fiv will save its position and exit. fiv resumes where it left off on the next run - which is basically the behavior I wanted for something I'd drop in a cron job.

If this sounds useful to you, feel free to give it a try. fiv requires python, so I hope you have that installed. fiv --help, or the man page, should provide all the operational details you need.

All code is copyright Mike Shuey, and licensed under GPL version 2.


2012/01/02: Updated with fiv 2.1.3 release (minor update for newer python release, and added a --version option).


Random bits of code

Debian packages fiv_2.1.3-1_all.deb
Debian source packages fiv_2.1.3-1.dsc fiv_2.1.3-1.tar.gz

$Date: 2012/01/02 13:14:37$