Detecting File Corruption with Checksums
Protect your data from silent corruption using cryptographic checksums for backup verification and integrity monitoring.
What is File Corruption?
File corruption occurs when data is unintentionally modified, making files unusable or incorrect. Common causes:
- -Bit rot: Magnetic decay on hard drives causes random bit flips over time
- -Hardware failures: Failing RAM, disk controllers, or cables introduce errors
- -Software bugs: Application crashes during writes leave partial or corrupted files
- -Power loss: Sudden shutdowns interrupt write operations
- -Network errors: Packet loss or transmission errors during file transfers
The most dangerous corruption is silent-files appear normal but contain incorrect data. A 2013 study found that 8% of disks develop silent corruption within 4 years. Without checksums, you won't know until it's too late.
How Checksums Detect Corruption
A checksum is a unique fingerprint of a file. When the file changes (even by one bit), the checksum changes dramatically:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08
Backup Verification Workflow
Protect your backups with this three-step process:
Step 1: Generate Checksums Before Backup
Create a checksum file for all important files:
find /path/to/data -type f -exec sha256sum {} \; > checksums.txt Get-ChildItem -Recurse -File | ForEach-Object {
Get-FileHash $_.FullName -Algorithm SHA256
} | Export-Csv checksums.csv
Save checksums.txt in a different location than your backup.
If the backup drive fails, you still have the checksums to verify a restore from another source.
Step 2: Perform the Backup
Copy files to your backup destination using your preferred method:
rsync -av --checksum /source/ /backup/
The --checksum flag uses checksums to skip unchanged files
robocopy C:\source D:\backup /MIR /R:3 /W:10
Most cloud backup services (Backblaze, Crashplan, etc.) automatically verify checksums during upload
Step 3: Verify After Backup
Immediately verify the backup completed successfully:
cd /backup sha256sum -c /path/to/checksums.txt
Backup is complete and verified.
Safe to delete originals (if that's your plan).
Corruption detected during backup.
Re-copy failed files and verify again.
Periodic Integrity Checks
Don't wait for a restore to discover corruption. Check your backups regularly:
Monthly Verification
Run checksum verification on your entire backup once per month.
sha256sum -c checksums.txt | grep -v OK
Shows only failed files (empty output = all good)
Automated Monitoring
Set up a cron job (Linux/macOS) or Task Scheduler (Windows) to verify checksums automatically.
0 2 1 * * cd /backup && sha256sum -c checksums.txt | mail -s "Backup Check" you@example.com
Runs on the 1st of each month at 2 AM, emails results
ZFS/Btrfs Scrubbing
Modern filesystems have built-in checksum verification. Enable periodic scrubs:
zpool scrub tank # ZFS btrfs scrub start /mnt/backup # Btrfs
Automatically detects and repairs corruption
Real-World Scenarios
Scenario 1: Photo Archive
You have 10 years of family photos (500GB). Generate checksums, back up to external drive, verify immediately.
Result: 3 files failed verification. Re-copied those files. All photos safe.
Scenario 2: Research Data
Lab generates 2TB of experimental data. Checksums created daily, verified before analysis.
Result: Detected corruption in 1 file after 6 months. Re-ran that experiment instead of publishing bad data.
Scenario 3: Software Distribution
Company distributes software updates. Checksums published on website, users verify downloads.
Result: Users detected corrupted downloads from a CDN node. CDN provider fixed the issue.
Scenario 4: Long-Term Archive
Legal documents stored on tape for 7 years. Checksums verified annually.
Result: Year 5 verification found 12 corrupted files. Restored from redundant backup.
Choosing the Right Algorithm
Best balance of speed and security. Use for all new systems. Collision probability: 1 in 2256 (more atoms than in the universe).
Slower but higher security margin. Use for critical data or long-term archives (10+ years).
Fast but cryptographically broken. OK for detecting accidental corruption, NOT for security. Use only if you must maintain compatibility with old systems.
Modern algorithm, much faster than SHA-256 on modern CPUs. Great for large files but less widely supported by tools.
Tools and Scripts
#!/bin/bash BACKUP_DIR="/mnt/backup" CHECKSUM_FILE="$BACKUP_DIR/checksums.txt" LOG_FILE="$BACKUP_DIR/verify.log" echo "Starting verification: $(date)" >> "$LOG_FILE" cd "$BACKUP_DIR" if sha256sum -c "$CHECKSUM_FILE" >> "$LOG_FILE" 2>&1; then echo "✓ All files verified successfully" | mail -s "Backup OK" admin@example.com else echo "✗ Corruption detected!" | mail -s "BACKUP FAILURE" admin@example.com fi
import hashlib
from pathlib import Path
def verify_file(filepath, expected_hash):
sha256 = hashlib.sha256()
with open(filepath, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b''):
sha256.update(chunk)
return sha256.hexdigest() == expected_hash
# Load checksums and verify each file
with open('checksums.txt') as f:
for line in f:
hash_val, filepath = line.strip().split(maxsplit=1)
if verify_file(filepath, hash_val):
print(f"✓ {filepath}")
else:
print(f"✗ {filepath} CORRUPTED!") Try It Yourself
Practice with our interactive Hash Calculator:
- 1. Go to the Hash Calculator
- 2. Enter some text:
This is my important document - 3. Copy the SHA-256 hash (this is your "checksum")
- 4. Change one character:
This is my important documant - 5. Notice the hash is completely different → corruption would be detected
- 6. Change it back to the original → hash matches again
Official Resources
Data Integrity Best Practices
File System Documentation
- → OpenZFS Documentation (OpenZFS)
- → Btrfs Documentation (Btrfs)