Online backup service Backblaze frequently provides interesting storage analysis based on hard drive statistics gathered from its data center. We’ve seen Backblaze figure out the most reliable hard drives based on tests in 2014 and then again last May. Now the company’s talking about how it determines if a hard drive is likely to die, another return to a topic broached in 2014.
Every day Backblaze retrieves a SMART error report for each of the more than 67,000 hard drives in its Sacramento data center. Backblaze then tracks five specific SMART errors that it says are the most helpful to determine whether a hard drive is about to fail.
In the company’s experience, 76.7 percent of its failed hard drives reported at least one of these five SMART errors before kicking the bucket—a statistically large number, though that still means that 23.3 percent of Backblaze’s dead drives gave no warning at all before failing. Meanwhile, only 4.2 percent of still-operational drives have reported these five SMART errors.
If you’re not familiar with SMART it stands for Self-Monitoring, Analysis, and Reporting Technology. It’s a self-analysis feature built into modern hard drives. The catch is you usually need third-party software to retrieve your hard drive’s SMART report—though you can also fetch the report via the command line.
The five key SMART errors Backblaze tracks are the following:
SMART 5: Reallocated sectors count SMART 187: Reported uncorrectable errors SMART 188: Command timeout SMART 197: Current pending sector count SMART 198: Uncorrectable sector count
The last two errors are similar, but Backblaze includes them because some hard drive makers include one error in their reports but not the other.
Each SMART error reports a “raw value” when it happens. Unfortunately, this error number can vary by vendor. Regardless, all Backblaze needs to track is whether that raw value is more than zero. If it is, then the company takes a harder look at what’s going on with that drive.
So how can a regular user employ these error reports to know what’s up with their hard drive? First, if you want to use these SMART errors you need to keep a record to see how many of these errors are reported over time. That gives you an idea of how serious the problem is.
As BackBlaze’s Andy Klein, director of product marketing, pointed out in his blog post, a hard drive is more likely to fail if it “jumps from zero to 20 reported uncorrectable errors (SMART 187) in one day” as opposed to a hard drive that reports one SMART 187 error every month for five years.
If you do come across a hard drive that’s reporting these errors, and the problems become stronger over time, should you replace it? You may have to at some point, but it’s hard to know exactly when that hard drive will kick the bucket. What you definitely want to do is keep an eye on the drive, and make sure you’re backing up your data should disaster strike.
Another approach is to use any of the reported SMART errors as an excuse to upgrade to an SSD, which will dramatically boost your PC’s performance. Of course, just like hard drives, SSDs can fail too. Check out our own tutorial on how to figure out if your SSD is dying for more information.