AVS Forum banner
7K views 128 replies 14 participants last post by  ajhieb 
#1 ·
HomeArticlesIT/DatacenterEditorials

Dispelling Backblaze's HDD Reliability Myth - The Real Story Covered



Read more at http://www.tweaktown.com/articles/6028/dispelling-backblaze-s-hdd-reliability-myth-the-real-story-covered/index.html#UX5OD0IAgPphmv71.99
Quote:
Introduction


Companies have a funny tendency to do just about anything for publicity. Being a storage writer requires staying up to date on the news, and in the normal course of my week, I watch companies clamor for attention daily. Attempts range from the mundane to the hilarious, and sometimes even questionable. Some will do anything to separate themselves from the thunderous cacophony of the thousand other companies screaming for attention.



Personally, I'm happy I'm not the one tasked with promoting companies or their products. I can see that there must be somewhat of an addictive property to nailing it though, as we observed from the splash of Backblaze headlines in the last few weeks.


Backblaze has had their name emblazoned on the front page of every tech website this week with the results of their HDD reliability blog post. Backblaze's blog is a running anthology of their company, and their efforts to provide users with unlimited online backup for a scant $5 a month. The very fact that you read that sentence makes this all worthwhile for Backblaze - they just got their message out.


This latest post is a result of their initial blog entries covering how long HDDs last, and the follow-up that enterprise HDD's are less reliable than consumer drives. Needless to say, there are holes in the methodology big enough to drive a truck through. However, the headlines led to questions from the public about more detailed drive failure rates, and Backblaze complied with the latest post, "Which Hard Drive Should I Buy?"


While Backblaze is somewhat clear about the results, they don't explain the test environment, and they don't do a good job of explaining why much of their data is worthless to the typical consumer. Further digging unearths questionable practices, at least if the goal is to gather data on drive reliability. Backblaze follows the open source mentality of sharing data on their enclosures, and even share schematics so readers can build their own Backblaze servers. This provides us some insight into their activities.


Reading their blog posts during the HDD crisis in 2011 is enthralling, they really do go to any length to continue operation in a cost-efficient manner. Unfortunately, that doesn't fall in line with determining the winners or losers of HDD reliability. To add insult to injury, most tech websites have picked this story up and proclaimed the results as a definitive guide to drive reliability.


The Backblaze results have been tallied and placed in nice charts, and winners proclaimed. Hitachi and Western Digital appear to be the winners, and Seagate comes in a distant third. However, in this case, even the winners are losers. Let's fire up the truck and drive through the holes in this test, and explain exactly why the results shouldn't affect anyone's purchasing decision.


Backblaze unabashedly sources the cheapest consumer drives they can find for storing their customers' data. Their noble goal is to provide storage as cheaply as possible, and then pass those savings along to their customers. They do not use drives indiscriminately, each model goes through a short test phase to assure the drive suits their purposes. Pairing these cheap drives with various RAID and replication schemes provides enough integrity to safely store data.



During the Thailand flooding in October 2011, Backblaze had to go to extreme lengths to procure enough drives to sustain operations. With the prices of HDD's rising exponentially, and supply in short stock, they literally took to the streets as a drastic form of damage control. They named the practice 'drive farming'.


First, they found that buying external drives was very cost effective. The whims of the market dictate that external drives are often cheaper than internal desktop drives. These drives come in small enclosures and are usually connected via USB or eSATA. Backblaze buys these drives, and strips the drives from the enclosures in a process they lovingly refer to as "shucking". Much like shucking a head of corn, they rip the enclosure apart and out pops a useable HDD.


During the Thailand flooding in October 2011, Backblaze had to go to extreme lengths to procure enough drives to sustain operations. With the prices of HDD's rising exponentially, and supply in short stock, they literally took to the streets as a drastic form of damage control. They named the practice 'drive farming'.


First, they found that buying external drives was very cost effective. The whims of the market dictate that external drives are often cheaper than internal desktop drives. These drives come in small enclosures and are usually connected via USB or eSATA. Backblaze buys these drives, and strips the drives from the enclosures in a process they lovingly refer to as "shucking". Much like shucking a head of corn, they rip the enclosure apart and out pops a useable HDD.




At first they pillaged area Fry's and Costco's until some employees were eventually banned from purchasing more drives. They fanned out further and enlisted the help of friends and family to continue purchasing drives, even detailing complex drive purchasing schedules to skirt drive purchasing limitations. When this approach began to fizzle in the local area, they even pondered going cross-country in Ryder truck. This eventually led to crowdsourcing. Backblaze offered their readers a $5 bounty per drive for drives purchased and sent to them during a Costco sale.


This practice exhibits amazing ingenuity and a laudable effort to deliver the best bang for the buck to their customers. Unfortunately, it doesn't mesh well with creating a stable sample pool to determine drive reliability. Backblaze also acknowledges using drives that were known RMA's, and refurbished, in the sample pool.





The majority of Backblaze's failures occur in the first few weeks of service, which is understandable considering their drive purchasing methods. The typical 'bathtub curve' of failures is expected with many storage devices, with the highest chance of failure in the beginning and ending stages of the product's life. However, it is feasible to conclude that their drive sourcing methods tainted their results.


Interestingly enough, Backblaze won $5 million in venture funding. Soon after, Costco was offering a sale on external HDD's with a limit of five per person. Even with the massive funding, they chose to take advantage of the sale to get the best bang for their buck. As they stated, old habits die hard.


Unfortunately, these drives are included in their failure ratings.


Backblaze also extends their ingenuity to the server rack. They have designed purpose-built Storage Pod enclosures, and share the schematic freely online. This commendable commitment to information sharing also sheds light on their 'failure rate' data.


The Storage Pod is currently on revision 3.0, with two prior incarnations requiring upgrades to deal with a number of design problems, most notably vibration.




Vibration is every hard drive's enemy, and creates an exponential amount of wear on components. Vibration even has performance implications. A typical desktop HDD experiences a relatively vibration-free existence in a stable environment, and is designed accordingly. One of the major differences in enterprise HDD design is vibration resistance technology. This allows the drive to function well and stand up to the wear and tear of the server chassis and rack.


More HDDs installed in an enclosure raises the amount of vibration. Backblaze packs 45 HDDs per enclosure for maximum storage density. While the drives are initially exposed to vibration from their neighbors inside the server, once placed into the rack, they are exposed to even more vibration from other servers. This creates the 'perfect storm' of vibration, and the use of consumer drives results in horrendous failure rates, as evidenced by the data from Backblaze.





It is no wonder that Backblaze has continued to refine their chassis to provide more resistance to vibration: the early models merely had nylon spacers to dampen vibration. Taking a closer look at their data, we can see that the drives in use the longest suffer the highest failure rates. One likely reason is simple: these older drives are in revision 1.0 of their storage enclosures, which suffer from significant vibration issues that merited a redesign.


Unfortunately for Seagate, these drives are predominantly from their product lines. This paints them in a very unforgiving light due to obvious chassis issues, with a misleading annual failure rate of 25.4% that would surely put Seagate out of business, if it were realistic.


Backblaze has left a significant amount of information out of the disclosed failure rate data. Segmenting these drives to different chassis revisions would be the responsible approach to information dissemination. We can rest assured that the older drives aren't in the best chassis available, revision 3.0 wasn't released until February 2013.


Another issue that affects the lifespan of an HDD is drive temperature. Variations in temperature and humidity have an impact upon drive life. Reading through the Backblaze blog, one comment that drew my eye was on server rack temperature;

... we've observed in the past three years that: 1) hard drives in pods in the top of racks run three degrees warmer on average than pods in the lower shelves; 2) drives in the center of the pod run five degrees warmer than those on the perimeter; 3) pods do not need all six fans-the drives maintain the recommended operating temperature with as few as two fans; and 4) heat doesn't correlate with drive failure (at least in the ranges seen in storage pods).







Backblaze claims that drive temperature doesn't affect drive life. That is counter to the observations of many others, including drive manufacturers. There is a reason for specified temperature ranges for HDD's. Though they are likely within these ranges, the Backblaze drives cannot be directly compared to each other with varying temperature ranges, let alone other drives. Once again, the lack of pertinent information makes any real conclusions impossible, and the uneven nature of the test environment spoils the data.



Workloads


Each drive is designed meticulously to provide a tightly-defined service level in its intended environment. These guidelines determine not only the design, but also the type of components used. The most cost-effective drives are designed to deliver exactly the correct performance and longevity in their intended environment, and nothing more. Utilizing robust components above the workload requirements of the drive is wasteful, and adds cost unnecessarily. This design efficiency also means the drives are more likely to fail in untoward conditions.



Backblaze procures the cheapest possible HDD on the market at all times, regardless of its workload rating, and then subjects them to a harsh environment that is virtually guaranteed to destroy the drive. This leads to higher failure rates than observed in the wild. This reflects just how precisely these drives are engineered to fulfil their stated purpose, and nothing more.


Another concern is the direct comparisons between drives, even though they endure varying workloads. Comparing drives of the same make and model is impossible if the same workload isn't applied. Expanding that out to compare different models and manufacturers is even more ridiculous. There is no way to know how many times the drives have spun up and down, and how many times the drives were subjected to varying types of data requests.


Random data requires more movement, and thus creates more wear and tear on delicate HDD heads. Spinning up and down, and also entering and recovering from various sleep states, also wears the drives differently over the course of time. With no real rhyme or reason to the workload distribution, let alone the environment, direct comparisons are impossible.


Only one thing is certain; the drives were subjected to workloads well beyond their design limits.


Final Thoughts


The data from Backblaze should not influence a purchasing decision by any consumer, regardless of what type of drive they are purchasing. The innumerable variables, and lack of documentation, ensures the results are unreliable. Even for the winners, the results aren't good; the failure rates are exponentially higher than those observed in the real-world. One should question whether these companies could survive financially with the massive warranty return rates in real-world scenarios.


We covered some of the most obvious holes in the methodology behind the Backblaze comparisons, but there are many more, such as sample size. With varying numbers of drives for each model, it is possible that some bad batches may have made their way into the sample pool, thus further skewing the numbers.



There is no clearer example of this than in their blog post titled "Enterprise Drives: Facts or Fiction?". In this blog post, Backblaze compared 368 enterprise HDDs, presumably purchased as a batch, to 14,719 consumer drives. Along with the fact that a bad batch may skew the numbers, Backblaze admits they subjected the drives to various chassis, temperatures and workloads. This creates data that is essentially worthless for comparative purposes, but when paired with a catchy title, it serves the purpose of attracting attention.


The enthusiast in me loves the Backblaze story. They are determined to deliver great value to their customers, and will go to any length to do so. Reading the blog posts about the extreme measures they took was engrossing, and I'm sure they enjoyed rising to the challenge. Their Storage Pod is a compelling design that has been field-tested extensively, and refined to provide a compelling price point per GB of storage.


It is the release of the data, in handy charts and graphs that encourage misrepresentation, which brings out the data-storage stickler in me. HDD manufacturers spend billions of dollars in R&D, and their labs are designed to characterize and measure the reliability and endurance of their storage solutions.


The Backblaze environment is the exact opposite. I do not believe I could dream up worse conditions to study and compare drive reliability. It's hard to believe they plotted this out and convened a meeting to outline a process to buy the cheapest drives imaginable, from all manner of ridiculous sources, install them into varying (and sometimes flawed) chassis, then stack them up and subject them to entirely different workloads and environmental conditions... all with the purpose of determining drive reliability.


Of course that wasn't the intention, but that is how some will interpret the data. In my opinion, the intoxicating allure of media coverage overwhelmed common sense, and Backblaze released these numbers with a catchy title that would attract attention. The tech media is to blame as well, with many posting the information with little or no research. Unfortunately, the Backblaze blog post will be copy/pasted innumerable times for years to come as an authoritative source of data, when it is the furthest thing from a comprehensive study imaginable.


Read more at http://www.tweaktown.com/articles/6028/dispelling-backblaze-s-hdd-reliability-myth-the-real-story-covered/index5.html#iLFiOVkFF8G0dFIe.99


Thoughts ???
 
#102 ·

Quote:
Originally Posted by EricN  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/100_100#post_24335132


That's how statistics operate.

So then you are saying if all 4 of the Hitachi models had good reliability (both 7200rpm and 5400rpm) then statistically all Hitachi is likely good ?
 
#103 ·

Quote:
Originally Posted by ajhieb  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/0_100#post_24331439


Well, clearly that's the problem... I must have missed the post where you explained the difference between certainty, and probability. I would have expected it to actually contain the word "certainty" but that post doesn't seem to exist, at least not in this thread.

Try searching for "certain".
 
#104 ·
If one makes an effort to take a sample that represents the whole, yes that's how statistics operate. Backblaze has shown no such effort, and in fact, state the contrary. They weren't selecting hard drives to represent a cross section of various manufacturers. They were purchasing the cheapest hard drives they could get their hands on, then looked at the results.


This is no different that saying based on the performance of 500,000 Crown Vics used by the Yellow Cab company over the last few years, that Ford makes reliable/unreliable cars. Just like with the hard drives, the cars are used differently, serviced differently, and not necessarily representative of cars that people are generally looking to purchase. (no one under 70 anyway)
 
#105 ·

Quote:
Originally Posted by jbcain  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/0_100#post_24331590


jim2100, you're assuming statistics of a greater whole with a lesser group. this data was clearly only indicative of the models used/chosen. I would never assume that chevy made an inferior product solely based on 2 car product lines, as we all know they make 20+.

Like ajhieb, you are confusing certainty and probability. As I already explained a few posts back. But I am not going to give a probability and statistics lesson here, so you will have to correct your misunderstandings in that area yourself.
 
#107 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/60#post_24330728


This is not a murder trial. We do not need to be certain, beyond a reasonable doubt, that someone is guilty.

Pardon me. Bang up explanation you gave there. That cleared it all up.
 
#108 ·

Quote:
Originally Posted by ajhieb  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/100_100#post_24335401



Pardon me. Bang up explanation you gave there. That cleared it all up.

I'm not sure why you seem to feel entitled to have someone else take their time to teach you something that you seem intelligent enough to learn on your own. You have a fundamental misunderstanding of basic probability and statistics. Even if it were possible, I am not going to take the time to teach you basic probability and statistics in this forum. I've already told you several times that you are confusing probability with certainty. I do not know how to say it more clearly. Yet you continue to make statements involving absolutes when what is called for is probabilistic statements. Maybe someone else is willing to explain it to you. Otherwise, you probably need to spend some time studying applied statistics, and then come back and think about this study after you have corrected your misunderstandings.
 
#109 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/90#post_24335389


Like ajhieb, you are confusing certainty and probability. As I already explained a few posts back. But I am not going to give a probability and statistics lesson here, so you will have to correct your misunderstandings in that area yourself.

So if you don't intend on explaining yourself, do you expect to be taken seriously when you just keep repeating "Wrong again."


Had Backblaze made an effort to get a representative sample of the drives available from the manufacturers, then you could say with a fair amount of confidence, that Manufacturer A is better than Manufacturer B. But they didn't. They didn't do a random sampling. They didn't do a representative sampling. They bought a bunch of cheap drives. Period. That isn't a good sampling to base claims about the brand, no matter how many of those drives they had.


If you could show some sort of correlation between the models represented, and the ones not represented, then you certainly could state that based on the results that all the models from Man A are generally more reliable than all the models from Man B. But no one, not Backblaze, not you, not anyone has made that correlation. The untested drives aren't connected statistically in any way to the ones tested without that correlation. This has nothing to do with absolutes, other than there is absolutely no connection between the drives studied and the drives not included.


It's not a matter of Brand A is more reliable than Brand B with x% of certainty. I'm well aware of what certainty is in that context. The thing that you keep ignoring is that the study wasn't actually looking at brands, it was looking at models. Brand is wholly irrelevant until you show some statistical connection . And you haven't. In the context of the study, "brand" is a wholly abstract concept. You might as well have named some of the drives "Jack" and the others "Jill" and based the study on that. Then we could determine that people named Jack are more reliable than people named Jill. It sounds silly, but it's no more ridiculous than the leap you're trying to make, without first showing the connection between what was studied and what wasn't.


You can keep telling me I'm wrong until you're blue in the face, but until you bring something to the discussion besides that, you may as well be going "blah blah blah"
 
#110 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/90#post_24335430


I'm not sure why you seem to feel entitled to have someone else take their time to teach you something that you seem intelligent enough to learn on your own. You have a fundamental misunderstanding of basic probability and statistics. Even if it were possible, I am not going to take the time to teach you basic probability and statistics in this forum. I've already told you several times that you are confusing probability with certainty. I do not know how to say it more clearly. Yet you continue to make statements involving absolutes when what is called for is probabilistic statements. Maybe someone else is willing to explain it to you. Otherwise, you probably need to spend some time studying applied statistics, and then come back and think about this study after you have corrected your misunderstandings.

I don't feel entitled at all to have you teach me anything. I'm simply asking for you to explain your claims. If you don't want to give an explanation, that's your prerogative.
 
#111 ·

Quote:
Originally Posted by ajhieb  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/100_100#post_24335461


You can keep telling me I'm wrong until you're blue in the face, but until you bring something to the discussion besides that, you may as well be going "blah blah blah"

There is something we can agree on. That is why I already said that me repeating myself is not going to help correct your confusion. You need to put in some effort yourself.
 
#112 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/90#post_24335486


There is something we can agree on. That is why I already said that me repeating myself is not going to help correct your confusion. You need to put in some effort yourself.

Nope. You're wrong. Don't make me repeat myself.
 
#113 ·

Quote:
Originally Posted by ajhieb  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/100_100#post_24335477


I'm simply asking for you to explain your claims. If you don't want to give an explanation, that's your prerogative.

What "claims"? Do you mean my assertion that one can draw probabilistic information from statistical analysis of samples taken from populations? That is basic probability and statistics.
 
#115 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/90#post_24335506


What "claims"? Do you mean my assertion that one can draw probabilistic information from statistical analysis of samples taken from populations? That is basic probability and statistics.

No one is doubting the viability of probability and statistics in general. Nice attempt at ducking the issue at hand. No, the claim that you can say Hitachis are more reliable than Seagates. or Western Digital, based on this study is the claim that you've yet to defend.


Why do you keep making me repeat myself?
 
#116 ·
"But Statistics!" is not a valid defense of that claim. Neither is "No, you're wrong."


Perhaps you should have skipped a prob and stat class and watch the debate team once or twice.
 
#117 ·

Quote:
Originally Posted by ajhieb  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/100_100#post_24335517


No, the claim that you can say Hitachis are more reliable than Seagates. or Western Digital, based on this study is the claim that you've yet to defend.

I made no such claim. Again, you are confusing certainty with probability. But with your mental block, you can just replace that with "blah blah blah", as it is clear that you are unwilling to correct your misunderstanding basic probability and statistics, and insist on speaking in absolutes.
 
#118 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered#post_24297641


Not really. I skimmed quickly through the article and kept thinking how poor the logic was and how sorely lacking it was with any supporting evidence to its claims.

That's kinda how I feel about your contribution to this thread.
 
  • Like
Reactions: Andy_Steb
#119 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/90#post_24335528


I made no such claim. Again, you are confusing certainty with probability. But with your mental block, you can just replace that with "blah blah blah", as it is clear that you are unwilling to correct your misunderstanding basic probability and statistics, and insist on speaking in absolutes.

When I'm claiming that you can't based on the backblaze study claim Hitachis are more reliable than Seagates. or Western Digital, and you tell me I'm wrong, then yes, you are claiming that very thing.
 
#121 ·

Quote:
Originally Posted by ajhieb  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/100_100#post_24335571


When I'm claiming that you can't based on the backblaze study claim Hitachis are more reliable than Seagates. or Western Digital, and you tell me I'm wrong, then yes, you are claiming that very thing.

Wrong yet again. I made no such claim.
 
#122 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/120#post_24335577


Wrong yet again. I made no such claim.

So we agree that you can't based on the Backblaze study claim that Hitachi drives in general are any more reliable than Seagate or WD drives.
 
#123 ·

Quote:
Originally Posted by ajhieb  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/100_100#post_24335602


So we agree that you can't based on the Backblaze study claim that Hitachi drives in general are any more reliable than Seagate or WD drives.

You really need to read more carefully (or possibly think more clearly, or both).
 
#124 ·

Quote:
Originally Posted by jim2100  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/120#post_24335618


You really need to read more carefully.

I'll tell you what...


I'm just going to keep talking about the Backblaze study. That real, actual thing, that happened, and was recorded, and published. You can keep ranting about probability and statistics (and keep complaining because when you steer the conversation away from the study and towards the abstract, people want expect you to back up what you say) To be very clear... you're the one who doesn't seem to want to talk about the backblaze study. You're the one who wants to keep this limited to a discussion of general probability and statistics, and you're the one who is complaining because people are asking you about the very topic you insist on steering the discussion towards.


If want to actually make a claim about this study then I'm all ears. If you're just going to keep trying to derail the conversation into a lecture about prob and stat, and then refuse to actually give said lecture I have no more interest in trying to have a discussion with you.


I have no issue at all with someone claiming I'm wrong if they'll explain what exactly I'm wrong about, and why exactly I'm wrong. I love learning that I was wrong about things... it means I learned something new. You however haven't brought anything to the discussion except trolling and nitpicking. You've refused to actually engage in any meaningful dialogue, instead choosing to talk in circles. Welcome to my ignore list. Good night.
 
#126 ·

Quote:
Originally Posted by ajhieb  /t/1515424/dispelling-backblazes-hdd-reliability-myth-the-real-story-covered/100_100#post_24335661


I have no issue at all with someone claiming I'm wrong if they'll explain what exactly I'm wrong about, and why exactly I'm wrong.

So despite your previous denial, you do feel entitled to having someone teach you basic probability and statistics. That is kind of pathetic.
 
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top