is it stupid to build a HTPC with a SSD drive only? - Page 3 - AVS Forum
Forum Jump: 
Reply
 
Thread Tools
post #61 of 92 Old 07-14-2014, 09:20 AM
AVS Special Member
 
ilovejedd's Avatar
 
Join Date: Aug 2006
Posts: 3,725
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 38 Post(s)
Liked: 59
Quote:
Originally Posted by rcohen View Post
Yes, in terms of throughput, but depending on buffering, seeks can put you over. I've known people who have run into this with WMC, but I've had no problems with my RAID. They may be able to get one drive to work by tuning OS or WMC buffering parameters.
One thing I always do for my DVR/media drives is to format with 64K sectors. I've found that helps greatly in avoiding stutters when you've got multiple streams recording.
ilovejedd is offline  
Sponsored Links
Advertisement
 
post #62 of 92 Old 07-14-2014, 09:51 AM
Senior Member
 
dcard's Avatar
 
Join Date: Oct 2005
Posts: 206
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 13 Post(s)
Liked: 17
Quote:
Originally Posted by ajhieb View Post
A 7200rpm laptop drive is more than fast enough to handle several concurrent video streams.
Yep. I have (oops had) exactly seven concurrent HD streams often recording to a single 5400rpm WD Green. The decision to use an SSD as a recording drive should not be made based on perception of better performance for recording. Maybe for playback on multiple streams over network, but I often had three simultaneous from same drive with no problems.

I, personally, have lost far more SSD's as a percentage of SSD's bought, than % hard drive failures. Granted, I have bought well over 50 hard drives over past six years, and only 8-10 SSD's, making this, statistically, more bad coincidence than hard fact. I think the bulk were early generation SSD's, but I lost my $1100 Intel 320 600GB laptop SSD last month (very light use), so the sour taste continues. When a hard drive fails for me, I usually have warning signs that it is headed south. My SSD failures are immediate and complete.

However, with all this said, if for convenience (like room for only one drive in a small case, and no need for TB storage of recordings), I wouldn't hesitate to use the same SSD for OS and recordings. As always, partition your SSD, and Image that OS!
ilovejedd likes this.
dcard is offline  
post #63 of 92 Old 07-14-2014, 10:12 AM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by ilovejedd View Post
One thing I always do for my DVR/media drives is to format with 64K sectors. I've found that helps greatly in avoiding stutters when you've got multiple streams recording.
That's good to know.
rcohen is offline  
post #64 of 92 Old 07-14-2014, 10:19 AM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by dcard View Post
Yep. I have (oops had) exactly seven concurrent HD streams often recording to a single 5400rpm WD Green. The decision to use an SSD as a recording drive should not be made based on perception of better performance for recording. Maybe for playback on multiple streams over network, but I often had three simultaneous from same drive with no problems.

I, personally, have lost far more SSD's as a percentage of SSD's bought, than % hard drive failures. Granted, I have bought well over 50 hard drives over past six years, and only 8-10 SSD's, making this, statistically, more bad coincidence than hard fact. I think the bulk were early generation SSD's, but I lost my $1100 Intel 320 600GB laptop SSD last month (very light use), so the sour taste continues. When a hard drive fails for me, I usually have warning signs that it is headed south. My SSD failures are immediate and complete.

However, with all this said, if for convenience (like room for only one drive in a small case, and no need for TB storage of recordings), I wouldn't hesitate to use the same SSD for OS and recordings. As always, partition your SSD, and Image that OS!
As long as you have write caching enabled, concurrent asynchronous writes should be a non-issue, until you run out of total throughout. I was referring to concurrent synchronous reads a possible problem with a single spindle. I'm talking about actual experience with WMC, not just theory. Anything to reduce seeks could address it, though, like larger sectors, larger buffers, or more aggressive read-ahead setting in the application or OS.
rcohen is offline  
post #65 of 92 Old 07-14-2014, 10:21 AM
AVS Addicted Member
 
Charles R's Avatar
 
Join Date: Nov 2000
Location: Indianapolis
Posts: 10,261
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 1 Post(s)
Liked: 222
Quote:
Originally Posted by rcohen View Post
Yes, in terms of throughput, but depending on buffering, seeks can put you over. I've known people who have run into this with WMC, but I've had no problems with my RAID.
I have used several USB 2.0 drives with WMC and I noticed very rare pixel break when it was recording three or shows concurrently. They happened during the write process. Right now I'm using one of the WD green drives internally which stops spinning after x period of time and I'll see pixel break on a regular basis as the drive spins up. It's within the one minute padding so it doesn't really hurt anything.
Charles R is offline  
post #66 of 92 Old 07-14-2014, 10:29 AM
AVS Special Member
 
ilovejedd's Avatar
 
Join Date: Aug 2006
Posts: 3,725
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 38 Post(s)
Liked: 59
Quote:
Originally Posted by mcturkey View Post
The type of write-heavy long term storage that could wear out an SSD within the warranty period is not something you will ever achieve unless you are using it in a server where you see daily writes measured in TB (and in scenarios like this, you have to use flash storage anyway, because no array of mechanical drives is keeping up with that). There is nothing remotely impractical about using an SSD for "extremely write heavy long term storage". The only factor in which your decision on SSD vs HDD should depend on is capacity vs. price.
The cost per GB is what makes using SSDs impractical as a DVR drive for long term storage.

As for speeds it depends on the type of workload. Modern HDDs with 1TB platters can do an average of 150 MB/s sequential. That translates to around 1TB of writes in 2 hours.
ilovejedd is offline  
post #67 of 92 Old 07-14-2014, 10:32 AM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by Charles R View Post
I have used several USB 2.0 drives with WMC and I noticed very rare pixel break when it was recording three or shows concurrently. They happened during the write process. Right now I'm using one of the WD green drives internally which stops spinning after x period of time and I'll see pixel break on a regular basis as the drive spins up. It's within the one minute padding so it doesn't really hurt anything.
I have had bad luck with usb drives for long-term storage.

I strongly recommend that Thecus n5550 via iSCSI if you can't do internal storage. Fast, reliable, and relatively cheap. I also like how I can get the noisy drives out of the room.
rcohen is offline  
post #68 of 92 Old 07-14-2014, 10:46 AM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by dcard View Post
Yep. I have (oops had) exactly seven concurrent HD streams often recording to a single 5400rpm WD Green. The decision to use an SSD as a recording drive should not be made based on perception of better performance for recording. Maybe for playback on multiple streams over network, but I often had three simultaneous from same drive with no problems.

I, personally, have lost far more SSD's as a percentage of SSD's bought, than % hard drive failures. Granted, I have bought well over 50 hard drives over past six years, and only 8-10 SSD's, making this, statistically, more bad coincidence than hard fact. I think the bulk were early generation SSD's, but I lost my $1100 Intel 320 600GB laptop SSD last month (very light use), so the sour taste continues. When a hard drive fails for me, I usually have warning signs that it is headed south. My SSD failures are immediate and complete.

However, with all this said, if for convenience (like room for only one drive in a small case, and no need for TB storage of recordings), I wouldn't hesitate to use the same SSD for OS and recordings. As always, partition your SSD, and Image that OS!
I've also had lots of reliability issues with consumer grade SSDs. Intel has been the most reliable, but they fail, too. For Intel and Samsung, the failure rate seems similar to HDDs. Other brands have been significantly worse. As you said, when they do happen, the failures tend to be worse than HDDs. Total failures without warning, corrupt data without IO errors, not detecting failures in RAIDs, etc. ZFS and ReFS are designed to handle these kinds of problems gracefully.

There can be large performance benefits, though - just not a good match for a DVR.
rcohen is offline  
post #69 of 92 Old 07-14-2014, 11:14 AM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
I have to say that in spite of my *ahem* disagreement earlier, this is turning out to be quite a useful thread. (My apologies to the OP for that distraction, BTW...) There are a lot of good data points. Here's my attempt to summarize them, along with a bit of my own personal experience thrown in for good measure...

Pros:
  • SSDs make less noise
  • SSDs uses less power (probably not significant enough to make a difference IRL though...)
  • Reboots will be faster (but, as your box is probably on all the time, probably not a real consideration...)
  • Other apps may load faster and feel more responsive (I'm thinking games specifically...)
  • Waking from hibernate will probably be faster (in my opinion, probably not a real-world benefit though as sleep is probably a better option vs. hibernate for a box that's plugged in all the time. Not to mention I gave up years ago trying to get my box to reliably wake from sleep for recordings...)
Cons:
  • Much less recording space for the money (I bolded this because, in my opinion, this is the most significant data point of all)
  • The failure rate of SSDs *may* be statistically higher than HDDs (independent of the whole endurance debate)? Also, there are plenty of examples of HDDs dying early on as well, so I have no idea if this is really relevant... IN the case of SSDs, may be safest to stick with Intel or Samsung
Things that probably shouldn't be considered:
  • Performance (most likely, at least as it relates to the ability to record and play back media). See PROs for a few examples where SSDs may provide performance benefits in an HTPC environment...
    • In the case of HDD, be sure it's set up properly (formatted with large sector size, buffering enabled, etc...)
  • SSD _write endurance_
    • Several respected tech media outlets have verified that drives will typically far exceed the manufacturer stated endurance, which is probably far beyond the amount of data that any typical HTPC users will write to the drive in its expected lifetime. However, the exchange on write amplification between @rcohen and @ajhieb (posts47, 49, and 50) provide the only compelling argument I've personally seen against why the testing methodology typically used might not be relevant for HTPC use.
    • There's also the potential to just flat out get a dud drive...
Other considerations:
  • If you have a NAS it may make perfect sense for SSD only. Either move recordings from the SSD to the NAS regularly, or consider using iSCSI to record directly to NAS
    • One significant consideration here - make sure you have a reliable network and sufficient network throughput for iSCSI. That pretty much means ethernet.
What did I miss/get wrong?
ajhieb likes this.

Last edited by nvmarino; 07-14-2014 at 01:18 PM. Reason: Wording, formatting, updated iSCSI point
nvmarino is offline  
post #70 of 92 Old 07-14-2014, 11:37 AM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
SSD read speed is a HUGE benefit for laptops waking from hibernation.

iSCSI over unreliable networks like wifi or Ethernet over power/coax/whatever is a very bad idea if you value your data. In that case, stick with a file share.

Over reliable networks, I have gotten better results with iSCSI, plus it works with internet backup services like CrashPlan (with an ini tweak to not crash with big data).
rcohen is offline  
post #71 of 92 Old 07-14-2014, 12:05 PM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
And, on the whole topic of reliability (a trending topic in the thread), here's a recent personal experience:

In my main HTPC I run my OS and apps off an 120GB OCZ Agility (don't worry - it's imaged regularly...) All of my media and other data resides on a Highpoint Rocketraid 3520 with six 2TB drives set up in a RAID-6 configuration. This is all housed in an Origen AE X11 case (yes I managed to shove 6 HDDs, one SSD, and an optical drive in that case - don't ask how and I won't have to admit it involved double-sided tape...). I also back up my most important stuff (personal photos/videos and data) in a variety of ways - offsite and to a USB drive attached directly (for a hopefully faster restore) via Crashplan, and offsite via pogoplug (my wife and I sync our photos and videos from our iPhones to the HTPC via Pogoplug). We also sync our phones independently to a Slingbox 500 with attached USB storage, and also a LyveHome device. I will add that while a loss of our personal data would be pretty devastating I'm not really that paranoid - most of my set up is more for testing and evaluating, for both my own personal use and my job. I will eventually pare down to a much simpler setup...

Here's the interesting/relevant part - about two weeks ago my very high energy two year old son ran pretty much full speed in to the entertainment center where my HTPC sits. The TV wobbled and my brain immediately assumed I was going to be replacing some (probably all) the HDDs. Fortunately, I am in the process of building a new system, in which I have 12TB of storage (software RAID0 for now, but don't worry - just for testing. Still working out my new storage plan...) available on, so I immediately sync'd everything from the HTPC to the new storage. Sure enough I've had the array rebuild itself twice since then, with two drives failing both times (what's weird is neither drive show any SMART errors, and the array managed to rebuild itself both times on the same drives, and has been running merrily for about two days now. Not exactly sure what failed, but the drives are getting replaced anyway). My saving grace here is that I JUST upgraded from RAID-5 to RAID-6 three weeks ago, else I would have been re-ripping a whole bunch of stuff and lost a bunch of TV recordings... In any case, after this happened I had pretty much zero concern about the SSD after the impact.

The tl;dr version of the story is if your HTPC is in a location where it could be bashed by a two year old, and you only have room for one storage device, an SSD may be a better choice. Oh yeah, probably should've mentioned my son is fine... He really was my first concern after the crash. Really. I swear.

Last edited by nvmarino; 07-14-2014 at 12:24 PM.
nvmarino is offline  
post #72 of 92 Old 07-14-2014, 01:25 PM
Senior Member
 
dcard's Avatar
 
Join Date: Oct 2005
Posts: 206
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 13 Post(s)
Liked: 17
Quote:
Originally Posted by nvmarino View Post
I have to say that in spite of my *ahem* disagreement earlier, this is turning out to be quite a useful thread. (My apologies to the OP for that distraction, BTW...) There are a lot of good data points. Here's my attempt to summarize them, along with a bit of my own personal experience thrown in for good measure...


......[/LIST]Cons:
  • Much less recording space for the money (I bolded this because, in my opinion, this is the most significant data point of all)

    ......
What did I miss/get wrong?
I would think that an SSD alone for recording (no NAS nor attached auxilliary) is nearly worthless. After you reserve your 40-50GB for OS and Apps, you can record about 10 1-hr TV shows on a 120GB drive. When I got my first DVR Cablebox 12 yrs ago that was amazing....for about the first week .

So, if I assume that this thread is really about using the OS'es SSD for temporary buffering of recordings for eventual (daily?) transfer to the NAS. From this perspective, the SSD recording makes lots of sense.

However, even in this "temporary buffer" situation, where a batch file automatically transfers recordings daily, I found that having a 500GB+ "temp buffer" was very useful when the NAS was "down" for maintenance, sometimes for an extended period. Lots of flexibility in that setup. My seven-stream recordings to a single WD Green 2TB was done in the same chassis with a 40TB hardware raid 6 setup. Yes, I could have recorded directly to the RAID array, but I would lose lots of flexibility with that setup.

...just some additional thoughts.....
dcard is offline  
post #73 of 92 Old 07-14-2014, 01:35 PM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Glad that story had a happy ending. On the topic of backups, CrashPlan with this ini setting is the best and cheapest option I have found for HTPCs with lots of data. (I don't use it for constantly changing DVR storage, though.)

http://crashplan.probackup.nl/remote...arting.en.html
rcohen is offline  
post #74 of 92 Old 07-14-2014, 01:56 PM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
Quote:
Originally Posted by rcohen View Post
On the topic of backups, CrashPlan with this ini setting is the best and cheapest option I have found for HTPCs with lots of data.
We should probably start another thread for this, but... Agreed, crashplan is pretty good. Pogoplug is also pretty competitive for cloud-based storage ($50 annually for unlimited storage), but it's not an end-to-end cloud-based backup solution like CrashPlan. You'd have to combine it with some third-party backup software for a true cloud-based backup, but they do offer some nice features that you won't get from a cloud solution that is dedicated solely to backup (at least not that I'm aware). Come to think of it, doing something like combining crashplan software with Pogoplug cloud storage may be an ideal solution. You may have just saved me $50 a year...

Quote:
Originally Posted by rcohen View Post
(I don't use it for constantly changing DVR storage, though.)
Yeah, me either. I also don't back up any of my ripped media - that just feels abusive to me. Not to mention, for the DVR content, I probably couldn't upload the content fast enough for the backup to ever finish anyway...

Last edited by nvmarino; 07-14-2014 at 02:37 PM.
nvmarino is offline  
post #75 of 92 Old 07-14-2014, 02:06 PM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by nvmarino View Post
also don't back up any of my ripped media - that just feels abusive to me.
I don't see it that way. That represents a huge time investment for me, and a backup service with de-dup should be able to very efficiently upload and store that stuff.
rcohen is offline  
post #76 of 92 Old 07-14-2014, 02:16 PM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by nvmarino View Post
You'd have to combine it with some third-party backup software for a true cloud-based backup, but they do offer some nice features that you won't get from a cloud solution that is dedicated solely to backup (at least not that I'm aware).
I'm not sure what you're referring to by CrashPlan not being cloud based or true backup.

I like CrashPlan for the low price, fast upload speed, unlimited storage, and block-level de-dup.

It falls short in that it does files only (no disk images) and that it doesn't do anything special for mobile.
rcohen is offline  
post #77 of 92 Old 07-14-2014, 02:32 PM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
Quote:
Originally Posted by rcohen View Post
I don't see it that way. That represents a huge time investment for me, and a backup service with de-dup should be able to very efficiently upload and store that stuff.
Fair enough - there's definitely nothing wrong with it as that's how they advertise their service. I suppose if I did I'd also be paying for their ship me a hard drive service, and both ways if I did ever have to restore. But I've also got what I consider to be a pretty reliable setup as it is, so don't really feel the need I guess.
nvmarino is offline  
post #78 of 92 Old 07-14-2014, 02:35 PM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
Quote:
Originally Posted by rcohen View Post
I'm not sure what you're referring to by CrashPlan not being cloud based or true backup.

I like CrashPlan for the low price, fast upload speed, unlimited storage, and block-level de-dup.

It falls short in that it does files only (no disk images) and that it doesn't do anything special for mobile.
Bad wording - I edited the post to hopefully clarify. I was referring to Pogoplug not being a true backup solution. Pogoplug's cloud service is basically like dropbox. Except it's $50 anually for unlimited storage.
nvmarino is offline  
post #79 of 92 Old 07-14-2014, 02:56 PM
AVS Special Member
 
ajhieb's Avatar
 
Join Date: Jul 2009
Posts: 1,598
Mentioned: 7 Post(s)
Tagged: 0 Thread(s)
Quoted: 432 Post(s)
Liked: 434
Quote:
Originally Posted by rcohen View Post
I don't see it that way. That represents a huge time investment for me, and a backup service with de-dup should be able to very efficiently upload and store that stuff.
De-dup could be effective presuming your rips are 1:1 and someone else has already backed up the same rip, but if you've compressed the streams I don't think de-dup is going to be very effective. If they're suing something like Single Instance Storage I'm under the impression the rips would have to be exact (as it's based on the file hash) and even if they were suing a more advanced block based de-dup, it still might not do much good for rips... I'm assuming MKV and other common formats are interleaved so even if the video is an exact bit for bit match, if the audio streams are different (and it seems like more people tinker with, drop, or re-encode the audio streams) even block level de-dup won't get you very far since the non-matching data will be interleaved with the matching data.

So I suspect it might work in a handful of cases, but not many. It is an interesting thought though, so I'm wondering if anyone with some practical experience could weigh in. (I'm half tempted to throw a trial of Server 2012 R2 on a computer and see what it does with various "similar" rips.

RAID protection is only for failed drives. That's it. It's no replacement for a proper backup.
ajhieb is online now  
post #80 of 92 Old 07-14-2014, 03:26 PM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
Quote:
Originally Posted by rcohen View Post
I don't see it that way. That represents a huge time investment for me, and a backup service with de-dup should be able to very efficiently upload and store that stuff.
I just caught the word "upload" in this reply. Are you saying that de-dupe happens before the data is uploaded? If so, I never realized that - I was always under the impression de-dupe happened on the server side.
nvmarino is offline  
post #81 of 92 Old 07-14-2014, 03:37 PM
AVS Special Member
 
ajhieb's Avatar
 
Join Date: Jul 2009
Posts: 1,598
Mentioned: 7 Post(s)
Tagged: 0 Thread(s)
Quoted: 432 Post(s)
Liked: 434
Quote:
Originally Posted by nvmarino View Post
I just caught the word "upload" in this reply. Are you saying that de-dupe happens before the data is uploaded? If so, I never realized that - I was always under the impression de-dupe happened on the server side.
I don't know how any particular service does it, but it would make sense that file based de-duping happens on the client side. I can't think of a reason why you wouldn't have the client generate all the file hashes and transmit those to the server to determine what needs to be uploaded. Block de-duping could theoretically be done on the client side too but I could see that becoming a little more complex and time consuming.

Quick back of the envelope calculations tells me that 40TB of user data would be about 5 billion 128 blocks, and that would be 5 billion hashes that need to be uploaded, then compared to however many quadrillion hashes for their existing data. Something tells me that might take a while. (especially when they have hundreds or thousands of people doing the same thing at the same time)

Also if I recall, from an earlier thread about crashplan software crashing due to excessive memory usage, I think it was due to the de-duping process, but I'll see if I can find that thread and verify.


Edit: Found it!

Here is the thread: Crash Plan glitch! Server users beware

and here is the relevant article: http://support.code42.com/CrashPlan/...ry_And_Crashes

RAID protection is only for failed drives. That's it. It's no replacement for a proper backup.
ajhieb is online now  
post #82 of 92 Old 07-14-2014, 04:01 PM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
Quote:
Originally Posted by ajhieb View Post
I don't know how any particular service does it, but it would make sense that file based de-duping happens on the client side. I can't think of a reason why you wouldn't have the client generate all the file hashes and transmit those to the server to determine what needs to be uploaded. Block de-duping could theoretically be done on the client side too but I could see that becoming a little more complex and time consuming.

Quick back of the envelope calculations tells me that 40TB of user data would be about 5 billion 128 blocks, and that would be 5 billion hashes that need to be uploaded, then compared to however many quadrillion hashes for their existing data. Something tells me that might take a while. (especially when they have hundreds or thousands of people doing the same thing at the same time)

Also if I recall, from an earlier thread about crashplan software crashing due to excessive memory usage, I think it was due to the de-duping process, but I'll see if I can find that thread and verify.


Edit: Found it!

Here is the thread: Crash Plan glitch! Server users beware

and here is the relevant article: http://support.code42.com/CrashPlan/...ry_And_Crashes
Very interesting - thanks @ajhieb and @rcohen . Makes perfect sense to do de-dup on the client side now that I think about it.

So... when did you say you were getting around to that Server 2K12 test? One other way I can think of to validate would be to actually try to back up my BD collection and watch the stats - most of my stuff is ripped with AnyDVD to folder structure with original menu maintained and extras removed - a pretty common method I think? I wonder if CrashPlan reports somewhere how much data needs to be sent to the server? Obviously we know how much data is in the backup, but how about the actual amount of data that was (or needs to be) uploaded? Will have to poke around and see...

@rcohen (or anyone else who backs up their discs to CrashPlan), have you ever tried to validate how much was actually de-duped?

Last edited by nvmarino; 07-14-2014 at 04:04 PM.
nvmarino is offline  
post #83 of 92 Old 07-14-2014, 04:36 PM
AVS Special Member
 
ajhieb's Avatar
 
Join Date: Jul 2009
Posts: 1,598
Mentioned: 7 Post(s)
Tagged: 0 Thread(s)
Quoted: 432 Post(s)
Liked: 434
Quote:
Originally Posted by nvmarino View Post
Very interesting - thanks @ajhieb and @rcohen . Makes perfect sense to do de-dup on the client side now that I think about it.

So... when did you say you were getting around to that Server 2K12 test?
Eh... I got nothing better to do this evening. I'll throw 2012 R2 on something laying around, and throw a few MKV rips on there and see what I find out.

RAID protection is only for failed drives. That's it. It's no replacement for a proper backup.
ajhieb is online now  
post #84 of 92 Old 07-14-2014, 04:49 PM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by nvmarino View Post
Very interesting - thanks @ajhieb and @rcohen . Makes perfect sense to do de-dup on the client side now that I think about it.

So... when did you say you were getting around to that Server 2K12 test? One other way I can think of to validate would be to actually try to back up my BD collection and watch the stats - most of my stuff is ripped with AnyDVD to folder structure with original menu maintained and extras removed - a pretty common method I think? I wonder if CrashPlan reports somewhere how much data needs to be sent to the server? Obviously we know how much data is in the backup, but how about the actual amount of data that was (or needs to be) uploaded? Will have to poke around and see...

@rcohen (or anyone else who backs up their discs to CrashPlan), have you ever tried to validate how much was actually de-duped?
I don't know whether it dedups across users, but when I make a small modification to a large file (like changing tag data on a movie), CrashPlan will do a fast, incremental upload.

Usually, when I see the term Single Instance Storage, it refers to file-level dedup, but CrashPlan appears to have block-level dedup, which is nice.

It would be interesting to see what happens if two users upload the same file. Cross-user dedup would certainly have benefits for both the user and the cost of running the service.

Incidentally, I have found that dedup in ZFS is typically not great in practice, despite a lot of potential and fascinating technology.
rcohen is offline  
post #85 of 92 Old 07-14-2014, 05:04 PM
AVS Special Member
 
ajhieb's Avatar
 
Join Date: Jul 2009
Posts: 1,598
Mentioned: 7 Post(s)
Tagged: 0 Thread(s)
Quoted: 432 Post(s)
Liked: 434
Quote:
Originally Posted by rcohen View Post
I don't know whether it dedups across users, but when I make a small modification to a large file (like changing tag data on a movie), CrashPlan will do a fast, incremental upload.

Usually, when I see the term Single Instance Storage, it refers to file-level dedup, but CrashPlan appears to have block-level dedup, which is nice.

It would be interesting to see what happens if two users upload the same file. Cross-user dedup would certainly have benefits for both the user and the cost of running the service.

Incidentally, I have found that dedup in ZFS is typically not great in practice, despite a lot of potential and fascinating technology.
Hmmm... that's interesting. Makes sense though. Earlier I was focused on the inter-user de-dup which is why I didn't think it was really feasible to do block-level prior to upload, but it makes perfect sense to check and see what has been uploaded already for a given user on the block-level. Also explains the CrashPlan memory usage.

It's all coming into focus now

RAID protection is only for failed drives. That's it. It's no replacement for a proper backup.
ajhieb is online now  
post #86 of 92 Old 07-14-2014, 05:08 PM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by ajhieb View Post
Hmmm... that's interesting. Makes sense though. Earlier I was focused on the inter-user de-dup which is why I didn't think it was really feasible to do block-level prior to upload, but it makes perfect sense to check and see what has been uploaded already for a given user on the block-level. Also explains the CrashPlan memory usage.

It's all coming into focus now
It would just need to upload the hashes to the server prior to uploading the data, so that the server could determine which blocks need to be uploaded, rather than the client.
rcohen is offline  
post #87 of 92 Old 07-14-2014, 05:19 PM
AVS Special Member
 
ajhieb's Avatar
 
Join Date: Jul 2009
Posts: 1,598
Mentioned: 7 Post(s)
Tagged: 0 Thread(s)
Quoted: 432 Post(s)
Liked: 434
Quote:
Originally Posted by rcohen View Post
It would just need to upload the hashes to the server prior to uploading the data, so that the server could determine which blocks need to be uploaded, rather than the client.
Well, conceivable the hashes could be stored locally so the server wouldn't need to do anything from the standpoint of single user de-dup.

Cross-user would need to uplaod the hashes for comparison, but I still don't think it would be feasible to do considering the amount of data they would have to compare. I wouldn't be surprised to find out they actually have terabytes of hash data. (if they're even doing cross-user de-dup)

RAID protection is only for failed drives. That's it. It's no replacement for a proper backup.
ajhieb is online now  
post #88 of 92 Old 07-14-2014, 05:23 PM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
Quote:
Originally Posted by rcohen View Post
It would just need to upload the hashes to the server prior to uploading the data, so that the server could determine which blocks need to be uploaded, rather than the client.
That's exactly how I was thinking it worked when you first mentioned it too. Looks like we are all on the same page now...

I did know it was smart enough to only upload the changes at a block level for a single-user - that was also one of my key criteria for selecting. As, even for my personal photos and videos (which I have about 146GB of), the metadata does change frequently. So yeah, absolutely critical that everything didn't get re-uploaded if a change happened at the file level.

So now the question is do they do this across users. I thought I read somewhere that Amazon's cloud service did this, which is where my original logic came from. If CrashPlan doesn't work this way, anyone want to start a cloud backup service?
nvmarino is offline  
post #89 of 92 Old 07-14-2014, 05:43 PM
Member
 
nvmarino's Avatar
 
Join Date: Nov 2006
Posts: 139
Mentioned: 1 Post(s)
Tagged: 0 Thread(s)
Quoted: 14 Post(s)
Liked: 15
I am also just now realizing in the case of CrashPlan cross-user de-dup could ONLY work if the decision were made at the client before the data was uploaded, as every backup is encrypted with a unique key. The original way I was thinking of de-dup was that it was intended to save storage space on the server side (everyone uploads all their data the first time, the server side would keep some master DB of all unique blocks and remove all duplicates).

Edit:

And it just dawned on me that even if done on the server side, how the hell do you restore if everyone's backup is encrypted individually? Based on that it doesn't seem possible to do cross-user de-dup, even on the client side.

Sorry for all the thinking out loud...

Last edited by nvmarino; 07-14-2014 at 05:50 PM.
nvmarino is offline  
post #90 of 92 Old 07-14-2014, 05:53 PM
AVS Special Member
 
rcohen's Avatar
 
Join Date: Feb 2005
Posts: 1,264
Mentioned: 5 Post(s)
Tagged: 0 Thread(s)
Quoted: 287 Post(s)
Liked: 101
Quote:
Originally Posted by nvmarino View Post
Edit:

And it just dawned on me that even if done on the server side, how the hell do you restore if everyone's backup is encrypted individually? Based on that it doesn't seem possible to do cross-user de-dup, even on the client side.
That might be one of the killers for cross-user dedup. Unique encryption per block would create other problems.

The size of the hash table, random distribution, and performance requirements is also a tough problem.

Maybe they got in a corner on that and punted. I wonder if that happened before or after they got their funding. 8O

I think you may be right that per-user storage encryption would need to be cut in order to provide cross-user dedup. The hash table issue seems solvable...just complicated. Maybe just charge different prices for different levels of data security.

Last edited by rcohen; 07-14-2014 at 05:59 PM.
rcohen is offline  
Reply Home Theater Computers
Gear in this thread - 180GB by PriceGrabber.com

Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off