data recovery for RAID - AVS Forum
Forum Jump: 
 
Thread Tools
Old 07-28-2012, 05:42 AM - Thread Starter
Senior Member
 
hehe299792458's Avatar
 
Join Date: Dec 2006
Posts: 459
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 10
In the middle of a rebuilding process, one of the drives in my RAID 6 array decided to break down and take the entire array with it. The drive is encountering multiple SMART errors. Please see pictures. Any ideas on how to remedy it? The drive is a Western Digital RE4-GP 2GB 2002FYPS drive with firmware 04.05G05


hehe299792458 is offline  
Sponsored Links
Advertisement
 
Old 07-31-2012, 03:48 AM - Thread Starter
Senior Member
 
hehe299792458's Avatar
 
Join Date: Dec 2006
Posts: 459
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 10
A few more details.... Actually, I still don't have all the information, but I have a much better picture.

Here's my setup:

OS: Windows Server 2008 R2

HDD: Western Digital 2002FYPS

Raid Controller: 3ware 9650SE 24ML

Array: RAID 6 (14 HDDs)

Here are the error logs from my RAID controller.
https://dl.dropbox.com/u/10737837/lsi.Win2k8R2.HOMESERVER.072412.10704.zip
https://dl.dropbox.com/u/10737837/errorlog_0.dat


A few nights ago, the array reported that there was a problem with two of the drives (drives 0 and 5). I'm not sure what the exact error was. I was in the middle of a relatively large transfer (~50GB). All of a sudden, my system froze for about 2 hours, and after that, I managed to do a normal restart. The controller does this sometimes - kicks out two drives randomly and then proceeds to rebuild them.

When the system started, I checked 3DM2 (the raid controller software). It said that the array was degraded and proceeded to automatically rebuild the array. Everything was fine until the rebuild process hit 14%. Then I received several errors concerning drive 3 (which is strange, because the two drives that were dropped are 0 and 5). While the rebuilding process is still listed as "active" under 3DM2, it hasn't progressed at all overnight.

I took drive 3 out of the array and ran the WD diagnostics software on it:
original.png



I also tried reading the SMART data via HD Tune:
original-1.png
hehe299792458 is offline  
Old 07-31-2012, 05:57 AM
AVS Addicted Member
 
arnyk's Avatar
 
Join Date: Oct 2002
Location: Grosse Pointe Woods, MI
Posts: 14,387
Mentioned: 2 Post(s)
Tagged: 0 Thread(s)
Quoted: 763 Post(s)
Liked: 1178
Quote:
Originally Posted by hehe299792458 View Post

A few more details.... Actually, I still don't have all the information, but I have a much better picture.
Here's my setup:
OS: Windows Server 2008 R2
HDD: Western Digital 2002FYPS
Raid Controller: 3ware 9650SE 24ML
Array: RAID 6 (14 HDDs)

A few nights ago, the array reported that there was a problem with two of the drives (drives 0 and 5). I'm not sure what the exact error was. I was in the middle of a relatively large transfer (~50GB). All of a sudden, my system froze for about 2 hours, and after that, I managed to do a normal restart. The controller does this sometimes - kicks out two drives randomly and then proceeds to rebuild them.
When the system started, I checked 3DM2 (the raid controller software). It said that the array was degraded and proceeded to automatically rebuild the array. Everything was fine until the rebuild process hit 14%. Then I received several errors concerning drive 3 (which is strange, because the two drives that were dropped are 0 and 5). While the rebuilding process is still listed as "active" under 3DM2, it hasn't progressed at all overnight.
I took drive 3 out of the array and ran the WD diagnostics software on it:


IME the general answer is that you take the array down, take off the existing drives, add the replacement hardware (2 new drives), bring the array back up and mark the replacement drives as hot spares. Then take the array down, add back in the complete old array, reboot and let the controller do its stuff.
arnyk is offline  
Old 07-31-2012, 07:54 PM
Senior Member
 
leewcraft's Avatar
 
Join Date: Oct 2002
Location: Nashville, TN
Posts: 446
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 10
If your controller is flagging random drives as failed, I think you probably have a bad RAID controller (or backplane if they attach to a backplane board rather than being cabled directly to the controller). I think you should try either replacing that part or maybe seeing if there's a firmware upgrade available before doing any RAID array reconfiguration. Your array may be fine if the controller is the issue, but you can definitely mess up the array and lose your data if you do the wrong things with adding/removing drives and reconfiguring the array.
leewcraft is offline  
Old 08-01-2012, 08:55 AM
AVS Special Member
 
BizarroTerl's Avatar
 
Join Date: May 2001
Location: San Jose, CA USA
Posts: 1,825
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 27 Post(s)
Liked: 38
This is the main reason I went with Unraid. No matter how bad it gets, you can only lose the data on the drives that are bad.

Leewcraft's advice should be followed. You're in a precarious situation and one wrong step can have severe consequences.
BizarroTerl is offline  
Old 08-01-2012, 11:09 AM
Senior Member
 
csmith's Avatar
 
Join Date: Apr 2001
Location: Charlotte,NC
Posts: 292
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 9 Post(s)
Liked: 11
As Bill Clinton use to say "I feel your pain".

Note leewcraft's post, if your third drive is still available when you boot up your system again, you might be able to finish the re-build before it totally fails. I've never had two drives fail at the same time with my SATA RAID6 system (Areca card), so I don't know if the re-build time is longer if you replace both or just one. The problem with these large size HDs, it takes much longer to re-build if you have 10TB+ in one RAID set, which makes you vulnerable to another failure and loss of the RAID6 when the third drive fails.

Having SCSI drive RAID5 systems with 250GB drives in old days seems like a blessing, also had a lower hard drive failure rate. I won't be going above 12TB in a RAID6, once I found that re-building took 2 days.
csmith is offline  
 
Thread Tools


Forum Jump: 

Posting Rules  
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off