Monday, November 22, 2010

A Data Recovery Story, Part I

or: Why I'm Still Married. :)

As my last post alluded, my home server died in mid-October as the result of a power supply failure. This turned out to be a much bigger recovery project than I had anticipated. In my prior experience, if a power supply unit failed, it might take out the motherboard and peripheral cards, but never hard drives. Maybe I was just lucky, but this time my luck ran out.

I  built a new system with a new power supply, motherboard, processor, and RAM. (In fact, I had to pilfer RAM from my main computer, as these modern motherboards no longer accept PC-3200...) I connected everything up and powered on. It turns out that two of the five drives in the machine failed to show up in the BIOS.

And one of those two drives was a LVM physical volume that was a critical component of our file server's entire storage area. Without all three physical volumes present and accounted for, its logical storage volume could not be created and Linux subsequently failed to boot. I knew I was in big trouble.

Inspection of the un-recognized drive made it immediately apparent as to why it wasn't working: the drive logic board's PCB had suffered damage from the PSU failure.

Damaged hard drive. Note blown diode (lower left) and motor driver controller (upper right).
A voltage-limiting protection diode had served it's purpose and fried itself. Unfortunately, it didn't do it fast enough to prevent the motor driver logic chip to fry itself, either. The drive was completely out of commission.

Dread began to set in. This storage volume contained photos and documents which were not backed up anywhere else. Lots of personal documents, archival stuff from college, etc. Should it have been backed up? Yes. The problem is that my hobbyist server slowly transitioned into a mission-critical system with a large networked hard drive... and now it was holding data captive that, if lost, would be a tremendous personal loss to myself and my wife.

Not only was the situation now out of my league, it was going to be expensive to attempt recovery. Data recovery firms I polled started around $1,000 to evaluate the drive, with no guarantee of success.

I knew that there was one last likely attempt in the bag of data recovery tricks: a logic-board swap. The trick is to find an identical make / model / firmware revision of the damaged drive and use its controller on the damaged drive. The difficulty is finding that exact match.

In my case, the drive was several years old and manufactured by Seagate. As I learned, Seagate has a wide variety of firmware versions in use which makes it extremely difficult to purchase an exact match. To some extent, drives are calibrated in the factory as well. (This is becoming increasingly common.)

Things didn't look good, but I had my task. And in the end, there was success. I'll describe how that worked out in a future post -- this one's clearly getting too long as it it!