Monday, December 27, 2010

A Data Recovery Story, Part II

Where we last left off, I had a dead hard drive on my hands. It was part of a three drive combination which together contained a treasure trove of irreplaceable photos, documents, code, and work. Physical electrical damage had fried a protection diode on the drive's logic board and destroyed a servo control chip.

The plan: obtain a replacement logic board for the failed hard drive.

The first place I looked was eBay. There are several people out there selling logic boards or hard drives for exactly this purpose. I had no luck, however. After a couple of false leads with companies which claimed to have the drive (down to the model number and firmware revision) fell through, I was at my wits end and began soliciting quotes from data recovery firms, bracing myself for the $1,000+ numbers that trickled in.

Along the way, I found a company which specializes in hard drive logic boards: PCBSolution. Over the course of a few emails, they offered a very interesting alternative -- utilizing a physically identical controller board, and performing a firmware transfer from the damaged board to retain any/all factory calibrations, etc. used to encode the data in the disabled drive!

For $49 plus shipping (< 5% of data recovery service quotes), it was a steal to give it a shot. I shipped my board off to Canada, and within two days of receipt I received notice that the firmware transfer was successful and a replacement board was coming back to my home.

In the meantime, I constructed a replacement Linux server using new large disk drives. With approximately 3 TB of disk space accessible, I would be ready to image the drives and subsequently rescue their data. The usual rule of thumb is that if you can get the disk image, then there's some way to make Linux play nice with it. :)

Usually, the way I do this is with the trusty dd utility. For example,

#dd if=/dev/sda of=/path/to/image/file.img

would make a faithful reproduction of the contents of the disk addressed at /dev/sda into an image file named file.img.

However, in my case this trusty recipe failed. After several hours, the read would abort with an I/O error. This is likely because the disks I was imaging were likely damaged by the same power fault that took out the host system, motherboard, and ancillary components. What's galling is that most of the disk arrays were empty space. dd was giving up the read on a 750 GB disk because it could not read a single 512K sector!

It turns out that a specialized tool exists for exactly this situation: ddrescue, from the FSF. It is designed to recover as much data as possible from a (possibly failing) device and revisit pesky regions later. Example:

#ddrescue /dev/sda /path/to/image/file.img rescue_log

This creates the same net file.img from the failing /dev/sda, but keeps track of its progress in the human-readable rescue_log in the working directory. After the first pass, optional direct access / retry attempts can be made by adding the -d and -r flags, respectively.

After my primary imaging passes on the two available disks, I ran SpinRite on them to work out the problem sectors. (I had the time, and it took the better part of a week!)

Ultimately, by using ddrescue in combination with SpinRite I was able to recover ~1.5 TB of raw disk image from the first two disks of the damaged LVM set. I failed to recover less than 100 kB due to unrecoverable bad sectors, etc. resulting from physical damage to the devices.

When I next revisit this topic, the story will pick up when I received the replacement logic board in the mail. :)

No comments:

Post a Comment