Friday, December 31, 2010

A Data Recovery Story, Part III

Backstory: Part I, Part II

At this point, a cloned hard disk drive logic board utilizing factory calibrations and firmware programming rescued from my damaged drive was on its way back to my home.

I constructed a new Linux box with appropriate storage capacity, made images using ddrescue of the two failed drives in the LVM volume group, and awaited delivery.

When the time came, I carefully transferred the recovered logic board and placed it onto the failed disk drive. This was simply accomplished with a standard Torx screwdriver set. I hooked up the drive to the recovery server, crossed my fingers, and powered on.

The next few seconds would determine whether the gamble worked, or whether I'd have to resort to an expensive data recovery service.

To my great relief, the drive spun up. It registered with the BIOS, and was accessible to the recovery system!

Recovery imaging with ddrescue began immediately. With it, I then had (in principle) full access to the contents of all three physical volumes as binary image files.

Emergency imaging of a 400 GB drive with PCBSolution firmware transferred replica logic board (L) to a fresh SATA drive.
After the imaging was complete, I shut down the server and removed the repaired drive from operation while recovery operations were taking place. As it happens, it was the last time that the drive ever needed to be powered on.

Normally, given a dd disk image of a partition containing a file system, one can simply mount the image using a loopback device and recover its contents.

Recovery of a partition is simple when the file system (blue) resides on a single partition of a single drive (orange). A dd image of the partition (green) may be saved and directly mounted for file system access.

This was not possible in my scenario, as the file system of interest was residing within a LVM logical volume, itself located in a volume group comprised of dedicated LVM partitions spanning three physical volumes.

Recovery scenario. The file system of interest lies within a LVM logical volume (grey) spanning three physical volumes. ddrescue images of the drives containing the physical volume partitions exist.

Given this topology, my data recovery approach needed a bit more work. In the end, I found helpful hints from the Fedora forums, which I summarize here:

1. Create separate block-level loopback devices mapped to the ddrescue full disk image files:

#losetup -f /path/to/disk1.img
#losetup -f /path/to/disk2.img
#losetup -f /path/to/disk3.img

In my case, this created loopback devices loop0, loop1, and loop2. (You can find out the mappings via losetup -a.)

2. Scan the partition tables from the three loopback-mapped devices and create partition-level device mappings:

#kpartx -a /dev/loop0
#kpartx -a /dev/loop1
#kpartx -a /dev/loop2

This adds partition-level access at /dev/mapper.

3. Instruct LVM to re-scan for physical volumes:

#pvscan

This will detect the presence of the LVM volume group contained within the disk images.

4. Activate the volume group contained within the new physical volumes:

#vgchange -a y

After following these steps, the logical volumes were fully accessible at their usual location at /dev/mapper. Following a read-only mount of the system, I was ready to recover with rsync!

Several hours later, the data was entirely rescued. A great weight lifted from my shoulders -- and it was time to come back out of the doghouse! :)

I consider myself very lucky, and regret having found myself in this scenario. So, if there is a moral to this story, it's to have good, automated backups of your critical data. Preferably, an offsite backup too -- it's not clear that the mystery power event which destroyed my server (which was on a UPS!) wouldn't have affected other components on the same subcircuit.

I'll be describing my backup system sometime in the future, but for now, it's time to sign off.

Tuesday, December 28, 2010

AirPlay Fun

Those that know me know that I am not an Apple fan by nature. (I have my grudges against OS X due to my preference for Linux.)

However, since getting an iPhone in May for my graduation birthday, I've been enjoying some of the features that it offers.

In particular, for Christmas this year, I received an AirPort Express. It lets me stream music from my home computer or iPhone directly to my home stereo system. (Thank you, Remote App!)

Now that it's working, I'm very happy. It's a great implementation, and my hat's off to Apple for doing it.

I did have some problems with my wireless connectivity, though, which I thought were worth mentioning to those who may be searching for solutions.

In particular, while configuring Remote and attempting to pair it to my home iTunes library, there comes a step where the remote shows up as a 'Device' in the iTunes application. Clicking on it brings you to a validation screen where one enters a four-digit passcode shown on the iPhone running Remote. My machines were all hanging at 'Verifying remote passcode...' and failing to properly pair.

It was not a machine-specific thing; rather, due to my home wireless connection. It does not manage my home Internet connection, but provides wireless access to my LAN. I reconfigured the router to act solely as an access point, which immediately allowed Remote to pair to iTunes on my home computers.

While that solution may not work for everyone, I'm pretty glad that I found something that worked!

Monday, December 27, 2010

A Data Recovery Story, Part II

Where we last left off, I had a dead hard drive on my hands. It was part of a three drive combination which together contained a treasure trove of irreplaceable photos, documents, code, and work. Physical electrical damage had fried a protection diode on the drive's logic board and destroyed a servo control chip.

The plan: obtain a replacement logic board for the failed hard drive.

The first place I looked was eBay. There are several people out there selling logic boards or hard drives for exactly this purpose. I had no luck, however. After a couple of false leads with companies which claimed to have the drive (down to the model number and firmware revision) fell through, I was at my wits end and began soliciting quotes from data recovery firms, bracing myself for the $1,000+ numbers that trickled in.

Along the way, I found a company which specializes in hard drive logic boards: PCBSolution. Over the course of a few emails, they offered a very interesting alternative -- utilizing a physically identical controller board, and performing a firmware transfer from the damaged board to retain any/all factory calibrations, etc. used to encode the data in the disabled drive!

For $49 plus shipping (< 5% of data recovery service quotes), it was a steal to give it a shot. I shipped my board off to Canada, and within two days of receipt I received notice that the firmware transfer was successful and a replacement board was coming back to my home.

In the meantime, I constructed a replacement Linux server using new large disk drives. With approximately 3 TB of disk space accessible, I would be ready to image the drives and subsequently rescue their data. The usual rule of thumb is that if you can get the disk image, then there's some way to make Linux play nice with it. :)

Usually, the way I do this is with the trusty dd utility. For example,

#dd if=/dev/sda of=/path/to/image/file.img

would make a faithful reproduction of the contents of the disk addressed at /dev/sda into an image file named file.img.

However, in my case this trusty recipe failed. After several hours, the read would abort with an I/O error. This is likely because the disks I was imaging were likely damaged by the same power fault that took out the host system, motherboard, and ancillary components. What's galling is that most of the disk arrays were empty space. dd was giving up the read on a 750 GB disk because it could not read a single 512K sector!

It turns out that a specialized tool exists for exactly this situation: ddrescue, from the FSF. It is designed to recover as much data as possible from a (possibly failing) device and revisit pesky regions later. Example:

#ddrescue /dev/sda /path/to/image/file.img rescue_log

This creates the same net file.img from the failing /dev/sda, but keeps track of its progress in the human-readable rescue_log in the working directory. After the first pass, optional direct access / retry attempts can be made by adding the -d and -r flags, respectively.

After my primary imaging passes on the two available disks, I ran SpinRite on them to work out the problem sectors. (I had the time, and it took the better part of a week!)

Ultimately, by using ddrescue in combination with SpinRite I was able to recover ~1.5 TB of raw disk image from the first two disks of the damaged LVM set. I failed to recover less than 100 kB due to unrecoverable bad sectors, etc. resulting from physical damage to the devices.

When I next revisit this topic, the story will pick up when I received the replacement logic board in the mail. :)