Categories
First Category

Recover from disk failure in Amahi

Brief overview: this is about Linux Home Server, called Amahi, which I’m trying to figure out whether it suits my needs and I can migrate to it from Windows Home Server. Currently, the server is installed in VIrtualBox.

The task: simulate an unrecoverable hard disk failure of one of non-system disks. E.g. one of the disks that contain your data and participate in mirroring, managed by Greyhole.

Environment:

Brief overview: this is about Linux Home Server, called Amahi, which I’m trying to figure out whether it suits my needs and I can migrate to it from Windows Home Server. Currently, the server is installed in VIrtualBox.

The task: simulate an unrecoverable hard disk failure of one of non-system disks. E.g. one of the disks that contain your data and participate in mirroring, managed by Greyhole.

Environment:

VirtualBox with Fedora 12 and Amahi on top.
3 hard disks attached to the virtual machine. One is system disk, second and third contain user data, managed by Greyhole.
MediaWiki installed – just to check what happens to applications.
Scenario:

Perform hard-reset of the system, simulating power failure
Remove the third disk
Recover the system
Add replacement disk
So now the story. After I shut the system down and removed the third hard disk, I started the system and got not-so-nice error during boot – system couldn’t find device and cannot continue to load. I was asked to fix the problems or reboot. Since rebooting doesn’t help, I understood that recovering will not be easy, or at least not automatic. I actually hoped that system can recognize the missing disk, warn, but continue. Remember, I‘m not talking about the disk where the system is installed, it is just one of the data disks. Because I’m not a Linux pro, more like newbie (the only Linux command I always remember is “dir”, since it appears in MSDOS as well ), I had to dig and find what are the steps to recover the system and let it continue. Turns out, there is a special file which contains all devices to mount on start up and all I need is to edit it and remove the line with missing drive. Here are the steps:

1. After the system started, you will be notified about missing device and boot sequence will stop with command prompt, asking you to fix the problems:

Type your root password to get to console.

2. The root file system is most likely mounted as read-only, so we need to remount it, as we are going to change one of the system files. Do this by typing the following command:
mount –n –o remount /

3. Open “/etc/fstab” file for editing, with following command:
nano –Bw /etc/fstab

In my case, the missing drive is shown in the last line.
-Bw switch tells Nano to create a backup copy when you save the file. Just in case.

4. Find the line with your missing drive and remove it completely. Hit Control+O to save your changes and Control-X to exit editing.
Note, if you get an error saying something like “Cannot write file, the system is read-only”, it means the previous command didn’t work, exit the editor and try it again. Don’t miss the trailing slash – this is the root file system path.

5. Hit Control+D to restart the system. It should boot properly now. At least we have a running system again, so let’s continue fixing it and adding a disk replacement.

6. Start LVM (Logical Volume Management). It can be found in System->Administration. You should see your failed disk as “unknown device” in the tree:

This is not good and we need to repair it.

6. CAUTION! BE VERY CAUTIOS IN THIS STEP! YOU MAY CAUSE A LOSS OF DATA IF YOU REMOVE THE WRONG VOLUME! You’ve been warned.
First, we should remove the logical volume. In my case, the volume that is pointing to physical failed device is “lv_data1”. In your case it may be something else, figure it out and delete it, by selecting it in the tree and clicking “Remove Logical Volume”.

7. Now we need to remove the physical drive. Start console and su to root (e.g. type “su” in the command line, without quotas, then your root password”). Type the following command, which will remove missing devices from the system:
vgreduce –removemissing vg_hda
Change “vg_hda” to your volume group name, which contains the missing device.
Reload LVM (View->Reload) and you should not see any more “unknown device”s anymore. Our system is fully repaired:
8. Next step, is to install a replacement disk. If you don’t have it, just stop here, as there is nothing more to do at this point.
Shutdown the system, insert your new disk and start again as usual. You can follow the guide posted on Amahi’s Wiki here, but it takes the path of command line, which I don’t really like, so if you want to do everything with UI, continue reading.

9. You will see your new drive in “Uninitialized Entities” group. Go ahead, select it and hit “Initialize Entity”:

The drive will be moved to Unallocated Volumes group:

Hit “Add to Existing Volume Group”, select your group and add it. Now our group has expanded with new unused space:
Select “Logical View” and hit “Create New Logical Volume”. A dialog for adding new volume will appear. Fill in the details and remember to mount your new volume somewhere under “/var/hda/files”. In my case, I mount it in “/var/hda/files/drives/sdc1”:

Select file system (Ext4) and check both Mount and Mount when rebooted and click OK. This may be a lengthy operation, so be patient.

10. Go to your HAD, by navigating to http://hda and add your new volume into storage pool as usual. Check the configuration of each folder’s pool and we are done.

11. Optionally, you may want to force Greyhole to resynchronize all data and copy it wherever needed by executing the following command in console:
greyhole –fsck

That’s it, we are done! And please remember, I’m a total noob in Linux, so if you find any issue in what I wrote above, feel free to post about it in comments.

Leave a Reply

Your email address will not be published. Required fields are marked *