Recover from disk failure in Amahi

Brief overview: this is about Linux Home Server, called Amahi, which Iā€™m trying to figure out whether it suits my needs and I can migrate to it from Windows Home Server. Currently, the server is installed in VIrtualBox.

The task: simulate an unrecoverable hard disk failure of one of non-system disks. E.g. one of the disks that contain your data and participate in mirroring, managed by Greyhole.

Environment:

VirtualBox with Fedora 12 and Amahi on top.
3 hard disks attached to the virtual machine. One is system disk, second and third contain user data, managed by Greyhole.
MediaWiki installed ā€“ just to check what happens to applications.
Scenario:

Perform hard-reset of the system, simulating power failure
Remove the third disk
Recover the system
Add replacement disk
So now the story. After I shut the system down and removed the third hard disk, I started the system and got not-so-nice error during boot ā€“ system couldnā€™t find device and cannot continue to load. I was asked to fix the problems or reboot. Since rebooting doesnā€™t help, I understood that recovering will not be easy, or at least not automatic. I actually hoped that system can recognize the missing disk, warn, but continue. Remember, Iā€˜m not talking about the disk where the system is installed, it is just one of the data disks. Because Iā€™m not a Linux pro, more like newbie (the only Linux command I always remember is ā€œdirā€, since it appears in MSDOS as well ), I had to dig and find what are the steps to recover the system and let it continue. Turns out, there is a special file which contains all devices to mount on start up and all I need is to edit it and remove the line with missing drive. Here are the steps:

1. After the system started, you will be notified about missing device and boot sequence will stop with command prompt, asking you to fix the problems:

Type your root password to get to console.

2. The root file system is most likely mounted as read-only, so we need to remount it, as we are going to change one of the system files. Do this by typing the following command:
mount ā€“n ā€“o remount /

3. Open ā€œ/etc/fstabā€ file for editing, with following command:
nano ā€“Bw /etc/fstab

In my case, the missing drive is shown in the last line.
-Bw switch tells Nano to create a backup copy when you save the file. Just in case.

4. Find the line with your missing drive and remove it completely. Hit Control+O to save your changes and Control-X to exit editing.
Note, if you get an error saying something like “Cannot write file, the system is read-onlyā€, it means the previous command didnā€™t work, exit the editor and try it again. Donā€™t miss the trailing slash ā€“ this is the root file system path.

5. Hit Control+D to restart the system. It should boot properly now. At least we have a running system again, so letā€™s continue fixing it and adding a disk replacement.

6. Start LVM (Logical Volume Management). It can be found in System->Administration. You should see your failed disk as ā€œunknown deviceā€ in the tree:

This is not good and we need to repair it.

6. CAUTION! BE VERY CAUTIOS IN THIS STEP! YOU MAY CAUSE A LOSS OF DATA IF YOU REMOVE THE WRONG VOLUME! Youā€™ve been warned.
First, we should remove the logical volume. In my case, the volume that is pointing to physical failed device is ā€œlv_data1ā€. In your case it may be something else, figure it out and delete it, by selecting it in the tree and clicking ā€œRemove Logical Volumeā€.

7. Now we need to remove the physical drive. Start console and su to root (e.g. type ā€œsuā€ in the command line, without quotas, then your root password”). Type the following command, which will remove missing devices from the system:
vgreduce ā€“removemissing vg_hda
Change ā€œvg_hdaā€ to your volume group name, which contains the missing device.
Reload LVM (View->Reload) and you should not see any more ā€œunknown deviceā€s anymore. Our system is fully repaired:
8. Next step, is to install a replacement disk. If you donā€™t have it, just stop here, as there is nothing more to do at this point.
Shutdown the system, insert your new disk and start again as usual. You can follow the guide posted on Amahiā€™s Wiki here, but it takes the path of command line, which I donā€™t really like, so if you want to do everything with UI, continue reading.

9. You will see your new drive in ā€œUninitialized Entitiesā€ group. Go ahead, select it and hit ā€œInitialize Entityā€:

The drive will be moved to Unallocated Volumes group:

Hit ā€œAdd to Existing Volume Groupā€, select your group and add it. Now our group has expanded with new unused space:
Select ā€œLogical Viewā€ and hit ā€œCreate New Logical Volumeā€. A dialog for adding new volume will appear. Fill in the details and remember to mount your new volume somewhere under ā€œ/var/hda/filesā€. In my case, I mount it in ā€œ/var/hda/files/drives/sdc1ā€:

Select file system (Ext4) and check both Mount and Mount when rebooted and click OK. This may be a lengthy operation, so be patient.

10. Go to your HAD, by navigating to http://hda and add your new volume into storage pool as usual. Check the configuration of each folderā€™s pool and we are done.

11. Optionally, you may want to force Greyhole to resynchronize all data and copy it wherever needed by executing the following command in console:
greyhole ā€“fsck

Thatā€™s it, we are done! And please remember, Iā€™m a total noob in Linux, so if you find any issue in what I wrote above, feel free to post about it in comments.

Leave a comment

Your email address will not be published. Required fields are marked *