You are here
Almost a year ago I had a mediawiki appliance on a server which got relocated to my company's corporate headquarters. The host died. Those who had physical access to the box concluded that the RAID1 array had completely died, or maybe the RAID controller card, or who knows what. It took awhile, but I finally got the server, and after a few months of being busy with other stuff, I got one disk mounted and found the VMDK file No clue what happened to the rest of the VM. I grabbed a VMX from a Turnkey Redmine appliance, changed the references in the VMX, added the VM to inventory, removed the VMDK and re-added it. And now my VM is resurrected.
So, I load it up and it comes to an error " /bin/sh: can't access tty: job control turned off" and dumps me off at the terminal prompt. The top of the console window says:
Target filesystem doesn't have requested /sbin/init.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... mount: mounting /dev on /root/dev failed: No such file or directory
done.
No init found. Try passing init= bootarg.
Then several lines starting with [ 52.######] and then BusyBox v1.22.1 (Debian 1:1.22.0-9+eb8u1) built-in shell (ash)
So my question: what are my options now? The webui doesn't load up. Maybe there's a way I could back up the database and copy it over the LAN to extract on a newly deployed appliance? Maybe I need to deploy an older version and copy my VMDK in? I think I'm close, just don't know enough about the filesystem to know how to restore the data.
Sounds like data corruption
First up, the easiest way to recover this would be restore from backup... So if you'd been using TKLBAM (or some other automated remote backup tool), you could start a new server of the same version and simply restore your most recent backup. Bam, done! If you have that set up, then skip this post and just do that! :)
If you don't have a backup, read on, but please make sure you do set up automated backups once it's up and running. Remember data loss due to hardware (or other) failure is a matter of when; not if! Data that isn't backed up should be considered living on borrowed time!
If you don't have a backup to restore from, what I would do first is take a copy of the VMDK and work with the copy from here on. Then it won't matter what you do as worst case, you can trash it and start again with a new copy (and you still have the untouched original).
Then after that, there are a ton of factors to consider and the process is a little hard to describe exactly from memory. Especially without being able to look over your shoulder. But I'll do my best... :)
Next step would be to boot the machine with a live Linux ISO of some description. Any relatively recent distro would do the job, even a TurnKey ISO would do. Although if you need to dig deeper, then perhaps something built for purpose may be better? E.g. SystemRescueCD (download page is here). Keep in mind though that if not using TurnKey, then you likely won't be root, so many/most of these commands, may need to be prefixed with sudo.
So boot from the ISO and run a live session. If it asks if you want to mount drives/partitions (TurnKey won't but others might), select no. Then you'll need to find what the drive is. The disk should be named something like /dev/sdX; where X is a letter (i.e. a-z; but most likely a, b or c). And partitions are numbered 1-n (e.g. 1, 2, 3, etc). You can look at all available drives using this:
If that shows too much info, you can look at a specific disk by appending it's name at the end. E.g.:
IIRC our VM builds are generally installed with LVM. Older ones have a separate boot partition (of type 'extN' - where 'N' is a number between 2 & 4), whereas newer ones have the boot directory inside the LVM. So assuming you have an old one, you'll want to check the filesystem of both the /boot partition (/dev/sdX1 e.g. /dev/sda1) and the LVM (where the root filesystem is). If you have a newer one, then you can skip the first bit regarding running fsck on "/dev/sdX1" (where X is a letter) and go straight to the LVM related commands further down.
Note that the system will not let you run fsck on a mounted volume, so if the live ISO automounts the volumes (and fsck complains it's mounted) you will need to unmount them (with the 'umount /dev/sdXn' command). If it complains that it's busy, then it's likely the wrong disk.
Then run fsck. E.g. to run fsck on /dev/sda1
If that reports errors (or that the disk is clean) then re-run with these options (to force a check and to fix any errors found). Remember that /dev/sda1 is just an example, it may be /dev/sdb1, etc.:
Then assuming that completes successfully, and my memory serves me correctly regarding the root filesystem being on LVM, you'll need to do some more work to get that ready to check. The following commands should allow you to find the right one (the bits after the # are just comments, you don't need to include them, but can safely copy/paste them if it's easier).
If you get stuck or find conflicting info, please feel free to post back for clarification.
Once you've done that, you'll likely need to reinstall grub (the bootloader). That may be a little trickier to set up and do, but should be something along the lines of this:
Assuming that all goes to plan (fingers crossed), your system will now boot... If not, then you'll need to do your own research, but it's likely corrupted beyond repair. You may be able to recover stuff if it's not too damaged, but that's well beyond the scope of what I can provide here...
Hopefully the above does the trick. FWIW, you should find plenty of info about these commands online (they're pretty generic Linux commands). If you keep in mind that TurnKey is based on Debian (which Ubuntu is also based on) then you should find plenty of relevant info.
And last, but certainly not least: set up automated daily backups!!!. Ideally they should be stored offsite. TKLBAM is an ideal tool for the job, but there are plenty of alternate options...
Well, plug noted, and quite
Well, plug noted, and quite apropo. There were no backups of this VM, which I would've turned to if I had the option. I use ghettoVCB on my systems to back everything up nightly, locally to a NAS running RAID1, and one drive gets pulled, replaced and taken off-site somewhat regularly. Unfortunately, this particular host was taken out of my hands, and it didn't get returned until it was broke, sans any backups, and so I could throw some choice words around at the bar later if you'd like a rant. I did make a backup of the VMDK in question once I figured out how to mount the controller card and try each drive until I found one that worked. (I have many of the same mantras, too... Redundancy is not a backup, snapshots are not backups, application redundancy is not a replacement for virtual layer redundancy, etc etc.) So yep, same page thus far. I have considered TKLBAM, but haven't gone through more than just reading the process, and also I'm a cheapskate who loathes subscription fees and who has the extra hardware already in place, so...
I used the latest Ubuntu live boot from a USB in my attempts to get at the data on the disk (Ubuntu doesn't appear to mount VMFS tho). Ubuntu being another Debian distro just felt like a good choice here anyway. But then I couldn't find the ISO, guessing maybe I deleted it or maybe it's on the other laptop, and redownloading it is taking forever. So, SystemRescue it is. Uploaded ISO to datastore, attached to the VM, and booted it up. So far so good.
fdisk -l tells me that /dev/sda is a 20GiB disk, which matches the VMDK. /dev/sda1 is markekd bootable, 428M, Linux. /dev/sda2 is 18.2G, Linux LVM. fsck says partition 1 is clean, and for good measure I ran it with -fy and everything checked with no reports of errors or wrongdoing. Already my VM is overqualified for politics. 14.2% non-contiguous (possible disk fragmentation?).
lvm pvscan: "PV /dev/sda2 VG turnkey lvm2 [18.14 GiB 660.00 MiB free]"
lvm vgscan: "Foudn volume group "turnkey" using metadata type lvm2"
lvm lvscan: "ACTIVE '/dev/turnkey/root' [17.00 GiB] inherit" (also a swap_1, but I feel like I'm being oververbose already).
This seemed like a good place for a snapshot, so I took one.
All went well, until I hit the spot where you didn't say to "mkdir /mnt2/boot" before attempting to mount to it. Linux was mad, but I gave it a quick pat on the head and some chocolate, and created the dir before mounting it, and we went on happily. I also had to create /mnt2/proc and /mnt2/sys, although I didn't have to create /mnt2/dev. I made it as far as "chroot /mnt2" and it apparently did not comprendo my commando:
"chroot: failed to run command [/bin/bash/]: no such file or directory" (it actually displayed a square ASCII character, not the brackets; doubt that matters too much tho). Incidentally, update-grub failed with "bash: update-grub not found" and grub-install also didn't work, although it at least found the command: "grub-install: error: failed to get canonical path of 'airootfs'." Canonical... I'm not writing fanfic here.
So, up to the grub portion I appear to be okay. Not sure if the missing commands are a sign that these have been lost in the OS due to corruption, or maybe they just aren't where we expected in the live ISO? Either way, I'll post this before the browser crashes, and try a reoot of the VM to see if I should buy a lottery ticket. Thanks for the very detailed directions. Some of it was refresher for me, but the steps might help others in the future.
Some progress
Yes I am certainly a shameless plugger for TKLBAM! :) It sounds like you are all over it re backups, so not necessarily any need to use TKLBAM. Although OTOH, because it's all contained within the VM and communicates with the TurnKey Hub (and AWS) remotely, so long as your server had (outgoing) internet access, it would have continued to back up, even without your involvement (although obviously things still could have gone wrong).
Anyway, I'll stop plugging and get back to the issue at hand! ;)
First up, apologies for this somewhat vague and rambling post. I'm pretty handy with this stuff, but when I do it, I either rely on muscle memory and/or google to get the required info. And without having the immediate feedback from the system, get a little lost in it all sometimes...
As you note, it sounds like your VM was one of the earlier ones that included a separate boot partition (as /dev/sda1) outside the LVM (on /dev/sda2).
So it appears that you have repaired the root filesystem, which is a good start. Although the next bits don't sound so good!
TBH it's been quite a while since I last used SystemRescueCD and I had forgotten that it has the ability to boot directly into the broken system (as opposed to running as a live environment). Apologies on this gross oversight on my behalf. FWIW, I was thinking along the lines of using a TurnKey ISO in live mode (hence why I got you to create a /mnt2 - strictly speaking use of /mnt would have sufficed, but under some circumstance, TurnKey may already have stuff mounted there). Things may well have gone a little off track there?! Although, re-reading your post, I'm not really sure?! The first bit (including running fsck on the LVM) sounds like you were running as a live system (fsck should always baulk at checking a mounted filesystem as it can exacerbate fs damage). But the bit where you are trying to fix grub, sounds more like you have TurnKey itself running (I assume via the SystemRescueCD?!).
If I understand correctly and you are actually running the TurnKey system (but booted via SystemRescueCD), then it looks like you want "solution #1" from their "Repairing a damaged Grub" doc page. I was essentially suggesting "solution #2" on that same page.
Regardless, the chroot command should certainly exist either way (both on the rescue ISO and within the TurnKey VM). I'm not 100% sure, but if you are actually already running the TurnKey system, then that could be a sign of quite serious corruption where some important utilities have been corrupted.
If you are actually running TurnKey at this stage, then unfortunately, chroot appearing to not exist as a command, the "square ASCII character" in the error message, and the "grub-install: error: failed to get canonical path of 'airootfs'." all make me quite concerned that the degree of damage may be critical.
You may be able to reboot into a live system and try again and may get lucky enough to get the core of the system working again. Although rather than chrooting in (and using the version of grub installed within TurnKey), it may be preferable to try using SystemRescueCD's grub?
There is a risk that the versions of grub differ between the 2 systems enough to cause problems later (hence why I didn't suggest it sooner), but I suspect that we're well passed worrying about minor details such as that... So to do it that way requires slightly different steps, but essentially remains the same. You'll still need to mount the root LVM under /mnt2 (or /mnt if you chose - as per my note above) and then also mount the boot partition. IIRC fixing grub externally, doesn't require mounting /dev, /sys & /proc, but it won't hurt (also, you mentioned that you didn't need to mount /dev?! That seems really weird to me!). The only significant difference is that you will need to tell grub where to find the relevant /boot directory and files. I.e.:
Hopefully that should run ok. If it appears to, then you can try rebooting now and see how things go. Fingers crossed for you!
If it still won't boot, if you get a busybox shell, or a grub prompt, then you could have a go at manually booting in to see how bad things might be (I forget the details of those now, but you should find plenty of info via google).
If things are still badly broken and you're not over it already, then it's possibly worth changing tack to see if you can get the data out of the server, rather than get this one running. That may be easier said than done though, especially in the case of MySQL data. TBH, I've never tried to do that with a broken system. AFAIK the data files are some funky binary format which only MySQL understands. My guess is that they'll be buried somewhere in /var/lib/mysql or perhaps /usr/share/mysql (both guesses). But even if you find them, you'll also need to know the exact version of MySQL that was installed. If you can read the file /etc/turnkey_version, please post the contents of it and that will give me a little more info and I reckon I could be a little more precise on what might work there (plus paths etc)...
IIRC the Mediawiki files themselves should be in /var/lib/mediawiki. I'm pretty sure that when we were using a separate /boot partition, we were also installing mediawiki from Debian package/ But we've since moved to an upstream source install. So if not, then have a peek in /var/www/mediawiki. I forget which version it was we moved from Debian package to upstream source install, but if I know the exact TurnKey version I can check.
Add new comment