You are here
Bob_Bennett - Tue, 2015/11/24 - 00:43
I'm not sure how much information is needed here. Feel free to ask any questions.
I have a live lamp server running version 13 and backing up nightly. I am trying to test disaster recovery. In the hub, on the backups tab, I click on the server in question and click on 'restore to new cloud server'. It asks what region I want to use (and I've tried several) and proceeds to create a new lamp server, presumably from the backup. After it's up and running, trying to log into webmin or ssh with root and the password that currently functions on the live server, does not work. I'm not a linux guru so I'm probably doing something wrong but I don't know what.
Any ideas?
Forum:
It should "just work"...
Thanks for providing good info of how you produce the issue. I'll see if I can reproduce it and get back to you.
I can't reproduce this issue...
Here is what I did:
So I wonder what might be happening for you? The only things I can suggest I'm sure that you've probably already tried, but I'll mention them anyway...:
The only other thing that I can think of is to use SSH keys to log in. If you add SSH keys to either your Hub account (easiest) or to Amazon EC2 (you need to make sure that you add the key to the region where you intend to launch your test server) then the keys should be auto included into any new server you launch.
Jeremy, Thanks for the
Jeremy,
Thanks for the response.
We are trying to different ways if I'm understanding what you did. You clicked on the blue "launch a new server" button, bottom left?
What I tried was the white "restore to new cloud server" button in the details of the backup right below the time since last backup.
However, I did try your method, with creating a new server, then tklbam-restore. However, once the server is done restoring and operational, something didn't come up right as none of my apache sites are there and a2ensite only lists the default lamp sites. My concern was even if I fix that, how can I be sure everything else is ok?
Apologies if I wasn't clear
So yes I used the 'white "restore to new cloud server" button in the details of the backup right below the time since last backup'
It should be possible the way that you thought I did it. I.e. launch a new server and restore your backup to that. However I'm almost certain that the issue is something to do with your backup; rather than the restore process. Although TBH I have no idea why it isn't working as it should. It should just work; like it did for me...
FWIW even though I think that it is irrelevant, if you did want to restore to a new server, you need to restore the correct backup set. What I'm pretty sure you did was a restore of an empty backup the new server. You would need to do a 'tklbam list' first to get the right backup set ID (or look in the Hub the the ID); then do a 'tklbam-restore <BACKUP_ID>'. But as I said I'm almost certain that isn't the issue...
TBH I'm not really sure of the best way to troubleshoot this...
Perhaps try that and see?
I'll ask ALon and Liraz and see if they have any bright ideas on other stuff we can try.
I have spoken with Alon
Firstly perhaps the system log (viewable from the Hub) might contain some clues.
He also suggested it actually might be useful to test doing a manual restore in a new machine (make sure you specify the correct backup ID). You could then watch it interactively and check the restore log afterwards. Again the full sys log may provide some insight too?
I think we're getting
I think we're getting somewhere. When I try to execute tklbam-backup --simulate, I get the following errors at the end:
----------------------
sh: 0: getcwd() failed: No such file or directory
UNCOMPRESSED BACKUP SIZE: 10.39 GB in 59732 files
Traceback (most recent call last):
File "/usr/bin/tklbam-backup", line 510, in <module>
main()
File "/usr/bin/tklbam-backup", line 445, in main
hooks.backup.inspect(b.extras_paths.path)
File "/usr/lib/tklbam/hooks.py", line 82, in inspect
orig_cwd = os.getcwd()
OSError: [Errno 2] No such file or directory
----------------------
Any thoughts?
I think that might be pointing us toward the issue
So there is some file or directory that it's trying to access that doesn't exist. Unfortunately it's not telling us what directory is causing this issue. That makes it really it's hard to know if this is an issue with your backup; an issue with your original host; or an issue with TKLBAM itself.
I would check the TKLBAM log (/var/log/tklbam-backup) on your main server (the one where the backup is coming from) and see if there is anything that looks relevant there. It may be worth manually running a (full) backup there too and see if any errors occur during the backup process itself (they should be in the log; but might still beworth checking). FWIW tklbam-backup also accepts the --simulate switch (see the docs).
My suspicion is that this is being caused by something that you have installed or added to your main server that isn't being included in the backups. Then when your backup is trying to trying to restore it is failing because a path doesn't exist in a clean TKL server.
It's probably good practice to make sure that both servers have the latest version of TKLBAM (AFAIK they should do):
If that reports (at the end):Then you have the latest version...
I checked for tklbam updates
I checked for tklbam updates and we have the latest version.
Last nights backup appears to have worked so I ran tklbam-backup again, but it did incremental not full, like last night. The documentation says it should execute a full backup. What's the command to force a full backup?
I checked the log /var/log/tklbam-backup and nothing unsual there, and the incremental backup says no errors.
I tried creating a fresh "small" server (listed as a previous generation) in case that would make a difference.
It does claim to successfully restore without having to issue these extra commands like a medium current generation server does.
mkdir /temp
chmod 1777 /temp/
export TMPDIR=/temp/
mount --bind /temp/ /tmp/
mount --bind /temp /tmp
But a2ensite still doesn't list the actual production sites so something isn't restoring. I looked at the log /var/log/tklbam-restore but nothing sticks out, no errors I can find.
Assuming that it has been
I think that perhaps it might be worth doing a simulated backup ('tklbam --simulate') initially (rather than uploading it too). Then you can have a dig around inside the backup (/TKLBAM) and see what it's backing up and what it's not. You can then tweak the settings to make sure that it is including everything that it should.
Having said that, unless it has been tweaked to exclude the Apache configs it should be automatically including them (and all the rest of /etc for that matter). To double check that have a look at the TKLBAM overrides conf: '/etc/tklbam/overrides'. Anything starting with a '-' (dash/minus sign) will be excluded; anything (and it's contents) explicitly mentioned there will be included. To see what should be being backed up by default should be mentioned within /var/lib/tklbam/profile/
Ok, every line in the
Ok, every line in the overrides files is commented out, every line except a blank one in the middle.
The backup --simulate is not capturing all the files. It's missing quite a few. How do we fix that? Specifically a number of the sites in /etc/apache2/.
Are you using non-standard locations?
Regardless, you can ensure that your site files are included by specifying the paths that you want included in your overrides file (/etc/tklbam/overrides).
We aren't using any non
We aren't using any non standard locations. Our sites are in /etc/apache2/sites-available.
Is there a way we can just specify that we want to backup everything? So that we could restore a full functional server in the event of a disaster. If that's not possible, I don't think we have a use for this service at all.
TBH I don't understand why it's not working
I just double checked it myself to make 100% sure.
This is what I did to test:
FWIW during the backup I note that tklbam reported the following (I'm using "..." to indicate lines that I omitted):
As you can see it removed the symlink enabling the default site and automatically recognised and included in the files that I had added. I did not manually tweak anything to make it do that...
So obviously there is something wrong with your tklbam config. You said that no one has adjusted or changed anything with the TKLBAM config so I have no idea how it could have gone wrong or what might have gone wrong...
As a last ditch effort I suggest that you move the existing profile directory. That will force TKLBAM to re-download it when you launch TKLBAM. To do that, try this:
The first line it responds with should be:
Check the output for your Apache conf files (and other stuff). If there's too much output and/or its too much of a pain, then try this:
On my server it reports this: It has those entries 3 times because I've run TKLBAM 3 times. To reduce the output then you can pipe it through tail: By default That will give you the last 10 lines of output, you can use a switch to explicitly set how many lines to output (e.g. for 20 lines: "tail -20"). Also if you want to search for other things, replace the contents of the double quotes in the above command. It can be (partial or full) paths or file name(s) etc.One more thing...
Add new comment