You are here
Hi guys,
I didn't want to flag this as a bug in the issue tracker at this time because I was initially wondering if anyone had TKLBAM hanging problems before on initial backup after a fresh appliance install and configuration and how you might have fixed it. I'm not looking for support or grunty debugging guidance, that's on my shoulders, I'm really looking to benefit from any existing wisdom on the subject based on knowledge of TKLBAM or having experienced this problem.
The Problem:
I have a new Wordpress multi-site built on a turnkey-lamp-13.0-wheezy-amd64 running on a VMWare VCloud host. I prefer to build my Wordpress from scratch so I didn't use the Wordpress appliance.
NOTE: I have a turnkey-lamp-11.3-lucid-x86 running a custom Wordpress multisite on VCloud, a turnkey-otrs-13.0-wheezy-amd64 appliance running on VCloud and a turnkey-wordpress-12.1-squeeze-amd64 running on AWS. No problems with any of them and no TKLBAM problems.
The Issue:
The Wordpress multi-site I built on a turnkey-lamp-13.0-wheezy-amd64 runs perfectly on a VMWare VCloud host. Operates like a dream. When I went to do my initial backup though TKLBAM gets through processing the files and the databases and then just sits there. Now I thought it might take some time so I ran it overnight but it was still just sitting there this morning. I have poked around and tried killing and restarting the backup process but I am suspicious that something is off.
As you can see from the attached TOP listing there is plenty of RAM, CPU, and swap space and the disk is empty (fresh install). So why do you think TKLBAM might be stuck. My firewall is ok because when I go into the TKLX Hub it shows that this backup is in progress.
Any ideas would be most appreciated. I'll post updates here if I trip over anything interesting as I continue my diagnosis. Never had a problem with TKLBAM before. Always worked like a dream.
Anywhere I can look to see if there is any activity going on of any substance...can the TKLX boys take a look on the Hub end and see if there is any data being uploaded at all. Much appreciated.
I seemed to have solved the problem...
I had been killing the processes and restarting the tklbam process via the Webmin interface (what can I say I like a GUI).
I thought I would do a simulation to see what the size of the backup would be and got this error in Webmin:
tklbam-backup --simulate
error: --simulate will destroy your aborted backup session. To force use --disable-resume
So I went to the command line via Putty and ran:
tklbam-backup --disable-resume
Much to my pleasure it completed the backup happily.
I'm going to keep a watch on this...could be some weird edge condition involving the Webmin TKLBAM plugin.
Cheers,
Tim (Managing Director - OnePressTech)
I've had some similar issues - generally worked through
One time a Backup was started but not completed. Running TKLBAM took care of it.
Other times I have had to either run tklbam-init OR kill a backup in hub.turnkey.org. I think there is a fair number of posts I have put out here on various situations (search arnold tklbam, but don't read them all)
I generaly like Webmin for running a Manual Backup becuase it will normally finish the session without timing out. I am finding that Wheezy Webshell will finish without timing out but earlier Squeeze and Lucid Webshells would often Timeout.
I did discover Bitvise SSH Client recently which runs both sftp and shell concurrently. The only problem is that it won't give a wider page view in shell, but I do think it would be a functional way to run TKLBAM session. Also I really like having the SFTP client right there along the wesession, no JAVA required.
My TKLBAM hanging problem is still there...sigh!
I know I should be old enough and wise enough not to test the infinite capacity of the gods to muck up my life for their own amusement by posting yesterday's "it's fixed email" but, what can I say, I'm an eternal optimist.
Thanks L. A. for letting me know I'm not the only one experiencing this. I guess I was enjoying the warm cocoon of my previous VMs that experienced no issues at all.
This morning I woke up to the infamous Wordpress "Error establishing a database connection". Last night's daily backup hung while processing the database again so the database is offline until I kill TKLBAM.
So although I'm not out of the woods yet, at least there is a path out of the woods. The problem appears to be related to the backup of the database. I'll do some further analysis and post it back here. It's not a server capacity issue so it is likely a MySQL database capacity issue. Not sure why TKLBAM would hang though...I would expect it to time out and log a failed incremental backup! I'll have to look into that too.
Anyone who has had a TKLBAM lockup during the database backup in the past who diagnosed the problem and / or has a work-around feel free to jump in any time :-)
Thanks again L.A. for the suggestions on your TKLBAM retry techniques unfortunately tklbam-init reports "error: already initialized" and I can't kill the backup on the hub because there is no option to kill an incremental backup in progress through the Hub's Backups console. So it's into Webmin or Putty to issue the Kill command (TKLBAM seems to shrug off the Terminate command).
Stay tuned...
Cheers,
Tim (Managing Director - OnePressTech)
FWIW
You can use the --force switch on tklbam-init to force reitialisation. Also AFAIK you can still manually remove your Hub API key like this. TBH I haven't tested that lately but I assume it would still work...
Thanks J - Good tips
Thanks J. My comment was related to an inability to kill a backup in progress from the hub.
Thanks for the tip on forcing a tklbam-init. I just completed the reset but no luck.
Cheers,
Tim (Managing Director - OnePressTech)
Ok, this is weird, a manual kill & resume worked perfectly
So I restarted the TKLBAM incremental backup via Putty:
> pkill -f tklbam
> tklbam-backup --disable-resume
Incremental backup completed in 4 seconds! I'm wondering now if there is a credential problem with the Webmin and automated part of TKLBAM that is not experienced when manually issued from the commandline logged in as root!
Stay tuned...
Cheers,
Tim (Managing Director - OnePressTech)
Hmmm...
AFAIK behind the scenes, the Webmin module just runs the same commands (as root) as what you'd do yourself (on the commandline). Also the automated TKLBAM backups run from cron, which again AFAIK also run as root.
I guess my info probably doesn't provide a lot of help or guidance though... I'd be very interested to hear if you find anything that you think might be causing your issues...
Thanks J - I had assumed Webmin and cmd line same but...
Here's the thing...
I now have a 100% reproducible situation:
1) Backup initiated from Webmin hangs after database has been parsed. It stops in the tklbam log after displaying all the database tables and before the "TKLBAM/newpkgs" line gets displayed. This is consistent whether it is a live backup or simulation
2) Backup initiated from the commandline via Putty terminal works perfectly.
Whether related or not I have difficulty killing MySQL. Can't stop it from Webmin or the command line. On the commandline
[FAIL] Stopping MySQL database server: mysqld failed!
Cheers,
Tim (Managing Director - OnePressTech)
Problem Solved, Red Face, Thanks all :-)
Ok, Ok...I left something out. I just didn't know it was significant enough to report since I made the change long before the TKLBAM problem arose.
This is a new build and my first Debian VM. So what do you do as part of a new build...look around, get familiar, and of course touch things you shouldn't. Oh look...there is a default user with full acces to my MySQL called debian-sys-maint. Sounds like a security risk to leave that default user with full access to the database. Let's just drop him :-)
Huh...everything works fine as expected. Flash forward a few days later...hey, TKLBAM is hanging!!!
For those experienced Sysadmins out there...you get a free laugh at my expense :-)
For those less experienced, like me, Webmin uses the debian-sys-maint user to issue its commands. Other symptoms were inability to stop / start the mySQL server from Webmin.
End of the story...TKLBAM from Webmin now works fine live and simulation.
A happy ending... yay!.
Thanks to all those who contributed tips and moral support on my journey of discovery. Nice to know the community is there to pitch in when needed.
Thanks L.A. Thanks J. Much appreciated.
Cheers,
Tim (Managing Director - OnePressTech)
Great detective work Tim
I know it can be a little embarrassing when these sort of things happen but good on you for finding it and letting everyone know. It's a great reminder how the seemly 'incidental' things we do to our systems can sometimes be the cause of such issues!
Glad you got it all sorted! :)
Glad that worked and you tought us about Webmin User
I never knew about debian-sys-maint.
It is a great system. There are occassional gotchas but so far I haven't been "unable" to get past them. Sometimes it takes a while.
Thanks for the update on everything and glad it worked out.
Add new comment