You are here
Bjorn Struebing - Mon, 2014/07/14 - 21:11
Have a weird issue that i do not know where to start from on. I am running the turnkey wordpress vm and its been soild for almost a year now, however in the last few months the website will go down and the only fix is to reboot the server. I can get to the webmin page no problem, its just the wordpress site that will not come up.
My question is where do i start in trouble shooting this.
Thanks for the help in advance.
Bjorn
Forum:
Tags:
Have a look at logs
All the log files will be in /var/log - Apache specific ones in /var/log/apache2. Also are there any errors showing up when you browse to the WP site? They might give you clues (i.e. if it is a 404 something is probably up with Apache, if it is complaining about a DB connection then maybe its MySQL, etc).
Also one other thing that might be worth a look is to see if phpMyAdmin works (https://domain-or-ip:12322) as that runs under Apache too and requires a MySQL connection. might help you pin it down to a Apache/MySQL issue or a strictly WP issue.
Can you qualify and reproduce "website goes down"!
Hi Bjorn,
As Jeremy indicated you can have a look in the apache logfiles but if nothing jumps out I would suggest the following:
1) Clarify "goes down"...do you get 500s? Do you get no response at all? How long does this last if you wait? How often are you rebooting (daily, weekly, monthly)? Are you on your own hardware appliance (CPU fan failure typical culprit) or on a cloud service provider (AWS Micro Instance auto-throttle)? Is your disk full? Does the site "go down" for users or wordpress admin or both?
2) Enable Wordpress debug (http://codex.wordpress.org/Debugging_in_WordPress)
3) Set up a Pingdom account and monitor its responsiveness ahead of it "going down"...does performance degrade or does it just stop!
4) Clone the appliance and see if the problem goes away on the clone appliance.
5) Turn off all plugins on the clone appliance and see if the problem goes away.
NOTE: If you are using a AWS Micro instance AWS will shut it off completely for 10+ minutes if you breach peak usage. See http://www.turnkeylinux.org/forum/support/20120626/cpu-usage-spikes-100-... for details.
NOTE: If you are using a AWS Micro instance increase it to a small instance and see if problem goes away.
Cheers,
Tim (Managing Director - OnePressTech)
thanks for the info. let me
thanks for the info. let me run thru this and see what i can find.
so it finnaly went down again
took about 10 days but it finnaly went down. When i go to the page i either get "This page cant be displayed" in IE or "No Data Recived" in Chrome.
Here is what the appache log shows from the time Pingdom tells me it went down:
any idea what this means:
I also tried to hit :12322 and that will not load either. So i guess im looking at either Apache or MySQL?
any other log i should look at?
thanks again for the help.
Have a look at your pingdom and AWS logs
If it takes 10 days for a failure to occur it could be a memory leak in one of the plug-ins, disk getting full due to excessively verbose recording, external DOS. Regarding answers to the previous questions I asked...
1) what isp are you using?
2) if on AWS what size service are you subscribed to (Micro)?
3) What did the pingdom response log show...reducing responsiveness ahead of the lockup or good responsiveness up to the time of lockup or a quick spike in load ahead of the lockup? The first could indicate memory leak / zombie process, the second could be apache resource load buildup, third could indicate DOS attack.
Cheers,
Tim (Managing Director - OnePressTech)
1) So i'm self hosting in my
1) So i'm self hosting in my own Vmware Cluster. ISP is AIS
2) not using AWS
3) pingdom shows two spikes of 1000 to 3000ms. normal is 850-900ms Could I be looking at a DOS?
IMO DOS attack is quite possible
Especially considering the "server reached MaxClients setting" error.
But like Tim said, could be other stuff...
Segfaults in Apache can be caused by a myriad of different things, however running out of RAM is a possibility. A flaky PHP module, WP module or other PHP issue is a common cause too. Have you installed any new PHP modules, WP modules or adjusted any PHP settings prior to this issue starting?
This blog post looks like it gives a pretty clear explanation on how to narrow down segfaults which might be useful!?
Also did you check any of the things that Tim mentioned? E.g. HDD space?
As for the MaxClients setting, I don't know a lot about it, but a quick google suggests that it relates to the number of concurrent connections that Apache can handle (as you would probably expect by the name). It may be related to the segfaults (e.g. system running out of RAM? DDOS attack? etc...) or it may be coincidental. This answer on StackOverflow explains it quite well (although note that they are discussing a CentOS server so whilst the theory is the same, the way the config is tweaked is different in Debian). Also FWIW here is the official Apache docs on it.
ok let me look over those
ok let me look over those posts. thanks for the point of direction.
Make sure you consider Tim's input
Whilst I am an 'official' TurnKey guy, he is a long term trusted TurnKey user and always has valuable input. He has much more 'real world' experience than I do. My knowledge comes from a lot less experience, a lot more googling and a fair bit of 'best guessing'! :)
Good luck! And keep us posted.
I was about to suggest the same for your advice...
Thanks for the vote of confidence J ...let's hope our advice helps out in this case :-)
Cheers,
Tim (Managing Director - OnePressTech)
Ok...now it gets tricky...
Up to you how much diagnosis you want to do further.
1-3 second response would not be abnormal if not sustained ...i.e. not likely a DOS attack. Wordpress admin is a bit of a pig. Enough logged in users at the same time on a low cpu server would show a temporary high load.
Regarding maxclients...if you get enough hits in a short period maxclients can peak on an apache server because the default socket timeout is 300 seconds so a traffic burst even a legitimate one would suck up all your sockets and it will take 5 minutes to be able to access the server again. A quick load test should verify this if this is the problem http://performance-test.compuware.com/instant-load-test
My suggestion...
1) Check the apache access logs to see what the traffic looks like
2) After the server locks up wait for 10 minutes and see if it clears once the traffic is reduced
3) If you are self-hosting see what external network traffic logs are available so you can see when the traffic peaks and clears.
2) put a cron job that resets the server every night at 2 a.m. and see if the server stays up over the long term. There are times in the technology world that solving the problem is more important than knowing definitively what the problem is.
Up to you.
Cheers,
Tim (Managing Director - OnePressTech)
Add new comment