Managed Dedicated Server BLOG | FastServers.Net

The Red Button: To push or not to push

posted on March 7, 2007 4:38 PM. by Andrew Howard

Andrew Howard

The red button I speak of in this context is the metaphorical reboot switch. Granted, since your servers are all contained in a state-of-the-art data center in locked cages behind bomb-proof walls and high-tech security, you never actually get to 'push' the button, and for that matter most of the buttons aren't even red, but you're reading this now, which means the title got your attention. More than just this, the title serves as an introduction to the content in that I'll be bringing to light a few reasons why you may or may not want to reboot your server(s).

I'm sure everyone has received advice at some point during their lives to "Just reboot it." I know I have heard it, and I'll admit that when I'm truly stumped I'll sometimes do just that. However, rebooting is not always the answer. Working as a tech here at FastServers, I've seen any number of server problems, and I'll stand right up and say that a reboot is almost never the best solution. In fact, rebooting is often not a solution at all. If you're noticing an error somewhere, or some service isn't working properly for any number of reasons, there is always a reason for the problem. Servers do not randomly develop a glitch in the flux capacitor. Since there is a particular source causing the problem, it stands to reason that this source can be found and corrected. However, rebooting a server re-initializes all the services, and often clears away most of the evidence you, or we, could use to find the source of the problem. If you are seeing a problem with the server, you want to investigate any error messages you're seeing. If you aren't seeing messages, head straight for the log files, where you will usually find an error message. From that point you can use the error message to find the source of the problem, correct it, and then test to ensure the problem is in fact resolved.

Hopefully I just made a point, though it may have been a bit convoluted. Basically, rebooting may make the problem go away, but if so then you don't have any idea what caused the problem in the first place. This is more or less the same idea as with restarting services. I had a problem with my own server awhile back in which apache (the web service) started dishing out an internal server error every time I accessed a page that ran a cgi script. I couldn't restart apache directly, but if I killed off the processes and then started the service fresh, it worked fine. The problem was, it started dishing out the 500's again after a day or so. As it turns out, I had inadvertently configured ulimit to restrict the number of processes running as apache. Once this peak amount was reached, apache snapped and could no longer run cgi scripts, at least until it was restarted. I was only able to solve this by leaving the web server in the broken state long enough to test the issue and find all the error messages. The point I'm trying to make through all of this is that if you're trying to fix a problem, don't just attack the symptoms. Rebooting may make the problem go away, but your job isn't done if the problem comes back. If you investigate the symptoms, you should be able to find the source.

This philosophy applies for opening tickets with our tech department as well. If your mail server freezes up after six hours and then needs to be restarted/rebooted, we really need to see it while the problem is occurring. I can not even begin to explain how difficult it is to troubleshoot a problem I can't see. I know it's tough, but if at all possible we need to leave the service in a broken state until we're able to find the cause.

So far I've been detailing all kinds of reasons why you should avoid rebooting your server. I'm really just trying to convince everyone that a reboot often doesn't solve the problem, but only hides it for awhile. On top of that, the solution is often a much smaller change to the server, so a reboot is like using a sledgehammer to drive a finishing nail (My roommate is a construction worker).

In addition to persuading all the webhosts out there to avoid reboots, I'd also like to encourage you to reboot. No, don't get out the straightjacket yet; that sentence isn't completely insane. Reboots have their time and place. People have a tendency to use reboots as the technology cure-all, but others have a deep-seated fear of rebooting. This fear isn't completely unjustified. I've seen a number of servers go belly-up during a reboot, but having said that, I've also seen many, many more reboots go off without a hitch.

For linux servers, kernel updates are dreadfully important (as are all updates), and require a reboot. A major advantage of linux servers is the ability to have multiple kernels installed at once. If the server fails to come online with a new kernel, we may be able to simply reboot to an old one. There are even ways of configuring the boot loader to try a new kernel once, so if it fails then a simple reboot will default to the old kernel. If you're interested in how this works, do a Google search for "grub savedefault once" or open a ticket with our tech department and ask about it.

For Windows servers, reboots are more common, as they are usually required with every round of Windows updates. In this case, DEFCON 1, 2, and 3 customers have an advantage. Our update-deployment software is really powerful in that it will automatically roll-back any updates that cause problems and raise an alert so we can look closer at why the update failed. DEFCON 4 and 5 customers don't have this advantage provided through FastServers, but that doesn't make updates any less important, and if Microsoft says a reboot is required to complete the installation of updates, they mean it. Even though updates may have been installed, that has only changed the information on the hard drive, and the old version is still loaded into memory, and is still the one running until a reboot occurs.

Reboots can be scary, but they are necessary, particularly for updates. If updates are not completed with necessary reboots, the updates may as well have never been applied at all. Without updates, servers will be hacked/infected/compromised/rooted/etc. Once that happens, the only thing we can do is reinstall. It's much safer to risk a reboot before that happens. Of course, reboots don't have to be dangerous. Truth be told, servers almost always come back online successfully after a reboot, so it's not all that dangerous to begin with. Even so, steps can be taken to reduce the risk that comes with reboots. If you are afraid of rebooting a server, feel free to open a ticket with our tech department. If you have any specific concerns, be sure to let us know. The more information we have, the better we can provide support.

Last but not least, what should you do if you've already rebooted the server and it's not coming back up? Obviously, you'll have to contact us. If you can't reach the server remotely, you can't do much for it from where you are. If we can't reach it remotely, we'll probably have to reboot it manually to get keyboard and mouse support. At this point, the time it takes us to bring the server back online is based on how much information we have. We don't necessarily *need* to know that the server was rebooted to a freshly-compiled kernel and never came back online, but that information could prove to be the difference between a 10 minute turnaround and a 30-40 minute turnaround. The more information you can give us about what the server was doing before/while it went offline, the faster we'll be able to have you back in business.

In summary, I'm long-winded. Having said that, just keep a few things in mind:
-Reboots don't solve problems, they hide problems.
-Sometimes reboots are necessary. Don't fear them.
-If anything goes wrong, increased information translates directly into faster turnaround times.

Spread the Word: Click below to share this with the rest of the world

Rating: 5.5/10 (1789 votes cast)