Thursday, September 29, 2005

On the difference between unlikely and impossible

Last night at 1:48am a large number of our servers either shut down or rebooted. The obvious cause? Power outage, right?

Except I knew better than that. Our servers are housed in a facility that doesn't have power problems - ever. They have multiple back up systems, heck they have a tanker truck of gas ready to be used for generators near by.

Most of the reason we pay the company to house the servers is for power (that and cooling and physical security).

So I started coming up with theories of what could bring down a whole bunch of machines (but not all...) at once. I was thinking we had some very clever hacker on our hands.

Finally, on a whim I asked someone at the data center - "I know this sounds crazy, but did you guys loose power at 1:45am this morning?" "Oh yeah" he responded, "we did."

And just like that, something I thought impossible (or very unlikely) just happend.

And how did they notify us of the problem? Via e-mail of course. Which was never delivered because the server didn't come back up after losing power. ARGH.

I guess the real take away is that you should have a catchall protocol for when the unthinkable happens. Simply saying the unthinkable won't happen, is well, a bad idea.


No comments:

Post a Comment