There were a few questions about my recent outage on asktom.oracle.com - such as:
- aren't you clustered
- don't you have redundant hardware
- isn't there redundant network links
- do you use an ISP - wouldn't you provide your own connectivity
Well, truth be told, asktom is sort of "skunk works
" in nature. It isn't in an official data center - the availability is "pretty darn good". It runs on a bit of hardware that cost about the same as my laptop
(just a little more, but truthfully - not a lot more).
There is no redundant hardware - unless you count the raid array. A single computer. Single network.
It is not clustered. It has no real availability requirements beyond "pretty darn available". It is done without a budget. It just takes care of itself. If it isn't available - the world doesn't stop, people still work, life goes on.
We use an ISP - at the end of the day, everyone does (except for perhaps the ISP's themselves).
Given that asktom has been running for seven years now - and this was the first major 'incident', the tag I use of "pretty darn available" fits quite well. It takes about 1% of a DBA, 1% of a System Administrator to run. It is very low maintenance, by design. APEX based - as few moving pieces as possible. Single purpose machine. Low budget.
All of those things drive me to "pretty darn available" - and it is.
So, no major changes in the works to "harden" it. The world doesn't stop when it is unavailable (truth be told - I was in class on Monday and Tuesday - not having a large backlog of 'reviews' every night afterwards was sort of nice). We might move it into a data center - whereby the availability of replacement bits and some level of network redundancy would be present - but that is a "maybe".
Pretty darn available...