Wednesday, June 20, 2007

The outage...

There were a few questions about my recent outage on asktom.oracle.com - such as:
  • aren't you clustered
  • don't you have redundant hardware
  • isn't there redundant network links
  • do you use an ISP - wouldn't you provide your own connectivity
Well, truth be told, asktom is sort of "skunk works" in nature. It isn't in an official data center - the availability is "pretty darn good". It runs on a bit of hardware that cost about the same as my laptop (just a little more, but truthfully - not a lot more).

There is no redundant hardware - unless you count the raid array. A single computer. Single network.

It is not clustered. It has no real availability requirements beyond "pretty darn available". It is done without a budget. It just takes care of itself. If it isn't available - the world doesn't stop, people still work, life goes on.

We use an ISP - at the end of the day, everyone does (except for perhaps the ISP's themselves).

Given that asktom has been running for seven years now - and this was the first major 'incident', the tag I use of "pretty darn available" fits quite well. It takes about 1% of a DBA, 1% of a System Administrator to run. It is very low maintenance, by design. APEX based - as few moving pieces as possible. Single purpose machine. Low budget.

All of those things drive me to "pretty darn available" - and it is.

So, no major changes in the works to "harden" it. The world doesn't stop when it is unavailable (truth be told - I was in class on Monday and Tuesday - not having a large backlog of 'reviews' every night afterwards was sort of nice). We might move it into a data center - whereby the availability of replacement bits and some level of network redundancy would be present - but that is a "maybe".

Pretty darn available...
POST A COMMENT

33 Comments:

Anonymous Anonymous said....

Pretty darn good post!

Wed Jun 20, 08:45:00 AM EDT  

Anonymous Anonymous said....

AskTom at "pretty darned available" still beats the heck out of a consultant at $300/hour who probably doesn't know anywhere near as much as what's in that database.

Here is one developer who is "pretty darned appreciative" of what has obviously been a labour of love for the past few years.

Wed Jun 20, 09:28:00 AM EDT  

Anonymous Anonymous said....

Availablity has been great over the years.

But, if it helps convince the corporate folks to provide you more resources...

Tell them that this customer with an oracle enterprise-wide unlimited site license thinks that asktom is a more valuable resource than metalink and otn.

Wed Jun 20, 10:03:00 AM EDT  

Anonymous RobH said....

Tell them that this customer with an oracle enterprise-wide unlimited site license thinks that asktom is a more valuable resource than metalink and otn.

Couldn't agree more. Until now, I assumed Oracle supported, produced hardware, etc for asktom. They should. Hands down, you are more likely the reason for product growth than the database itself.

Wed Jun 20, 10:05:00 AM EDT  

Blogger Joel Garry said....

The world doesn't stop when it is unavailable

Well, that's one way to define an SLA. :-)

asktom has demonstrated how an excellent application can grow organically from a small group of experienced people with clear ideas on "how things should work." This is a fundamental of the Agile or RAD development methodologies, when they are done right. Even when they are done right, they can fall into implementation traps, and then we hear things like "we don't need backups, we have vendor-guaranteed high uptime," or "the world doesn't stop when it is unavailable," or "we have budget limitations."

Nothing wrong with all that, until it becomes out of step with reality, having expanded beyond a development environment or a small group. Once it becomes available to the general public, it really needs a reevaluation of the necessary service level. It's no longer skunkworks, it becomes more like a utility, people expect it to just work. Remember when telephones were like that? :-O

Of course, the vast majority of the web, software and infrastructure, is just plain flaky. Should developers and implementors use that as an excuse? I say no.

A large financial institution notified me not long ago I need to use their convenient 24 hour online site, or else pay fees for that college savings account, or else give them $10K. Their web site was available, but the system was down. I took the fourth choice and pulled out all the money. Did the world stop when their site didn't work?

asktom is a truly great app, it would be well worth it to add some 9's.

word: ruktojgh

Wed Jun 20, 10:13:00 AM EDT  

Anonymous Rob said....

ummm, 4 years one major down. That is an anomoly. asktom has exceded PDA.

-Rob

Wed Jun 20, 10:22:00 AM EDT  

Anonymous marlenej said....

You imply that AskTom is only useful when you are replying to reviews, but you have solved dozens of problems for me from old threads. The first thing I do when I have an Oracle puzzlement is to search AskTom. I seldom fail to find the key to a solution. I have posted only one question on AskTom and never told you how many times you have saved my skin and made me look brilliant. AskTom is THE MOST IMPORTANT source of Oracle information for me.

Wed Jun 20, 12:17:00 PM EDT  

Anonymous Anonymous said....

Tom,

what class were you taking?

Wed Jun 20, 02:09:00 PM EDT  

Blogger Thomas Kyte said....

You imply that AskTom is only useful when you are replying to reviews

that was not my intention - what I was saying was "hey, you know what, not only did the world not stop - but having a couple of nights off for me was not horrible :)"


what class were you taking?

11g...

Wed Jun 20, 02:14:00 PM EDT  

Anonymous Anonymous said....

Tom;
Its indeed very nice to read all your blogs and the questions you have answered on your website.

Could you write something on "How to be a Successful Oracle DBA."
I am a beginner and your words would be a huge morale booster and a guideline to all beginners.
Hope you will take some time amidst your busy schedule and pen a few words.
Thanks

Wed Jun 20, 03:18:00 PM EDT  

Anonymous Anonymous said....

(alias Stew)

Tom, great you enjoyed your two evenings off. For those who wanted to consult old threads, it would have been nice to be able to consult the Google cache. All you need to do is take the question marks out of the URLs...

Wed Jun 20, 04:58:00 PM EDT  

Blogger Thomas Kyte said....

it would be pretty much impossible to take the question marks out of the URLS and - I don't want it in the google cache necessarily.

Wed Jun 20, 05:03:00 PM EDT  

Blogger Alberto Dell'Era said....

> The first thing I do when I have an Oracle puzzlement is to search AskTom.

Same here - and almost always I find the answer, and I know many, many people that do exactly the same.

Would be interesting to measure, financially, the impact on the "Oracle community" of two days of asktom downtime ;)

Wed Jun 20, 06:16:00 PM EDT  

Anonymous JL said....

Tom:

since 3 years ago, i visit asktom al least 11 days at month.
since 3 years ago, i don't spend my time at metalink.
...i just have made 2 follows up.
...believe it. without asktom the world doesn't stop, but with it, i can early go to my home.
JL

Wed Jun 20, 10:54:00 PM EDT  

Anonymous Anonymous said....

"Low budget."

Since you don't have to pay for Oracle licences :-$

Thu Jun 21, 07:13:00 AM EDT  

Blogger Thomas Kyte said....

Since you don't have to pay for Oracle licences

well, I've written before that asktom would be able to run on XE easily sizewise...

SE1 for sure.

It would be, could be low budget - all things considered.

Thu Jun 21, 07:29:00 AM EDT  

Anonymous Anonymous said....

I really don't mean to complain, as AskTom has always been a great, highly available, and very approachable resource. I cannot possibly thank Tom (and everyone else who contributes) enough for putting their time/effort/passion into this.

That said, I'm a bit surprised at how Oracle (the organization) regards this tool. The latest version of SQL Developer has a built-in search linking to AskTom, but doesn't regard it as important enough to have a back-up internet connection? It just seems like some people in the organization recognize how awesome AskTom is while others simply do not. I guess most places are like that, Oracle included.

Thu Jun 21, 10:36:00 AM EDT  

Anonymous Stew said....

In large organizations, industrial-strength solutions tend to require industrial-strength approval processes, meaning "slow by slow". Once you go that route, there's no turning back.

If it works "pretty darned" well, don't fix it.

Thu Jun 21, 12:11:00 PM EDT  

Blogger Tom said....

it would be pretty much impossible to take the question marks out of the URLS and - I don't want it in the google cache necessarily.

If APEX would use HTTP POST instead of HTTP GET, it wouldn't be so hard. ;)

Thu Jun 21, 02:04:00 PM EDT  

Blogger Glenn said....

AskTom means SO much to the education of the developement and DBA community, you would think Oracle would reconize that and ante up the $10 or so bucks a month to put the machine in a datacenter.

Of course, they are not making any money on it (actually probably losing Oracle University attendents) so I guess it won't happen. Maybe you need to watch out from SABOTAGE!

Thu Jun 21, 06:39:00 PM EDT  

Blogger 3360 said....

I don't know why it is being suggested that Oracle should officially maintain Ask Tom. They do that for the OTN forums and Metalink and they don't achieve the uptime of one outage in four years.

Thu Jun 21, 08:22:00 PM EDT  

Blogger Thomas Kyte said....

Just to clarify...

It is by my choice that the hardware hosting asktom is in Reston Virginia, in a pretty darn available setting

I put it there, I've never been 'denied' data center access - in fact quite the opposite.

I've just never had the need to really do so - it is a little more flexible for me personally this way (I can touch the machine if I want to :) )

Fri Jun 22, 09:33:00 AM EDT  

Blogger Jon, Sarah & Harriet said....

The world may not have stopped, but I really missed your posts and AskTom is a massive source of information, inspiration and motivation. I would like you to know how much this DBA appreciates your efforts that makes my home/work balance easier to achieve - thanks Tom (and your family for sharing you so much).

Fri Jun 22, 04:19:00 PM EDT  

Anonymous Anonymous said....

Seems to be down again.

The world doesn't stop turning but it does reflect badly on Oracle as a company.

It is not obvious that the site is not run by Oracle as company, but your private project. I would imagine it gets more hits than Oracle.com ?

From
http://www.oracle.com/oramag/oracle/01-jul/o41asktom.html

I see ...

"Oracle Managing Technologist Tom Kyte answers your most difficult Oracle technology questions in Oracle Publishing Online's Ask Tom forum, at asktom.oracle.com. Highlights from that forum appear in this column."

Which make it look like it run by Oracle Publishing - not yourself.

Sun Jun 24, 02:37:00 AM EDT  

Blogger Thomas Kyte said....

Yes, this weekend (the 23rd/24th) they are upgrading the network on the floors my machine resides on

Sun Jun 24, 11:06:00 AM EDT  

Blogger Gary Myers said....

It may the weekend at your end. Down here in Oz, it is Monday afternoon now ;)
PS. Don't let them move it to a data centre. Once they have that information, they'll add a licence fee to it.
PPS. Any book progress ?

Mon Jun 25, 12:18:00 AM EDT  

Anonymous Anonymous said....

[quote]
The world doesn't stop turning but it does reflect badly on Oracle as a company.
[quote]

Well no one pays for it mate, so no need for blamestorm..

I guess 2 days of downtime is justifiable with 7 years of uptime.

And did i mention you don't pay for it??

Mon Jun 25, 01:00:00 AM EDT  

Anonymous Shouvik said....

The world does not stop but it might think you have taken a job in Microsoft and might even look for asktom.microsoft.com.

Mon Jun 25, 06:26:00 AM EDT  

Anonymous Gabe said....

Would a read-only mirror of asktom (say, refreshed every day) situated on a different network be too much of a hassle?

Mon Jun 25, 09:47:00 AM EDT  

Anonymous Anonymous said....

Getting a DNS error still.

Mon Jun 25, 10:13:00 AM EDT  

Anonymous Berny said....

AskTom is back up again and the world continues turning... ;-)

Mon Jun 25, 12:50:00 PM EDT  

Anonymous Anonymous said....

'Pretty darn available' works for me.

Unless some one (Oracle) is gonna buck up to make it > pretty darn available with redundancy etc.
dont bother

Wed Jun 27, 03:08:00 PM EDT  

Blogger 3360 said....

Entire oracle.com is inaccessible at the moment including OTN and forums.

> I don't know why it is being
> suggested that Oracle should
> officially maintain Ask Tom. They
> do that for the OTN forums and
> Metalink and they don't achieve the
> uptime of one outage in four years.

Ask Tom is still up though.

Thu Oct 18, 10:01:00 PM EDT  

POST A COMMENT

<< Home