Thursday, June 30, 2005

Instrumentation

Talking about instrumentation of code and introducing two more blogs I've been reading on a regular basis. One of them describes a really nice utopia today.

Code instrumentation, what is it? To me, it is the fine art of making approximately every other line of your developed code “debug” of some sort.  Trace information.  With timestamps.  And meaningful information.

Code instrumentation, why is it? Because poor performance is not always a “database problem”.  It is many times an application issue and when the application is spread over 14 tiers of complexity, tracking down the bottleneck is grievously hard.  If you just whip together an application and throw it out there without any thought to monitoring it over time, be prepared to have poor performance and no clue as to why or where.

Commonly requested information – Tom, can you tell me what my average transaction response time is?  Answer: nope, no clue, not a single clue.  The only thing I can tell you from the database is on average how long individual bits of SQL might have taken.  I don’t know what a transaction is to you and besides, I would tell you only about the database component – no network, no application time, just the database.

And you know what, the end users are the ones that care and they need to have all of the time accounted for.  You know what the only thing is that can really give you good transaction response times over time?  The application.  Why?  Because it is the thing that knows what a transaction is, what a transaction does.

I’ve met many a developer that refuses to put this into their code.  “It’ll make it run slower”, “This is extra stuff I don’t need”, “It takes me longer to write”.  I have yet in 18 years to hear a valid reason why instrumentation should not be done.  I have only heard extremely compelling reasons why it must be done.

End users want to know, is the system going slower over time, if so, by how much.  Management wants to know, what are my transaction response times, how many transactions do we do, when is the busiest time, and so on.  People responsible to administering the system need to be able to identify where bottlenecks are, who needs to be brought in to look at something, who is responsible.

Without code instrumentation, you cannot answer any of those questions – not a single one.  Not accurately anyway.  (Well, maybe you can if you live in utopia!)

To the developers that say “this is extra code that will just make my code run slower” I respond “well fine, we will take away V$ views, there will be no SQL_TRACE, no 10046 level 12 traces, in fact – that entire events subsystem in Oracle, it is gone”.  Would Oracle run faster without this stuff?  Undoubtedly not.  It would run many times slower, perhaps hundreds of times slower.  Why?  Because you would have no clue where to look to find performance related issues.  You would have nothing to go on.  Without this “overhead” (air quotes intentionally used to denote sarcasm there), Oracle would not have a chance of performing as well as it does.  Because you would not have a change to make it perform well.  Because you would not know where even to begin.

So, a plea to all developers, get on the instrumentation bandwagon.  You’ll find your code easier to debug (note how Oracle doesn’t fly a developer to your site to debug the kernel, there is enough instrumentation to do it remotely).  You’ll find your code easier to tune.  You’ll find your code easier to maintain over time.  Also, make this instrumentation part of the production code, don’t leave it out!  Why?  Because, funny thing about production – you are not allowed to drop in “debug” code at the drop of a hat, but you are allowed to update a row in a configuration table, or in a configuration file!  Your trace code, like Oracle’s should always be there, just waiting to be enabled.

POST A COMMENT

42 Comments:

Anonymous Raj Jamadagni said....

Tom,

I like the instrumentation part, I do it in my code, but I haven't (yet) found an instrumentation technique that would work in an enterprise environment. I give second preference to utl_file based approaches since for every line that you output, you have to go through open-write-close cycle. In high frequency output mode, this could be a problem.

Maybe AQ is an option to capture everything and write it out once on process completion, successful or otherwise.

Are there any other examples or techniques about instrumentation?

I however hacked your debug.f (didn't work cleanly for me in the 10g environment related to call stack), referred to Cary's presentation (from Hotsos 2005) about generating standard trace file and have something in alpha version that works in my environment.

But I'd like to hear from others who have used this or similar approach towards code instrumentation.

Cheers
Raj

Thu Jun 30, 10:56:00 AM EDT  

Blogger Thomas Kyte said....

Raj Jamadagni said...
but I haven't (yet) found an instrumentation technique that would work in an enterprise environment.


why don't files work? It is what I've always done, it is what Oracle does?

Thu Jun 30, 11:03:00 AM EDT  

Anonymous Raj Jamadagni said....

Tom,

correct me if i am wrong, but except for alert.log, don't think oracle does open-write-close for trace files. They do open-write-write-...-close at the end of the session (or until asked to stop). This is a slightly different behavior when utl_file is used.

This open/close overhead is what i am talking about in high write environment.

Feel free to correct me.
Raj

Thu Jun 30, 11:09:00 AM EDT  

Blogger Thomas Kyte said....

Raj Jamadagni said...

Tom,

correct me if i am wrong, but except for alert.log,


but why do you need open/write/close for trace

and even if you do, running with trace ON is supposed to have some performance implications (sql_trace=true is slower than sql_trace=false)

but I don't think you need open/write/close. Oracle trace files don't anyway.

Thu Jun 30, 11:13:00 AM EDT  

Anonymous Doug said....

Thanks Tom, this is *such* an important subject. It's incredibly frustrating to regularly face the situation as a DBA where I have to say 'I can tell you what the database is doing, but ...'. Sometimes you can help people, sometimes not and it's soooo frustrating trying to guess what's going on inside a "black box"

I think another crucial part of the equation is the knowledge that such tracing exists. For example, I've come across systems with network issues or application problems where the information is available, but the people I'm depending on don't know how to use it.

I suppose if *everything* was instrumented it would become standard practice to look at the results.

Thu Jun 30, 11:21:00 AM EDT  

Blogger Tim... said....

Raj Jamadagni:
The instrumentation isn't running all the time (unless you want it to). You only use it to diagnose a problem. With this in mind I don't think the performance overhead is that much of a issue.

Think of it like level 12 SQL trace, you'd be mad to switch it on all the time, but you wouldn't want to remove the feature in production.

I played around with DBMS_PROFILER and DBMS_TRACE hoping to find a way to replace my instrumentation. Great tools but not a replacement to instrumentation.

Tom:
Thanks for the link. Got to think of something impressive to write. The pressure is on.

Thu Jun 30, 11:27:00 AM EDT  

Blogger Alberto Dell'Era said....

I always log in tables, using an autonomous transaction to get the message logged even after rollbacks.

Why using files - it's so convenient to use SQL to investigate ... where msg like '%ORA-%', where msg like 'start%my_procedure%', up to group bys, analytics to find elapsed times between start/end of procedures, etc etc

Thu Jun 30, 11:44:00 AM EDT  

Anonymous Verghese said....

Google for log4plsql

Thu Jun 30, 11:54:00 AM EDT  

Blogger Tim... said....

alberto dell'era said....

I always log in tables, using an autonomous transaction to get the message logged even after rollbacks.


Don't mention the "autonomous transaction" thing within earshot of Tom. See the last couple of posts:

http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:1155066278457

;)

Thu Jun 30, 11:58:00 AM EDT  

Blogger Thomas Kyte said....

tim said...

alberto dell'era said...


Tim -- still bitter about the when others :)

Personally, for tracing - I typically prefer files (and will load them if I need to SQL them).

For things like average transaction response times (auditing stuff) -- I'm all about tables (but no need for an autonomous transaction on that, just make a commit be:

insert into ... (the info);
commit;

Thu Jun 30, 12:04:00 PM EDT  

Blogger Tim... said....

Thomas Kyte said...

Tim -- still bitter about the when others :)


Don't start or I'll get all emotional again :)

Thu Jun 30, 12:23:00 PM EDT  

Anonymous Jeff DBA said....

" If you just whip together and application and through it out there without any thought to monitoring it over time " Tom, sometimes you type too fast, and you don't look back :)

Thu Jun 30, 12:34:00 PM EDT  

Blogger Alberto Dell'Era said....

Tim said:
Don't mention the "autonomous transaction" thing within earshot of Tom.


I know, but I know also the rationale behind Tom's adversion to atrans, namely that often inexperienced people abuse of them. "no feature is totally evil", and in the case of logging, I've found no problems using them (perhaps performance, undo & redo, but it is just a matter of measuring whether it is acceptable or not - a function of how much info you log).

Thu Jun 30, 12:34:00 PM EDT  

Blogger Thomas Kyte said....

Jeff DBA said...

he he he, I fixed that as you were typing... It was fixed before you hit submit :)

Thu Jun 30, 12:46:00 PM EDT  

Anonymous Anonymous said....

There is no silver bullet, but instrumentation at least qualifies as a bronze bullet.

Thu Jun 30, 01:22:00 PM EDT  

Blogger Bill S. said....

Thomas Kyte said....

Jeff DBA said...

he he he, I fixed that as you were typing... It was fixed before you hit submit :)

Not quite (just a second ago):
"If you just whip together and application"

:-D

Thu Jun 30, 01:25:00 PM EDT  

Blogger Thomas Kyte said....

Bill S. said...

Uncle, I give. Fixed.....

Thu Jun 30, 01:37:00 PM EDT  

Anonymous Anonymous said....

Personally, for tracing - I typically prefer files (and will load them if I need to SQL them).

For things like average transaction response times (auditing stuff) -- I'm all about tables (but no need for an autonomous transaction on that, just make a commit be:

insert into ... (the info);
commit;

I know this is not asktom, but
I think I am confused. Won't
the commit also commit
the parent transaction? Which
is why I thought autonomous
transasctions were used in
logging?
I agree with files being better
- especially since you can
do real time "tail" to
see additional output instead
of repeating a select.

rmenon (forgot my password)

Thu Jun 30, 01:47:00 PM EDT  

Blogger Thomas Kyte said....

Anonymous said...

....
I know this is not asktom, but
I think I am confused. Won't
the commit also commit
the parent transaction?



What I meant was

Instead of the application just calling COMMIT, I insert the audit record and COMMIT -- meaning, when the parent transaction is ready to be "done", we audit the timing and commit it.

Thu Jun 30, 02:07:00 PM EDT  

Anonymous Scot said....

Safe to assume you'll write a whole chapter about this in volume 2 (if not volume 1) of your new book(s)? I started thinking seriously about this issue when I read your sections in Effective Oracle By Design.

Perhaps some additional examples of how to instrument, showing many different methods (app_info, atrans, no atrans, files, some other way, etc).

Thu Jun 30, 02:43:00 PM EDT  

Blogger Jasper Scholten said....

L.S,

It took me a year to find a good way to tell a development team that I want instrumentation of code and in what way.

To make things measurable, easy to debug and easy fixable as you get all information needed when needed from your own code.

Well, it is certainly worth while.

By the way, nothing wrong with files, nothing wrong with keeping stuff in the database. As long as it is thought over thorougly.

Just my 50 cents...

Cheers,

Jasper

Thu Jun 30, 03:47:00 PM EDT  

Blogger Jeff Hunter said....

Developers get on the bandwagon real quick when you tell them "I know your process took 3 hours to complete, but it only spent 49 seconds on the database".

Thu Jun 30, 04:20:00 PM EDT  

Blogger Tim... said....

Jasper Scholten said....

It took me a year to find a good way to tell a development team that I want instrumentation of code and in what way.


Did you have to learn to say it in a different language or something? It took me one second to think of, "Do it or I'll ..." :)

Cheers

Tim...

Thu Jun 30, 06:53:00 PM EDT  

Anonymous chet said....

I like the idea of instrumentation. It is something I would like to incorporate.

I can't seem to grasp the concept of how to turn it on and off though. Could you give me a hint?

Thanks.

chet

Thu Jun 30, 10:30:00 PM EDT  

Blogger Kalita said....

I think most people instrument their code when they are debugging rather than doind it as part of coding itself. I used to do that myself. Now I am big fan of instrumentation. Write trace message while coding and when something is not working just go thru the trace logs and find out what could have gone wrong. You feel good when you guess something based on the messages, go back to the code and find that your guess was correct :-) And if instrumentation is proper, sometimes you don't have to guess, you know what is wrong.
Tom,
May be you should make instrumentation part of your standard slides like bind variables and question authority..

Fri Jul 01, 01:25:00 AM EDT  

Blogger Thomas Kyte said....

chet said...
... I can't seem to grasp the concept of how to turn it on and off though. Could you give me a hint?


there are three main ways I do it

a) parameters. goto asktom.oracle.com, click on a page. Note the word:

NO

in the url, change it to

YES

and hit enter (there are other parameters as well to turn on sql_trace for each page)

b) database table as a configuration tool. Update a row in a table and all of a sudden trace is on for a module/user/session/whatever. The DEBUG.F package I use lots has this.


c) configuration file, much like a database table but outside the database. I used this with owarepl -- something that was downloaded and used by a lot of people -- and something I had to telnet into a system exactly once to figure out what was going wrong (and that was a compilation issue!)

so, parameters, database configuration settings, configuration files are the main techniques.

and remember, some instrumentation might always be "on" (like v$ tables are always "on'). Every page on my site generates a row at least - so I know what you did, and how long it took to do it. These are my transaction metrics.


Kalita said...

May be you should make instrumentation part of your standard slides like bind variables and question authority..


It is part of it :) In the "tools I use" presentation I have I close up with this and go through the various techniques.

Fri Jul 01, 08:44:00 AM EDT  

Anonymous chet said....

Thanks Tom.

I am using HTMLDB now. I like it a lot. I even got praise for a question I asked the other day on the HTMLDB forum...I had just read "How to ask a Question."

Sergio was happy I followed your guidelines.

Fri Jul 01, 10:22:00 AM EDT  

Blogger Jasper Scholten said....

Tim said: Did you have to learn to say it in a different language or something? It took me one second to think of, "Do it or I'll ..." :)

Cheers

Tim


No, not in an other language. You need to try not be the DBA telling what to do, but keep friends with the developers and be "one of the guys".

Of course it was used before, but it costed me a year to bring it in such a way, that they feel like you are giving them something that does what they already were looking for, for years.

Cheers,

Jasper

Sat Jul 02, 04:12:00 AM EDT  

Blogger Lou said....

Tom,

In your book "expert" in the latter part you talked about "creating a report that you can show to upper management". Can you direct me or show me how to produce such report. Do you have any report like that?


Lou

Thu Jul 07, 01:55:00 PM EDT  

Blogger Thomas Kyte said....

Lou

first you tell me what is important to your business?

Management might want to know uptime of system, average transaction response times by business transaction and how that is trending over the last week, month, six months, year. Number of transactions processed, how many failed.

What would be useful to YOUR management. On asktom, I don't care how many physical IO's I did (well, "I" do, but "I the manager" do not). I care how many page views, avg/min/max response times, number of unique visitors, return visitors, month over month comparisions, day over day (by week - how is this monday compared to last monday and the monday before and so on). when do people come most, least.

That is a short list of what is important to *me*. My manager however just cares "how many people come, wow that's alot"

Thu Jul 07, 02:06:00 PM EDT  

Blogger Lou said....

Tom,

I do agree. But no where near either in your website or any other place. I have seen such report. A genetic report which us(dba) can start developing even with the metrics you just mentioned.

Lou

Thu Jul 07, 02:12:00 PM EDT  

Blogger Thomas Kyte said....

Lou

there is no such standard report, I was throwing out to you what is IMPORTANT TO ME. It isn't what is relevant to you (maybe)

but do this, just list them out. Write down on paper somewhere "this is what is relevant".

The report naturally comes from that. A report is just a report, the contents, what you WANT in there is what is important.

My reports -- they are sqlplus scripts -- seriously. They mean something to me. They would not be relevant to you - nor would they help in understanding what YOU need to report to your management.


Just write down that list of what you need to report, then start finding it. Then start printing it.

Thu Jul 07, 02:24:00 PM EDT  

Blogger Lou said....

Tom,

Understood. You were heard loud and clear!

Thu Jul 07, 02:57:00 PM EDT  

Anonymous Anonymous said....

One other comment on instrumentation from a tester. If you properly instrument code it can assist functional testers in researching potential issues and help lead them towards a more thorough understanding of the system. The understanding usually seems to lead to less frustration in communication between testers and developers.

Concretely, I've seen instrumentation lead to less "by design" and "duplicate" bug reports. Once the tester understands the instrumentation and how to read it, they can isolate the real issue and also attach the logs to the bug for all to review.

Tue Jan 03, 02:23:00 PM EST  

Anonymous Anonymous said....

Hi Tom,

Hate to be the one to ask the newbie-sounding question, but could you give me a short definition of what instrumentation is? It's easy to get an idea from your post but a definition will be useful (for me, at least).

Thanks,

David.

Mon Sep 10, 10:35:00 AM EDT  

Blogger Thomas Kyte said....

instrumentation is the fine art of making every other line of your code some sort of debug/diagnostic output.

like using the "verbose" mode of many products - so you can see what is actually happening under the covers.

sql_trace=true - the trace output, that is a good "for example"

Mon Sep 10, 11:42:00 AM EDT  

Anonymous Anonymous said....

Thank you very much, Tom. I'd come to this entry via a link on another site (Confessions of an IT Hitman). I've only just noticed that it's really old! I'd assumed it was from this past week or something. Thanks for taking the time to reply tho.

I'll try using instrumentation on any new code I write and I might write it in whenever I'm debugging existing stuff. I guess it comes down to that eternal developer battle between "quick" and "complete": your common sense tells you one thing, but your tired mind and fingers tell you something else!

David.

Mon Sep 10, 11:56:00 AM EDT  

Anonymous Anonymous said....

"With timestamps" This is what i like the most in your posting. I believe timestamping your log is very important. But unfortunately some guys feel that is nasty to have lot of files in there log folder. Its not easy to search for the latest log.

Tom & others have you come across such comments if so can you share your experience or some real time mess up that was caused because of this.

Tue Dec 11, 08:55:00 AM EST  

Anonymous Mukesh said....

Hi Tom. Thanks for such a nice article on the importance of code instrumentation.

However, nowhere mentioned a good strategy on how to achieve this. I downloaded debug.f but I'm a bit confused on how to use it effectively. I would appreciate if you throw some light on this.

Thanks.

Thu Nov 20, 12:21:00 AM EST  

Blogger Dave Edwards said....

Responses came to that in turn, from Cary (On the Usefulness of Software Instrumentation) and Tom Kyte (Instrumentation).

Log Buffer #134

Fri Feb 06, 03:52:00 PM EST  

Anonymous Harig31 said....

Tom,

I completely agree with you, I had so much times to debug, tests and justify the execution time of my programs that I created my own instrumentation. I just released it on sourceforge : http://dbuglibraryora.sourceforge.net/
I tried to balance information, speed and easy of use.

Tue Jun 09, 08:56:00 AM EDT  

OpenID raminorujov said....

Nice utopia link is broken, the correct on is
http://www.oracle-base.com/blog/2005/06/30/my-utopian-development-environment/

Wed Mar 03, 03:57:00 PM EST  

POST A COMMENT

<< Home