Thursday, April 15, 2010

to_char(dt,'yyyy') is almost as good as trunc(dt,'y')...

I'm in the middle of updating Expert Oracle Database Architecture Edition 1 to Edition 2. It'll cover up through Oracle Database 11g Release 2. While doing it, I've been discovering that "things change over time". Well, Ok, that is not a new discovery - I've said that before...

If you are interested in the new edition, you can read the unedited, probably incorrect, updates - you can buy the "alpha book" in pdf, which will entitle you to a full pdf of the final copy. See http://apress.com/book/view/1430229462 for details.


Anyway, while updating the book - I was re-running all of my 'test' scripts - the scripts I use to demonstrate points - and of course since things change - the results of some of the scripts have changed as well. Things like private redo strands affected my measurements in the Redo/Undo chapter (rendering some of the old examples useless...). And other small changes pop up and make themselves apparent.

For example, to_char(dt,fmt) has changed, for the better. It is faster...

In the first edition, I wrote:

For example, they will use
Where to_char(date_column,'yyyy') = '2005'
instead of
Where trunc(date_column,'y') = to_date('01-jan-2005','dd-mon-yyyy')
The latter is a far more performant and less resource-intensive approach.

Note: in the following - you would not use literals in the query, you would use bind variables to represent the dates! Just an example...


While I would still *prefer* you to use trunc(date_column'y') - it says so much more than the to_char does semantically - it cannot be for the reason I wrote. In the first edition - I showed by executing a query against a copy of ALL_OBJECTS that it was more efficient:

select count(*)
from
t where to_char(created,'yyyy') = '2005'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.01 0.05 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 8 0.41 0.59 0 372 0 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 16 0.42 0.64 0 372 0 4

select count(*)
from
t where trunc(created,'y') = to_date('01-jan-2005','dd-mon-yyyy')

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 8 0.04 0.16 0 372 0 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 16 0.04 0.16 0 372 0 4

but when I re-ran that in 11g Release two - I discovered:


select count(*)
from
t where to_char(created,'yyyy') = '2005'

call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 8 0.09 0.10 0 364 0 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 16 0.09 0.10 0 364 0 4

select count(*)
from
t where trunc(created,'y') = to_date('01-jan-2005','dd-mon-yyyy')


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 8 0.07 0.07 0 364 0 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 16 0.07 0.07 0 364 0 4

*note - I used the same number of rows in 11g as 10g in the above

Well, that isn't too different anymore, is it? They made it a bit more efficient than it used to be in the past - the to_char and trunc are now about the same. In fact that change happened with 10g Release 2! (edition 1 of Expert Oracle Database Architecture was up through 10g Release 1)

Fortunately though, my ultimate conclusion - that:


select count(*)
from
t where created between to_date('01-jan-2005','dd-mon-yyyy') and
to_date('01-jan-2006', 'dd-mon-yyyy')-1/24/60/60


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 4 0.00 0.00 0 0 0 0
Execute 4 0.00 0.00 0 0 0 0
Fetch 8 0.00 0.00 0 364 0 4
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 16 0.00 0.00 0 364 0 4


is the right way to code that query still stands - that is from 11g Release 2 - it is significantly faster than to_char or trunc. And, it'll be infinitely more "index friendly" in general that either of to_char or trunc.


So, things change - the tuning advice of yesterday might well be meaningless or - at least less meaningful - today...

Glad I have my test scripts :) It would be much harder to update the book without them - I wouldn't have a big red flag staring at me in the face without them.

That is how today's myths get started - most of the myths surrounding the database have their origins in some nugget of truth from the past. But unless you have some method of seeing when things change - it'll be hard to figure out what's changed.
POST A COMMENT

13 Comments:

Blogger Jeff Hunter said....

So maybe by the end of your research you'll be advocating rebuilding all indexes every week?

eh...maybe not.

Thu Apr 15, 11:57:00 AM EDT  

Blogger Finn Ellebaek Nielsen said....

Very interesting and well documented as always Tom :-).

I prefer to use date and timestamp literals instead of those long TO_DATE expressions. They are compact and not affected by NLS settings.

So instead of

to_date('01-feb-2005','dd-mon-yyyy')

you could write

date'2005-02-01'

and instead of

to_date('01-feb-2005 15:16:17','dd-mon-yyyy hh24:mi:ss')

you could write

timestamp'2005-02-01 15:16:17'

In my view this is much more compact and easy on the eye.

It would be interesting to know whether using DATE or TIMESTAMP literals has an impact on the performance in your examples (I would assume not).

Thu Apr 15, 01:45:00 PM EDT  

Blogger Thomas Kyte said....

@Finn

the reason you should stick with to_date/to_timestamp in general (and NOT literals) is that to_date/to_timestamp permits......


bind variable usage

whereas literals do not. So in real life - there would not be constants in there - they would be bind variables...

I've actually updated the article to reflect that :)

Thu Apr 15, 01:57:00 PM EDT  

Blogger Finn Ellebaek Nielsen said....

Sure and I agree -- I would only use them with constant values.

My bind parameters would normally have the correct data type so I wouldn't need to convert it to a date but simply bind it directly like in

Where trunc(date_column,'y') = :first_jan_2005

instead of

Where trunc(date_column,'y') = to_date(:first_jan_2005, 'dd-mon-yy')

Wouldn't that give the same result and performance if first_jan_2005 is a DATE?

Cheers

Thu Apr 15, 02:05:00 PM EDT  

Blogger Stew Ashton said....

Tom, I bought the alpha book last week. Do you know if we will be notified when chapters are added or revised? If not, could you tell us if more alpha chapters are planned? I am guessing you have been hard at work since your activity on asktom has moderated.

Thu Apr 15, 03:42:00 PM EDT  

Blogger Thomas Kyte said....

@Stew,

I'll ask my editor to comment - I have no idea/control over it :)

Thu Apr 15, 03:44:00 PM EDT  

Blogger Jonathan said....

@Stew.

I'm Tom's editor. Yes, we do plan to post more alpha chapters. My practice is to post each chapter after making my edit pass. Look for another chapter sometime next week.

And we do notify customers of each new chapter. You'll get an email each time I post one.

Thu Apr 15, 09:17:00 PM EDT  

Blogger Stew Ashton said....

Thank you, Tom and Jonathan!

Fri Apr 16, 06:23:00 AM EDT  

Blogger Sidhu said....

Hi Tom,

Great to know about the 2nd edition.

BTW any news about the second part of the book ? :)

Thanks !

Wed Apr 21, 01:30:00 AM EDT  

Blogger Kevin said....

Pedantry, to be sure, but this:

select count(*)fromt where created between to_date('01-jan-2005','dd-mon-yyyy') andto_date('01-jan-2006', 'dd-mon-yyyy')-1/24/60/60

wreaks of "special knowledge of Oracle internals", specifically, that Oracle date fields are to the nearest second.

The following seems a touch more elegant:

select count(*)fromt where created >= to_date('01-jan-2005','dd-mon-yyyy') and created < to_date('01-jan-2006', 'dd-mon-yyyy')

Wed Apr 21, 12:52:00 PM EDT  

Blogger Thomas Kyte said....

@Kevin,

I'm not sure that knowing that dates are down to the second would be internal knowledge.

Rather I would have to assume that not only is it common knowledge - it is a fundamental mandatory piece of knowledge.

If someone is using the ANSI DATE datatype, and they do not know it is only down to the seconds - they are not able to safely use the database.


some people like the between, some people like the >=, <

some people really dislike the >=, < and some people really dislike the between.

I use both...

Wed Apr 21, 12:57:00 PM EDT  

Anonymous Anonymous said....

Tom, you probably know a lot more about the ANSI SQL standard than I do, but the Oracle date datatype is not an ANSI date, so using an Oracle date is not using an ANSI date. AFAIK, an ANSI DATE is required to have one day resolution, not one second. Subtracting a number from an ANSI date should subtract that number of days with any fractional days ignored. ANSI dates are allowed to include extra information including time, but as I understand it they are expressly prohibited from using any extra info including time in any comparisons. This would make the -1/24/60/60 idiom not compliant with ANSI requirements.

It is the ANSI TIMESTAMP that has one second or better resolution, with the number of decimal places for seconds implementation defined. Since ANSI timestamps can have fractional second resolution, I don't think your characterization of the ANSI date type is accurate, even allowing for possible confusion of terminology. (The ANSI standard is a nightmare of confusing terminology.) Subtracting a number from a timestamp subtracts that number of days, applying any fractional days as seconds, but since the number of decimal digits for the seconds is implementation defined, subtracting 1/24/60/60 would be wrong if the timestamp might be for example

TIMESTAMP '2005-12-31 23:59:59.99993'.

Although an ANSI TIMESTAMP has has implementation defined fractional second precision, it must allow at least 6 decimal digits. The Oracle date data type is like the ANSI utterance TIMESTAMP (0). (This should work in Oracle too.)

The ultimate problem is really that ANSI chose poor semantics for BETWEEN. Intervals should by default be half open, including the lower bound but not the upper. This makes most programming much more natural. But this choice was made long ago and we have to live with it.

Tue Apr 27, 03:37:00 AM EDT  

Blogger Thomas Kyte said....

@Anonymous

ok, let me put it this way:

knowing that the "Oracle DATE datatype has 1 second degree of granularity is not internal information, it is a prerequisite to using that type"


I'll still say - it is not a trick, not magic, not anything that someone using Oracle would not know (if they do not know, they need to learn it - else they cannot use that datatype safely at all)

Tue Apr 27, 06:54:00 AM EDT  

POST A COMMENT

<< Home