Tuesday, September 15, 2009

Every day I learn something new...

Two things

One, here is something new I learned today. I wasn't aware of that, that is a nice "stealth" feature :)

The other thing - I have a podcast on 11g Release 2 "things" posted on oracle.com if you are interested....
POST A COMMENT

29 Comments:

Blogger Surachart said....

Thank you for your podcast.

Good! if we look & learn something new every day.

;)

Tue Sep 15, 10:57:00 AM EDT  

Anonymous Darryl said....

How granular is the detection? Is it at the table level or at the actual result set level?

Either way I hope there is a way to override it. Take the following situation for example:

1) Function A caches "certain values" from table B
2) Function A is set to rely on table C
3) Whenever the "certain values" in table B change a trigger does DML on table C to invalidate the cached results of function A.

It save a fair amount of CPU ane elapsed time doing it this way.

What is the alternative in 11.2?

Tue Sep 15, 11:07:00 AM EDT  

Blogger Thomas Kyte said....

@Darryl

result set caching - all three types:

a) server results cache
b) function results cache
c) client side cache

are all at the table level.


I don't see the real world use case you stated, can you make it more "real" for us, sounds very sketchy as you stated it. Sounds overly complex and fraught with potential "I didn't know it worked that way" issues down the road (eg: you are trying to be too tricky)

Tue Sep 15, 11:13:00 AM EDT  

Anonymous Darryl said....

One option I was looking at was for column values that get queried a lot but rarely change while
other columns in the table change frequently. Prior to 11g I would just hold these in PL/SQL package
memory but that only makes them available within a session. The result cache makes them available
to all sessions.

Conside this simple table:
SQL> desc abc
Name Null? Type
--------------------- -------- ------------
COL1 NOT NULL NUMBER
COL2 VARCHAR2(30)
COL_THAT_CHANGES_ALOT NUMBER

And this simple function:
CREATE OR REPLACE FUNCTION get_col2 ( p_col1 NUMBER )
RETURN VARCHAR2
RESULT_CACHE
RELIES_ON(abc) AS

CURSOR get_col2(cp_col1 NUMBER) IS
SELECT col2
FROM abc
WHERE col1 = cp_col1;
v_ret_val VARCHAR2(30);

BEGIN

OPEN get_col2(p_col1);
FETCH get_col2 INTO v_ret_val;
CLOSE get_col2;

RETURN(v_ret_val);

END;


And this simple query:
SELECT col2
FROM abc
WHERE col1 = 33;

With this plan (find index entry and go to table for value):
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 1 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| ABC | 1 | 34 | 1 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | SYS_C006005 | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("COL1"=33)

Statistics
----------------------------------------------------------
3 consistent gets
362 bytes sent via SQL*Net to client
409 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 rows processed


Now this query to use the cache:
SELECT get_col2(col1)
FROM abc
WHERE col1 = 33;

With this plan (find index entry and go to cache for value):
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time|
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01|
|* 1 | INDEX UNIQUE SCAN| SYS_C006005 | 1 | 4 | 1 (0)| 00:00:01|
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("COL1"=33)

Statistics
----------------------------------------------------------
2 consistent gets
470 bytes sent via SQL*Net to client
420 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
1 rows processed

The cache version is 1/3 faster and handles 70% less bytes. This makes my application
faster. But ABC.COL_THAT_CHANGES_ALOT changes a lot which makes my application slower
because the result cache repeatedly invalidates.

To avoid the invalidations I do the following:
1) create a table named BOB with 1 record in it
2) put a trigger on ABC.COL2 to update BOB
3) make the function rely on BOB

This makes my application faster.

Tue Sep 15, 01:00:00 PM EDT  

Blogger Thomas Kyte said....

@darryl

vertically partition your data would be my answer then.


this code by the way is a huge bug

OPEN get_col2(p_col1);
FETCH get_col2 INTO v_ret_val;
CLOSE get_col2;


you do not

a) check to make sure you got a value
b) check to make sure there is not another value to get


you should use select INTO - select into is not only faster and more efficient than your open/fetch/close - it is infinitely safer too.



your example is a bit whacked - I don't know why you have SQL in there at all - if you have get_col, you would NEVER select it like this:

..
SELECT get_col2(col1)
FROM abc
WHERE col1 = 33;
.....


never, never never. That would be beyond confusing - and it would consume a ton more resources than just

x := get_col2(col1);

does.

... The cache version is 1/3 faster ...

no, it isn't. It has the overhead of calling plsql from SQL (even if the function is cached). Runtime wise it probably takes LONGER than just letting sql do sql.

... This makes my application faster. ...

I think you might have tricked yourself into thinking that.

I do not believe it to be true.

To remove a single logical IO this way is not an appropriate use of

a) this feature
b) our time

Given that you introduce the overhead of switching from SQL to PLSQL (even when cached - this is a big hit)

yup, I thought so....


ops$tkyte%ORA11GR1> exec runStats_pkg.rs_start;

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> declare
2 l_col2 abc.col2%type;
3 begin
4 for i in 1 .. 5000
5 loop
6 SELECT col2 into l_col2 FROM abc WHERE col1 = 33;
7 end loop;
8 end;
9 /

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> exec runStats_pkg.rs_middle;

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> declare
2 l_col2 abc.col2%type;
3 begin
4 for i in 1 .. 5000
5 loop
6 SELECT get_col2(col1) into l_col2 FROM abc WHERE col1 = 33;
7 end loop;
8 end;
9 /

PL/SQL procedure successfully completed.

ops$tkyte%ORA11GR1> exec runStats_pkg.rs_stop(1000);
Run1 ran in 21 cpu hsecs
Run2 ran in 44 cpu hsecs
run 1 ran in 47.73% of the time

Name Run1 Run2 Diff
LATCH.cache buffers chains 10,170 5,182 -4,988
STAT...session logical reads 10,052 5,050 -5,002
STAT...table fetch by rowid 5,002 0 -5,002
STAT...rows fetched via callba 5,002 0 -5,002
STAT...consistent gets 10,022 5,017 -5,005
STAT...consistent gets from ca 10,022 5,017 -5,005
STAT...consistent gets - exami 10,014 5,008 -5,006
LATCH.Result Cache: Latch 0 10,000 10,000
STAT...buffer is not pinned co 15,004 5,000 -10,004
STAT...session pga memory -196,608 196,608 393,216

Run1 latches total versus runs -- difference and pct
Run1 Run2 Diff Pct
15,459 20,493 5,034 75.44%

PL/SQL procedure successfully completed.

Tue Sep 15, 01:16:00 PM EDT  

Blogger Thomas Kyte said....

@Darryl

getter/setter functions called from SQL like that - I despise them. They are performance NIGHTMARES.

And as shown above, huge performance *HITS* even when cached.

Tue Sep 15, 01:17:00 PM EDT  

Blogger Tim... said....

Hi.

It's these "stealth" features, as you put it, that I really like about new releases. It proves people are actually using the technology and thinking of ways to make life easier, rather than looking for the next big sales pitch.

Cheers

Tim...

Tue Sep 15, 01:38:00 PM EDT  

Anonymous Anonymous said....

Tom,

Totally unrelated to your blog topic. Can you please perform a CPR on asktom.oracle.com? Looks like it fainted!

Thanks,

Tue Sep 15, 01:53:00 PM EDT  

Blogger Thomas Kyte said....

@anonymous

can you be a tad more "precise" - I've been using asktom all morning.

What are the symptoms - before a medical procedure is performed - we should have a diagnoses first.

Tue Sep 15, 02:00:00 PM EDT  

Anonymous Darryl said....

I left error handling and the like out of the code for the sake of brevity.

You make a valid point about not using SQL when a function call will suffice.

So I'll compare the performance of the following blocks:

DECLARE
x VARCHAR2(30);
BEGIN
SELECT col2
INTO x
FROM abc
WHERE col1 = 33;
END;

DECLARE
x VARCHAR2(30);
BEGIN
x := get_col2(33);
END;

SQL*Trace shows 0.03 cpu and elapsed time for the straight query and 0.00 for the
cached version (after the first execution of course).

Tue Sep 15, 02:05:00 PM EDT  

Blogger Thomas Kyte said....

@Darryl

vertically partition if you think this is a "good idea"(tm)

I think this is a "bad idea"(tm) - you can let sql do the work of sql.

Use the server results cache if you want - that way you don't write hundreds of getter functions, just use SQL where SQL is the right approach (and sql is the right approach for getting data out of the database)


And do not leave error handling out of examples!

But also - it looks like you do not use select into's (that would have made the code much more brief and concise) - you should. open/fetch/close is an extremely bad practice.

Tue Sep 15, 02:10:00 PM EDT  

Anonymous Darryl said....

Partitioning means extra money so its non starter.

I'll submit an "un-enhancement"(tm) request for the relies_on clause to see how it goes.

We'll just have to agree to disagree on everything else.

Tue Sep 15, 06:58:00 PM EDT  

Anonymous Rajesh said....

Tom,

The pod cast was great.. Especially the pat where you listed the new features from various releases.

I am looking forward to reading the new features guide and experimenting with the new features for developers.

Thanks,
Rajesh.

Tue Sep 15, 09:35:00 PM EDT  

Blogger Thomas Kyte said....

@Darryl

Partitioning means extra money so its non starter.

I said - each time - VERTICALLY PARTITION. Eg: two tables - actively_updated_table + not_actively_updated_table

Use a view to join together. Use b*tree clusters, hash clusters to physically colocate (store in a single block data from both tables so when you IO one into the cache, the other is put there already). Use an instead of trigger if you want on the view to make it appear to the application that there is one table.

In short - vertically partition - which is "free"


We'll just have to agree to disagree on everything else.

If you are talking about using SELECT INTO (or not in your case) you would be wrong, there is no chance for disagreement

open/fetch/close is

a) buggier (you don't need error Handling code, you didn't leave out error handling code, you left out at least 50% of the necessary, minimum code to do it right).

b) less maintainable

c) less clear

d) less performant



If you meant "getter setter" functions they are again

a) less performant
b) less maintainable
c) a NIGHTMARE

using SQL is the correct approach. Using SQL with result caching is infinitely superior to writing dozens or hundreds of getter functions.

Just use SQL, it isn't so scary. You are trying to be oh so tricky, but I truly believe you have negatively impacted your performance in every way, you have made your code much more error prone.

Wed Sep 16, 06:28:00 AM EDT  

Blogger Brian Tkatch said....

@Tom

I think some people prefer CURSORs because they feel more in control. If someone does not fully understand set manipulations, he will fear them. This happens to many procedural coders that jump to SQL, thinking its just another language, and not understanding the database paradigm.

I tend to re-write everything as "one big query", using the mantra "let the database do its job", which you quoted earlier.

Many people simply don't believe that large PROCEDUREs that take minutes to complete are slow because of CURSORs, temp TABLEs, and parameters, and can be written as a VIEW to run with a WHERE clause in under a second. On my last project i did similar things a few times.

I think Darryl needs to test your suggestion until he becomes comfortable with it. Then he will realize it is the obvious solution.

Wed Sep 16, 08:29:00 AM EDT  

OpenID carpenteri said....

Hi Tom,

Where you say

"If you meant "getter setter" functions they are again

a) less performant
b) less maintainable
c) a NIGHTMARE"

Not having had enough coffee yet - can you give an brief example of a "getter setter" function being used in SQL?

Regards

Ian

Wed Sep 16, 09:28:00 AM EDT  

Blogger Thomas Kyte said....

@Ian

Darryl did in his example


select get_something(primary_key)
from table
where primary_key = ?


I've seen code like this:

select get_name(pk),
get_address(pk),
get_phone(pk),
get_this(pk),
get_that(pk),
get_the_other_thing(pk)
from table
where some_condition;


And guess what get_XXXX does?

select XXXX into l_return
from table
where pk = p_pk;

they hit the table over and over and over again.


It is procedural coding, "object oriented coding" I've even heard it called - taken to an illogical extreme


There is nothing wrong with SQL, use it.

Use procedural code only when you CANNOT use SQL.

Wed Sep 16, 09:35:00 AM EDT  

OpenID carpenteri said....

Blimey!

Thanks for the clarification

Regards

Ian

Wed Sep 16, 09:58:00 AM EDT  

Anonymous Darryl said....

OK, I have done some testing with a slightly more real world situation. I've added a new table
that joins to the existing one that has the function. I guess this makes it more and more of
a lookup.

Here is the first piece of test code that does not use the function.

DECLARE
v_col2 abc.col2%TYPE;
BEGIN
FOR counter IN 1..1000 LOOP
SELECT col2
INTO v_col2
FROM def,
abc
WHERE def.pk_col = counter
AND abc.col1 = def.col1;
END LOOP;
END;

And here is the second that does use the function as well as its underlying result cache.

DECLARE
v_col2 abc.col2%TYPE;
BEGIN
FOR counter IN 1..1000 LOOP
SELECT get_col2(col1)
INTO v_col2
FROM def
WHERE def.pk_col = counter;
END LOOP;
END;
/

I ran each one several times and:

60% of the time they executed in an equal amount of time.
30% of the time the non-lookup(cached) one was 25% faster
10% of the time the cached one was 15% faster

The results were very similar when I replaced the cursor in the function with a straight SELECT.

So this test in this environment shows that the PLSQL Result Cache is not quite enough to offset
the use of PLSQL and SQL instead of just SQL. I plan to try more complex queries to see if this
holds true.

~~~~~~~~~~~~~~~~~~

I traced one run of each and got the following:

The non-lookup version showed this:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.03 0 0 0 0
Execute 1 0.15 0.15 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.15 0.18 0 0 0 1

The lookup/cached version showed this:
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.07 0.06 0 0 0 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 0.07 0.06 0 0 0 1

While the underlying SQL showed this:

Non-lookup version:
SELECT COL2
FROM
DEF, ABC WHERE DEF.PK_COL = :B1 AND ABC.COL1 = DEF.COL1


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1000 0.00 0.00 0 0 0 0
Fetch 1000 0.00 0.00 0 6000 0 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2001 0.00 0.00 0 6000 0 1000

Lookup/cached version:
SELECT GET_COL2(COL1)
FROM
DEF WHERE DEF.PK_COL = :B1


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1000 0.00 0.00 0 0 0 0
Fetch 1000 0.07 0.09 0 3000 0 1000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2001 0.07 0.09 0 3000 0 1000

Each table has 1,000 records in it so I'm not sure why they fetched 6,000 and 3,000...

Wed Sep 16, 02:25:00 PM EDT  

Blogger tagskie said....

hi.. just dropping by here... have a nice day! http://kantahanan.blogspot.com/

Thu Sep 17, 09:30:00 PM EDT  

Anonymous Pierre said....

Hi Tom,

I've listened to your podcast: you recommend to everybody to re-read the 11gR2 Concepts Guide
(for which you are now of the primary author) because it has changed a lot. Is there any other 11gR2 doc that has changed a lot with respect to 10gR2 (except the New Features Guide) ?

Thanks

Fri Sep 18, 04:04:00 AM EDT  

Anonymous Log Buffer said....

"To which Tom Kyte responded, every day I learn something new, which also links to Tom’s podcast on 11g Release 2 'things'. [...]"

Log Buffer #162

Fri Sep 18, 03:22:00 PM EDT  

Anonymous Anonymous said....

Not exactly related, but still:
Tom, what's your take on column-based relational databases?
in particular advantages of those over Oracle DB. Another (sub) question - SybaseIq vs OBIEE.
thank you.
andrew

Wed Oct 07, 09:49:00 AM EDT  

Blogger Thomas Kyte said....

@andrew

see this...

Wed Oct 07, 10:00:00 AM EDT  

Anonymous Anonymous said....

@Tom
thank you Tom. I see that there is an "exadata" - reading while posting %).
I wonder is there such a big difference how the data stored? Os read "big" amounts of data in a single io, drive has different "read-ahead" algorithms...
I'm not quite sure that there is SUCH a big gain in HOW data is stored in the DB... Mind to shed more light/experience?
Andrew

Wed Oct 07, 10:18:00 AM EDT  

Blogger Thomas Kyte said....

@andrew

with a column data store, think of it this way.

normally we store rows, on a block we would have

(c1, c2, c3), (c1, c2, c3) .....


With columnar storage - we store all of the c1 values AND THEN the c2 values AND THEN the c3 values and then have a row lookup that lets us reference these column values (like pointers)

if that were all, it would not compress anything - but....

when we build the C1 array of data - we deduplicate it (pretend we are loading an audit trail, lots of IP addresses would be repeated in C1).

and after de-dupping it, we compress it (lots of repetition again in IP addresses the leading edges might all be the same in fact).

so, we store hundreds of thousands of IP addresses in a few bytes.

And then we do that to column c2, and c3. Then we point to this data.

That allows us to spend (a lot) of cpu time during the load to rearrange the data, de-dup it, compress it and store it - so at retrieval time we might have 1/50th of the data to scan through - decompressing is FAST, compressing is very slow (that is why we'll offload lots of work to exadata)

Wed Oct 07, 10:36:00 AM EDT  

Anonymous Anonymous said....

@Tom
thank you Tom. Makes more sense for me now.
Do you have any stats to compare Sybase IQ vs Oracle OBiee?
Andrew

Wed Oct 07, 12:13:00 PM EDT  

Blogger santu said....

Dear
tom this is santosh from bangalore,india. i have really new for this wonderful domain.
can you please help me regarding the tunning of pl/sql code?
i mean How the pl/sql code will be tunned so that it'll affect the performance.CAn you suggest me the entire step of pl/sql tune.
Thanks in advance...

Mon Jan 25, 09:27:00 AM EST  

Blogger santu said....

Dear tom
can you please suggest me
the book or material in which
i will found the traceing,tunning concept of pl/sql elaborately.
or give some links of your blog
which was previously discussed.
i have your book Expert in oracle
but this colud not give me the proper idea.
please help me on this regards...

Tue Jan 26, 10:02:00 AM EST  

POST A COMMENT

<< Home