Wednesday, September 19, 2007

Tuning with sql_trace=true...

My all time favorite seminar topic is my talk on "All about Binds".  It takes about 3 hours to do the entire talk from start to finish and covers performance, memory utilization, scalability, security, bind mismatches, bind variable peeking and cursor sharing.

Today, I got a question on asktom that I answer - no, I actually demonstrate the answer to - in that presentation.  During the presentation - I say

"I'm going to show you how to tune by setting sql_trace=true.  You are probably all thinking that I'm going to run a query, observe it running slow, trace it and then tune it.  Well, I'm not going to do that - all I'm going to do is set sql_trace=true and we'll observe the query performance and resource utilization to be dramatically affected - all for the better!"

The question on asktom is identical to what I demonstrate - they wrote:

"I have a query that runs slow (2 minutes) So what do I do? Turn on tracing (both sql_trace=true and event 10046) before running the query. But consistently, 100% of the time, without changing anything else, when I enable tracing, the query comes back instantly!!"

I know exactly what is happening...  It is a combination of two things

  • The fact that when you set sql_trace=true, you set up a new 'parse' environment (made up term).  You have changed your session in such a way that you will not share any existing SQL that was not parsed with sql_trace enabled.  So, it is highly likely that you will either hard parse a new version of the query - or use some existing child cursor that is different from the one you would use with sql_trace disabled.
  • Bind variable peeking happens at hard parse time and may affect the plan chosen by the optimizer.

So, first for the setup script to demonstrate this phenomena - I'll set up a very skewed set of data for emphasis:

ops$tkyte%ORA10GR2> create table t
2 as
3 select case when rownum = 1 then 1 else 99 end id, a.*
4 from all_objects a
5 /
Table created.

ops$tkyte%ORA10GR2> alter table t modify object_id null;
Table altered.

ops$tkyte%ORA10GR2> create index t_idx on t(id);
Index created.

ops$tkyte%ORA10GR2> begin
2 dbms_stats.gather_table_stats
3 ( user, 'T',
4 method_opt=> 'for all indexed columns size 254' );
5 end;
6 /
PL/SQL procedure successfully completed.


So, it is quite clear that WHERE ID=1 will return one record and WHERE ID=99 will return all of the rest (about 50,000 records). Also, the optimizer is very aware of this fact due to the histograms in place and if we parse using literals - we can definitely see different plans for different inputs:


ops$tkyte%ORA10GR2> set autotrace traceonly explain
ops$tkyte%ORA10GR2> select * from t where id = 1;

Execution Plan
----------------------------------------------------------
Plan hash value: 470836197

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 96 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 96 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=1)

ops$tkyte%ORA10GR2> select * from t where id = 99;

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 50244 | 4710K| 235 (4)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| T | 50244 | 4710K| 235 (4)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID"=99)

ops$tkyte%ORA10GR2> set autotrace off


But, what happens when we issue "select * from t where id = :id" - when we bind the ID input?  Well, on the first hard parse - Oracle will wait till the cursor is opened to do the actual optimization of the query - it will wait for the bind variable value to be supplied by the application before figuring out the right way to optimize the query.  That is called "bind variable peeking", when the optimizer looks at the bind values and then optimizes the query.  In this case however, depending on what inputs are used to first run the query - we'll either get a full scan or an index range scan plus table access by index rowid - and that is the plan that will be used to execute the query "select * from t where id = :id" regardless of the bind values from now on.


What is happening to the poster on asktom is that



  1. Someone hard parsed the query in question and when they did so, the inputs they used resulted in "plan A"
  2. Plan A was the best plan for that person given their inputs - it was the most efficient plan
  3. Later, someone else executed the same query, using shared sql they reused the plan generated by the other person - but their bind variable inputs where different and the plan that was generated for #1 was not the best plan for these inputs, in fact the plan was miserable for these inputs
  4. They turned on sql_trace and executed the query with inputs from #3 again - but because sql-trace was on - they did not share the sql statement - they hard parsed - with their inputs - and this resulted in a very different plan

We can observe this easily, consider:


ops$tkyte%ORA10GR2> variable id number
ops$tkyte%ORA10GR2> set autotrace traceonly statistics
ops$tkyte%ORA10GR2> exec :id := 99
PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select * from t where id = :id;
50254 rows selected.

Statistics
----------------------------------------------------------
....
4031 consistent gets
....
50254 rows processed

 


So, we started off with id=99 as the bind, and a full scan was chosen (you can prove that via TKPROF by looking at the row source operation if you wish; I did...).  That means regardless of the bind value from now on - we will full scan - for example:


ops$tkyte%ORA10GR2> exec :id := 1
PL/SQL procedure successfully completed.

ops$tkyte%ORA10GR2> select * from t where id = :id;

Statistics
----------------------------------------------------------
...
720 consistent gets
...
1 rows processed
It is unlikely that was the index range scan/table access by index rowid - we'd expect a lot less logical IO, 3 or 4 IO's against an index and 1 against the table - we did a full scan for id=1.  That represents our "poorly performing query".  We turn on SQL_TRACE to find the performance characteristics of that and observe:
ops$tkyte%ORA10GR2> alter session set sql_trace=true;
Session altered.

ops$tkyte%ORA10GR2> select * from t where id = :id;

Statistics
----------------------------------------------------------
...
4 consistent gets
...
1 rows processed

 


That apparently just setting SQL_TRACE=TRUE is our tuning mechanism!  It isn't really - it is bind peeking and hard parsing that is causing this, but unless you know that a) sql_trace sets up a child cursor and b) bind peeking happens - this looks very mysterious, especially if you turn it off:

ops$tkyte%ORA10GR2> alter session set sql_trace=false;
Session altered.

ops$tkyte%ORA10GR2> select * from t where id = :id;

Statistics
----------------------------------------------------------
...
720 consistent gets
...
1 rows processed

 


See it run slow and turn it on again:

ops$tkyte%ORA10GR2> alter session set sql_trace=true;
Session altered.

ops$tkyte%ORA10GR2> select * from t where id = :id;

Statistics
----------------------------------------------------------
...
4 consistent gets
...
1 rows processed


And see it go fast...


It is not magic, it is bind peeking.  To see the real plans - you can query v$sql_plan / v$sql_plan_statistics or use DBMS_XPLAN to dump the plans for each child cursor:


ops$tkyte%ORA10GR2> select sql_id 
from v$sql
where sql_text = 'select * from t where id = :id';

SQL_ID
-------------
8s40hfjcbmxzk
8s40hfjcbmxzk

ops$tkyte%ORA10GR2> select *
from table( dbms_xplan.display_cursor( '8s40hfjcbmxzk', 0 ) );

PLAN_TABLE_OUTPUT
---------------------
SQL_ID 8s40hfjcbmxzk, child number 0
-------------------------------------
select * from t where id = :id

Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 235 (100)| |
|* 1 | TABLE ACCESS FULL| T | 50250 | 4710K| 235 (4)| 00:00:02 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ID"=:ID)


18 rows selected.

ops$tkyte%ORA10GR2> select *
from table( dbms_xplan.display_cursor( '8s40hfjcbmxzk', 1 ) );

PLAN_TABLE_OUTPUT
---------------------------
SQL_ID 8s40hfjcbmxzk, child number 1
-------------------------------------
select * from t where id = :id

Plan hash value: 470836197

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 1 | 96 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | T_IDX | 1 | | 1 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=:ID)


19 rows selected.



You can use v$sql_shared_cursor to see why you have multiple child cursors as well - for example:


ops$tkyte%ORA10GR2> select child_number, stats_row_mismatch
2 from v$sql_shared_cursor
3 where sql_id = '8s40hfjcbmxzk'
4 /

CHILD_NUMBER S
------------ -
0 N
1 Y


That is what you would look for to verify you were having a "must be sql_trace creating a new child cursor and a hard parse with different binds" problem.

POST A COMMENT

14 Comments:

Blogger Gary Myers said....

The first step, before tracing, should be to look at the plan from v$sql_plan. If you know the database, that would often show the problem.
11g will be a different kettle of fish, with its adaptive cursor sharing.
It should correct itself more often.
http://www.oracle.com/technology/pub/articles/oracle-database-11g-top-features/11g-sqlplanmanagement.html

Wed Sep 19, 04:30:00 PM EDT  

Anonymous Clemens said....

Hi Tom,
I'd appreciate an enhancement of the optimizer: Before applying the Bind Variable Peeking,it could look at the statistics of the column (id in your example) and depending on the skewness of the distribution of values decide whether it is good to fix the execution plan or not.
In my opinion, it is sometimes difficult for the developer to forecast the statistics and to decide whether or not to use the bind variable.

Wed Sep 19, 04:35:00 PM EDT  

Blogger Thomas Kyte said....

clemens said...

as pointed out above - 11g is doing that

and if you use cursor_sharing=similar to parse a query with literals, we will decide what to bind and what to not bind based on the statistics available.

Wed Sep 19, 04:43:00 PM EDT  

Anonymous Robert Lockard said....

Tom,

Interesting, you just explained something I have been trying to tune for the past hour. I'm glad I took a break.

BTW: I just gave away another copy of your book "Effective Oracle by Design" to a new developer. I should ask you for the volume discount.

Later,
-Rob

Wed Sep 19, 05:00:00 PM EDT  

Blogger Noons said....

this problem of execution plans not matching what explain plan shows is quite a trap, IMHO. It's been here for a while and it is extremelly dangerous.

Let's consider this scenario: you're runnning an application such as Peoplesoft where programs are written in Cobol and you don't have access to the source code - therefore any suggestions along the line of "change the code" are useless.

Normal case for anyone running pre-canned apps, even Oracle's own.

Now, you're faced with an instance of this problem, where the existing plan in memory is bad for later executions although it was perfect for earlier ones. Given that this might happen any number of times in a day, depending on what and how folks use the system, it is not feasible either to ask the DBA to reanalyze. The question of course being: reanalyze what and when?

So, what is the solution? We turn off cursor sharing and go back to no bind variables?

Wed Sep 19, 08:38:00 PM EDT  

Blogger Aman Sharma said....

Hi sir,
I am always get into this doubt that why this feature of BVP is actualy there from 9i?If it serves the purpose for the first time and subsequently kills things that its more trouble than help.Or is it some thing more better that it is intended to do?

Thu Sep 20, 12:00:00 AM EDT  

Blogger Gareth said....

Hi Tom,

Something I'm not clear on is how do you 'fix' this behaviour? I can see situations where you want both plans, indeed as you've proved both are valid plans, but the thing that isn't clear to me is how do you 'make it choose the plan you want' if that is the correct phrase.

Regards,

G.

Thu Sep 20, 03:49:00 AM EDT  

Anonymous Duke Ganote said....

Ah! The "observer effect"
http://en.wikipedia.org/wiki/Observer_effect
, which is sometimes mischaracterized as a Heisenberg effect.

Thu Sep 20, 08:35:00 AM EDT  

Anonymous Anonymous said....

Thanks for the detailed blog entry about my question on Asktom.

I am not entirely sure that is what is happening in my case.

I am on 9iR2. Your diagnostic queries to confirm the issue with child cursors are only for 10GR2. The V$ views in 9iR2 seem to be missing the sql_id column.

Do you have 9iR2 versions of the v$ queries you show?

Thanks again.

Thu Sep 20, 09:08:00 AM EDT  

Blogger Thomas Kyte said....

For answers to questions asked in these comments... please see
sql_trace=true, part two

Hey, that rhymes :)

Thu Sep 20, 10:29:00 AM EDT  

Anonymous Anonymous said....

Hi Tom,

Is this sql_trace behavior (Non shared environment) same when we trace a session via Extended SQL trace (setting events 10046, 10053 etc.)?

Cheers
Nik

Fri Jul 30, 01:21:00 AM EDT  

Blogger Thomas Kyte said....

@Nik

sql_trace=true is like 10046 level 0. tracing without binds or wait events enabled.

you should be using dbms_monitor.session_trace_enable in 10g and above - it lets you do the equivalent of a 10046 trace with or without binds and waits easily (without having to have alter session either)

Fri Jul 30, 05:38:00 AM EDT  

Blogger fauxing said....

Hi Tom;

Thanks for the insight - how do I incorporate the "alter session set sql_trace=true" privilege to users since this is highly restrictive?

Wed Sep 05, 09:53:00 AM EDT  

Blogger Thomas Kyte said....

@fauxing

use dbms_monitor in 10g and above!!

it can enable sql trace without letting them do anything else.

Wed Sep 05, 12:05:00 PM EDT  

POST A COMMENT

<< Home