Friday, July 08, 2005

How cool is this part IIa

Alberto Dell'era said previously Is perhaps big_table ordered on disk by "id", ie by the PK of t1,t2,t3 ? If yes, I wonder what would be the results if it got randomly "disordered" (re-created using a CTAS with "order by dbms_random").

What a difference.  Instead of adding an order by dbms_random, I updated the primary key of BIG_TABLE to be random and re-ran the test with the constraints on (the check and primary key constraint).

Very promising, I had a “fake example” with the sorted data for the index, it was:

  • insert, no error logging, 10.1 seconds elapsed, 7.36 cpu seconds
  • insert, error logging, 47.6 seconds elapsed, 42.55 cpu seconds
  • procedural code, 21 seconds elapsed, 17.99 cpu seconds

It is (after randomizing the sorted order of the primary key)

  • insert, no error logging, 30 seconds elapsed, 26.56 cpu seconds
  • insert, error logging, 68 seconds elapsed, 47.02 cpu seconds,
  • procedural code, 54 seconds elapsed, 40.56 cpu seconds

Obviously, there is more to test!  The LOG ERRORS still generated more redo — but less than 2x the redo (not 5 times).  It generated only a little more undo and the latch gets were very comparable this time.  Alberto – thanks for picking up on that one.

What I’d like to do is re-run the stuff with sorted/not sorted data.

Add a test to have 5% of the data fail (randomly), 10%, 15%, 20% and 25%.  I believe anything above 25% indicates a serious failure in the ETL process and so wouldn’t be real world.

Check out update, delete and merge times similarly.  We’ll work that out over the course of next week. 

Hats off to Alberto!



Blogger Alberto Dell'Era said....

Hey, you want to kill me out of blushing this night ;)

I would add to the tests an insert into a naked table (no PK, no indexes, etc). Besides that being an interesting experiment in simplified conditions, it's also very real-life, since very often in a DWH indexes are UNUSABLE during the load, and constraints are not validated - LOG ERRORS would be interesting in such an environment to eg avoid failure due to column max size violations of a rogue row (eg a charset conversion problematic row trying to fit a two-byte char in a VARCHAR2(1) column - eg in an AL32UTF8 database, perhaps in Italy ;)

Fri Jul 08, 06:54:00 PM EDT  

Anonymous Anonymous said....

Not sure I understand what your sorted/unsorted, random sort proves? You gave 2 sets of numbers, what do they mean? Can you or Alberto please provide an interpretation for the rest of us poor souls out here?! Thanks

Fri Jul 08, 07:17:00 PM EDT  

Blogger Alberto Dell'Era said....

please provide an interpretation

Here is mine:

Data is stored sorted in an index - in our case, for the index that supports the PK, it is sorted in ascending order by "id".
In general, if you insert rows into the index following the same order, you are making the job easier for Oracle (that's intuitively sound, think about adding 1,000 "name/phone number" pair to your paper phone directory, which is sorted by name - if the new pairs are not sorted by name already, you have to turn pages after each number has been added).

Especially, there are a number of optimizations in place in the kernel for adding "above the head" of the index (in the phone directory metaphor, that would be adding after the last name contained - the first name is the "tail", the last is the "head") when using insert-selects.

In our test case, we are adding into an empty table, so definitely "above the head".

In the original test case, rows in big_table were sorted on disk by id, and so inserted in the index in the same order - so the optimizations kicked in, but not for the LOG ERRORS case, probably because they have not been re-implemented yet for this new feature (my guess). In fact, performances for LOG ERRORS were comparable to the procedural code one.

By randomizing the order in big_table, we (Tom actually ;) prevented Oracle from performing these optimizations; the remaining differences are so due to other reasons (to be investigated, probably other optimizations have yet to be re-implemented).

Sat Jul 09, 07:42:00 AM EDT  

Blogger Thomas Kyte said....

Anonymous said...

Not sure I understand what your sorted/unsorted, random sort proves?

to follow onto Alberto's explanation of what happened, here is an example. I have a script big_table I use to create a "big table" of as many rows as I like. I choose 1,000,000. An artifact of the way I create this table is that all of the column values are distributed all over the table with the exception of the ID column which is stored "sorted" in the table generally. So, an index on ID on big_table would have a very very good clustering factor since the table is stored "sorted" by that key.

So, I created my big_table (sorted) and renamed it big_table_sorted. I then created table big_table_random as select * from big_table_sorted order by dbms_random.random.

Then, setup empty tables t1 and t2 - looking just like the big_tables, with a primary key on ID. Did the inserts:

SQL> exec runStats_pkg.rs_start;
PL/SQL procedure successfully completed.

SQL> insert into t1
2 select *
3 from big_table_ordered;
1000000 rows created.

SQL> exec runStats_pkg.rs_middle;
PL/SQL procedure successfully completed.

SQL> insert into t2
2 select *
3 from big_table_random;
1000000 rows created.

ops$tkyte@ORA10G> exec runStats_pkg.rs_stop;

Now, some numbers:

CPU used by each was:
6.75 cpu seconds sorted
23.87 cpu seconds random

90/10 leaf splits
1,874 sorted
0 random

session logical reads
195K sorted
2,903K random

28 meg sorted
105 meg random

178 meg sorted
388 meg random

1M sorted
11M random

meaning -- the sorted order of the data gave the bulk insert an "advantage" out of the box. When the LOG ERRORS flipped us into psuedo row by row mode, it could not take advantage of those optimizations. When we un-bias the example, randomize the data, LOG ERRORS is not nearly as bad as it at first appeared in a conventional path load

Which is why I'll spend the next week looking at it bit by bit on the blog :) Look for more tests/examples soon

Sat Jul 09, 08:23:00 AM EDT  

Blogger Alberto Dell'Era said....

To finish it up - the "90/10 leaf splits" are the "adding 'above the head' optimizations" kicking in,
and the "session logical reads" and "latches" increase is due to Oracle "turning pages" in the "paper phone directory" metaphor (actually, hopping from one leaf block to another while inserting).

Sat Jul 09, 11:01:00 AM EDT  

Blogger Alberto Dell'Era said....

Looks like that your blog is turning (partially) into the "TomSays" of "askTom", where you push Oracle topics instead of pulling them, for public discussion.

If that's going to be, perhaps you may consider opening a separate section in asktom (perhaps blue-themed instead of red;) to make us leverage the superior editing abilities of the asktom engine (and having a search engine, backups, etc).

Sat Jul 09, 11:15:00 AM EDT  

Blogger Thomas Kyte said....

alberto dell'era said...

Looks like that your blog is turning (partially) into the "TomSays" of "askTom"

Heh, yeah, just what I need, another thing to distract me :)

this does have a pretty good search though -- up top. Google search. I use it to find my old blog entries myself.

Sat Jul 09, 11:29:00 AM EDT  

Blogger Alberto Dell'Era said....

Heh, yeah, just what I need, another thing to distract me :)

It wouldn't be "another", you may just leave a pointer here to "TomSays" when you want to publicly discuss an idea/new feature.

Ok for the search engine, but asktom editing capabilities, and presentation look&feel for eg drag&dropped code, are far superior (try putting an explain plan here - or the output of a sqlplus script ;).

Sat Jul 09, 12:47:00 PM EDT  

Anonymous Rob ZIjlstra said....

Hello Tom,
I would rather read about the things you see when you're sitting in your garden, doing nothing connected with Oracle. Discussions like this I expect to see on your own Ask Tom site, not here.
This weekend we had two days on a row bbq, with two days lovely weather. It seems to me this is a much better topic to discuss on your blog. What do you eat / drink when you bbq? How many people are normallly there?

Greetings from a sunny Holland,

Rob Zijlstra.

PS Tomorrow at work we will be 'serious' again...

Sun Jul 10, 03:28:00 PM EDT  

Blogger Alberto Dell'Era said....

What's a BBQ - Break in-Between Questions perhaps ? ;)

Just to avoid misunderstandings - I wasn't saying you shouldn't talk about Oracle here. IMHO in a blog one has to talk about what it feels like talking about at the moment - one day it may be squirrels, another a latch - that's the spirit of blogs.

By the way, IMHO = In My Humble Opinion

Sun Jul 10, 05:33:00 PM EDT  

Anonymous ShoeLace said....


BBQ is a contraction (short form) of Barbecue.

do a google search for define:BBQ and you get some good definitions.. (i use define: a lot on google)

in my case it gave a link to

but "Barbecue, is a method of cooking food with the radiant heat and/or hot gasses of a fire"

In australia it is often just refered to as a "Barbie". (like the doll. from which many jokes have been formed) and are most often social events where people bring their own meat to cook and eat and and salad to share.

commons BBQ foods are steak, sausage, chop, prawns, corn cobs, burgers (minced meat patties), various chinken things etc.. and often fish aswell.

I hope that clears that up.

PS: i know it not about the orginal post.. but the question was never answered (and i hate unaswered questions)

Fri Dec 09, 01:47:00 AM EST  

Anonymous Nabil Hachem said....

This case of inserting in order versus using random keys is typical of building indexed organized tables. It is far better to bulk insert data in order of the primary key in the case of indexed organized tables; assuming the table does not have additional indexes.

Tue Mar 21, 10:09:00 AM EST  

Blogger RAJESH said....


I did a Test in and It seems to be Direct path loading is good.

Is, there is any flaws in my Benchmarks?

Tue Sep 21, 08:18:00 AM EDT  

Blogger Thomas Kyte said....


I'm guessing that you did not test what happens when a unique constraint is actually violated - nor in fact when any constraint is violated.

That would be a rather big deal to test - especially since dml error logging does not work with direct path loads and unique constraints - if you violate a unique constraint with direct path - you rollback the entire statement at the end.

Tue Sep 21, 12:41:00 PM EDT  


<< Home