Sunday, January 29, 2006

Constraints have a bad name...

Constraints have a bad name. The thoughts on constraints might be (as applied to a person):

  • No one likes to be constrained. Just ask them, they’ll tell you so.

  • Constraints are impediments, they get in the way.

  • They are rules – things we have to “respect and obey”. No one likes that.

  • They get in the way – they are an annoyance, they slow things down.

This week, I’ll just put up some snippets showing that constraints in the database are “awesome”, things to be respected, something you’ll want to add to your schema as often as possible. Besides the obvious “we get better data”, you get things like “better plans, better performance”.

We’ve already seen one example above – if you specify NOT NULL when the column is in fact NOT NULL (as most columns are), then perhaps we can use certain indexes more often. Oh, how I wish that NOT NULL was the default state for all columns.

Well, how about a simple check constraint? The CBO (cost based optimizer) is pretty good at using check constraints to figure out when it might not need to do something and getting better all of the time. Funny thing is, this fact comes into play most frequently in a large data warehouse, the place where constraints are seemingly most often considered “evil” (things that slow us down). In fact, it is exactly the large data warehouse is where constraints are most vital! Consider:

create table t1
( num number check (num>0),
data char(20)
);
insert into t1
select rownum, rownum
from all_objects;
exec dbms_stats.gather_table_stats( user, 'T1' );

create table t2
( num number check (num<0),
data char(20)
);
insert into t2
select -num, data
from t1;
exec dbms_stats.gather_table_stats( user, 'T2' );

create or replace view vw
as
select * from t1
union all
select * from t2;

alter session set sql_trace=true;
set termout off
select * from vw where num > 0;
select * from vw where num < 0;
set termout on

tkprof will show something like this:

select * from vw where num > 0

Rows Row Source Operation
------- ---------------------------------------------------
49958 VIEW VW (cr=3577 pr=0 pw=0 time=2348976 us)
49958 UNION-ALL PARTITION (cr=3577 pr=0 pw=0 time=1799199 us)
49958 TABLE ACCESS FULL T1 (cr=3577 pr=0 pw=0 time=699638 us)
0 FILTER (cr=0 pr=0 pw=0 time=0 us)
0 TABLE ACCESS FULL T2 (cr=0 pr=0 pw=0 time=0 us)

select * from vw where num < 0

Rows Row Source Operation
------- ---------------------------------------------------
49958 VIEW VW (cr=3569 pr=0 pw=0 time=1349734 us)
49958 UNION-ALL PARTITION (cr=3569 pr=0 pw=0 time=1099623 us)
0 FILTER (cr=0 pr=0 pw=0 time=0 us)
0 TABLE ACCESS FULL T1 (cr=0 pr=0 pw=0 time=0 us)
49958 TABLE ACCESS FULL T2 (cr=3569 pr=0 pw=0 time=249879 us)

Note how the CBO was able to effectively just “skip” one of the tables in each case? This sort of “elimination” has been around since version 7.3 (was called partitioned views way back when) – assuming you have constraints in place. If you use autotrace in 10gr2 (or dbms_xplan before that) you can see what is happening here:

ops$tkyte@ORA10GR2> set autotrace traceonly explain
ops$tkyte@ORA10GR2> select * from vw where num > 0;

Execution Plan
----------------------------------------------------------
Plan hash value: 2831881304

------------------------------------------------------------...
| Id | Operation | Name | Rows | Bytes | Cost (...
------------------------------------------------------------...
| 0 | SELECT STATEMENT | | 49961 | 1317K| 111 ...
| 1 | VIEW | VW | 49961 | 1317K| 111 ...
| 2 | UNION-ALL PARTITION| | | | ...
|* 3 | TABLE ACCESS FULL | T1 | 49962 | 1219K| 57 ...
|* 4 | FILTER | | | | ...
|* 5 | TABLE ACCESS FULL| T2 | 1 | 26 | 57 ...
------------------------------------------------------------...

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("NUM">0)
4 - filter(NULL IS NOT NULL)
5 - filter("NUM">0)

See the addition of a new filter? NULL IS NOT NULL at step 4. The CBO basically removed T2 from consideration!
POST A COMMENT

16 Comments:

Anonymous Karl said....

Hi,
Constraints helps to keep SQL's simpler; a lot of constraints is contained in SQL intead on the Table/Columns. The following SQL must elimates NULL for consistent retrieving of data:

SELECT Act.*
FROM Activity Act
WHERE Act.Isactivityrepeat = 1
AND Act.Activitydate <
Nvl(Act.Repetitionenddate,
Trunc(SYSDATE) + Cn_90days)
AND Nvl(Act.Repetitionenddate,
Trunc(SYSDATE) + Cn_90days) < g_Target_Date


With the check constraint defined :

ALTER TABLE Activity ADD
CONSTRAINT CHECK_ACTIVITY_REPEAT CHECK ( ( ISACTIVITYREPEAT = 0
AND FREQUENCY = 1
AND REPETITIONENDDATE IS NULL )
OR
( ISACTIVITYREPEAT = 1
AND FREQUENCY IN (2,3,4,5)
AND REPETITIONENDDATE IS NOT NULL
AND NEXTACTIVITY IS NULL ) )
;

now the SQL is much simpler :

SELECT Act.*
FROM Activity Act
WHERE Act.Isactivityrepeat = 1
AND Act.Activitydate < Act.Repetitionenddate
AND Act.Repetitionenddate < :g_Target_Date

Greetings
Karl

Mon Jan 30, 01:38:00 AM EST  

Blogger Thomas Kyte said....

I don't think constraints keep SQL "simplier"

constraints only permit valid data in the database.

You have two very different "implementations" there - ones that I cannot even compare. The first one says "repetitionEndDate is allowed to be null and when it is, we dynamically consider it to be some 90 days in the future - from the point in time the query is executed"


But if repetitionEndDate is never to be null when you have a repeating activity - then, it is not to be null and accounting for NULLS in the query that way would just be "wrong" wouldn't it.

These are two entirely different models if you ask me - it is not about making SQL simplier, it is about having correct data.

Mon Jan 30, 01:50:00 AM EST  

Anonymous Sokrates said....

constraints rock

Mon Jan 30, 02:34:00 AM EST  

Anonymous Karl said....

Hi,
sorry for the less information i provided; the query should retrieve repeatable Activites; A repeatable Activity should have the repetitiondate set. The 'should have' was ensured by the application. But nevertheless without constraints the repetiotion date could be null;
To prevent that a date in future ( 90 day )was set.
but the logic behind the two SQL is not identical - You are right :-
Sorry for too fast posting
Karl

But last statement i want to repeat; If your data is less constrained the work of constraining to get valid data is shipped to the SQL;

Greetings
Karl

Mon Jan 30, 02:59:00 AM EST  

Anonymous Anonymous said....

Constraints are like having permanent unit test in the app.
This greatly improves the reliability of code.

Greetings from Frankfurt

Peter

Mon Jan 30, 03:12:00 AM EST  

Blogger Connor McDonald said....

Even more so the very first kind of "constraint", that of the right datatype...Man, am I getting sick of:

select *
from T
where column_that_was_supposed_to_numeric = 123

gives me:

ORA-01722: invalid number

Why is everyone so obsessed with "varchar2(4000)" as a column datatype nowadays...Is there some prize out in the world somewhere that I don't know about, where you get money for the most varchar2(4000) columns in a database ?

Agggghhhhhh!

Mon Jan 30, 04:46:00 AM EST  

Blogger William Robertson said....

> Even more so the very first kind of "constraint", that of the right datatype

You'll enjoy this thread on DBForums:

Q: Just curious, why would you want to store a unix timestamp in an oracle database?

A: I am using PHP as the language. So I can format using the date() function, much easier.

Mon Jan 30, 05:27:00 AM EST  

Anonymous Markku Uttula said....

Why is everyone so obsessed with "varchar2(4000)" as a column datatype nowadays...Is there some prize out in the world somewhere that I don't know about, where you get money for the most varchar2(4000) columns in a database ?

Just to amuse myself, i ran

select * from user_tab_cols where data_type = 'VARCHAR2' and data_length = 4000

on the latest production database. To my amazement, there was only 1 column of that type in a table, though it was propagated onto couple of views in addition to the base table. The column is ADDRESS_FREETEXT whose intention is to store those addresses that can't be described useing STREET, CITY, ZIP, AREA, COUNTRY...

When i broadened the search into data_length > 1000, there were a couple of more fields, all of which had name ending with *_DESCRIPTION ... I just hate it when people use unsuitable column types. But that's just me :)

Mon Jan 30, 05:32:00 AM EST  

Anonymous Anonymous said....

[off topic]

Tom, not sure if you have seen this or not. I know it isn't directly your area, but some of the anti-oracle guys I work with give me massive grief when they see things like this.

http://www2.csoonline.com/blog_view.html?CID=17132

Mon Jan 30, 09:59:00 AM EST  

Anonymous Anonymous said....

I wonder if your anti-oracle guys run windows or even UNIX for that matter.

If you rely on your vendor for security then you are shooting yourself in the foot, regardless of platform.

The Oracle RDBMS is probably the most robust, stable and securable platforms I've ever worked on.

That said, those were definately some nasties in the Jan release.

And just so I'm not accused of abandoning the topic. I'm a DW guy and I LOVE CONSTRAINTS but strangely enough I don't think of them as CONSTRAINTS but rather as, hmm
"dynamic documentation",
"validated metadata",
"quality assurance",
"data guarantees"
...

Mon Jan 30, 12:58:00 PM EST  

Anonymous Anonymous said....

I like the "validated metadata".

Mon Jan 30, 03:02:00 PM EST  

Anonymous 9iGuy said....

Hi Tom,

I saw your "elimination" optimization using a check constraint - and found it interesting that in 9.2.0.6 - that the same optimization doesn't work with the DELETE or UPDATE operations... Is that possibly some low hanging fruit (that in fairness may have been corrected in 10g)?

Thanks...

Wed Feb 01, 05:47:00 AM EST  

Blogger Darren said....

I would just like to reinforce that if you have an existing table, it's important to add your constraints with 'validate' (the default).

Adding with 'novalidate' because "you know the data is correct" may indeed be faster when adding the constraint (especially for very large tables), but if the database hasn't validated, the CBO cannot use the constraint.

-d

Tue Mar 07, 12:40:00 PM EST  

Blogger Thomas Kyte said....

Novalidate PLUS rely lets the optimizer use the constraint - the trick is "rely"

Tue Mar 07, 03:15:00 PM EST  

Anonymous Anonymous said....

Hi Tom,

Not sure why but my "all_objects" table seems to have different columns in it;

seems to be missing;
NAMESPACE and EDITION_NAME

sys@XE> desc all_objects;
Name Null? Type
----------------------------------------------------- -------- --------------
OWNER NOT NULL VARCHAR2(30)
OBJECT_NAME NOT NULL VARCHAR2(30)
SUBOBJECT_NAME VARCHAR2(30)
OBJECT_ID NOT NULL NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(19)
CREATED NOT NULL DATE
LAST_DDL_TIME NOT NULL DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)



sys@XE> declare
2 l_cnt number;
3 l_rows number := &1;
4 begin
5 insert /*+ append */
6 into big_table
7 select rownum, a.*
8 from all_objects a
9 where rownum <= &1;
10
11 l_cnt := sql%rowcount;
12
13 commit;
14
15 while (l_cnt < l_rows)
16 loop
17 insert /*+ APPEND */ into big_table
18 select rownum+l_cnt,
19 OWNER, OBJECT_NAME, SUBOBJECT_NAME, OBJECT_ID, DATA_OBJECT_ID,
20 OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS,
21 TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME
22 from big_table
23 where rownum <= l_rows-l_cnt;
24 l_cnt := l_cnt + sql%rowcount;
25 commit;
26 end loop;
27 end;
28 /
TEMPORARY, GENERATED, SECONDARY, NAMESPACE, EDITION_NAME
*
ERROR at line 21:
ORA-06550: line 21, column 60:
PL/SQL: ORA-00904: "EDITION_NAME": invalid identifier
ORA-06550: line 17, column 9:
PL/SQL: SQL Statement ignored

sys@XE> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Express Edition Release 10.2.0.1.0 - Product
PL/SQL Release 10.2.0.1.0 - Production
CORE 10.2.0.1.0 Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

5 rows selected.

cheers

Tue Dec 06, 08:40:00 AM EST  

Blogger Thomas Kyte said....

@Anonymous

you have 10g, I was on 11g - just edit the script to get rid of the columns you do not have to generate data.

Tue Dec 06, 09:24:00 AM EST  

POST A COMMENT

<< Home