Tuesday, 19 November 2013

Oracle: ROW_NUMBER vs ROWNUM


I have an SQL query that looks something like this:
1.SELECT  *
2.FROM    (
3.SELECT  t.*, row_number() OVER (ORDER BY ID) rn
4.FROM    mytable t
5.)
6.WHERE   rn BETWEEN :start and :end
Essentially, it’s the ORDER BY part that’s slowing things down. If I were to remove it, the EXPLAINcost goes down by an order of magnitude (over 1,000 times).
I’ve tried this:
1.SELECT  t.*, row_number() OVER (ORDER BY ID) rn
2.FROM    mytable t
3.WHERE   rownum BETWEEN :start and :end
, but this doesn’t give correct results.
Is there any easy way to speed this up? Or will I have to spend some more time with the EXPLAINtool?
First, just a quick reminder on how ROWNUM works. From Oracle‘s documentation:
Conditions testing for ROWNUM values greater than a positive integer are always false. For example, this query returns no rows:
1.SELECT  *
2.FROM    employees
3.WHERE   ROWNUM > 1
The first row fetched is assigned a ROWNUM of 1 and makes the condition false. The second row to be fetched is now the first row and is also assigned a ROWNUM of 1 and makes the condition false. All rows subsequently fail to satisfy the condition, so no rows are returned.
That’s why the second query should look like this:
1.SELECT  *
2.FROM    (
3.SELECT  t.*, ROWNUM AS rn
4.FROM    mytable t
5.ORDER BY
6.paginator, id
7.)
8.WHERE   rn BETWEEN :start and :end
Now, let’s see the performance. To do this, we’ll create a sample table:

01.CREATE TABLE mytable (
02.id NUMBER(10) NOT NULL,
03.paginator NUMBER(10) NOT NULL,
04.value VARCHAR2(50)
05.)
06./
07.ALTER TABLE mytable
08.ADD CONSTRAINT pk_mytable_id PRIMARY KEY (id)
09./
10. 
11.CREATE INDEX ix_mytable_paginator_id ON mytable(paginator, id)
12./
13. 
14.INSERT
15.INTO    mytable(id, paginator, value)
16.SELECT  levellevel / 10000, 'Value ' || level
17.FROM    dual
18.CONNECT BY
19.level <= 1000000
20./
21. 
22.COMMIT
23./
24. 
25.BEGIN
26.DBMS_STATS.gather_schema_stats('"20090506_rownum"');
27.END;
28./
This query has 1,000,000 records and an index on (paginator, id).
I deliberately made the paginator non-UNIQUE to demonstrate that the paging query should always include a unique column set into the sort. Otherwise, a record can be selected twice on two different pages, or not selected at all.
Let’s query the table with both queries, returning 10 values from 900,001 to 900,010, and see which one performs better.
First, the ROW_NUMBER():
1.SELECT  *
2.FROM    (
3.SELECT  t.*, ROW_NUMBER() OVER (ORDER BY paginator, id) AS rn
4.FROM    mytable t
5.)
6.WHERE   rn BETWEEN 900001 AND 900010
ID
PAGINATOR
VALUE
RN
900001
90
Value 900001
900001
900002
90
Value 900002
900002
900003
90
Value 900003
900003
900004
90
Value 900004
900004
900005
90
Value 900005
900005
900006
90
Value 900006
900006
900007
90
Value 900007
900007
900008
90
Value 900008
900008
900009
90
Value 900009
900009
900010
90
Value 900010
900010
10 rows fetched in 0.0005s (0.8594s)
SELECT STATEMENT
 VIEW
  WINDOW NOSORT STOPKEY
   TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
    INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID
And the ROWNUM:
01.SELECT  *
02.FROM    (
03.SELECT  t.*, ROWNUM AS rn
04.FROM    (
05.SELECT  *
06.FROM    mytable
07.ORDER BY
08.paginator, id
09.) t
10.)
11.WHERE   rn BETWEEN 900001 AND 900010
ID
PAGINATOR
VALUE
RN
900001
90
Value 900001
900001
900002
90
Value 900002
900002
900003
90
Value 900003
900003
900004
90
Value 900004
900004
900005
90
Value 900005
900005
900006
90
Value 900006
900006
900007
90
Value 900007
900007
900008
90
Value 900008
900008
900009
90
Value 900009
900009
900010
90
Value 900010
900010
10 rows fetched in 0.0005s (0.7058)
SELECT STATEMENT
 VIEW
  COUNT
   VIEW
    TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
     INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID
We can see that ROW_NUMBER() works for 850 ms, while the ROWNUM one for only 700 ms. ROWNUM is slightly more efficient.
But let’s look closer into the plan for ROW_NUMBER(). We see that Oracle is smart enough, first, to avoid sorting on (paginator, id) (since an index is available), and, second, to use a STOPKEY condition which ceases scanning as soon as it finds enough values.
The ROWNUM query uses the index too, but it does not employ STOPKEY condition, it just counts.
How can we enable STOPKEY for a ROWNUM query?
Unfortunately, Oracle’s optimizer cannot understand in this case that RN is an alias for ROWNUM in the inner subquery. That’s why we’ll need to rewrite the query a little so that we will have a limiting condition on ROWNUMthe outer subquery and STOPKEY will became usable:
01.SELECT  *
02.FROM    (
03.SELECT  t.*, ROWNUM AS rn
04.FROM    (
05.SELECT  *
06.FROM    mytable
07.ORDER BY
08.paginator, id
09.) t
10.)
11.WHERE   rn >= 900001
12.AND rownum <= 10
ID
PAGINATOR
VALUE
RN
900001
90
Value 900001
900001
900002
90
Value 900002
900002
900003
90
Value 900003
900003
900004
90
Value 900004
900004
900005
90
Value 900005
900005
900006
90
Value 900006
900006
900007
90
Value 900007
900007
900008
90
Value 900008
900008
900009
90
Value 900009
900009
900010
90
Value 900010
900010
10 rows fetched in 0.0005s (0.4714s)
SELECT STATEMENT
 COUNT STOPKEY
  VIEW
   COUNT
    VIEW
     TABLE ACCESS BY INDEX ROWID, 20090506_rownum.MYTABLE
      INDEX FULL SCAN, 20090506_rownum.IX_MYTABLE_PAGINATOR_ID
We now make lower filtering on RN (which is an alias for ROWNUM from the inner subquery), and upper filtering on ROWNUM in the outer subquery, which counts not the number of total rows returned, but the number of rows satisfying the first condition.
In this case Oracle will use the STOPKEY, and the query now runs for only 471 ms, twice as fast as the original one.
But if ROW_NUMBER and ROWNUM use essentially the same plan, why the latter one is so much faster?
This is because Oracle is very, very old.
ROWNUM was introduced in Oracle 6 that was released in 1988. You remember 1988? You needed your PC to beIBM compatible to run Oracle.
And by that time it already had ROWNUM, so says its ARJ-compressed manual (you remember ARJ?).
Large HTML online manuals for PostgreSQL and MySQL, by the way, still don’t mention any ROWNUM, it needs to be emulated (read my yesterday’s article PostgreSQL: row numbers to see how to do it).
OK. So ROWNUM is 1988, that means it’s both simple and efficient. Just a pure counter.
Analytic functions, on the other hand, were introduced in Oracle 9i in 2001. Y2K was already a history by that time, there was nothing to be afraid of, and ROW_NUMBER was implemented in not so efficient way.
It is of course powerful and flexible, but slow for simple tasks like pagination.

No comments:

Post a Comment