I have an SQL query
that looks something like this:
1.SELECT *
2.FROM (
3.SELECT t.*, row_number()
OVER (ORDER BY ID) rn
4.FROM mytable t
5.)
6.WHERE rn BETWEEN :start and :end
Essentially, it’s the ORDER BY part that’s slowing
things down. If I were to remove it, the EXPLAINcost goes down by an
order of magnitude (over 1,000 times).
I’ve tried this:
1.SELECT t.*, row_number()
OVER (ORDER BY ID) rn
2.FROM mytable t
3.WHERE rownum BETWEEN :start and :end
, but this doesn’t
give correct results.
Is there any easy way
to speed this up? Or will I have to spend some more time with the EXPLAINtool?
Conditions testing for ROWNUM values greater than a
positive integer are always false. For example, this query returns no rows:
1.SELECT *
2.FROM employees
3.WHERE ROWNUM > 1
The first row fetched
is assigned a ROWNUM of 1 and makes the
condition false. The second row to be fetched is now the first row and is also
assigned a ROWNUM of 1 and makes the
condition false. All rows subsequently fail to satisfy the condition, so no rows
are returned.
That’s why the second
query should look like this:
1.SELECT *
2.FROM (
3.SELECT t.*, ROWNUM AS rn
4.FROM mytable t
5.ORDER BY
6.paginator,
id
7.)
8.WHERE rn BETWEEN :start and :end
Now, let’s see the
performance. To do this, we’ll create a sample table:
01.CREATE TABLE mytable (
02.id
NUMBER(10) NOT NULL,
03.paginator
NUMBER(10) NOT NULL,
04.value
VARCHAR2(50)
05.)
06./
07.ALTER TABLE mytable
08.ADD CONSTRAINT pk_mytable_id PRIMARY KEY (id)
09./
10.
11.CREATE INDEX ix_mytable_paginator_id ON mytable(paginator,
id)
12./
13.
14.INSERT
15.INTO mytable(id,
paginator, value)
16.SELECT level, level / 10000, 'Value ' || level
17.FROM dual
18.CONNECT BY
19.level <= 1000000
20./
21.
22.COMMIT
23./
24.
25.BEGIN
26.DBMS_STATS.gather_schema_stats('"20090506_rownum"');
27.END;
28./
This query has 1,000,000 records and an index
on (paginator, id).
I deliberately made
the paginator non-UNIQUE to demonstrate that the paging query should always include a
unique column set into the sort. Otherwise, a record can be selected twice on
two different pages, or not selected at all.
Let’s query the table
with both queries, returning 10 values from 900,001 to 900,010, and see which one
performs better.
First, the ROW_NUMBER():
1.SELECT *
2.FROM (
3.SELECT t.*, ROW_NUMBER()
OVER (ORDER BY paginator, id) AS rn
4.FROM mytable t
5.)
6.WHERE rn BETWEEN 900001 AND 900010
ID
|
PAGINATOR
|
VALUE
|
RN
|
900001
|
90
|
Value
900001
|
900001
|
900002
|
90
|
Value
900002
|
900002
|
900003
|
90
|
Value
900003
|
900003
|
900004
|
90
|
Value
900004
|
900004
|
900005
|
90
|
Value
900005
|
900005
|
900006
|
90
|
Value
900006
|
900006
|
900007
|
90
|
Value
900007
|
900007
|
900008
|
90
|
Value
900008
|
900008
|
900009
|
90
|
Value
900009
|
900009
|
900010
|
90
|
Value
900010
|
900010
|
10 rows
fetched in 0.0005s (0.8594s)
|
SELECT STATEMENT
VIEW
WINDOW NOSORT STOPKEY
TABLE ACCESS BY INDEX
ROWID, 20090506_rownum.MYTABLE
INDEX FULL SCAN,
20090506_rownum.IX_MYTABLE_PAGINATOR_ID
And the ROWNUM:
01.SELECT *
02.FROM (
03.SELECT t.*, ROWNUM AS rn
04.FROM (
05.SELECT *
06.FROM mytable
07.ORDER BY
08.paginator,
id
09.) t
10.)
11.WHERE rn BETWEEN 900001 AND 900010
ID
|
PAGINATOR
|
VALUE
|
RN
|
900001
|
90
|
Value
900001
|
900001
|
900002
|
90
|
Value
900002
|
900002
|
900003
|
90
|
Value
900003
|
900003
|
900004
|
90
|
Value
900004
|
900004
|
900005
|
90
|
Value
900005
|
900005
|
900006
|
90
|
Value
900006
|
900006
|
900007
|
90
|
Value
900007
|
900007
|
900008
|
90
|
Value
900008
|
900008
|
900009
|
90
|
Value
900009
|
900009
|
900010
|
90
|
Value
900010
|
900010
|
10 rows
fetched in 0.0005s (0.7058)
|
SELECT STATEMENT
VIEW
COUNT
VIEW
TABLE ACCESS BY INDEX
ROWID, 20090506_rownum.MYTABLE
INDEX FULL SCAN,
20090506_rownum.IX_MYTABLE_PAGINATOR_ID
We can see that ROW_NUMBER() works for 850 ms, while the ROWNUM one for only 700 ms. ROWNUM is slightly more
efficient.
But let’s look closer
into the plan for ROW_NUMBER(). We see that Oracle is smart enough, first, to avoid sorting on (paginator, id) (since an index is
available), and, second, to use a STOPKEY condition which ceases
scanning as soon as it finds enough values.
The ROWNUM query uses the index
too, but it does not employ STOPKEY condition, it just
counts.
How can we enable STOPKEY for a ROWNUM query?
Unfortunately,
Oracle’s optimizer cannot understand in this case that RN is an alias for ROWNUM in the inner subquery.
That’s why we’ll need to rewrite the query a little so that we will have a
limiting condition on ROWNUMthe outer subquery and STOPKEY will became usable:
01.SELECT *
02.FROM (
03.SELECT t.*, ROWNUM AS rn
04.FROM (
05.SELECT *
06.FROM mytable
07.ORDER BY
08.paginator,
id
09.) t
10.)
11.WHERE rn >= 900001
12.AND rownum <= 10
ID
|
PAGINATOR
|
VALUE
|
RN
|
900001
|
90
|
Value
900001
|
900001
|
900002
|
90
|
Value
900002
|
900002
|
900003
|
90
|
Value
900003
|
900003
|
900004
|
90
|
Value
900004
|
900004
|
900005
|
90
|
Value
900005
|
900005
|
900006
|
90
|
Value
900006
|
900006
|
900007
|
90
|
Value
900007
|
900007
|
900008
|
90
|
Value
900008
|
900008
|
900009
|
90
|
Value
900009
|
900009
|
900010
|
90
|
Value
900010
|
900010
|
10 rows
fetched in 0.0005s (0.4714s)
|
SELECT STATEMENT
COUNT STOPKEY
VIEW
COUNT
VIEW
TABLE ACCESS BY INDEX
ROWID, 20090506_rownum.MYTABLE
INDEX FULL SCAN,
20090506_rownum.IX_MYTABLE_PAGINATOR_ID
We now make lower
filtering on RN (which is an alias for ROWNUM from the inner
subquery), and upper filtering on ROWNUM in the outer subquery,
which counts not the number of total rows returned, but the number of rows
satisfying the first condition.
In this case Oracle will use the STOPKEY, and the query now
runs for only 471 ms, twice as fast as
the original one.
But if ROW_NUMBER and ROWNUM use essentially the
same plan, why the latter one is so much faster?
This is because Oracle is very, very old.
ROWNUM was introduced in Oracle 6 that was released in
1988. You remember 1988? You needed your PC to beIBM compatible to run Oracle.
And by that time it
already had ROWNUM, so says its ARJ-compressed manual
(you remember ARJ?).
Large HTML online manuals for PostgreSQL and MySQL, by the way, still
don’t mention any ROWNUM, it needs to be
emulated (read my yesterday’s article PostgreSQL: row numbers to see how to do it).
OK. So ROWNUM is 1988, that means
it’s both simple and efficient. Just a pure counter.
Analytic functions, on
the other hand, were introduced in Oracle 9i in 2001. Y2K was already a history
by that time, there was nothing to be afraid of, and ROW_NUMBER was implemented in not
so efficient way.
It is of course
powerful and flexible, but slow for simple tasks like pagination.
No comments:
Post a Comment