Systems Engineering and RDBMS

Conditional Sorting and Performance Issue

Posted by decipherinfosys on December 16, 2008

We have written about FBI (Function Based Indexes) in Oracle and Computed Columns (SQL Server) many times before in our blog posts – you can just search for “FBI” or “Computed Column” in order to browse through those posts. In this post, we are going to cover a SQL performance tuning scenario where we had to use conditional sorting in the ORDER BY clause and then had to use a FBI in order to further speed up the query.

The question raised by one of the readers was this: “I have a table in which if the value in the rec_type column is x or y, then I need to get the last record created in the table and if there is no value for x/y, then I need to get the last record created out of all the records. I have done this via sub-selects in the SQL Select statement. Is there a better way of doing it?”

This can be done in many different ways and kind of matches up with the first in a group type of scenario which we had described here.

Common approaches might include using the MAX() aggregate function or making use of analytic function ROW_NUMBER() or making use of conditional sorting. Let’s look at those 3 approaches for this issue:

Table Script:

USE DECIPHERTEST
GO
CREATE TABLE BIG_TABLE_TEST (ID INT IDENTITY PRIMARY KEY, QTY INT, TYPE_VAL CHAR(1), CREATE_DATE_TIME DATETIME DEFAULT GETDATE());
GO
SET NOCOUNT ON
GO
DECLARE @I INT
SET @I = 1
WHILE (@I <= 1000000)
BEGIN
INSERT INTO dbo.BIG_TABLE_TEST (QTY, TYPE_VAL)
SELECT 1000000 * RAND(@I), CASE WHEN @I/2 = 0 THEN ‘a’ ELSE ‘b’ END

SELECT @I = @I + 1
END

GO
Approach #1:

Using the MAX() function:

SELECT COALESCE(MAX(CASE WHEN TYPE_VAL IN (‘A’, ‘B’) THEN CREATE_DATE_TIME ELSE NULL END), MAX(CREATE_DATE_TIME))
FROM dbo.BIG_TABLE_TEST

Approach #2:

Using the ROW_NUMBER() function:

SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY CASE WHEN TYPE_VAL IN (‘A’, ‘B’) THEN 1 ELSE 2 END, CREATE_DATE_TIME DESC) AS ROWNUM, CREATE_DATE_TIME
FROM dbo.BIG_TABLE_TEST
) AS X WHERE X.ROWNUM = 1

Approach #3:

Using TOP and Conditional sorting which is the topic of this post as well. The SQL in that case looks like this:

SELECT TOP 1 CREATE_DATE_TIME
FROM dbo.BIG_TABLE_TEST
ORDER BY CASE WHEN TYPE_VAL IN (‘A’, ‘B’) THEN 1 ELSE 2 END, CREATE_DATE_TIME DESC;

In the case of Oracle, it will look like this:

SELECT * FROM
(SELECT CREATE_DATE_TIME FROM BIG_TABLE_TEST
ORDER BY CASE WHEN TYPE_VAL IN (‘A’, ‘B’) THEN 1 ELSE 2 END, CREATE_DATE_TIME DESC)
WHERE ROWNUM = 1
/

Now, sort costs are typically very expensive so if we are to have the third approach work in a performant way, we will need to create a FBI in the case of Oracle and a computed column with an index on it in the case of SQL Server. Before we delve into that, let’s look at the performance characteristics of the three approaches that we have listed here:

Approach #1 will do a full clustered index scan which is the same as a table scan. The second approach will be even more expensive since it will involve sorting operations and then only the selection of the very first record. And the same is true for the third approach as well since it also will involve sorts and then selecting the TOP 1 record.

However, let’s now create the computed column and the index in the case of SQL Server:

ALTER TABLE BIG_TABLE_TEST ADD COL_SORT AS CASE WHEN TYPE_VAL IN (‘A’, ‘B’) THEN 1 ELSE 2 END;
CREATE INDEX BIG_TABLE_TEST_IND_1 ON BIG_TABLE_TEST (COL_SORT ASC, CREATE_DATE_TIME DESC);

And in the case of Oracle, the FBI will look like this:

CREATE INDEX BIG_TABLE_TEST_IND_1 ON BIG_TABLE_TEST (CASE WHEN TYPE_VAL IN (‘A’,’B’) THEN 1 ELSE 2 END, CREATE_DATE_TIME DESC);

Now, if we check for the performance for approach #2, it will look like this:

|–Filter(WHERE:([Expr1003]=(1)))
|–Top(TOP EXPRESSION:(CASE WHEN (1) IS NULL OR (1)<(0) THEN (0) ELSE (1) END))
|–Sequence Project(DEFINE:([Expr1003]=row_number))
|–Compute Scalar(DEFINE:([Expr1009]=(1)))
|–Segment
|–Compute Scalar(DEFINE:([DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[COL_SORT]=[DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[COL_SORT]))
|–Index Scan(OBJECT:([DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[BIG_TABLE_TEST_IND_1]), ORDERED FORWARD)

On my system, it gives a cost of 0.0033 for the execution plan. It does an index scan and picks up the first row based on the filter criteria.

For Approach #3, the cost is 0.00985 and the execution plan looks like this:
|–Compute Scalar(DEFINE:([DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[COL_SORT]=[DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[COL_SORT]))
|–Top(TOP EXPRESSION:((1)))
|–Nested Loops(Inner Join, OUTER REFERENCES:([DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[ID], [Expr1004]) WITH ORDERED PREFETCH)
|–Index Scan(OBJECT:([DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[BIG_TABLE_TEST_IND_1]), ORDERED FORWARD)
|–Clustered Index Seek(OBJECT:([DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[PK__BIG_TABLE_TEST__0BC6C43E]), SEEK:([DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[ID]=[DECIPHERTEST].[dbo].[BIG_TABLE_TEST].[ID]) LOOKUP ORDERED FORWARD)

It does an index scan and a key lookup to get the first record and gives it back to us. The cost for approach #1 is pretty high – 4.18.

In Oracle also, it gets to the row without looking at much data by using the rownum filter. So, if you decide to use approach #3, you should create the FBI (Oracle) or a computed column and then a covered index (SQL Server).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: