Systems Engineering and RDBMS

Archive for June 18th, 2008

Common Problem – getting the first in a Group

Posted by decipherinfosys on June 18, 2008

This is one of the very common problems in the SQL world and also one that has many different solutions available. The problem description is this:

You have a table with say this structure (using SQL Server Syntax):

set nocount on
go
declare @test table (store_nbr int, product_name nvarchar(20), product_date datetime, aisle_nbr int)
insert into @test values (1, ‘A’, ’8-12-2004′, 6)
insert into @test values (1, ‘B’, ’7-10-2005′, 2)
insert into @test values (1, ‘C’, ’6-11-2006′, 3)
insert into @test values (1, ‘D’, ’11-10-2007′, 4)
insert into @test values (2, ‘E’, ’10-12-2004′, 2)
insert into @test values (2, ‘F’, ’5-9-2005′, 4)

And what you want to find out is that record in each store which was introduced the last so essentially in each group (i.e. each store), you want to find out the product that was introduced most recently. There are a lot of ways to solve this problem and the solutions vary among RDBMS based on syntax only. One can use analytic functions to resolve this, use a derived table or use a co-related sub-query. Here is one way to solve this issue:

select * from @test A
where A.product_date = (select top 1 B.product_date
from @test as B
where A.store_nbr = B.store_nbr
order by product_date desc)

store_nbr   product_name         product_date            aisle_nbr
----------- -------------------- ----------------------- -----------
1           D                    2007-11-10 00:00:00.000 4
2           F                    2005-05-09 00:00:00.000 4

In this code, we are sorting based on the product date in a descending order in the co-related sub-query and getting the equality match based on the TOP 1 record. We could have also used the ROW_NUMBER() function as well, example:

select store_nbr, product_name, product_date, aisle_nbr
from
(select ROW_NUMBER() over (partition by store_nbr order by product_date desc) as rn, *
from @test A
) as IV
where IV.rn = 1

One can also choose to do a MAX operation in the co-related sub-query, example:

select * from @test as A
where A.product_date = (select MAX(B.product_date)
from @test as B
where A.store_nbr = B.store_nbr)

And in SQL Server, one can also use the CROSS APPLY function in SQL Server 2005.  We will look at the execution plans and the performance characteristics of these different options in an upcoming post.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 74 other followers