Systems Engineering and RDBMS

A reporting query

Posted by decipherinfosys on July 24, 2008

This was one of the questions that we got from a reader:

I am working on a HR application and need to prepare a report in which I need to show the data in this format:

DEPT_NBR MAX_HIRE_DATE    MIN(DT)    HIRE_DATE
-------- -------------    ---------  --------
0312     01-JUN-07        12-DEC-02  12-DEC-02
                                     31-MAR-04
                                     30-APR-05
                                     31-MAY-06
                                     01-JUN-07
0313     15-MAY-08        01-NOV-06  12-NOV-06
                                     31-MAR-07
                                     01-NOV-06
                                     31-MAY-07
                                     15-MAY-08

i.e. I want to show per Department, the max hire date, the min hire date and then the hire dates and other employee related information. Problem is that when I use aggregate functions, I cannot have other pieces of information that I need and if I use sub-queries (sub-selects), then the performance is very bad. Is there any way to do this in SQL? I know that this should probably be done on the reporting side rather than the database side since in reporting (SSRS, Crystal or others), one can just put a function to do the max/min over a group but that would also have an overhead so if I can do this in SQL itself, that will be good. Also, I need to be able to have the same SQL for both Oracle and SQL Server since we are a vendor company and our product is used by clients who could be running on either of these platforms.

The answer to this question is – Use Analytics and you can easily do this in SQL – and it can be done for both Oracle and SQL Server (version 2005 and above though). Here is the solution – let’s create the data first (Note to the readers – whenever possible, please do provide the scripts to re-create your scenarios – it helps us save time and get back to you faster with a solution):

SET NOCOUNT ON;
GO
CREATE TABLE dbo.EMP_MASTER (DEPT_NBR NVARCHAR(10), HIRE_DATE DATETIME, FIRST_NAME NVARCHAR(30), LAST_NAME NVARCHAR(30));
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’12-DEC-02’, ‘Joe’, ‘Snyder’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’31-MAR-04’, ‘David’, ‘Gilo’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’30-APR-05’, ‘Smitha’, ‘Reilly’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’31-MAY-06’, ‘Naveen’, ‘Andrews’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0312′, ’01-JUN-07’, ‘Matthew’, ‘Winkle’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’12-NOV-06’, ‘Kathy’, ‘Konnor’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’31-MAR-07’, ‘Vikesh’, ‘Gupta’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’01-NOV-06’, ‘Martha’, ‘Stewart’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’31-MAY-07’, ‘Jim’, ‘Diego’);
INSERT INTO dbo.EMP_MASTER VALUES (‘0313′, ’15-MAY-08’, ‘Arthur’, ‘Doyle’);

And here is the code to get the data in the required format:

select
row_number() over (partition by e.dept_nbr order by e.hire_date) as RN,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then e.dept_nbr else ” end as dept_nbr,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then convert(nvarchar(10), min(e.hire_date) over (partition by e.dept_nbr), 101)
else ”
end as min_hire_date,
case when row_number() over (partition by e.dept_nbr order by e.hire_date) = 1 then convert(nvarchar(10), max(e.hire_date) over (partition by e.dept_nbr), 101)
else ”
end as max_hire_date,
e.hire_date
from dbo.emp_master as e
order by e.dept_nbr, e.hire_date;

RN                   dept_nbr   min_hire_date max_hire_date hire_date
-------------------- ---------- ------------- ------------- -----------------------
1                    0312       12/12/2002    06/01/2007    2002-12-12 00:00:00.000
2                                                           2004-03-31 00:00:00.000
3                                                           2005-04-30 00:00:00.000
4                                                           2006-05-31 00:00:00.000
5                                                           2007-06-01 00:00:00.000
1                    0313       11/01/2006    05/15/2008    2006-11-01 00:00:00.000
2                                                           2006-11-12 00:00:00.000
3                                                           2007-03-31 00:00:00.000
4                                                           2007-05-31 00:00:00.000
5                                                           2008-05-15 00:00:00.000

If you look at the code above, you will see that we are using analytic functions and we partition the data by department first and then order it by the hiring date within that department – we just look for the very first record and then do a min or a max in order to retrieve our values from that group. There are a couple of other ways to achieve the same as well. Above just demonstrates the usage of the analytic functions in making this pretty simple.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: