Systems Engineering and RDBMS

Analytic Functions – I

Posted by decipherinfosys on July 20, 2007

Oracle as well as DB2 LUW have had a very rich collection of Analytic functions in their language arsenal for quite some time now. SQL Server 2005 also took a step towards that direction and introduced a couple of them. We will start covering some of the important ones in our blog posts with examples. We would also cover specific features per RDBMS like the MODEL clause in Oracle.

To start with consider this example of the employees and the different departments that they belong to. We are using SQL Server as an example here though the same applies to other RDBMS as well. We will use the table variables as examples so assume that the foreign key relationships exist between the EMP_MASTER table and the DEPT_MASTER table with the EMP_MASTER being the child table for the DEPT_MASTER. One department can have multiple employees but one employee can belong to one department only (though in some organizations, this can be a M:N relationship, we will assume a 1:N relationship for the sake of this example).

declare @dept_master table (dept_id int not null primary key, dept_name varchar(30) not null)
declare @emp_master table (emp_id int not null primary key, dept_id int not null, emp_fname varchar(30) not null, emp_lname varchar(30) not null, date_of_hire datetime not null, salary numeric(8,2) not null default 0)

set nocount on

insert into @dept_master values (1, ‘Accounting’)
insert into @dept_master values (2, ‘Development’)
insert into @dept_master values (3, ‘Services’)
insert into @dept_master values (4, ‘Operations’)

insert into @emp_master values (1, 1, ‘John’, ‘Doe’, getdate() – 365, 100000.50)
insert into @emp_master values (2, 1, ‘Jack’, ‘Dwight’, getdate() – 324, 75000.00)
insert into @emp_master values (3, 1, ‘Monty’, ‘Morin’, getdate() – 275, 90000.00)
insert into @emp_master values (4, 2, ‘Angel’, ‘Chang’, getdate() – 720, 95000.75)
insert into @emp_master values (5, 2, ‘Hari’, ‘Kishore’, getdate() – 430, 83000.63)
insert into @emp_master values (6, 2, ‘Chris’, ‘Smith’, getdate() – 365, 78000.12)
insert into @emp_master values (7, 3, ‘Susie’, ‘Taylor’, getdate() – 456, 57500.00)
insert into @emp_master values (8, 3, ‘Mat’, ‘Bowser’, getdate() – 210, 76000.35)
insert into @emp_master values (9, 3, ‘Mathew’, ‘Weiner’, getdate() – 25, 55000.00)
insert into @emp_master values (10, 4, ‘Nathan’, ‘Twist’, getdate() – 105, 77000.00)
insert into @emp_master values (11, 4, ‘Nick’, ‘Patten’, getdate() – 188, 75000.00)
insert into @emp_master values (12, 4, ‘Chris’, ‘Heibert’, getdate() – 199, 45000.50)
insert into @emp_master values (13, 4, ‘Dennis’, ‘Menace’, getdate() – 200, 96000.50)
insert into @emp_master values (14, 4, ‘Amy’, ‘Norton’, getdate() – 300, 125000.00)

select * from @dept_master

dept_id dept_name
———– ——————————
1 Accounting
2 Development
3 Services
4 Operations

select * from @emp_master

emp_id      dept_id     emp_fname                      emp_lname                      date_of_hire            salary
----------- ----------- ------------------------------ ------------------------------ ----------------------- ---------------------------------------
1           1           John                           Doe                            2006-07-19 13:45:48.763 100000.50
2           1           Jack                           Dwight                         2006-08-29 13:45:48.763 75000.00
3           1           Monty                          Morin                          2006-10-17 13:45:48.763 900000.00
4           2           Angel                          Chang                          2005-07-29 13:45:48.763 95000.75
5           2           Hari                           Kishore                        2006-05-15 13:45:48.763 83000.63
6           2           Chris                          Smith                          2006-07-19 13:45:48.763 78000.12
7           3           Susie                          Taylor                         2006-04-19 13:45:48.857 57500.00
8           3           Mat                            Bowser                         2006-12-21 13:45:48.857 76000.35
9           3           Mathew                         Weiner                         2007-06-24 13:45:48.857 55000.00
10          4           Nathan                         Twist                          2007-04-05 13:45:48.857 77000.00
11          4           Nick                           Patten                         2007-01-12 13:45:48.857 75000.00
12          4           Chris                          Heibert                        2007-01-01 13:45:48.857 45000.50
13          4           Dennis                         Menace                         2006-12-31 13:45:48.857 96000.50
14          4           Amy                            Norton                         2006-09-22 13:45:48.857 125000.00

Now, if your HR director asks you to: “Give the top 2 highest paid employees per department. Provide the employee first and last name, their hiring dates and their salary in addition to their department”

In order to get this data without making use of functions or too many inline views (derived tables in SQL Server lingo), we can use dense_rank() and the windowing functions to get this information relatively easily:

select *
FROM (select dept.dept_name,
emp.emp_fname,
emp.emp_lname,
emp.date_of_hire,
emp.salary,
dense_rank() over (partition by dept.dept_name order by emp.salary desc) rank_order
from @emp_master as emp
inner join @dept_master as dept
on emp.dept_id = dept.dept_id
) as DT
where DT.rank_order < 3
order by DT.dept_name, DT.salary desc

Output is:

dept_name                      emp_fname                      emp_lname                      date_of_hire            salary                                  rank_order
------------------------------ ------------------------------ ------------------------------ ----------------------- --------------------------------------- --------------------
Accounting                     John                           Doe                            2006-07-19 13:54:56.493 100000.50                               1
Accounting                     Monty                          Morin                          2006-10-17 13:54:56.493 90000.00                                2
Development                    Angel                          Chang                          2005-07-29 13:54:56.493 95000.75                                1
Development                    Hari                           Kishore                        2006-05-15 13:54:56.493 83000.63                                2
Operations                     Amy                            Norton                         2006-09-22 13:54:56.493 125000.00                               1
Operations                     Dennis                         Menace                         2006-12-31 13:54:56.493 96000.50                                2
Services                       Mat                            Bowser                         2006-12-21 13:54:56.493 76000.35                                1
Services                       Susie                          Taylor                         2006-04-19 13:54:56.493 57500.00                                2

The query joins the two tables and then does partitioning of the data by department name and then orders by the salary of the employees withing that department in a descending fashion and then assigns a rank to each of those records. The query then filters off the records and returns back only the top 2 records. This is such a simple implementation – this can be done in other ways as well without making use of any analytical functions but those will require a lot more logic.

A current limitation in SQL Server (this is allowed in Oracle) is that the “ORDER BY CLAUSE” cannot be used with the aggregate window functions. For example:

SUM(SALARY) OVER (ORDER BY EMP_ID) will give an error and of course the same for:
SUM(SALARY) OVER (PARTITION BY DEPT_NAME ORDER BY EMP_ID)

This is allowed in Oracle. Maybe in SQL Server 2008, MSFT will enhance the analytic functions as well as introduce new ones.

4 Responses to “Analytic Functions – I”

  1. […] If you want to dig into the Row_Number() and the partition by logic, you can search this web-site for more information on those functions or look up Oracle docs. Oracle has a very rich set of analytic functions which can be very useful for development as well as troubleshooting purposes. We had also covered one of those in a post few days ago – you can access it here. […]

  2. […] are coming our way in the next release (like the much awaited MERGE command, better support for the analytic functions – better support for the OVER clause for ranking functions and aggregates etc.) that we will be […]

  3. […] note that as we had indicated in one of our previous blog post as well, in SQL Server 2005, the ORDER BY clause cannot be used within the OVER() clause when using […]

  4. […] Posted by decipherinfosys on December 22, 2009 In many of our posts, we have discussed different kinds of Analytic Functions and their usage to solve real world problems.  However, we have seen at many client sites that many developers or DBAs still do not use many of these analytic functions in their code.  These are very well documented by Microsoft (SQL Server) and Oracle.  In this post, we will take a look at some of them.  We will pick up this post from the first post that we had done about this topic – here. […]

Sorry, the comment form is closed at this time.

 
%d bloggers like this: