Analytic Functions – I
Posted by decipherinfosys on July 20, 2007
Oracle as well as DB2 LUW have had a very rich collection of Analytic functions in their language arsenal for quite some time now. SQL Server 2005 also took a step towards that direction and introduced a couple of them. We will start covering some of the important ones in our blog posts with examples. We would also cover specific features per RDBMS like the MODEL clause in Oracle.
To start with consider this example of the employees and the different departments that they belong to. We are using SQL Server as an example here though the same applies to other RDBMS as well. We will use the table variables as examples so assume that the foreign key relationships exist between the EMP_MASTER table and the DEPT_MASTER table with the EMP_MASTER being the child table for the DEPT_MASTER. One department can have multiple employees but one employee can belong to one department only (though in some organizations, this can be a M:N relationship, we will assume a 1:N relationship for the sake of this example).
declare @dept_master table (dept_id int not null primary key, dept_name varchar(30) not null)
declare @emp_master table (emp_id int not null primary key, dept_id int not null, emp_fname varchar(30) not null, emp_lname varchar(30) not null, date_of_hire datetime not null, salary numeric(8,2) not null default 0)
set nocount on
insert into @dept_master values (1, ‘Accounting’)
insert into @dept_master values (2, ‘Development’)
insert into @dept_master values (3, ‘Services’)
insert into @dept_master values (4, ‘Operations’)
insert into @emp_master values (1, 1, ‘John’, ‘Doe’, getdate() – 365, 100000.50)
insert into @emp_master values (2, 1, ‘Jack’, ‘Dwight’, getdate() – 324, 75000.00)
insert into @emp_master values (3, 1, ‘Monty’, ‘Morin’, getdate() – 275, 90000.00)
insert into @emp_master values (4, 2, ‘Angel’, ‘Chang’, getdate() – 720, 95000.75)
insert into @emp_master values (5, 2, ‘Hari’, ‘Kishore’, getdate() – 430, 83000.63)
insert into @emp_master values (6, 2, ‘Chris’, ‘Smith’, getdate() – 365, 78000.12)
insert into @emp_master values (7, 3, ‘Susie’, ‘Taylor’, getdate() – 456, 57500.00)
insert into @emp_master values (8, 3, ‘Mat’, ‘Bowser’, getdate() – 210, 76000.35)
insert into @emp_master values (9, 3, ‘Mathew’, ‘Weiner’, getdate() – 25, 55000.00)
insert into @emp_master values (10, 4, ‘Nathan’, ‘Twist’, getdate() – 105, 77000.00)
insert into @emp_master values (11, 4, ‘Nick’, ‘Patten’, getdate() – 188, 75000.00)
insert into @emp_master values (12, 4, ‘Chris’, ‘Heibert’, getdate() – 199, 45000.50)
insert into @emp_master values (13, 4, ‘Dennis’, ‘Menace’, getdate() – 200, 96000.50)
insert into @emp_master values (14, 4, ‘Amy’, ‘Norton’, getdate() – 300, 125000.00)
select * from @dept_master
dept_id dept_name
———– ——————————
1 Accounting
2 Development
3 Services
4 Operations
select * from @emp_master
emp_id dept_id emp_fname emp_lname date_of_hire salary ----------- ----------- ------------------------------ ------------------------------ ----------------------- --------------------------------------- 1 1 John Doe 2006-07-19 13:45:48.763 100000.50 2 1 Jack Dwight 2006-08-29 13:45:48.763 75000.00 3 1 Monty Morin 2006-10-17 13:45:48.763 900000.00 4 2 Angel Chang 2005-07-29 13:45:48.763 95000.75 5 2 Hari Kishore 2006-05-15 13:45:48.763 83000.63 6 2 Chris Smith 2006-07-19 13:45:48.763 78000.12 7 3 Susie Taylor 2006-04-19 13:45:48.857 57500.00 8 3 Mat Bowser 2006-12-21 13:45:48.857 76000.35 9 3 Mathew Weiner 2007-06-24 13:45:48.857 55000.00 10 4 Nathan Twist 2007-04-05 13:45:48.857 77000.00 11 4 Nick Patten 2007-01-12 13:45:48.857 75000.00 12 4 Chris Heibert 2007-01-01 13:45:48.857 45000.50 13 4 Dennis Menace 2006-12-31 13:45:48.857 96000.50 14 4 Amy Norton 2006-09-22 13:45:48.857 125000.00
Now, if your HR director asks you to: “Give the top 2 highest paid employees per department. Provide the employee first and last name, their hiring dates and their salary in addition to their department”
In order to get this data without making use of functions or too many inline views (derived tables in SQL Server lingo), we can use dense_rank() and the windowing functions to get this information relatively easily:
select *
FROM (select dept.dept_name,
emp.emp_fname,
emp.emp_lname,
emp.date_of_hire,
emp.salary,
dense_rank() over (partition by dept.dept_name order by emp.salary desc) rank_order
from @emp_master as emp
inner join @dept_master as dept
on emp.dept_id = dept.dept_id
) as DT
where DT.rank_order < 3
order by DT.dept_name, DT.salary desc
Output is:
dept_name emp_fname emp_lname date_of_hire salary rank_order ------------------------------ ------------------------------ ------------------------------ ----------------------- --------------------------------------- -------------------- Accounting John Doe 2006-07-19 13:54:56.493 100000.50 1 Accounting Monty Morin 2006-10-17 13:54:56.493 90000.00 2 Development Angel Chang 2005-07-29 13:54:56.493 95000.75 1 Development Hari Kishore 2006-05-15 13:54:56.493 83000.63 2 Operations Amy Norton 2006-09-22 13:54:56.493 125000.00 1 Operations Dennis Menace 2006-12-31 13:54:56.493 96000.50 2 Services Mat Bowser 2006-12-21 13:54:56.493 76000.35 1 Services Susie Taylor 2006-04-19 13:54:56.493 57500.00 2
The query joins the two tables and then does partitioning of the data by department name and then orders by the salary of the employees withing that department in a descending fashion and then assigns a rank to each of those records. The query then filters off the records and returns back only the top 2 records. This is such a simple implementation – this can be done in other ways as well without making use of any analytical functions but those will require a lot more logic.
A current limitation in SQL Server (this is allowed in Oracle) is that the “ORDER BY CLAUSE” cannot be used with the aggregate window functions. For example:
SUM(SALARY) OVER (ORDER BY EMP_ID) will give an error and of course the same for:
SUM(SALARY) OVER (PARTITION BY DEPT_NAME ORDER BY EMP_ID)
This is allowed in Oracle. Maybe in SQL Server 2008, MSFT will enhance the analytic functions as well as introduce new ones.
4 Responses to “Analytic Functions – I”
Sorry, the comment form is closed at this time.


Re-sequencing a number column in Oracle « Systems Engineering and RDBMS said
[...] If you want to dig into the Row_Number() and the partition by logic, you can search this web-site for more information on those functions or look up Oracle docs. Oracle has a very rich set of analytic functions which can be very useful for development as well as troubleshooting purposes. We had also covered one of those in a post few days ago – you can access it here. [...]
Row Value Constructor Support in SQL Server 2008 « Systems Engineering and RDBMS said
[...] are coming our way in the next release (like the much awaited MERGE command, better support for the analytic functions – better support for the OVER clause for ranking functions and aggregates etc.) that we will be [...]
Calculating Running Totals « Systems Engineering and RDBMS said
[...] note that as we had indicated in one of our previous blog post as well, in SQL Server 2005, the ORDER BY clause cannot be used within the OVER() clause when using [...]
Analytic Functions – II « Systems Engineering and RDBMS said
[...] Posted by decipherinfosys on December 22, 2009 In many of our posts, we have discussed different kinds of Analytic Functions and their usage to solve real world problems. However, we have seen at many client sites that many developers or DBAs still do not use many of these analytic functions in their code. These are very well documented by Microsoft (SQL Server) and Oracle. In this post, we will take a look at some of them. We will pick up this post from the first post that we had done about this topic – here. [...]