A Grouping and Ordinal Ranking problem
Posted by decipherinfosys on April 15, 2009
Got a question from a reader yesterday:
“Hi! I have an employee labor activity table and the table structure and some sample data for my problem is like this:
CREATE TABLE dbo.EMP_ACTIVITY
(
EMP_ACTIVITY_ID INT NOT NULL IDENTITY(1,1)
,ACTV_DATE DATETIME NOT NULL
,ACTV_TYPE NVARCHAR(10) NOT NULL
,EMP_MASTER_ID INT NOT NULL
)
GO
SET NOCOUNT ON
GO
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 10:00:00′, ‘JOB1′, 10);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 10:30:00′, ‘JOB1′, 10);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 11:30:00′, ‘JOB1′, 10);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 11:35:00′, ‘JOB2′, 10);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 12:30:00′, ‘JOB2′, 10);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 14:30:00′, ‘JOB3′, 10);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 15:30:00′, ‘JOB3′, 10);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 16:30:00′, ‘JOB1′, 10);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 10:00:00′, ‘JOB1′, 20);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 10:30:00′, ‘JOB1′, 20);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 11:30:00′, ‘JOB2′, 20);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 11:35:00′, ‘JOB3′, 20);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 12:30:00′, ‘JOB2′, 30);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 14:30:00′, ‘JOB3′, 30);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 15:30:00′, ‘JOB3′, 30);
INSERT INTO dbo.EMP_ACTIVITY (ACTV_DATE, ACTV_TYPE, EMP_MASTER_ID) VALUES (’04/13/2009 16:30:00′, ‘JOB1′, 30);
What I need to do in a SQL statement is that I need to sequence these records such that the sequence number should change on a change in the Activity Type and it should re-set to 1 again on a change in the employee ID value…the activity type could be the same as one of the previous activities done by the user. So, using the data set from above as an example, what I need the output from above to be is like this:
| actv_date | actv_type | emp_master_id | Final_Activity_Seq |
| 2009-04-13 10:00:00.000 | JOB1 | 10 | 1 |
| 2009-04-13 10:30:00.000 | JOB1 | 10 | 1 |
| 2009-04-13 11:30:00.000 | JOB1 | 10 | 1 |
| 2009-04-13 11:35:00.000 | JOB2 | 10 | 2 |
| 2009-04-13 12:30:00.000 | JOB2 | 10 | 2 |
| 2009-04-13 14:30:00.000 | JOB3 | 10 | 3 |
| 2009-04-13 15:30:00.000 | JOB3 | 10 | 3 |
| 2009-04-13 16:30:00.000 | JOB1 | 10 | 4 |
| 2009-04-13 10:00:00.000 | JOB1 | 20 | 1 |
| 2009-04-13 10:30:00.000 | JOB1 | 20 | 1 |
| 2009-04-13 11:30:00.000 | JOB2 | 20 | 2 |
| 2009-04-13 11:35:00.000 | JOB3 | 20 | 3 |
| 2009-04-13 12:30:00.000 | JOB2 | 30 | 1 |
| 2009-04-13 14:30:00.000 | JOB3 | 30 | 2 |
| 2009-04-13 15:30:00.000 | JOB3 | 30 | 2 |
| 2009-04-13 16:30:00.000 | JOB1 | 30 | 3 |
How can I go about it?“
There are a couple of ways to do that … the key thing to understand is that first we need to mark each record at which the job changes and we need to ensure that it is done in the right order since a user can switch back to an activity called JOB1 sometime during the later part of the day. So, if somehow we can mark the record at which the change occurs, that is how we can proceed then and provide a rank. So, keeping that in mind, let’s use a CTE in SQL Server and simulate a lag() function since SQL Server does not have an inbuilt lag() function like Oracle does. Later on, we will show how it can be done in Oracle.
with actv_change as (
select
EMP_ACTIVITY_ID
, ACTV_DATE
, ACTV_TYPE
, EMP_MASTER_ID
, ROW_NUMBER() over (partition by EMP_MASTER_ID ORDER by ACTV_DATE) as RN
from dbo.EMP_ACTIVITY
)
select
AC1.EMP_ACTIVITY_ID
, AC1.ACTV_DATE
, AC1.ACTV_TYPE
, AC1.EMP_MASTER_ID
, (CASE COALESCE(AC1.ACTV_TYPE, AC2.ACTV_TYPE)
WHEN AC2.ACTV_TYPE THEN 0
ELSE 1
END) as SEQ
from actv_change as AC1
left outer join actv_change as AC2
on AC1.EMP_MASTER_ID = AC2.EMP_MASTER_ID
AND AC1.RN = AC2.RN + 1
The output is like this:
EMP_ACTIVITY_ID ACTV_DATE ACTV_TYPE EMP_MASTER_ID SEQ 17 2009-04-13 10:00:00.000 JOB1 10 1 18 2009-04-13 10:30:00.000 JOB1 10 0 19 2009-04-13 11:30:00.000 JOB1 10 0 20 2009-04-13 11:35:00.000 JOB2 10 1 21 2009-04-13 12:30:00.000 JOB2 10 0 22 2009-04-13 14:30:00.000 JOB3 10 1 23 2009-04-13 15:30:00.000 JOB3 10 0 24 2009-04-13 16:30:00.000 JOB1 10 1 25 2009-04-13 10:00:00.000 JOB1 20 1 26 2009-04-13 10:30:00.000 JOB1 20 0 27 2009-04-13 11:30:00.000 JOB2 20 1 28 2009-04-13 11:35:00.000 JOB3 20 1 29 2009-04-13 12:30:00.000 JOB2 30 1 30 2009-04-13 14:30:00.000 JOB3 30 1 31 2009-04-13 15:30:00.000 JOB3 30 0 32 2009-04-13 16:30:00.000 JOB1 30 1
The value of 1 in the Sequence column represents when the change occurs. We first got the row numbers by partitioning the data based on the employee ID value and ordering the data based on the activity date. And then we used a left join with the CTE data set to lag the record and compute the 0 or 1 value for the Sequence. So, now if we take a running total over the sequence column, we will be able to get the record set that the reader has asked.
It would have been nice if SQL Server allowed the usage of aggregate functions in the OVER clause and use the ORDER BY along with it. Because if it did, we could have simply done: SUM(SEQ) over (partition by emp_master_id order by actv_date) to get our running total. However, this can easily be done bydoing this:
declare @temp table (actv_date datetime, actv_type nvarchar(10), emp_master_id int, seq int);
with actv_change as (
select
EMP_ACTIVITY_ID
, ACTV_DATE
, ACTV_TYPE
, EMP_MASTER_ID
, ROW_NUMBER() over (partition by EMP_MASTER_ID ORDER by ACTV_DATE) as RN
from dbo.EMP_ACTIVITY
)
insert into @temp
select
AC1.ACTV_DATE
, AC1.ACTV_TYPE
, AC1.EMP_MASTER_ID
, (CASE COALESCE(AC1.ACTV_TYPE, AC2.ACTV_TYPE)
WHEN AC2.ACTV_TYPE THEN 0
ELSE 1
END) as SEQ
from actv_change as AC1
left outer join actv_change as AC2
on AC1.EMP_MASTER_ID = AC2.EMP_MASTER_ID
AND AC1.RN = AC2.RN + 1
select
t1.actv_date
,t1.actv_type
,t1.emp_master_id
,t1.seq
,(select sum(t2.seq) from @temp as t2 where t2.emp_master_id = t1.emp_master_id and t2.actv_date <= t1.actv_date) as Final_Activity_SEQ
from @temp as t1
So, all we have done here is to get the data into a table variable and then get a running total using a sub-query. The output is shown below:
| actv_date | actv_type | emp_master_id | seq | Final_Activity_Seq |
| 2009-04-13 10:00:00.000 | JOB1 | 10 | 1 | 1 |
| 2009-04-13 10:30:00.000 | JOB1 | 10 | 0 | 1 |
| 2009-04-13 11:30:00.000 | JOB1 | 10 | 0 | 1 |
| 2009-04-13 11:35:00.000 | JOB2 | 10 | 1 | 2 |
| 2009-04-13 12:30:00.000 | JOB2 | 10 | 0 | 2 |
| 2009-04-13 14:30:00.000 | JOB3 | 10 | 1 | 3 |
| 2009-04-13 15:30:00.000 | JOB3 | 10 | 0 | 3 |
| 2009-04-13 16:30:00.000 | JOB1 | 10 | 1 | 4 |
| 2009-04-13 10:00:00.000 | JOB1 | 20 | 1 | 1 |
| 2009-04-13 10:30:00.000 | JOB1 | 20 | 0 | 1 |
| 2009-04-13 11:30:00.000 | JOB2 | 20 | 1 | 2 |
| 2009-04-13 11:35:00.000 | JOB3 | 20 | 1 | 3 |
| 2009-04-13 12:30:00.000 | JOB2 | 30 | 1 | 1 |
| 2009-04-13 14:30:00.000 | JOB3 | 30 | 1 | 2 |
| 2009-04-13 15:30:00.000 | JOB3 | 30 | 0 | 2 |
| 2009-04-13 16:30:00.000 | JOB1 | 30 | 1 | 3 |
In the case of Oracle, this query simply becomes:
with actv_change as (
select
EMP_ACTIVITY_ID
, ACTV_DATE
, ACTV_TYPE
, EMP_MASTER_ID
, case when lag(ACTV_TYPE,1, ACTV_TYPE) over (partition by EMP_MASTER_ID order by ACTV_DATE) = ACTV_TYPE
then 0
else 1
end as RN
from EMP_ACTIVITY
)
select
AC1.ACTV_DATE
, AC1.ACTV_TYPE
, AC1.EMP_MASTER_ID
, sum(RN) over(partition by EMP_MASTER_ID order by ACTV_DATE) + 1 as SEQ
from actv_change AC1
So, the solution is essentially the same – the only difference is that in the case of Oracle, we do not need to use the table variable and then get the running total…we can use the analytic function to get that output.
Resources:


Another question solved by Analytics « Systems Engineering and RDBMS said
[...] of Analytics…the question was from the same reader who had asked us the question on the employee labor activity tracking. This time, the question was around the grouping and ranking of the receiving time and the [...]
Ratio_to_Report Analytic Function « Systems Engineering and RDBMS said
[...] before in order to resolve grouping and ranking problems in couple of previous blog posts – here and here. Today, we are going to cover another analytic function ratio_to_report to calculate the [...]