Systems Engineering and RDBMS

Another question solved by Analytics

Posted by decipherinfosys on April 22, 2009

We have written quite a few posts on Analytics before – both in Oracle as well as SQL Server.  Yesterday, we got one question from our reader which fell into the same category of Analytics…the question was from the same reader who had asked us the question on the employee labor activity tracking.  This time, the question was around the grouping and ranking of the receiving time and the Quantity received on the dock door of a warehouse.  Here is the scenario:

On the receiving dock, when the activities are recorded, the data distribution looks like this:

declare @rec_dock table (recvd_date_time datetime, qty int);
insert into @rec_dock values (’04/10/2009 11:10:01′, 100);
insert into @rec_dock values (’04/10/2009 11:10:03′, 150);
insert into @rec_dock values (’04/10/2009 11:10:04′, 20);
insert into @rec_dock values (’04/10/2009 11:10:40′, 10);
insert into @rec_dock values (’04/10/2009 11:10:41′, 200);
insert into @rec_dock values (’04/10/2009 11:15:15′, 300);
insert into @rec_dock values (’04/10/2009 11:15:16′, 450);

Now, the requirement is that I need to sum the quantity where the “time” of the receiving is “within” 5 seconds of each other.  So, essentially we are defining groups in such a way that within a group, all the records are no more than 5 seconds apart.  So, if I have records like:

insert into @rec_dock values ('04/10/2009 15:10:01', 100);
insert into @rec_dock values ('04/10/2009 15:10:03', 150);
insert into @rec_dock values ('04/10/2009 15:10:04', 20);
insert into @rec_dock values ('04/10/2009 15:10:05', 10);
insert into @rec_dock values ('04/10/2009 15:10:06', 200);
insert into @rec_dock values ('04/10/2009 15:10:07', 90);
insert into @rec_dock values ('04/10/2009 15:10:08', 50);
insert into @rec_dock values ('04/10/2009 15:10:10', 10);

Then, I want one single record for above since all the records are within 5 seconds of each other.  I can do complex procedural logic or application side processing or row by row processing to achieve this.  Is there any way that it can be done using analytic functions like you had shown before for my last question?

So, let’s break out the logic here:

a) We start with the rows and if the prior row is within 5 seconds, it will need to fall into the same group, otherwise
b) Create a new group and start dumping the records in it and continue the process of doing so for all the records.

So, since the end user was using SQL Server, we will use the same logic that we had used in the previous post and simulate the lag() function in SQL Server:

declare @rec_dock table (recvd_date_time datetime, qty int);
insert into @rec_dock values ('04/10/2009 11:10:01', 100);
insert into @rec_dock values ('04/10/2009 11:10:03', 150);
insert into @rec_dock values ('04/10/2009 11:10:04', 20);
insert into @rec_dock values ('04/10/2009 11:10:40', 10);
insert into @rec_dock values ('04/10/2009 11:10:41', 200);
insert into @rec_dock values ('04/10/2009 11:15:15', 300);
insert into @rec_dock values ('04/10/2009 11:15:16', 450);

insert into @rec_dock values (’04/10/2009 15:10:01′, 100);
insert into @rec_dock values (’04/10/2009 15:10:03′, 150);
insert into @rec_dock values (’04/10/2009 15:10:04′, 20);
insert into @rec_dock values (’04/10/2009 15:10:05′, 10);
insert into @rec_dock values (’04/10/2009 15:10:06′, 200);
insert into @rec_dock values (’04/10/2009 15:10:07′, 90);
insert into @rec_dock values (’04/10/2009 15:10:08′, 50);
insert into @rec_dock values (’04/10/2009 15:10:10′, 10);

declare @temp table (recvd_date_time datetime, qty int, seq int);

with recv_time as (
select
recvd_date_time
, qty
, ROW_NUMBER() over (ORDER by recvd_date_time) as RN
from @rec_dock
)
insert into @temp
select
AC1.recvd_date_time
, AC1.qty
,case when datediff(ss, ac2.recvd_date_time, ac1.recvd_date_time) > 5 then 1 else 0
end as Change_In_Grp
from recv_time as AC1
left outer join recv_time as AC2
on AC1.RN = AC2.RN + 1

select * from @temp

Change_In_Grp recvd_date_time         qty

------------- ----------------------- -----------

0             2009-04-10 11:10:01.000 100

0             2009-04-10 11:10:03.000 150

0             2009-04-10 11:10:04.000 20

1             2009-04-10 11:10:40.000 10

0             2009-04-10 11:10:41.000 200

1             2009-04-10 11:15:15.000 300

0             2009-04-10 11:15:16.000 450

1             2009-04-10 15:10:01.000 100

0             2009-04-10 15:10:03.000 150

0             2009-04-10 15:10:04.000 20

0             2009-04-10 15:10:05.000 10

0             2009-04-10 15:10:06.000 200

0             2009-04-10 15:10:07.000 90

0             2009-04-10 15:10:08.000 50

0             2009-04-10 15:10:10.000 10

And now, let’s just simply do the running number logic again:

select
recvd_date_time
, qty
, (select sum(seq) + 1 from @temp as t1 where t1.recvd_date_time <= t2.recvd_date_time) as GRP
from @temp as t2

The output is:

recvd_date_time         qty         GRP
----------------------- ----------- -----------
2009-04-10 11:10:01.000 100         1
2009-04-10 11:10:03.000 150         1
2009-04-10 11:10:04.000 20          1
2009-04-10 11:10:40.000 10          2
2009-04-10 11:10:41.000 200         2
2009-04-10 11:15:15.000 300         3
2009-04-10 11:15:16.000 450         3
2009-04-10 15:10:01.000 100         4
2009-04-10 15:10:03.000 150         4
2009-04-10 15:10:04.000 20          4
2009-04-10 15:10:05.000 10          4
2009-04-10 15:10:06.000 200         4
2009-04-10 15:10:07.000 90          4
2009-04-10 15:10:08.000 50          4
2009-04-10 15:10:10.000 10          4

And now, we can simply group this data and get the ranges:

select
	 min(recvd_date_time) as START_TIME
	,max(recvd_date_time) as END_TIME
	, GRP
	, SUM(QTY) as TOTAL_QTY_IN_GRP
FROM
(
select
  recvd_date_time
, qty
, (select sum(seq) + 1 from @temp as t1 where t1.recvd_date_time <= t2.recvd_date_time) as GRP
from @temp as t2) as IV
group by GRP

START_TIME              END_TIME                GRP         TOTAL_QTY_IN_GRP
----------------------- ----------------------- ----------- ----------------
2009-04-10 11:10:01.000 2009-04-10 11:10:04.000 1           270
2009-04-10 11:10:40.000 2009-04-10 11:10:41.000 2           210
2009-04-10 11:15:15.000 2009-04-10 11:15:16.000 3           750
2009-04-10 15:10:01.000 2009-04-10 15:10:10.000 4           630

So, now we have the data set the way we need it…and all this can be done in simply a single SQL in Oracle using the lag() function:

select min(recvd_date_time), max(recvd_date_time), sum(qty)
from (
select recvd_date_time, qty,
max(rn) over (order by recvd_date_time) GRP
from (
select recvd_date_time, qty,
lag(recvd_date_time) over (order by recvd_date_time),
case
when abs(lag(recvd_date_time) over (order by recvd_date_time) – recvd_date_time) > 5/24/60/60
then row_number() over (order by recvd_date_time)
end RN
from tableT
)
)
group by GRP
/

There are other ways of doing this as well but the above method of using a CTE and simulating a lag function in SQL Server is one of the ways to achieve this.

Resources:

  • Our previous post on a similar topic – here.  This has references to other posts/articles related to calculation of the running totals etc.
  • Ranking functions in SQL Server – here.
  • Analytic functions in Oracle – here and here.

One Response to “Another question solved by Analytics”

  1. […] Another question solved by Analytics […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: