Systems Engineering and RDBMS

Calculating Running Totals

Posted by decipherinfosys on November 25, 2007

One frequent requirement that you will face would be a need to generate running totals. There are a couple of ways to do this in different RDBMS. Let us cover some of those options using an example (script is written with Oracle’s SQL syntax):

CREATE TABLE DEC_TEST_ACCOUNT_TXN
(
DEC_TEST_ACCOUNT_TXN_ID NUMBER(10) NOT NULL,
ACCOUNT_MASTER_ID NUMBER(10) NOT NULL,
TXN_DATE_TIME DATE NOT NULL,
TXN_AMOUNT NUMBER(9,2) NOT NULL,
TXN_TYPE_ID NUMBER(1) NOT NULL,
BRANCH_MASTER_ID NUMBER(10) NULL,
CONSTRAINT PK_DEC_TEST_TXN PRIMARY KEY (DEC_TEST_ACCOUNT_TXN_ID)
)
/

where:
a) ACCOUNT_MASTER_ID is the foreign key to the ACCOUNT_MASTER table,
b) TXN_DATE_TIME is the date and time the transaction took place,
c) TXN_AMOUNT is the total amount of the transaction,
d) TXN_TYPE_ID is the type of the transaction (1 for Deposit, 2 for Withdrawal and 3 for Transfer), and
e) BRANCH_MASTER_ID is the FK to the Branch in case the transaction was done in person.

And using a before trigger and a sequence, the DEC_TEST_ACCOUNT_TXN_ID is an automatically DB generated key and is also the PK of the table.

After populating it with data, here is how the data looks:

SELECT * FROM DEC_TEST_ACCOUNT_TXN;

DEC_TEST_ACCOUNT_TXN_ID ACCOUNT_MASTER_ID TXN_DATE_TIME           TXN_AMOUNT                              TXN_TYPE_ID BRANCH_MASTER_ID
----------------------- ----------------- ----------------------- --------------------------------------- ----------- ----------------
1                       1                 2007-11-23 07:41:43     10000.00                                1           NULL
2                       1                 2007-11-24 07:41:43     1000.00                                 1           NULL
3                       1                 2007-11-25 07:41:43     7000.00                                 2           NULL
4                       1                 2007-11-26 07:41:43     3000.00                                 1           NULL
5                       1                 2007-11-27 07:41:43     5000.00                                 2           NULL
6                       1                 2007-11-28 07:41:43     100000.00                               1           NULL

There are a couple of ways of generating the running totals:

1) Using a sub-select,
2) Using a self join,
3) Using a cursor,
4) Using analytic functions

We will pick the #1 and the #4 options as examples here:

/**************************************
Using a sub-select
***************************************/
SELECT
ACCOUNT_MASTER_ID,
TXN_DATE_TIME,
TXN_AMOUNT,
(SELECT SUM(CASE WHEN TXN_TYPE_ID IN (2,3) THEN -1 ELSE 1 END * TXN_AMOUNT)
FROM DEC_TEST_ACCOUNT_TXN A2
WHERE A2.TXN_DATE_TIME <= A1.TXN_DATE_TIME) RUNNING_BALANCE
FROM dbo.DEC_TEST_ACCOUNT_TXN A1
WHERE A1.ACCOUNT_MASTER_ID = 1
/

ACCOUNT_MASTER_ID TXN_DATE_TIME           TXN_AMOUNT                              RUNNING_BALANCE
----------------- ----------------------- --------------------------------------- ---------------------------------------
1                 2007-11-23 07:41:43     10000.00                                10000.00
1                 2007-11-24 07:41:43     1000.00                                 11000.00
1                 2007-11-25 07:41:43     7000.00                                 4000.00
1                 2007-11-26 07:41:43     3000.00                                 7000.00
1                 2007-11-27 07:41:43     5000.00                                 2000.00
1                 2007-11-28 07:41:43     100000.00                               102000.00

/**************************************
Using Analytic Function
***************************************/
SELECT
ACCOUNT_MASTER_ID,
TXN_DATE_TIME,
TXN_AMOUNT,
SUM(CASE WHEN TXN_TYPE_ID IN (2,3) THEN -1 ELSE 1 END * TXN_AMOUNT) OVER (ORDER BY TXN_DATE_TIME) RUNNING_BALANCE
FROM DEC_TEST_ACCOUNT_TXN A1
WHERE A1.ACCOUNT_MASTER_ID = 1
/

When the data set is large, using a sub-select or a cursor is typically not a good option. It also depends upon the
filter criteria that you have available in your WHERE clause – if you have an index seek operation and have narrowed down your results to a few records, then either of the options would be fine. Usage of the analytic functions is by far the best choice in terms of performance especially when dealing with large record sets.

Please note that as we had indicated in one of our previous blog post as well, in SQL Server 2005, the ORDER BY clause cannot be used within the OVER() clause when using aggregate functions. Hopefully, this restriction will be lifted in SQL Server 2008. We haven’t tested it yet on CTP5.

Sorry, the comment form is closed at this time.

 
%d bloggers like this: