Systems Engineering and RDBMS

Analytic Functions – II

Posted by decipherinfosys on December 22, 2009

In many of our posts, we have discussed different kinds of Analytic Functions and their usage to solve real world problems.  However, we have seen at many client sites that many developers or DBAs still do not use many of these analytic functions in their code.  These are very well documented by Microsoft (SQL Server) and Oracle.  In this post, we will take a look at some of them.  We will pick up this post from the first post that we had done about this topic – here.

Let’s take an example first which will distinguish an aggregate function from an analytic function since that is one of differences that need to be cleared up.   So, let’s compare GROUP BY with the COUNT(*) using the PARTITION BY clause.  Here are two examples (we are using SQL Server syntax in this post – the coverage of analytic functions in Oracle is much more robust than SQL Server and has been around since a very long time … we will cover the differences as well as the functions that are only available in Oracle in this series):

SELECT [TYPE], COUNT(*) AS CNT
FROM SYS.OBJECTS
GROUP BY [TYPE]
ORDER BY [TYPE]
GO

In this above SQL, it will give us the counts grouped by the type of the objects in the schema.  Another thing to note is that if we have a column in the “Select” list, then it has to be in the GROUP BY clause as well.  Here is the output from the schema that we are using:

TYPE    CNT
-------------------
C     23
D     778
F     442
FN    11
IF    5
IT    3
P     146
PK    203
S     41
SQ    3
TF    12
TR    1
U     282
UQ    1
V     68

Now, if we use an analytic function instead, we can get the data without grouping the result set.  Example:

SELECT [NAME], [TYPE], COUNT(*) OVER (PARTITION BY [TYPE]) AS CNT
FROM SYS.OBJECTS
ORDER BY [TYPE]
GO

And here is the abridged output:

NAME                              TYPE  CNT
------------------------------------------------------------
CK__INPT_MUTU__COMPA__149FFA14    C     23
CKC_RDT                           C     23
CKC_reportActivity                C     23
CKC_MFRPT_ACCOUNT_POSITION_MASTER C     23
CHK_SHARE_HOLDER_ID_LEN1          C     23
.
.
.
.
. <<Repeated 23 times - once per Check constraint object>>
DF__USER_MAST__AUTO___6E565CE8    D     778
DF__ACCOUNT_S__LKUP___056ECC6A    D     778

As you can see from above, the analytic function was computed as part of the select and we have got the value repeated per object name of that type.  The syntax as per BOL for ranking and window functions is:

Ranking Window Functions 
<OVER_CLAUSE> :: =
    OVER ( [ PARTITION BY value_expression , ... [ n ] ]
           <ORDER BY_Clause> )

Aggregate Window Functions 
<OVER_CLAUSE> :: = 
    OVER ( [ PARTITION BY value_expression , ... [ n ] ] )

In the above example, if we take away the partition section and just do: COUNT(*) OVER () – in that case,  the aggregate function COUNT(*) will apply to the entire result set returned by the query i.e. a total count of the number of records since we are not asking it to partition the result set based on anything.   Example:

SELECT [NAME], [TYPE], COUNT(*) OVER () AS CNT
FROM SYS.OBJECTS
ORDER BY [TYPE]
GO

And abridged output is:

NAME                              TYPE  CNT
-------------------------------------------------------
CK__INPT_MUTU__COMPA__149FFA14    C     2019
CKC_RDT                           C     2019
CKC_reportActivity                C     2019
.
.

Where 2019 is the entire result set record count.

So, bottom line is that if say we have 1000 total records returned by a query which can be partitioned into say 20 different groups based on the type, we can partition the data based on those “groups” and then apply the aggregate at the top of it and get all those results back in a single SQL statement.  This allows us to do a lot of very complicated computations in very simple SQL statement(s).  What we used as an aggregator at the top of those partitions in the above example was the COUNT(*).  We can also use other aggregation functions like SUM(), AVG(), MIN(), MAX() etc. or we can make use of Ranking functions like RANK(), DESNSE_RANK(), NTILE(), ROW_NUMBER() etc.. We will look at those in a second.  Another thing that we would like to point out is that as you can see from the BOL syntax above, the ranking functions also have the facility to use an ORDER BY clause in addition to the partition clause.  So, what does that do?  We have already seen that we can partition the result set into groups and then apply the Ranking/Aggregate function(s) at the top of them.  The “ORDER BY” clause after the PARTITION clause helps us to order the data set in a particular way, example:

SELECT [NAME], [TYPE], RANK() OVER (PARTITION BY [TYPE] ORDER BY [OBJECT_ID]) AS RNK
FROM SYS.OBJECTS
GO

NAME                              TYPE  RNK
------------------------------------------------
CK__INPT_MUTU__COMPA__149FFA14    C     1
CKC_RDT                           C     2
CKC_reportActivity                C     3
CKC_MFRPT_ACCOUNT_POSITION_MASTER C     4
.
.

The rank above is the rank of the object within a particular partition.  So, for the check constraints that are shown above, the total count was 23 i.e. the partition of check constraints had 23 records and the rank above is obtained for those 23 records by doing an ordering of those records within that partition.  Another thing to note is that when you are using the ranking functions, you can also choose not to use a partition and just order the result:

SELECT
[NAME]
,        [TYPE]
,        DENSE_RANK() OVER (ORDER BY [TYPE]) AS RNK
FROM SYS.OBJECTS

GO
NAME                              TYPE  RNK
-------------------------------------------
CK__INPT_MUTU__COMPA__149FFA14    C     1
CKC_RDT                           C     1
CKC_reportActivity                C     1
.
.
DF__USER_MAST__AUTO___6E565CE8    D     2
DF__ACCOUNT_S__LKUP___056ECC6A    D     2
DF__OUTPT_WOR__CREAT__6E6D022F    D     2

What this has done is that it has assigned a dense rank number to all the records that have the same [TYPE] value.

And here is an example demonstrating all the windowing function usage:

SELECT
[NAME]
,        [TYPE]
/*************************************
Ranking Windowing Functions
**************************************/
,        RANK() OVER (PARTITION BY [TYPE] ORDER BY [OBJECT_ID])    AS RNK
,        DENSE_RANK() OVER (ORDER BY [TYPE])                        AS DENSE_RNK
,        ROW_NUMBER() OVER (PARTITION BY [TYPE] ORDER BY [NAME])    AS ROWNUMBER
,        NTILE(10) OVER (ORDER BY [TYPE])                        AS NTILE_VAL
/*************************************
Aggregate Windowing Functions
— Using random columns just to illustrate usage
**************************************/
,        SUM([SCHEMA_ID]) OVER (PARTITION BY [TYPE])                AS SUM_AGGR
,        AVG([SCHEMA_ID]) OVER (PARTITION BY [TYPE])                AS AVG_AGGR
,        COUNT(*) OVER (PARTITION BY [TYPE])                AS COUNT_AGGR
,        MIN([OBJECT_ID]) OVER (PARTITION BY [TYPE])                AS MIN_AGGR
,        MAX([OBJECT_ID]) OVER (PARTITION BY [TYPE])                AS MAX_AGGR
FROM SYS.OBJECTS
GO

We have used random columns (Object_ID or Schema_ID) above just to illustrate the usage of the ranking and aggregate windowing functions.  BOL has some very good examples using Sales tables in the AdventureWorks database – here.

In the next post in this series, we will look into the rich functionality of analytics in Oracle.  We will cover functions like LEAD(), LAG(), FIRST_VALUE(), LAST_VALUE() and a couple of other very useful ones and will also look at ROW Type and RANGE Type Windows clauses.

Resources:

  • SQL Server BOL – here.
  • Oracle Documentation on Analytics – here.
  • In future releases of SQL Server, there will hopefully be a lot of additions in the space of Analytics – some of those features have been requested by many SQL Server MVPs at MSFT Connect – you can read more on it here.  It also has a link to MVP Itzik’s article.
  • Oracle Guru Tom Kyte has written extensively about analytics in his books and his portal – you will benefit from going through those discussions – here.
  • Good information on SQL Standards and also about the differences between different RDBMS (includes analytics as well) – here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: