Systems Engineering and RDBMS

Back to the basics: Using DISTINCT with CASE expression

Posted by decipherinfosys on May 5, 2009

Today at a client site, a junior developer mentioned that he is getting an error when he tries to use ‘distinct’ clause with a ‘case’ statement.  In one of our previous blog post, we had blogged why it is not a good idea to use ‘distinct’ clause just in case as oppose to whenever it is absolutely essential. In this blog post, we will cover the scenario in which we need to use ‘distinct’ clause with ‘case’ statement. This combination is required when more than one table having 1:N relationships are involved in query and we have to aggregate data at the header level. Let us start   with an example. Connect to SQL*Plus and create following tables. Little added complexity in this case was to pivot the data based on the status and hence we have to use ‘case’ statement as well. We wanted to create a query, which indicates for a given customer how many orders have been placed, fulfilled and cancelled based on the status code along with the distinct number of items, total units and total orders.

CREATE TABLE CUST_ORDER
(
ORDER_NO  VARCHAR2(10),
CUST_ID VARCHAR2(20),
STAT_CODE NUMBER(2),
CONSTRAINT PK_CUST_ORDER PRIMARY KEY(ORDER_NO)
)
/

CREATE TABLE CUST_ORDER_DTL
(
ORDER_NO  VARCHAR2(10),
ORDER_SEQ NUMBER(5),
ITEM_ID   VARCHAR2(10),
ORD_QTY   NUMBER(5,2) DEFAULT 0 NOT NULL,
CONSTRAINT PK_CUS_ORDER_DTL PRIMARY KEY (ORDER_NO,ORDER_SEQ)
)
/

ALTER TABLE CUST_ORDER_DTL ADD CONSTRAINT FK_CUST_ORDER_DTL
FOREIGN KEY(ORDER_NO) REFERENCES CUST_ORDER(ORDER_NO)
/

Let’s populate it with appropriate data.

--CUST_ORDER
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8765795', 'A05040013', 10);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8780045', 'A05040013', 10);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8782351', 'A05040013', 10);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8791268', 'A05040013', 20);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8794358', 'A05040013', 20);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8795343', 'A05040013', 20);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8804469', 'A05040013', 20);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8806861', 'A05040013', 20);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8808468', 'A05040013', 99);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8811897', 'A05040013', 99);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8820842', 'B08150003', 10);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8866542', 'B08150003', 10);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8887538', 'B08150003', 10);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8901983', 'B08150003', 99);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8907800', 'B08150003', 20);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8932068', 'B08150003', 20);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8938061', 'B08150003', 20);
Insert into CUST_ORDER(ORDER_NO, CUST_ID, STAT_CODE) Values('ORD8954227', 'B08150003', 20);

–CUST_ORDER_DTL
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8804469’, 2, ‘783869’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8804469’, 3, ‘277603’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8808468’, 1, ‘777555’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8782351’, 1, ‘781466’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8791268’, 1, ‘143158’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8794358’, 3, ‘145851’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8795343’, 1, ‘903218’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8780045’, 1, ‘156462’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8811897’, 1, ‘779778’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8806861’, 2, ‘783901’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8765795’, 1, ‘91485’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8866542’, 3, ‘206539’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8820842’, 1, ‘206539’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8954227’, 1, ‘206539’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8901983’, 1, ‘206539’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8907800’, 5, ‘143136’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8932068’, 1, ‘206539’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8887538’, 1, ‘814600’, 1);
Insert into CUST_ORDER_DTL(ORDER_NO, ORDER_SEQ, ITEM_ID, ORD_QTY)
Values(‘ORD8938061’, 1, ‘814599’, 1);

Here is the query to achieve required result along with the output. Note the usage of ‘distinct’ clause outside of case statement.

SELECT oh.cust_id,
COUNT(DISTINCT oh.order_no) Total_Orders,
COUNT(DISTINCT (CASE WHEN oh.stat_code = 10 THEN oh.order_no
ELSE NULL
END)
) Placed,
COUNT(DISTINCT (CASE WHEN oh.stat_code = 20 THEN oh.order_no
ELSE NULL
END)
) Fulfilled,
COUNT(DISTINCT (CASE WHEN oh.stat_code = 99 THEN oh.order_no
ELSE NULL
END)
) Cancelled,
COUNT(DISTINCT od.item_id) Distinct_Items,
SUM(od.ord_qty) Total_Units
FROM cust_order oh
INNER JOIN cust_order_dtl od ON od.order_no = oh.order_no
GROUP BY oh.cust_id

CUST_ID     TOTAL     PLACED  FULFILLED  CANCELLED UNIQUE_ITEMS TOTAL_UNITS
———- —— ———- ———- ———- ———— ———–
A05040013      10          3          5          2           11          11
B08150003       8          3          4          1            4           8

Developer was getting an error because he was putting the ‘distinct’ clause inside of the case statement. In general whenever we have to use ‘distinct’ clause along with case statement, it always resides outside the case statement unless we are using a sub-query in the case statement. If we put ‘distinct’ clause inside case statement, we will get ora-00936, missing expression error.

Resources:

  • Oracle 11g PL/SQL Reference Manual – here.

4 Responses to “Back to the basics: Using DISTINCT with CASE expression”

  1. holy carp. I was sitting here for a few days trying to figure that one out. I was summing and counting all wrong. Thank you 4 year old internet posts.
    doing this in MS SQL Server 2005 and it was universal syntax.

    I was using a common table expression (MS SQL thing) to simplify my thought process for the query but, regardless, the part I was then stuck on was how I was going to get past this.
    since I had many several rendering sites who could’ve seen a person more than once yet that person could’ve gone to any one of the sites anything I did was aggregating their visits to other sites even though I was grouping by a particular site.

    this is all I needed:

    select
    CTE.rendering_site
    ,count(distinct(CTE.person_id)) as seen
    ,count(distinct(case when CTE.rendering_site = CTE.pcp_site then CTE.person_id else null end)) as seen_empanelled

    from CTE
    group by CTE.rendering_site
    order by 1

    • … there’s probably a more elegant way of doing this out there but I’m using SQL Server Reporting Services and doing the sums and calculations within the report post processing was taking a while as it would pull about 100,000 entries into my local machine so I wanted to do the aggregating in my SQL statement.
      Brilliant and thank you.

      • … and I hate DISTINCT statements as well. I feel like I haven’t filtered my data smartly but this was a scenario where I don’t think I could get around it as I was already having to filter out data to get to this point (hence the CTE).

  2. Thanks this worked great for me. I too dislike using distinct but it was the most efficient way to do a report I was working on.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: