Systems Engineering and RDBMS

Archive for November 21st, 2008

Using DISTINCT … just because

Posted by decipherinfosys on November 21, 2008

We have blogged about the DISTINCT clause in the past (common mis-conceptions about the DISTINCT clause) and what it does and how it gets mis-used at times. This is one of the common issues that we have seen in some of the queries written by junior folks or even at times by senior folks not well versed with SQL.  They add the DISTINCT clause to the Select list just to make sure that “in case” there are duplicates, it will all be taken care of by this wonderful clause.  Most of the time, the reason why they get duplicates is because of join conditions or because they did not write the query in different ways like for example: If you have a 3 table join and say tableA : tableB :: 1:N (1 to many relationship) and tableB:tableC::1:N (again a 1 to many relationship) and the 3 joins are together but one is only selecting the data from tableA and tableB and that set of columns provides unique set of data, however the developer also adds tableC in the join because there is a filter condition (where clause) on it.  That would result into duplicates when there are many records being returned from tableC for a record in tableB because of the 1:N relationship.  In that case, the developer ends up adding a DISTINCT clause in the select list to get the distinct set of data.  The developer instead could have chosen to just use an EXISTS clause like this:

select …<columns from tableA and tableB>…

from tableA

inner join table B

on tableA.col1 = tableB.col2

where …<criteria on tableA and tableB – if being used based on the business logic>…

and exists (select 1 from tableC where col3 = tableB.col4)

and this will negate the usage of the DISTINCT clause.  There are many other such scenarios as well.  The problem with using DISTINCT un-necessarily is that the code will then incur sorting costs un-necessarily since the DISTINCT clause requires that the resultset is sorted and the duplicates are eliminated from the result set.

We have seen even bigger issues when people start using distinct aggregates even when they are not needed or when there are better ways to write the query…by distinct aggregates, we mean, clauses like:

count(distinct invoice_nbr) or sum(distinct sales_qty) etc.

The problem exacerbates when there are a lot of mixing of such distinct aggregates with non-distinct aggregates, example:

select
count(distinct invoice_nbr),
sum(distinct sales_qty),
count(ship_date),
Avg(sales_price),

…etc.

Mixing of these distinct aggregates and non distinct aggregates in the same SQL statement’s select list leads to spooling and involves several re-reads of the intermediate results which is very expensive – even more expensive than computing those separately in separate queries.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

SQL Server 2008 Compliance Guide

Posted by decipherinfosys on November 21, 2008

For those in the IT teams that are responsible for the compliance related activities for their SQL Server environments, here is an excellent whitepaper from MSFT on this topic.

Posted in SQL Server | Leave a Comment »

TechNet Whitepapers on SQL Server 2008

Posted by decipherinfosys on November 21, 2008

Yesterday night while researching an issue for a client of ours, we found the solution to the problem in one of the whitepapers on TechNet. There is a wealth of information TechNet and knowledge is power – so, when you do get some time from your busy schedules or even when you are troubleshooting issues, we would recommend that you go over the whitepapers at TechNet – all the SQL Server 2008 ones you can access over here. And if you are more of a video person (most of us are – seeing makes it easier to follow), TechNet also has some videos over here and am sure that the content will only grow with the passage of time.

Posted in SQL Server | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 74 other followers