Systems Engineering and RDBMS

Having vs Where Clause

Posted by decipherinfosys on July 29, 2008

Got another question from one of the readers asking about: “whether it is ok to substitute the WHERE clause with the HAVING clause since in the absence of the WHERE clause and a grouping condition, HAVING essentially acts as a WHERE clause onlY? What is the difference in using these two clauses?”

One thing to understand first is the basic function of the WHERE and the HAVING clauses – WHERE is used to apply filter conditions on the table columns and HAVING is used to apply the filters typically after the aggregations are done.  It is also important to understand the progression of the evaluation of the execution plan.  In the presence of both the WHERE and the HAVING clause, the WHERE condition will get evaluated prior to the HAVING condition.  Let’s take an example (using AdventureWorks sample user database in SQL Server 2005):

select Color, SUM(StandardCost) as SUM_STD_COST

from Production.Product

where Color IN (‘Blue’, ‘Black’)

group by Color

And one can also re-write this to use HAVING instead of the WHERE clause:

select Color, SUM(StandardCost) as SUM_STD_COST

from Production.Product

group by Color

having Color IN (‘Blue’, ‘Black’)

The second one evaluates the filter after the grouping and the first one applies the filter and then does the grouping and is more effective.  For this simple SQL statement, if you look at the execution plan, you will see that the optimizer is smart to change the second SQL’s execution plan to do a filter before the grouping since in the absence of the where clause,

From the first SQL:

————————————————————————-
|–Stream Aggregate(GROUP BY:([AdventureWorks].[Production].[Product].[Color]) DEFINE:([Expr1003]=SUM([AdventureWorks].[Production].[Product].[StandardCost])))
|–Sort(ORDER BY:([AdventureWorks].[Production].[Product].[Color] ASC))
|–Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID]), WHERE:([AdventureWorks].[Production].[Product].[Color]=N’Black’ OR [AdventureWorks].[Production].[Product].[Color]=N’Blue’))

And for the second SQL (after flushing the cache):

————————————————————————————-
|–Stream Aggregate(GROUP BY:([AdventureWorks].[Production].[Product].[Color]) DEFINE:([Expr1003]=SUM([AdventureWorks].[Production].[Product].[StandardCost])))
|–Sort(ORDER BY:([AdventureWorks].[Production].[Product].[Color] ASC))
|–Clustered Index Scan(OBJECT:([AdventureWorks].[Production].[Product].[PK_Product_ProductID]), WHERE:([AdventureWorks].[Production].[Product].[Color]=N’Black’ OR [AdventureWorks].[Production].[Product].[Color]=N’Blue’))

So, the optimizer does the right thing in this case…however, in the case of complex queries, it would be an issue.  So, use the WHERE clause to filter off the records using indexed columns for good performance and use Having only for applying filters at the top of the aggregations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: