Systems Engineering and RDBMS


Posted by decipherinfosys on January 30, 2007

Functionally, they are the same (as compared to NOT IN vs NOT EXISTS which are functionally different in one scenario – read this post for the differences between NOT IN and NOT EXISTS clauses : ). However, there are performance implications of using one over the other that one needs to be aware of. Assume that we have two tables : TABLE_A and TABLE_B and the match is being done on TABLE_A.col1 = TABLE_B.col2. In that scenario, an in statement like:

select <select column list> from TABLE_A where col1 in (Select col2 from TABLE_B)

will get processes in this way:

1) The sub-query gets evaluated first and the results are distinct’ed and indexed,

2) The output from it is then joined with TABLE_A.

Re-writing the above query using the EXISTS clause will give:

Select <select column list> from TABLE_A

where exists (select 1 from Table_B where Table_B.col2 = Table_A.col1)

This gets evaluated in this order:

1) For every value of Table_A.col1, loop through and match the values in Table_B.col2.

2) If we get a match, select that value and move on to the next one. If there is no match, discard that value.

So, where should one use an IN vs the EXISTS clause? If the result of the sub-query “Select col2 from TABLE_B” is huge and the TABLE_A is a relatively small set and executing “select 1 from Table_B where Table_B.col2 = Table_A.col1” is very fast because of proper index on Table_B.col2, then an exists clause will be better since the optimizer can do a FTS on Table_A and then use the index to do the probe/seek operations for Table_B.

If the result of the sub-query is small, then the IN clause is much faster. If the results of the both the sub-query as well as the outer query is large, then either IN or EXISTS would work the same – it depends upon your indexing scheme.

Please do note that the example used above is a very simplistic one in order to illustrate the point – in real world, you would have queries that have additional filter criteria on those tables that narrows down the result sets. As a generic rule, if the result of the outer query is small and the result set of the inner sub-query is large, then use EXISTS – if it is the other way around, then use the IN clause.

One Response to “IN vs EXISTS”

  1. […] this query can be re-written to remove the FROM clause and use an EXISTS condition – read up here on some general guidelines for making use of the EXISTS condition – it is not a panacea for […]

Sorry, the comment form is closed at this time.

%d bloggers like this: