More on Index design
Posted by decipherinfosys on July 14, 2008
We have blogged a couple of times over the factors that should be considered when designing indexes. In one of such posts, we had also covered indexes over multiple columns. In all of those posts, you will see one common thing that we try to emphasize – do not try to come up with a “rules of thumb“. Here is a simple example (from one of the e-mails to a question to a reader);
Question: I am trying to see how to tune/optimize a query and I saw that an index on the where condition columns is missing. Should I just go ahead and create an index and add all the columns that are used in the where clause into that index?
The answer is of course: NO. First, we requested the reader to go ahead and read the post from above where we have discussed the column order in covered indexes and then also gave a simple example to help illustrate the point:
Say, you have this query:
select col1, col2, col3
where col1 >= @x and col1 <= @y
and col3 = @z
For this, we might want to create a covered index (in this order): col3, col2, col1. This is so that we can first of all process the data based on the equality operator on col3, then scan through and narrow down the data based on the >= and <= conditions on col1 and then get the value for col2 from the index itself rather than going through the table. Now, as we have always stated, understanding your data and it’s characteristics is very very important. Suppose that col3 is unique in this table! In that case, we would just create a single index on col3 and be done with it. Why? It will qualify for a single record and thus we will just do the row ID lookup on the table for that one single record.
So, bottom line is that please spend some time to understand the data characteristics – understand how indexes work, understand how the query is accessing the data and the data distribution – density and selectivity of the column data values as well as how often that particular index is going to be used – if an index is being created to support a report that runs only once every 6 months, is it worth to have it in the schema all year round or should we just create it prior to generating that report? Once you have that understanding, then only decide how you want to approach the design.