Systems Engineering and RDBMS

Logical IOs and Query Plans

Posted by decipherinfosys on October 9, 2008

One of the ways to look for and tune queries is to reduce the logical IOs. You can tune the LIOs by re-writing your SQL, tuning your SQL using indexes or indexed views (materialized views in Oracle lingo) – if appropriate, setting parameter variables like in the case of Oracle: sort_xxx parameters, optimize_index_xxx parameters, db_file_multi_block_read_count parameter. Logical IO or LIO means the buffer cache … LIO operation might have incurred a physical IO in order to get the data into the cache to begin with. One can reduce the physical IO in a number of ways, one of them being increasing the buffer cache but does that really fix the problem? No – it can infact create issues. We will discuss those details on PIO and LIO in another post. In this post, let’s take a sample and go through the LIO piece of tuning. We will use SQL Server as our RDBMS for the sake of this example:

In SQL Server, if you set statistics IO on, you will get an IO count for each object that is accessed in the query. Let’s create a dummy table and insert a million records into it:

create table DIS_LIO (COL1 int, COL2 nvarchar(20));
create index DIS_LIO_IND_1 on DIS_LIO (col1, col2);

declare @i int
set @i=0

set nocount on
while (@i <= 1000000)
begin
set @i=@i+1;
insert into DIS_LIO values (@i,’Dummy_Value:’ + cast(@i as nvarchar(10)))
end

We can query the meta-data to see how many pages are being used by this table and it’s index:

select
indid,
dpages
from sys.sysindexes
where id=OBJECT_ID(‘DIS_LIO’)

indid dpages
—— ———–
0 8664
2 8903

Indid = 0 represents the table (heap) and a value of 2 represents the non-clustered index that we created.

Now, let’s try different permutations to query this data and see what we get as far as the LIO is concerned. The point being that depending upon how the query optimizer forms the plan and accesses the data, the LIO can vary.

First Query: We won’t use any index hint and will directly fire off this query:

select * from DIS_LIO where col1 <= 100;

Table ‘DIS_LIO’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

|–Index Seek(OBJECT:([DIS].[dbo].[DIS_LIO].[DIS_LIO_IND_1]), SEEK:([DIS].[dbo].[DIS_LIO].[COL1] <= CONVERT_IMPLICIT(int,[@1],0)) ORDERED FORWARD)

Second Query: Let’s force it to use an index now:

select * from DIS_LIO with (index = 0) where col1 <= 100;

Table ‘DIS_LIO’. Scan count 3, logical reads 8664, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

|–Parallelism(Gather Streams)
|–Table Scan(OBJECT:([DIS].[dbo].[DIS_LIO]), WHERE:([DIS].[dbo].[DIS_LIO].[COL1]<=(100))) |–Parallelism(Gather Streams)
|–Table Scan(OBJECT:([DIS].[dbo].[DIS_LIO]), WHERE:([DIS].[dbo].[DIS_LIO].[COL1]<=(100)))

Third Query: Let’s use the non-clustered index now:

select * from DIS_LIO with (index = 2) where col1 <= 100;

Table ‘DIS_LIO’. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

|–Index Seek(OBJECT:([DIS].[dbo].[DIS_LIO].[DIS_LIO_IND_1]), SEEK:([DIS].[dbo].[DIS_LIO].[COL1] <= (100)) ORDERED FORWARD)

So, what is the difference between the three? The plan generated by the first and third query is the same (I executed the first query later on so you would also see auto parameterization in place there). Both the queries are doing an index seek operation as compared to query 2 which is doing a table scan operation.
As you can see for Query #2, the logical reads are 8664 (the same number as we got from the sysindexes query) – so, in the case of the table scan, it counted each page as a single logical IO. In the case of the index seek operations, we see 4 logical IOs and 1 scan count (the single index seek). So, it did 4 LIOs per index seek – SQL Server has to first go to the root page of the index to be able to locate the right data page.

This was to help understand the concept – one can easily see that tuning for the LIOs in the above case will help in resolving the performance problem. But, and there always is a but, let’s consider this scenario:

drop table DIS_LIO;

create table DIS_LIO (COL1 int, COL2 nvarchar(20));
create index DIS_LIO_IND_1 on DIS_LIO (col1, col2);

declare @i int
set @i=0

set nocount on
while (@i <= 5)
begin
set @i=@i+1;
insert into DIS_LIO values (@i,’Dummy_Value:’ + cast(@i as nvarchar(10)))
end

And now, this query:

select
indid,
dpages
from sys.sysindexes
where id=OBJECT_ID(‘DIS_LIO’)

indid dpages
—— ———–
0 1
2 1

So, we have everything on a single page now. Let’s run the queries again:

First Query:

select * from DIS_LIO where col1 <= 5;

Table ‘DIS_LIO’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Second Query:

select * from DIS_LIO with (index = 0) where col1 <= 5;

Table ‘DIS_LIO’. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Third Query:

select * from DIS_LIO with (index = 2) where col1 <= 5;

Table ‘DIS_LIO’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

And now, if you see above, when doing a table scan, we are paying less LIOs – only 1 yet while we do the index seek operation, the LIOs are 2.  So, depending upon the data access method, the LIOs can be different.  If I change the above query to use the IN list iterator:

select * from DIS_LIO where col1 IN (1, 2, 3, 4, 5);

You will see:

Table ‘DIS_LIO’. Scan count 5, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So, now, 5 different seek operations happen because of the query plan that got generated (internally, it wil get implemented as multiple OR clauses).  And since it was 2 logical reads per seek, the total logical reads come out to be 10 in this case as compared to 1 if we had used a table scan.  So, be aware of these differences – depending upon the data access/optimzer path, the LIOs can be different.

Having said that, I still use LIO for tuning purposes all the time but do keep these subtle points in mind.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: