Data Compression in SQL Server 2008 – I
Posted by decipherinfosys on October 30, 2008
In one of our previous blog post, we had talked about backup compression feature in SQL Server 2008. In today’s post we are going to cover the data compression in SQL Server 2008. We can use data compression feature for these objects:
- Tables without any indexes.
- Tables with clustered and/or non-clustered indexes (includes the ones with partitioning).
- Non clustered indexes.
- Indexed views.
And the modes of compression are either row level or page level compression. The compression is handled completely on the database engine side so there is no changes to the application code. So, after enabling compression for an object, anytime that the data has to pass to and from the storage engine, it has to be compressed and uncompressed respectively. So, there is an extra CPU overhead involved but the amount of disk I/O saved by compression makes up for the CPU costs. So, let’s walk through an update statement and see what happens behind the scenes (assume that data compression has been done at the page level for the object):
update tableA set col1 = 2 where colx = 1; (colx is the Primary Key column)
In any update statement, a read is done before the write in order to identify the record that is being updated. So, the relational engine will make a request to the storage engine to retrieve that data record and the storage engine gets the record from the disk (assuming it is not in the buffer cache already) and puts it into the buffer cache – still in a compressed format. When the handover from the Storage Engine to the Relational Engine happens, that is when the uncompression happens. Now, having gotten the row that needs to be updated, the update is fired off and passed back to the storage engine and it again compresses it and keeps it in the buffer cache till the row is flushed to disk in the compressed format.
The obvious reasons for doing data compression are:
a) Disk space utilization improves.
b) Lesser disk I/O for read and write operations.
c) More data gets stored in the buffer cache since even in the buffer cache, it is compressed data.
Disadvantage is of course higher CPU load. Still have to do a real benchmark to see what the impact is.
In tomorrow’s post, we will cover the row level and page level data compressions and will also go through the data compression wizard.