Data Compression in SQL Server 2008 – II
Posted by decipherinfosys on October 31, 2008
Yesterday, we had given a brief summary of the data compression feature available in SQL Server 2008. Today, we will cover the row level and page level data compression and also the wizard to set it up for an object.
Row Level Data Compression:
As you know, the page size in the case of SQL Server is fixed (Oracle does allow us options to select from a range of block sizes) so if you have larger sized rows, it means that you will have fewer records per page which means that if there is a query that needs to do a range search, it has to go over a large number of pages – which means more I/O. A classic example typically given to help explain this is that of a cookie jar. If you have a cookie jar and have larger sized cookies, you can fit in say 10 of them in that jar whereas if you had smaller sized cookies, you can fill in say 100 of them in the same jar. So what does row level data compression do? It turns the fixed length data types for the columns (example int which normally takes 4 bytes for storage) into variable length data types (say we store a value of say 1, then it will use only 1 byte instead of 4) and thus frees up the empty space. It can also ignore the null values as well. This in turn allows us to save more rows per page. It reduces the amount of meta-data required to store a row, does the same thing for other fixed data types like say CHAR(10) would take 10 bytes normally – in this case if we are storing only 3 characters, it will take only 3 bytes and the blank characters are not stored.
Page Level Data Compression:
Page level compression does what row level data compression does plus does dictionary compression and prefix compression. So, the row level compression is done first followed by prefix and then the dictionary compression – what prefix compression does is that the repeating patterns of the data at the start of a column values are removed and are substituted with an abbreviated reference which is stored in a CI (Compression Information) Structure which is after the page header in the page. Once the prefix compression is done, the dictionary compression kicks in and it also searches for the repeated values but not just in a column values – it searches for those anywhere on the page and same way stores them in the CI. So, if the data is more random, then the page level compression won’t help much – if the data is repetitive then the page level compression is more useful.
So, enough of theory – let’s see it in action now: We will use SSMS for displaying the compression. It can also be done via T-SQL. Let’s create a table:
create table dbo.big_table (col1 int, col2 char(100));
create clustered index big_table_ind_1 on dbo.big_table (col1);
Now, let’s populate it with some sample data:
set nocount on
declare @i int
set @i = 1
while (@i <= 100000)
insert into dbo.big_table values (@i, ‘value:’ + cast(@i as varchar(10)))
set @i = @i + 1
In SSMS, Object Explorer, right click on the table, select storage and then Manage Compression:
It will bring up the data compression wizard as shown below:
You can then click on next and choose to use the same compression type (row/page or none) for all the partitions:
It also allows us to see how much space we will save based on different compression schemes. So, let’s try that out first – let’s select row level compression and hit on calculate to see what we will be able to save:
As you can see, we saved from 11.070 MB to 2.422 MB. And now, let’s try it with the page level compression:
It has now dropped down to 1.25 MB. Let’s go with this one and click next. We will get a screen that looks like the one below:
We have the option of running the compression at that time or saving it as a script or scheduling it for a later time. If you select to run immediately, it will present you the summary and then clicking on Finish, you will be able to compress it.
Now, one thing to remember is that as we mentioned yesterday, the compression is at the object level. Since a table with clustered index is treated as one and the same thing, if you have a non clustered index on this big_table, you would need to compress it separately. A bit cumbersome, you might think at first but this allows us a lot of flexibility – especially if you benefit more by compressing only certain objects in your environment and leaving the rest as is because either there is not much benefit to it or the CPU load is not acceptable.