Systems Engineering and RDBMS

Data Compression in SQL Server 2008 – II

Posted by decipherinfosys on October 31, 2008

Yesterday, we had given a brief summary of the data compression feature available in SQL Server 2008. Today, we will cover the row level and page level data compression and also the wizard to set it up for an object.

Row Level Data Compression:

As you know, the page size in the case of SQL Server is fixed (Oracle does allow us options to select from a range of block sizes) so if you have larger sized rows, it means that you will have fewer records per page which means that if there is a query that needs to do a range search, it has to go over a large number of pages – which means more I/O. A classic example typically given to help explain this is that of a cookie jar. If you have a cookie jar and have larger sized cookies, you can fit in say 10 of them in that jar whereas if you had smaller sized cookies, you can fill in say 100 of them in the same jar. So what does row level data compression do? It turns the fixed length data types for the columns (example int which normally takes 4 bytes for storage) into variable length data types (say we store a value of say 1, then it will use only 1 byte instead of 4) and thus frees up the empty space. It can also ignore the null values as well. This in turn allows us to save more rows per page. It reduces the amount of meta-data required to store a row, does the same thing for other fixed data types like say CHAR(10) would take 10 bytes normally – in this case if we are storing only 3 characters, it will take only 3 bytes and the blank characters are not stored.

Page Level Data Compression:

Page level compression does what row level data compression does plus does dictionary compression and prefix compression. So, the row level compression is done first followed by prefix and then the dictionary compression – what prefix compression does is that the repeating patterns of the data at the start of a column values are removed and are substituted with an abbreviated reference which is stored in a CI (Compression Information) Structure which is after the page header in the page. Once the prefix compression is done, the dictionary compression kicks in and it also searches for the repeated values but not just in a column values – it searches for those anywhere on the page and same way stores them in the CI. So, if the data is more random, then the page level compression won’t help much – if the data is repetitive then the page level compression is more useful.

So, enough of theory – let’s see it in action now: We will use SSMS for displaying the compression. It can also be done via T-SQL. Let’s create a table:

use dis_test
go
create table dbo.big_table (col1 int, col2 char(100));
create clustered index big_table_ind_1 on dbo.big_table (col1);

Now, let’s populate it with some sample data:

set nocount on
go
declare @i int
set @i = 1
while (@i <= 100000)
begin
insert into dbo.big_table values (@i, ‘value:’ + cast(@i as varchar(10)))
set @i = @i + 1
end

In SSMS, Object Explorer, right click on the table, select storage and then Manage Compression:

It will bring up the data compression wizard as shown below:

You can then click on next and choose to use the same compression type (row/page or none) for all the partitions:

It also allows us to see how much space we will save based on different compression schemes. So, let’s try that out first – let’s select row level compression and hit on calculate to see what we will be able to save:

As you can see, we saved from 11.070 MB to 2.422 MB. And now, let’s try it with the page level compression:

It has now dropped down to 1.25 MB. Let’s go with this one and click next. We will get a screen that looks like the one below:

data_c_11

We have the option of running the compression at that time or saving it as a script or scheduling it for a later time. If you select to run immediately, it will present you the summary and then clicking on Finish, you will be able to compress it.

Now, one thing to remember is that as we mentioned yesterday, the compression is at the object level. Since a table with clustered index is treated as one and the same thing, if you have a non clustered index on this big_table, you would need to compress it separately. A bit cumbersome, you might think at first but this allows us a lot of flexibility – especially if you benefit more by compressing only certain objects in your environment and leaving the rest as is because either there is not much benefit to it or the CPU load is not acceptable.

3 Responses to “Data Compression in SQL Server 2008 – II”

  1. kobi barak said

    Hello .
    My name is KOBI Barak, I am an electronics engineer and a pilot
    A year and a half ago I developed an electronic unit that requires a unique compression..
    I decided to try to develop compression algorithm and solve the problem by myself. About a 2 month ago I finish to develop a compression algorithm that able to compress in rate of thousands percent,.
    It can be used in audio and video lossless compression applications, archiving and more.
    My algorithm is able to shrink any file to the size of 4K (in price of time) (1 GB takes 15 mints).
    It sound a little imaginary, but it is not it is real End working.
    I will be happy to represent what I doing in the subject and I know that it can use for many applications.
    Sincerely yours.
    kobi Barak.
    Jbarak@actcom.co.il.

  2. […] issue similar to what we ran into last week.  At one of our client sites, they were using CDC and data compression features in SQL Server 2008.  When they tried to restore the database in their development […]

  3. […] compression feature in SQL Server 2008 in the first two parts of this series – Part 1 and Part 2.  We had also covered backup compression in one of the posts – here.  We have received […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: