Systems Engineering and RDBMS

Another use of the checksum functions

Posted by decipherinfosys on February 11, 2009

We have discussed the checksum functions before in our blog posts – you can access those posts here and here.  Yet another use of checksum functions is when you want to quickly compare two records to see whether anything has changed – this is a typical requirement at the time of data loading from a source to the destination system.  One has to match up the records based on the primary key or alternate key(s) and then ascertain which records have changed so that one can then update them.  If you are lucky, the source data that you are using might already have a flag (or status code value) to indicate whether a record changed or not and then proceed to fire off the update for that entire record.  However, at times you will not be getting such a flag option in which case you need to determine which records are for update and which are there for an insert.  Instead of checking one column at a time for the changed records in order to ascertain which attributes have changed, folks typically fire off the update (for all the matching records) to all the non-primary, non-alternate key attributes which is not only a waste of resources, it is not optimal.  The alternative is not pretty either – checking each column to see whether it has changed or not and then only fire off the update for those subset of records.

Beginning SQL Server 2008, one can make use of the MERGE command to do this work.  And prior to SQL Server 2008, one can make use of the checksum functions.  One can improve this process by just doing a checksum on the source and the destination rows.  So, join the source and the destination rows based on the primary key/alternate key(s) and then compare the checksum values – the rows where the checksum has changed are the ones that need to be updated.  Do remember that the regular checksum function is case insensitive and Binary_Checksum function is case sensitive.  There is an outside chance of the checksum functions returning the same value even when the source and the destination rows do not match but the chances of that happening for the matching primary key records is negligible.

Here is an example:

create table tblSource (col1 int primary key, col2 int, col3 nvarchar(100));
create table tblDestination (col4 int primary key, col5 int, col6 nvarchar(100));

insert into tblSource values (1, 10, ‘Row 1 is same’);
insert into tblDestination values (1, 10, ‘Row 1 is same’);

insert into tblSource values (2, 1, ‘Row 2 is different’);
insert into tblDestination values (2, 10, ‘Row 2’);

And here is the SQL to do the comparison:

select col1, col2, col3, col4, col5, col6, checksum(col1, col2, col3) as Source_Checksum, checksum(col4,col5, col6) as Destination_Checksum
from dbo.tblSource as Source
inner join dbo.tblDestination as Destination
on Source.col1 = Destination.col4
where checksum(col1, col2, col3) <> checksum(col4,col5, col6);

And you will get this output:

col1    col2    col3    col4    col5    col6    Source_Checksum    Destination_Checksum
-------------------------------------------------------------------------------------------------------
2    1    Row 2 is different    2    10    Row 2    1950188260    -1661133429

As you can see from above, the source and the destination checksums are different for the second row.  You can use this technique in order to quickly make an analysis of whether there has been a difference between the source and the destination row.  In addition, you can do this in SSIS as well – you can download the checksum transformation from sqlis.com.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: