Systems Engineering and RDBMS

Archive for November 19th, 2007

Change Data Capture Feature in SQL Server 2008

Posted by decipherinfosys on November 19, 2007

We have been blogging for quite some time now on the new feature sets of the upcoming release of SQL Server – SQL Server 2008. You can search for SQL Server 2008 on this blog and get to those posts. In this post, we wil cover another new feature called “Change Data Capture”. Using this feature, one can capture the DML activities (insert, update and delete) that take place against the tables. There are two steps that need to happen in order for this to take place: A login that has sysadmin rights first needs to enabled the database for change data capture and then a member of the db_owner database role needs to enable the tables for which the DML activities need to be captured.

Let’s first enable the database and then a table for demonstrating this and then we will go over how the capture process works internally.

On our instance, currently none of the databases is configured for CDC (Change Data Capture). You can see that from the IS_CDC_ENABLED column of SYS.DATABASES:


 -------------- -------------------
 0              master
 0              tempdb
 0              model
 0              msdb
 0              MDW
 0              ReportServer
 0              ReportServerTempDB
 0              AdventureWorks
 0              DECIPHER_TEST

Now, in order to enable it for the database DECIPHER_TEST, we will have to run the stored procedure: “sys.sp_cdc_enable_db_change_data_capture” in the context of DECIPHER_TEST database:

EXEC sys.sp_cdc_enable_db_change_data_capture

Now, if we execute the first query, we will see that DECIPHER_TEST has been enabled for CDC:

 -------------- -------------------
 0              master
 0              tempdb
 0              model
 0              msdb
 0              MDW
 0              ReportServer
 0              ReportServerTempDB
 0              AdventureWorks
 1              DECIPHER_TEST

So, what exactly did SQL Server do when we enabled the CDC option? It went ahead and created the cdc schema, the cdc user, the meta-data tables and other system objects that will be used for this process. You can look at the image below to see some of those objects that were created.


Now that we have enabled the database, we will enable the table TEST_TBL in the dbo schema for the CDC process. In order to do so, a member of the db_owner fixed database role needs to create a capture instance for the TEST_TBL table. We need to first ensure that the SQL Server Agent is running and then use the “sys.sp_cdc_enable_table_change_data_capture” procedure:

EXECUTE sys.sp_cdc_enable_table_change_data_capture
@source_schema = N’dbo’
, @source_name = N’TEST_TBL’
, @role_name = N’cdc_Admin’

This execution will then create two jobs that you will see in the informational messages as part of this execution:

Job ‘cdc.DECIPHER_TEST_capture’ started successfully.
Job ‘cdc.DECIPHER_TEST_cleanup’ started successfully.

In addition, if you want to track which tables have been marked for the CDC process, you can use this SQL:

select is_tracked_by_cdc, name, type from sys.tables where type = ‘u’

The ones marked with 1 are the ones that are tracked.

is_tracked_by_cdc name                                                                                                                             type
 ----------------- -------------------------------------------------------------------------------------------------------------------------------- ----
 0                 ddl_history                                                                                                                      U
 0                 lsn_time_mapping                                                                                                                 U
 0                 captured_columns                                                                                                                 U
 0                 index_columns                                                                                                                    U
 0                 dbo_TEST_TBL_CT                                                                                                                  U
 1                 TEST_TBL                                                                                                                         U
 0                 systranschemas                                                                                                                   U
 0                 change_tables                                                                                                                    U

By creating a capture instance, a change table is also created by this process. There are a couple of options that can be utilized when specifying the capture process for a table:

a) You can specify which columns you want to capture,
b) You can specify an index to help uniquely identify the rows in the change table,
c) Name of a fileroup in which that change table should be created,
d) There is a concept of net change tracking with CDC. By default, only one UDF is created to access the data in the change table: cdc.fn_cdc_get_all_changes_<name of the capture instance>. In our example from above, the name of the UDF is cdc.fn_cdc_get_all_changes_dbo_TEST_TBL. You can also use the parameter @supports_net_changes and assign it a value of 1 to create another UDF which will return only one change for each row that was changed in a given interval.

So, essentially when the DML operations are made against the source tables, these are then recorded into the transaction log of the database.  The CDC process that we just set up then reads the log records and inserts modifications made to captured columns in the corresponding change table(s) in the CDC schema.  Now, let us follow it up with an example of DML operations against the TEST_TBL table. If you recall from a previous posts in which we had covered some other T-SQL features of SQL 2008 (here and here), this table has three columns: COL1, COL2 and COL3 and this set of data:


COL1        COL2       COL3
 ----------- ---------- -----------
 1           A          10
 1           A          20
 1           A          30
 1           B          90
 2           A          30
 2           A          100
 3           C          110
 3           C          120

Now, we will execute these DML statements against this table:

delete from TEST_TBL;
insert into TEST_TBL values (1, ‘A’, 1000);

Update dbo.TEST_TBL
set COL2 = ‘X’
where col1 = 1;

And this should have created the log enteries into the table in the CDC schema. Let’s query that using the UDF that was created for us when we had set up the CDC process:

DECLARE @start_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10);
SET @start_time = GETDATE()-1
SET @end_time = GETDATE();
SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn(‘smallest greater than or equal’, @start_time);
SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn(‘largest less than or equal’, @end_time);

Show the changes that happened during that query window
SELECT __$start_lsn, __$seqval,
case __$operation
when 1 then ‘Delete’
when 2 then ‘Insert’
when 3 then ‘Update (Before)’
when 4 then ‘Update (After)’
end as Operation,
Col1, Col2, Col3
FROM cdc.fn_cdc_get_all_changes_dbo_test_tbl (@from_lsn, @to_lsn, ‘all update old’)
order by 1, 2;

__$start_lsn           __$seqval              Operation       Col1        Col2       Col3
 ---------------------- ---------------------- --------------- ----------- ---------- -----------
 0x00000019000001CF000B 0x00000019000001CF0002 Delete          1           A          10
 0x00000019000001CF000B 0x00000019000001CF0003 Delete          1           A          20
 0x00000019000001CF000B 0x00000019000001CF0004 Delete          1           A          30
 0x00000019000001CF000B 0x00000019000001CF0005 Delete          1           B          90
 0x00000019000001CF000B 0x00000019000001CF0006 Delete          2           A          30
 0x00000019000001CF000B 0x00000019000001CF0007 Delete          2           A          100
 0x00000019000001CF000B 0x00000019000001CF0008 Delete          3           C          110
 0x00000019000001CF000B 0x00000019000001CF0009 Delete          3           C          120
 0x00000019000001D20004 0x00000019000001D20002 Insert          1           A          1000
 0x00000019000001D30006 0x00000019000001D30002 Update (Before) 1           A          1000
 0x00000019000001D30006 0x00000019000001D30002 Update (After)  1           X          1000

You can imagine the benefits of the CDC feature – fine grained auditing, enhanced functionality for an ETL application that needs to incrementally populate a data mart. There are of course a lot of other things that need to be discussed pertaining CDC – what happens when the source table gets modified by a DDL command (change of data-type, addition/deletion of a column etc.), the overhead of having a CDC process – is it any different than using a DML trigger (the answer is Yes – we will cover it in a future blog post).

Posted in SQL Server | 2 Comments »