Systems Engineering and RDBMS

XML Indexes in SQL Server 2005

Posted by decipherinfosys on November 7, 2007

In our previous posts, we had covered some of the new features of XML in SQL Server 2005. You can search for XML on this site and get those posts. Since the XML data-type can hold up-to 2GB worth of data and since in the code might be querying portions of that XML document, it is important to be able to index the columns that are declared with the XML data type. There are two types of indexes that can be created on the XML data-type columns:

* Primary Indexes: These cover all the elements in the column.
* Secondary Indexes: These cover paths, values and properties.

The syntax is also very similar to the one that is used to create the clustered and non-clustered indexes in SQL Server. One key thing to note though is that you need to have a clustered primary key on the table on which you want to create the index on XML data-type column.

The primary index essentially stores information on :
a) each tag name in the XML,
b) the type and the value for that node
c) the relative path from the root of the document to that tag, and
d) the primary key for that table.

The secondary index can only be created once the primary XML index is in place and there are three types of secondary XML indexes:
a) PATH secondary XML index (used for queries that utilize XML path expressions),
b) VALUE secondary XML index (for searching for values within the XML document), and
c) PROPERTY secondary XML index (for retrieval of any specific object properties withint the XML document).

Let’s follow this up with an example:

/************************************************************************
Sample table with the Priamry Key and the XML data-type column
*************************************************************************/
CREATE TABLE TEST_XML_INDX (ID_VAL INT IDENTITY PRIMARY KEY, XML_DATA XML)
GO

/************************************************************************
Primary XML index on the XML data-type column
*************************************************************************/
CREATE PRIMARY XML INDEX TEST_XML_INDX_IND_1 on TEST_XML_INDX (XML_DATA)
GO

/************************************************************************
Secondary XML indexes (three types) on the XML data-type column
*************************************************************************/
CREATE XML INDEX TEST_XML_INDX_IND_2 ON TEST_XML_INDX (XML_DATA) USING XML INDEX TEST_XML_INDX_IND_1 FOR PATH;
GO

CREATE XML INDEX TEST_XML_INDX_IND_3 ON TEST_XML_INDX (XML_DATA) USING XML INDEX TEST_XML_INDX_IND_1 FOR VALUE;
GO

CREATE XML INDEX TEST_XML_INDX_IND_4 ON TEST_XML_INDX (XML_DATA) USING XML INDEX TEST_XML_INDX_IND_1 FOR PROPERTY;
GO

Now, let us insert into this column the same XML data that we had used in one of our posts before and see how we can access the data using the nodes() method. You can access the code for that XML data in this post:

https://decipherinfosys.wordpress.com/2007/09/02/using-nodes-method-to-de-construct-xml-as-relational-data/

You can use the script in the attachment:

blog_xml_1.doc

The output of that execution is:

Mtng_Notification    Share_Holder_ID      Votable_Shares                          Vote_Action
 -------------------- -------------------- --------------------------------------- -----------
 444                  987654321            100.00000                               FOR
 999                  45454545454          657.00000                               FOR
 999                  676767676767         654.00000                               FOR

The CROSS APPLY in the code above applies the nodes() method to each row in the XML_DATA xml data-type column and filters out the rows that do not match the filter criteria. So, what kind of an execution plan would it force in the presence of the indexes since that is what we are interested in finding out. You can run the above code with:

SET SHOWPLAN_TEXT ON
GO

and you will see in the execution plan that the PRIMARY XML index is being used for the data retrieval. A primary XML index is always a good idea when you are going to be writing queries to search against the XML columns that contain large amounts of data. Creation of secondary indexes though should be done only when you have a defined need of performing the PATH, VALUE or PROPERTY type of searches.

For more reading on this topic, we would encourage you to read the MSFT whitepaper on this topic that you can access over here.

One Response to “XML Indexes in SQL Server 2005”

  1. […] proper filter conditions in the WHERE clause on properly indexed columns and is also making use of XML indexes in order to ensure a good execution plan for the […]

Sorry, the comment form is closed at this time.

 
%d bloggers like this: