Systems Engineering and RDBMS

Archive for December, 2007

Sorting IP Addresses

Posted by decipherinfosys on December 28, 2007

This is a simple but interesting issue. One of the IT guys at our client sites had asked the client DBA to provide him with the name of the servers, their domain and other bits of information sorted by the IP addresses. Now, IP addresses are represented in dotted decimal notation i.e. four numbers, each ranging from 0 to 255 and separated by dots. Each range from 0 to 255 can be represented by 8 bits and is thus called an octet. Some first octet values like 127 have special meaning – 127 represents the local computer. Octets 0 and 255 are not acceptable values in some situations. 0 can however be used as the second and third octet.

So, as you can imagine that unless we are storing the data in a sortable friendly way, sorting of this data would require some string manipulation. Let’s follow this up with an example:

CREATE TABLE IP_ADDR (COL1 NVARCHAR(30));
INSERT INTO IP_ADDR VALUES ( ‘30.33.33.30’ );
INSERT INTO IP_ADDR VALUES ( ‘256.10.1.2’ );
INSERT INTO IP_ADDR VALUES ( ‘256.255.10.2’ );
INSERT INTO IP_ADDR VALUES ( ‘127.0.0.1’ );
INSERT INTO IP_ADDR VALUES ( ‘132.22.33.44’ );
INSERT INTO IP_ADDR VALUES ( ‘132.10.30.1’ );
INSERT INTO IP_ADDR VALUES ( ‘132.1.1.132’ );
INSERT INTO IP_ADDR VALUES ( ‘10.20.30.10’ );
Now, if we order by COL1, then we will get:

SELECT * FROM IP_ADDR ORDER BY COL1;

COL1
——————————
10.20.30.10
127.0.0.1
132.1.1.132
132.10.30.1
132.22.33.44
256.10.1.2
256.255.10.2
30.33.33.30

As you can see from above, 30.33.33.30 comes last though we should expect it after 10.20.30.10. And likewise, in some other cases as well. Now, if we use the SUBSTRING (SQL Server) – SUBSTR in Oracle function and make use of the CHARINDEX (SQL Server) or INSTR (Oracle) or LOCATE (DB2 LUW), we can easily do these manipulations. Let us take SQL Server as an example – the same methodology would apply to Oracle and DB2 as well but do keep in mind the differences between CHARINDEX(), INSTR() and LOCATE() – we had discussed these before in some of our blog posts.

SELECT 
SUBSTRING(COL1, 1, CHARINDEX('.', COL1) - 1) AS FIRST_OCTET, 
CAST(SUBSTRING(COL1, CHARINDEX('.', COL1)+1, CHARINDEX('.', SUBSTRING(COL1, CHARINDEX('.', COL1)+1, LEN(COL1))) - 1) AS INT) AS SECOND_OCTET,
*
    FROM IP_ADDR 
        ORDER BY 
            CAST(SUBSTRING(COL1, 1, CHARINDEX('.', COL1) - 1) AS INT),
            CAST(SUBSTRING(COL1, CHARINDEX('.', COL1)+1, CHARINDEX('.', SUBSTRING(COL1, CHARINDEX('.', COL1)+1, LEN(COL1))) - 1) AS INT)

The SQL above is used to show how to use the CHARINDEX() and the SUBSTRING() function to separate out the first and the second octets…you can do the same with the third and the fourth one as well and then order by on those in the same order i.e. the first_octet first (after converting it to an integer), then the second_octet and so on.  Here are the results from the execution from above:

FIRST_OCTET                    SECOND_OCTET COL1
------------------------------ ------------ ------------------------------
10                             20           10.20.30.10
30                             33           30.33.33.30
127                            0            127.0.0.1
132                            1            132.1.1.132
132                            10           132.10.30.1
132                            22           132.22.33.44
256                            10           256.10.1.2
256                            255          256.255.10.2

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

TableDiff.exe utility in SQL Server 2005

Posted by decipherinfosys on December 27, 2007

We have talked about the command line utilities in SQL Server 2005 and have covered some of them like SQLCMD in some detail before. In this post, we will go over the tablediff.exe utility. This utility is typically used to troubleshoot data mis-matches in two tables in a replication scenario where the source table resides in an instance that acts as a Publisher and the destination table in an instance that acts as a subscriber. It can also be used to perform column level comparisons and to generate a T-SQL script to fix those discrepancies. It also has an option to do a fast comparison of the schema and the row counts in the tables. Below is a snapshot displaying all the different options of this utility and the location of this exe in the default installation of SQL Server 2005:

table_diff_1.jpg

Let’s follow this up with a simple example to illustrate the capabilities of this utility. It takes in two sets of input:

1) Connection Information for the Source and the Destination locations.

2) Comparison Options: a) Comparison of Schemas: Regular or Strict, b) Comparison using Rowcounts or Column Comparisons, and c) T-SQL script generation to sync. up the destination table data with the source.

Let us create a table with the same structure in two different databases – for our demo sake, we are creating these on the same instance:

USE MAINDB
GO
CREATE TABLE TABLEDIFF_TEST (COL1 INT NOT NULL PRIMARY KEY, COL2 VARCHAR(10) NULL)
GO
INSERT INTO TABLEDIFF_TEST (COL1, COL2) VALUES (1, ‘ABCD’);

And then in another database, we will create the same table but with different data set:

USE MISYS
GO
CREATE TABLE TABLEDIFF_TEST (COL1 INT NOT NULL PRIMARY KEY, COL2 VARCHAR(10) NULL)
GO
INSERT INTO TABLEDIFF_TEST (COL1) VALUES (1);
INSERT INTO TABLEDIFF_TEST (COL1) VALUES (2);
INSERT INTO TABLEDIFF_TEST (COL1) VALUES (3);

And now, we can run the tablediff.exe utility from the command line to generate the differences and output it to an external file (we can also choose to write to a table):

table_diff_2.jpg

And here is the output of that command:

Table [MISYS].[dbo].[TABLEDIFF_TEST] on (local) and Table [MAINDB].[dbo].[TABLEDIFF_TEST] on (local) have 3 differences.
Err COL1 Col
Mismatch 1 COL2
Src. Only 2
Src. Only 3
The requested operation took 0.21875 seconds.

It is denoting that where the value of COL1=1, there was a mis-match in the data value of COL2 and the records with values of COL1=2 and COL1=3 were present only in the source and not the destination. Let us generate the DML statements through this utility to get the data in sync. in the destination database (this can be done by using the “-f” option):

Table [MISYS].[dbo].[TABLEDIFF_TEST] on (local) and Table [MAINDB].[dbo].[TABLEDIFF_TEST] on (local) have 3 differences.
Fix SQL written to DIFFIX.633343501980863750.sql.
Err COL1 Col
Mismatch 1 COL2
Src. Only 2
Src. Only 3
The requested operation took 0.21875 seconds.

This time when we ran the command, it created a separate SQL file for us which has these DML statements (the name of the file is indicated above and it’s location is the same location as that of the tablediff.exe utility. Here are the contents of that file:

— Host: (local)
— Database: [MAINDB]
— Table: [dbo].[TABLEDIFF_TEST]
UPDATE [dbo].[TABLEDIFF_TEST] SET [COL2]=NULL WHERE [COL1] = 1
INSERT INTO [dbo].[TABLEDIFF_TEST] ([COL1],[COL2]) VALUES (2,NULL)
INSERT INTO [dbo].[TABLEDIFF_TEST] ([COL1],[COL2]) VALUES (3,NULL)

This utility by itself would be pretty cumbersome to use but you can combine this along with a simple script file to loop through the list of the tables and then do a difference report between source and destination and also generate all the DML statements to get the data in sync.. As stated before as well, this is typically used for replication but can also be used for data and schema comparisons and sync. up.

We covered it so that you are aware that something like this exists natively in SQL Server and can be used for troubleshooting issues. Though if you are really looking for an easy way to address those issues, you should look at Red-Gate’s tools (SQL Compare and SQL Data-Compare).

Posted in SQL Server | Leave a Comment »

Black box trace in SQL Server 2000 and Default Trace in 2005

Posted by decipherinfosys on December 26, 2007

SQL Server 2000 used to have a black box trace which could be used to diagnose and troubleshoot issues with SQL Server. The name probably stems from the flight data recording black box. It was typically used to diagnose AV (Access Violations) or crashes though it could be used for tuning purposes as well like capturing the trace information and feeding it to the index tuning wizard in SQL Server 2000. This used to be one of the things that MSFT PSS always used to ask for in order to troubleshoot crashes. A SQL Server 2000 black box trace used to capture the following events:

1) Stored Procedure Execution (RPC:Starting),

2) T-SQL Batch Execution (SQL:BatchStarting), and

3) Errors and Warnings (Attention and Exception)

And for those events, the information that was captured was:

1) Date and Time of execution,

2) The name of the application, the name of the database, the name of the server/workstation from where the call originated and the name of the user that executed the query/procedure

3) The query and the error that occurred.

This trace file was written to the data directory i.e. the “X:\Program Files\Microsoft SQL Server\MSSQL\data” directory where X would be your logical drive where you have installed SQL Server. The file that used to be created was blackbox.trc and it was written to in 128KB chunks and it rolls over to a new file after 5MB limit. This black box trace was started by using the system stored procedure: sp_trace_create. The trace type needed to be set to the trace status of 8 to indicate that it is a black box trace. Example:

declare @ID int
exec sp_trace_create @ID output, 8
exec sp_trace_setstatus @ID, 1

This was a very useful tool in SQL Server 2000. This still works in SQL Server 2005. However, now there is also a default trace. This does not have much of an impact on the instance but you should be aware of such a default trace being collected on your behalf 🙂

In order to see this trace, you can read the data from the function: fn_trace_get_info() and pass in the value of default to it.

SELECT * FROM fn_trace_getinfo(default);

On my instance, I get:

traceid     property    value
----------- ----------- --------------------------------------------------------------------------
1           1           2
1           2           C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\log_15.trc
1           3           20
1           4           NULL
1           5           1
2           1           8
2           2           \\?\C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\blackbox.trc
2           3           5
2           4           NULL
2           5           0

And this is because, I am running a black box trace as well as the default trace is also running on the instance. The black box trace is the one with traceid value of 2 and the default trace is the one with the traceid value of 1. The 5enumerated values in the property column represent: trace options, file name of the trace, maximum size of the trace file, stop time for the session and the current status of the trace – a value of 0 indicates off and 1 indicates on. As you can see from above, I have stopped the black box trace but the default trace is going on.

Unlike the black box trace, this trace file that gets generated by the default trace gets stored in the LOG directory:

X:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG

You can also check for this configuration using sp_configure: When you run sp_configure (after enabling the “show advanced options”), you can see this in the parameter: “default trace enabled”:

EXEC master.dbo.sp_configure ‘allow updates’, 1;
GO
EXEC master.dbo.sp_configure ‘show advanced options’, 1;
GO
RECONFIGURE WITH OVERRIDE;
GO
sp_configure ‘default trace enabled’;
GO

Configuration option ‘allow updates’ changed from 1 to 1. Run the RECONFIGURE statement to install.
Configuration option ‘show advanced options’ changed from 1 to 1. Run the RECONFIGURE statement to install.

name minimum maximum config_value run_value

———————————– ———– ———– ———— ———–

default trace enabled 0 1 1 1

So, what does this default trace capture? Best way to look at all the events and the data that gets captured is to go to the LOG directory and open it up in profiler. Then, after that click on the properties tab to see what the trace captures. See the jpg image below for such a snapshot:

dt_1.jpg

Another question that would come to mind after reading this is whether one should disable this trace? especially in production environment(s). That depends upon your environment. We would recommend that you take a look at the default trace events and columns that get captured and then keeping your production environment in mind, make a decision. Do make a point to clear out the older trace files if you do keep the trace running.

Posted in SQL Server | 2 Comments »

RML Utilities for SQL Server

Posted by decipherinfosys on December 23, 2007

The RML utilities are available from Microsoft for both x86 as well as x64. They can be used to process the trace files, view reports related to the performance of the SQL Server instances and do a baseline performance metrics for comparison of how it was working a day before versus how it is performing now.

You can download these utilities from the MSFT site:

x86: http://www.microsoft.com/downloads/details.aspx?FamilyID=7edfa95a-a32f-440f-a3a8-5160c8dbe926&DisplayLang=en

x64: http://www.microsoft.com/downloads/details.aspx?familyid=B60CDFA3-732E-4347-9C06-2D1F1F84C342&displaylang=en

Posted in SQL Server | Leave a Comment »

Some common mis-conceptions about DISTINCT

Posted by decipherinfosys on December 22, 2007

We have seen this so many times at client sites that we decided to write up about it. One common mistake all the time is when people try to treat DISTINCT clause as a function. They try something like:

SELECT DISTINCT (COL1), COL2
FROM DISTINCT_TEST;

and expect that they will get only distinct values from COL1 and all the values from COL2. Let’s take a step back and think about it for a second. How is that even possible? The clause like DISTINCT operate on an entire row – the entire record and not only on a column. So, in the above query, even though it will not give you an error, it will give you all the results. Let’s follow that up with an example before we delve into this further:

CREATE TABLE DISTINCT_TEST (COL1 NVARCHAR(10), COL2 INT);
INSERT INTO DISTINCT_TEST VALUES (N’ABC’, 10);
INSERT INTO DISTINCT_TEST VALUES (N’ABC’, 20);
INSERT INTO DISTINCT_TEST VALUES (N’XYZ’, 30);
INSERT INTO DISTINCT_TEST VALUES (N’XYZ’, 40);
INSERT INTO DISTINCT_TEST VALUES (N’ABC’, 50);
GO

SELECT DISTINCT (COL1), COL2
FROM DISTINCT_TEST;

COL1 COL2
———- ———–
ABC 10
ABC 20
ABC 50
XYZ 30
XYZ 40

So, as you can see from above, DISTINCT is NOT a function, it is a clause that applies to the entire record. So:

SELECT DISTINCT (COL1), COL2
FROM DISTINCT_TEST;

SELECT DISTINCT (COL1), (COL2)
FROM DISTINCT_TEST;

SELECT COL1, COL2
FROM DISTINCT_TEST;

are all one and the same thing. We have even seen people try:

SELECT DISTINCT(COL1), MAX(COL2)
FROM DISTINCT_TEST;

thinking that they would be getting only one UNIQUE value from COL1 and for that value, get the MAX value from COL2. Since DISTINCT is not a function, the above SQL will give an error. You need to re-write the above SQL with a GROUP by CLAUSE and GROUP BY on COL1:

SELECT COL1, MAX(COL2) AS MAX_VAL
FROM DISTINCT_TEST
GROUP BY COL1;

COL1 MAX_VAL
———- ———–
ABC 50
XYZ 40

Another area of confusion that we have seen is when people try to order things by a column that is not included in the “SELECT DISTINCT …” list. Example:

SELECT DISTINCT COL1
FROM DISTINCT_TEST
ORDER BY COL2;

You will get the error:

Msg 145, Level 15, State 1, Line 1
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

The above SQL seems to be perfectly fine since even if we order by COL2, we should be getting 2 records: ABC and XYZ as the output. So, why does SQL Server force us to include the ordered columns in the SELECT list as well? If you think about the data for a second, the second column has all distinct values. Since we are ordering by that column and trying to get a distinct value from COL1 only, which one record would the optimizer pick for ABC? The one with a value of 10, 20 or 50 from COL2? What if there was say another column added to this table:

ALTER TABLE DISTINCT_TEST ADD COL3 NVARCHAR(30);

UPDATE DISTINCT_TEST SET COL3 = ‘TEST DATA 1’ WHERE COL1 = ‘ABC’ AND COL2 = 10;
UPDATE DISTINCT_TEST SET COL3 = ‘TEST DATA 2’ WHERE COL1 = ‘ABC’ AND COL2 = 20;
UPDATE DISTINCT_TEST SET COL3 = ‘TEST DATA 2’ WHERE COL1 = ‘ABC’ AND COL2 = 50;
UPDATE DISTINCT_TEST SET COL3 = ‘TEST DATA 3’ WHERE COL1 = ‘XYZ’ AND COL2 = 30;

So, the data now looks like:

COL1 COL2 COL3
———- ———– ——————————
ABC 10 TEST DATA 1
ABC 20 TEST DATA 2
XYZ 30 TEST DATA 3
XYZ 40 NULL
ABC 50 TEST DATA 2

And the query could be:

SELECT DISTINCT COL1, COL3
FROM DISTINCT_TEST
ORDER BY COL2;

In this case, which order the optimizer would have to pick to give a DISTINCT set of COL1 and COL2? There is no way for it to ascertain that and provide the record with COL2 = 20 first instead of the one that has COL2 = 50. So, the bottom-line is that the column that we want to order by should also be in the selection list so that we can determine the DISTINCT set of values. So, we will change the above SQL to be:

SELECT DISTINCT COL1, COL3
FROM DISTINCT_TEST
ORDER BY COL1;

COL1 COL3
———- ——————————
ABC TEST DATA 1
ABC TEST DATA 2
XYZ NULL
XYZ TEST DATA 3

Posted in DB2 LUW, Oracle, SQL Server | 3 Comments »

Cumulative Update Package 5 for SQL Server 2005 SP2 has been released

Posted by decipherinfosys on December 21, 2007

You can get the cumulative update from this link:

http://support.microsoft.com/kb/943656/

It also has a consolidated list of all the KB articles related to the bug fixes that have been made in this cumulative update.

Posted in SQL Server | Leave a Comment »

Difference between Unique Constraint and Unique Index

Posted by decipherinfosys on December 21, 2007

This has been brought up a couple of times at the client sites so we decided to put some comments on this topic. A unique constraint gets implemented using a unique index so from that perspective there is no difference between the two. The syntax is different for the two.

However, one thing that you need to be aware of is that the index creation options (other than some of them like the FILLFACTOR (SQL Server)) that you have at your disposal when enforcing this using a unique index are not available to you when you use the unique constraint syntax. One can always change it later on if one so wishes – our personal preference is to use the “CREATE INDEX” statement to enforce this.

If you want to read up on the difference between a Primary Key and a Unique Index, you can read it in one of our previous blog post here. And here’s another post detailing how one can handle multiple NULL values in a unique index in SQL Server and DB2 LUW.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

FORCESEEK Table Hint in SQL Server 2008

Posted by decipherinfosys on December 20, 2007

In prior versions of SQL Server, there was a way to force the optimizer to consider an index in the execution plan – however, there was no way to always force it to do a seek operation. That has now been added in the next release (tested against CTP5). A point of caution here – don’t be trigger happy and start using the hints. In a majority of the scenarios, the optimizer makes the right decision for the execution plan – you just need to make sure that it has statistics available to form the histograms correctly for the data distribution and it will do it’s job. However, there are always corner cases and that is why SQL Server (as well as Oracle and DB2 LUW) have table level, Query level and Join related hints available. The location of the hints and their syntax and usage is of course different in the three RDBMS. We have discussed some of these hints before like the ORDERED hint in Oracle (same as OPTION(FORCE ORDER) in SQL Server).

The FORCESEEK hint is a table level hint. It can also be applied to views and applies to both the clustered as well as the non-clustered indexes. It can also be applied in conjunction with the index hint i.e. you can force the optimizer to use a specific index as well as force it to use a seek operation on it. Let’s create an artificial situation here to demonstrate the usage. We will create a table with a million records in it which has a column that has only 3 distinct values. The data will be heavily skewed for this column. Yes – we are trying to re-create the bind variable peeking (Oracle) or parameter sniffing (SQL Server) scenario to help demonstrate the usage of this new hint. We had covered that in some of our posts before – you can read them here and here.

Here goes:

SET NOCOUNT ON
GO

CREATE TABLE BIG_TABLE
(
COL1 INT IDENTITY NOT NULL,
COL2 TINYINT NOT NULL
)
GO

DECLARE @I INT
SET @I = 1
WHILE (@I <= 1000000)
BEGIN

IF (@I <=10)
INSERT INTO BIG_TABLE (COL2) VALUES (0)

IF (@I > 10 AND @I <= 100000)
INSERT INTO BIG_TABLE (COL2) VALUES (10)

IF (@I > 100000 AND @I <= 1000000)
INSERT INTO BIG_TABLE (COL2) VALUES (90)

SET @I = @I + 1
END
GO

Here is the data distribution that was created by this small script:

SELECT COL2, COUNT(*) AS REC_CNT
FROM BIG_TABLE
GROUP BY COL2
ORDER BY COL2;

COL2 REC_CNT
 ---- -----------
 0    10
 10   99990
 90   900000

Now, let us create a non-clustered index on it:

CREATE NONCLUSTERED INDEX BIG_TABLE_IND_1 ON BIG_TABLE (COL2)
/*FILEGROUP CLAUSE*/
GO

And now, let’s fire off a parameterized query against this using the data value of 90 which we know from above qualifies for 900,000 records i.e. 90% of the records in the table.

SET SHOWPLAN_TEXT ON
GO
DECLARE @I INT
SET @I = 90
SELECT * FROM BIG_TABLE WHERE COL2 = @I
GO

Here is the execution plan for this statement:

StmtText
—————————————————————————————————————
|–Table Scan(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]), WHERE:([DEC_TEST_FS].[dbo].[BIG_TABLE].[COL2]=[@I]))

And now, let us change the value that is being passed in to be 0. As we can see from the data distribution from above, the value of 0 qualifies for only 10 records in the table and that has a very good selectivity (0.001 %). The index should jump right at it and use it.

DECLARE @I INT
SET @I = 0
SELECT * FROM BIG_TABLE WHERE COL2 = @I
GO

StmtText
—————————————————————————————————————
|–Table Scan(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]), WHERE:([DEC_TEST_FS].[dbo].[BIG_TABLE].[COL2]=[@I]))

As we know from previous posts on the topic, since the execution plan is already in the cache (based on the prior execution), the same execution plan gets used again this time even though the parameter has very good selectivity. There are ways to circumvent this issue even in prior versions of SQL Server and we have discussed those on our blog site. Another way to solve this would be to always re-compile this query regardless of the variable or to have a constant in this case instead of a parameter. Usage of plan guides is another way of fixing this.

Yet another way would be to use the FORCESEEK hint in places where we know the code will be getting fired only for the value of 0. Let’s take a look at the execution using this hint:

DECLARE @I INT
SET @I = 0
SELECT * FROM BIG_TABLE WITH (FORCESEEK) WHERE COL2 = @I
GO

StmtText
———————————————————————————————————————————————————-
|–Parallelism(Gather Streams)
|–Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000], [Expr1005]) WITH UNORDERED PREFETCH)
|–Index Seek(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE].[BIG_TABLE_IND_1]), SEEK:([DEC_TEST_FS].[dbo].[BIG_TABLE].[COL2]=[@I]) ORDERED FORWARD)
|–RID Lookup(OBJECT:([DEC_TEST_FS].[dbo].[BIG_TABLE]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

Posted in SQL Server | 1 Comment »

HierarchyID Data Type in SQL Server 2008

Posted by decipherinfosys on December 19, 2007

This is another blog entry in the series of SQL Server 2008 posts. In this post, we will cover the basics of one more data-type: hierarchyID. This data-type, as the name suggests, can be used to store hierarchical data structures like organizational trees, file system structures or any other type of hierarchical tree structures. In SQL Server 2008, you can now store that data and very easily query it as well. In this post, we will just look at a basic example of putting in the hierarchy data and querying it. There’s a whole lot more to it including updating and re-arranging the hierarchy, re-ordering of the data, creation of indexes to help optimize the searches on hierarchical data etc.

Let’s look at this briefly using an example:

Let us create an EMP_MASTER table which will house only the Employee data and then two table variables – one to act as the source of the data for the hiearchies and the second where we recursively populate the data in a hierarchical fashion.

/*******************************************************************
Creation of the Employee Master table
********************************************************************/
CREATE TABLE EMP_MASTER
(
EMP_NBR INT NOT NULL PRIMARY KEY,
EMP_NAME NVARCHAR(50),
MGR_NBR INT NULL
)

/*******************************************************************
Populating the table with some sample data
The MGR_NBR column reflects the immediate manager of the
Employee
********************************************************************/
INSERT INTO EMP_MASTER VALUES (1, ‘DON’, 5);
INSERT INTO EMP_MASTER VALUES (2, ‘HARI’, 5);
INSERT INTO EMP_MASTER VALUES (3, ‘RAMESH’, 2);
INSERT INTO EMP_MASTER VALUES (4, ‘JOE’, 5);
INSERT INTO EMP_MASTER VALUES (5, ‘DENNIS’, NULL);
INSERT INTO EMP_MASTER VALUES (6, ‘NIMISH’, 3);
INSERT INTO EMP_MASTER VALUES (7, ‘JESSIE’, 3);
INSERT INTO EMP_MASTER VALUES (8, ‘KEN’, 5);
INSERT INTO EMP_MASTER VALUES (9, ‘AMBER’, 4);
INSERT INTO EMP_MASTER VALUES (10, ‘JIM’, 6);

/*******************************************************************
Declaration of the two table variables
@ORG_HIERARCHY will hold the actual hierarchical data after
the recursion query populates it
@HIERARCHY_TREE will hold the EMP_MASTER data with actual
numbering of the data partitioned by their manager’s number
********************************************************************/
DECLARE @ORG_HIERARCHY TABLE (HIERARCHY_NODE HIERARCHYID, EMP_NBR INT, MGR_NBR INT, EMP_NAME NVARCHAR(50));

DECLARE @HIERARCHY_TREE TABLE (EMP_NBR INT, MGR_NBR INT, NBR_REC_IN_NODE INT)
INSERT INTO @HIERARCHY_TREE
SELECT EMP_NBR, MGR_NBR, ROW_NUMBER() OVER (PARTITION BY MGR_NBR ORDER BY MGR_NBR)
FROM EMP_MASTER;

/*********************************************************************
Recursive Query to take the data and populate the @ORG_HIERARCHY
table variable. This stores the Node and the path information
which we will then query using Hiearchical query methods

First we get the root data (using the GetRoot() method)
and then we recurse through the tree and populate the
hierarchy_Node column. Note that you would need to do a cast
to a string and use the ToString() method to get the hierarchyid.
**********************************************************************/
WITH ORG_PATH (path, EMP_NBR)
AS (
SELECT hierarchyid::GetRoot() AS Node, EMP_NBR
FROM @HIERARCHY_TREE AS A
WHERE MGR_NBR IS NULL

UNION ALL

SELECT
CAST(p.path.ToString() + CAST(NBR_REC_IN_NODE AS nvarchar(50)) + ‘/’ AS hierarchyid),
A.EMP_NBR
FROM @HIERARCHY_TREE AS A
JOIN ORG_PATH AS P
ON A.MGR_NBR = P.EMP_NBR
)
insert into @ORG_HIERARCHY
select PATH, A.EMP_NBR, A.MGR_NBR, A.EMP_NAME
from EMP_MASTER AS A INNER JOIN ORG_PATH AS B ON A.EMP_NBR = B.EMP_NBR;

/*********************************************************************
Now, let’s query the data using the ToString() and GetLevel()
methods and then order by the Level field. Output is shown below
**********************************************************************/
SELECT Hierarchy_Node.ToString() AS Node, Hierarchy_Node.GetLevel() as Level, *
FROM @ORG_HIERARCHY
ORDER BY Level;

Node         Level  HIERARCHY_NODE  EMP_NBR     MGR_NBR     EMP_NAME
------------ ------ --------------- ----------- ----------- --------
/            0      0x              5           NULL        DENNIS
/1/          1      0x58            8           5           KEN
/2/          1      0x68            4           5           JOE
/3/          1      0x78            1           5           DON
/4/          1      0x84            2           5           HARI
/4/1/        2      0x8560          3           2           RAMESH
/2/1/        2      0x6AC0          9           4           AMBER
/4/1/1/      3      0x856B          6           3           NIMISH
/4/1/2/      3      0x856D          7           3           JESSIE
/4/1/1/1/    4      0x856B58        10          6           JIM

As you can see from above, the different levels are defined and the Nodes are defined as well. So, what exactly does this replace from the previous system? i.e. what are the alternate ways of doing this hierarchical structure? One can use the normalized structures of a parent/child relationship, one can even make use of the XML data-type to store un-structured data. We will do some benchmarks and will post the performance results on our blog for the different methods. In our future blog posts, we will cover the indexing schemes, the updates and re-arranging of the data in the hierarchyID data-type columns.

Posted in SQL Server | Leave a Comment »

FILESTREAM Data in SQL Server 2008

Posted by decipherinfosys on December 17, 2007

We have been covering the new features in SQL Server 2008 in some of our blog posts.  In some of those, we had also discussed the new data-types in that release.  You can search for SQL Server 2008 on this blog site to get to all those posts.  In today’s blog post, we will discuss what can be done for designing and implementing FILESTREAM storage in SQL Server 2008.  Prior to SQL Server 2008, storage of items like documents, images was typically done by storing a link in the table column and storing the documents and images in a particular mapped out location on the SAN or other storage device.  For images, some people also used to store it in the imsage data-type but that was not always the most performant solution.

In SQL Server 2008, Filestream enables the application to store such un-structered data on the file system.  So, the application can now take advantage of the streaming APIs and performance of the file-system.  In addition, one can now co-relate the un-structured and structured data easily.  The Win32 filesystem interfaces provide the streaming access to the data and a good thing about filestream is that it does not use the SQL Server buffer cache – it uses the Windows system cache for the caching of the file data.

In order to make use of FILESTREAM, one needs to enable filestream for an instance of SQL Server and then one needs to create a database that supports filestream.  Once that is done, we can use the  FILESTREAM clause next to the VARBINARY(MAX) data-type to take advantage of this new feature.  Let’s follow this up with an example:

1) Let us first enable the instance for filestream:

USE MASTER
GO
EXEC sp_filestream_configure @enable_level = 3

This can also be changed from the GUI from the Advanced level properties for the instance.  Below is the jpg image for it.

filestream_1.jpg

You can see from the drop-down the different levels that are available for FILESTREAM.  We have chosen the highest level – Transact SQL and filesystem (local and remote client access).

2) Now that we have enabled the filestream option at the instance level, let us go ahead and create a new database.  We would need to create a filegroup that has the CONTAINS FILESTREAM as the clause so that we can designate that filegroup (and it’s files) to contain the filestream data.

USE MASTER
GO
CREATE DATABASE DEC_TEST_FS ON PRIMARY
( NAME = DEC_TEST_FS_data,
FILENAME = N’C:\DEC_TEST_FS_data.mdf’),
FILEGROUP DEC_TEST_FS_FG_1
( NAME = DEC_TEST_FS_REGULAR,
FILENAME = N’C:\DEC_TEST_FS_data_1.ndf’),
FILEGROUP DEC_TEST_FS_FG_2 CONTAINS FILESTREAM
( NAME = DEC_TEST_FS_FILESTREAM,
FILENAME = N’C:\DEC_TEST_FS’)
LOG ON
( NAME = DEC_TEST_FS_LOG,
FILENAME = N’C:\DEC_TEST_FS_log.ldf’);
GO

Please note that if you try to create this database by specifying a path for the filestream files that is not on NTFS, you will get the error message:

“The path specified by ‘d:\DEC_TEST_FS’ cannot be used for FILESTREAM files because it is not on NTFS.”

For the Filestream filegroup, the FILENAME refers to the path and not to the actual file name.  It creates that particular folder – from the example above, it created the C:\DEC_TEST_FS folder on the filesystem.  And that folder now contains a filestream.hdr file and also a folder $FSLOG folder.

If you already have a database, you can add a FILESTREAM filegroup to it using ALTER DATABASE command.

3) Now, that we have the instance and the database taken care of, let us create a table to take advantage of this new feature:

USE DEC_TEST_FS
GO

CREATE TABLE DEC_FS_TEST
(
COL1     INT IDENTITY,
COL2    UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
COL3     VARBINARY(MAX) FILESTREAM
);

The table definition needs a ROWGUIDCOL column – this is required by FILESTREAM.  The actual data is stored in the column COL3.

Now, let’s insert data into this table:

INSERT INTO DEC_FS_TEST (COL2, COL3) VALUES (NEWID(), NULL);
INSERT INTO DEC_FS_TEST (COL2, COL3) VALUES (NEWID(), CAST(‘MY DUMMY TEST’ AS VARBINARY(MAX)));

And now we can select out of this table and see how this data is represented:

COL1        COL2                                 COL3
———– ———————————— ———————————————————–
1           78909DBF-7B26-4CA9-A840-4D45930F7523 NULL
2           0B0F5833-1997-4C9C-A9A7-F2536D68CFED 0x4D592044554D4D592054455354

If you see on the filesystem, you will see additional folders have been created under DEC_TEST_FS folder.  That is shown in the jpg image below.

filestream_2.jpg

One can also use T-SQL to update the filestream data if one needs to.  The deletes will also work – the delete also marks the underlying file system files for deletion.  We will be doing some benchmarks on this new feature and will post our findings on the blog.

Posted in SQL Server | Leave a Comment »