Systems Engineering and RDBMS

Archive for October, 2007

Server Consolidation – things to look for in SQL Server 2008

Posted by decipherinfosys on October 31, 2007

We had talked about virtualization in the database world in a previous post. In SQL Server 2008, there are a lot more features available for doing server consolidation – here is a link to the MSFT whitepaper that breaks this down into the sections of flexibility, manageability and scalability & performance:

http://download.microsoft.com/download/a/c/d/acd8e043-d69b-4f09-bc9e-4168b65aaa71/SQL2008SrvConsol.doc

If you have downloaded the CTP 4, we would recommend that you get a head start with these features by playing with them in one of your VM environments.  In case you have not upgraded to SQL Server 2005, chances are that you would be upgrading directly to SQL Server 2008 by the end of next year once the product is released and has been tested in real world implementations.

Posted in SQL Server | Leave a Comment »

Converting Relational Data to XML and vice-versa in Oracle

Posted by decipherinfosys on October 30, 2007

In one of our previous blog post, we had covered how to generate XML documents in Oracle. And in subsequent posts, we had covered some of the XML features in SQL Server 2005 – de-construct XML as relational data, difference between typed and un-typed XML, and the difference between element and attribute centric XML.  In this post, we will briefly cover the usage of SQL functions – XMLForest and XMLElement in Oracle to convert relational data into XML and vice-versa.

Let us first create a TEST table and populate it with some data. Connect to SQL*Plus with proper authorization.

CREATE TABLE TEST(EMP_ID NUMBER(9), EMP_Name VARCHAR2(30),BIRTH_DATE DATE)
/

INSERT INTO TEST (EMP_ID, EMP_NAME, BIRTH_DATE) VALUES (1, ‘Jack’,’20-JAN-65′);
INSERT INTO TEST (EMP_ID, EMP_NAME, BIRTH_DATE) VALUES (2, ‘John’,’01-FEB-82′);
INSERT INTO TEST (EMP_ID, EMP_NAME, BIRTH_DATE) VALUES (3, ‘Beth’,’11-OCT-70′);
INSERT INTO TEST (EMP_ID, EMP_NAME, BIRTH_DATE) VALUES (4, ‘Mary’,’30-SEP-70′);
INSERT INTO TEST (EMP_ID, EMP_NAME, BIRTH_DATE) VALUES (5, ‘Allen’,’3-MAR-75′);
INSERT INTO TEST (EMP_ID, EMP_NAME, BIRTH_DATE) VALUES (6, ‘Diana’,’2-NOV-68′);
INSERT INTO TEST (EMP_ID, EMP_NAME, BIRTH_DATE) VALUES (7, ‘Don’,’31-DEC-80′);

Now using XMLFOREST function, we can generate the XML output. Run following query.

SELECT EMP_ID as Employee_Number,XMLFOREST(EMP_ID, EMP_NAME, BIRTH_DATE) AS XML_DATA
FROM TEST
/

Here is the output. Output is formatted for better readability.

EMPLOYEE_NUMBER        XML_DATA
---------------        -------------------------------------
              1        <EMP_ID>1</EMP_ID>
                       <EMP_NAME>Jack</EMP_NAME>
                       <BIRTH_DATE>1965-01-20</BIRTH_DATE>
              2        <EMP_ID>2</EMP_ID>
                       <EMP_NAME>John</EMP_NAME>
                       <BIRTH_DATE>1982-02-01</BIRTH_DATE>
              3        <EMP_ID>3</EMP_ID>
                       <EMP_NAME>Beth</EMP_NAME>
                       <BIRTH_DATE>1970-10-11</BIRTH_DATE>

In the output shown above, the column values are converted to xml data and column names are converted to tag names and data is enclosed within the tag. We can use XMLELEMENT function to place the user defined tag around the XML for each record. Let us see that.

SELECT EMP_ID as Employee_Number,
XMLELEMENT(“Employee”, XMLFOREST(EMP_ID, EMP_NAME, BIRTH_DATE)) AS XML_DATA
FROM TEST
/

Above SQL will wrap the user defined tag “Employee” around the original record displayed in the above result set. We can also store XML_DATA values in the table in which columns are defined as XMLTYPE. Datatype XMLTYPE has an advantage specially when searching for data in the XML. Using context search we can easily search for specific character or string in the XMLTYPE data.

Above approach shows how to convert relational data to XML data but, how can we convert back XML data into relational data? There are also functions available to do the same. Using EXTRACTVALUE function, we can retrieve data back into relational format from the XML format. Here is the SQL to do that.

SELECT EXTRACTVALUE(XML_DATA,’Employee/EMP_ID’) as EMP_ID,
EXTRACTVALUE(XML_DATA,’Employee/EMP_NAME’) as EMP_NAME,
TO_DATE(EXTRACTVALUE(XML_DATA,’Employee/BIRTH_date’),’MM/DD/YYYY’) as BIRTH_DATE
FROM
(
SELECT EMP_ID as Employee_Number,
XMLELEMENT(“Employee”, XMLFOREST(EMP_ID, EMP_NAME, BIRTH_DATE)) AS XML_DATA
FROM TEST
) A
/

Here is the output.

    EMP_ID EMP_NAME   BIRTH_DATE
---------- ---------- ----------
         1 Jack
         2 John
         3 Beth
         4 Mary
         5 Allen
         6 Diana
         7 Don

As you can see from above, we got the Employee ID and Employee Name but we didn’t get any data for Birth_Date column and we know that there is data for that column. This is because tags in XML are case-sensitive. We have defined the tag as BIRTH_DATE and we are trying to retrieve data as BIRTH_date. If we change the tag to BIRTH_DATE in EXTRACTVALUE function, we will get the data for birth date column as well. So one thing we have to keep in mind is that, we have to be extra cautious when dealing with XML data. One mismatch in upper or lower case and we may not get the desired data.

SELECT EXTRACTVALUE(XML_DATA,’Employee/EMP_ID’) as EMP_ID,
EXTRACTVALUE(XML_DATA,’Employee/EMP_NAME’) as EMP_NAME,
TO_CHAR(EXTRACTVALUE(XML_DATA,’Employee/BIRTH_DATE’),’MM/DD/YYYY’) as BIRTH_DATE
FROM
(
SELECT EMP_ID as Employee_Number,
XMLELEMENT(“Employee”, XMLFOREST(EMP_ID, EMP_NAME, BIRTH_DATE)) AS XML_DATA
FROM TEST
) A
/

Here is the complete output.

EMP_ID EMP_NAME   BIRTH_DATE
———- ———- ———-
1 Jack       01/20/1965
2 John       02/01/1982
3 Beth       10/11/1970
4 Mary       09/30/1970
5 Allen      03/03/1975
6 Diana      11/21/1968
7 Don        12/31/1980

In future blogs, we will cover more about some more complex and different XML options. This is just to demonstrate the simple approach to deal with XML data type.

Posted in Oracle | 1 Comment »

DBCC UPDATEUSAGE, sp_spaceused and Estimating the size of the database

Posted by decipherinfosys on October 29, 2007

In some of our previous blog posts, we have covered techniques on how to go about calculating the size of the database (Oracle or SQL Server). Search for “estimating the size” on this site and you will get the three posts for Oracle. Likewise, for SQL Server, Microsoft has provided guidelines in BOL – search BOL for “Estimating the size of a database” and you will get all the steps and calculations that you need to estimate the size of the table, a heap, a clustered or a non-clustered index. Based on those, one can write up a stored procedure or a script to compute the size. We had written a similar script while doing capacity planning for one of our clients. The script took the @db_name, @table_name, @num_rows, @var_pcnt as the parameters (database name, table name, number of rows for which the size needs to be estimated and the percentage fill for the variable length columns) and gave the respective size distributions as well as the total estimated size for the table and it’s indexes. You can also reference those calculations from the online version:

http://msdn2.microsoft.com/en-us/library/ms187445.aspx

In SQL Server 2000, one needed to use DBCC UPDATEUSAGE command in order to get the right page and rowcounts from the catalog views – in SQL Server 2005, this is no longer the case since SQL Server 2005 always maintains these correctly. However, if you have upgraded from SQL Server 2000 to SQL Server 2005, then you would need to run this command in order to remove any inaccuracies for page and row-counts else the data that is shown by sp_spaceused system procedure will be incorrect. You can get more information about this command and it’s different options from BOL.

Besides using the calculations mentioned above, you can also chose to wrap that using a GUI tool and provided additional value to your employer/client. One such utility that is available freely is the one written by Øystein Sundsbø. It is posted on his blog and you can access it here:

http://dbgoby.blogspot.com/2007/10/db-goby-v10.html

It does way more than just estimate the size of the database and is compatible with SQL Server 2005 as well as SQL Server 2008.

Posted in SQL Server | Leave a Comment »

MODL – a new way of learning

Posted by decipherinfosys on October 28, 2007

Everyday new training methods are evolving to reach out to the customers and enhance their learning experience. Portals and online applications are coming up which allow for sharing of content pertaining to high-tech training to topics such as gardening, high school mathematics etc.. Microsoft has also taken a couple of such concepts and has a program called MODL – Microsoft Official Distance Learning. The official site for MODL is:

http://www.microsoft.com/learning/modl/default.mspx

This is a good combination of the instructor led training plus web based CBTs plus virtual reality concept like Second-life. This sounds like a very neat concept since it merges the best of all those worlds. Not only is it less expensive since the travel and lodging costs are saved (if going out of city for training), it preserves the benefits of one-on-one instructor based training classes and at the same time lets the students to work on their own pace. In addition, one of the exciting things about MODL training is the phase in which the students work through a real world problem in a virtual workspace (you can draw parallel to Second life concept here) – this really helps since application of what you learn in the class to a real world problem helps cements those concepts. We are sure that MSFT will be adding more courses in their MODL training as time passes by – right now, the list of available courses can be seen at this link.

Posted in Technology | Leave a Comment »

Peer to Peer Transactional Replication

Posted by decipherinfosys on October 27, 2007

SQL Server 2000 had bi-directional transactional replication feature and it has been greatly enhanced in SQL Server 2005. It is called P2P or Peer to Peer Replication. This is Microsoft’s answer to the scaling out solution for SQL Server. This allows the applications working against the different nodes of SQL Server to be able to do read as well as write operations against any of the nodes involved in replication. The load balancing of the read operations as well as the updates across nodes allows for high availability as well as scalability.

The basic underlying principle of replication is that the data is published by the publisher and is replicated to the Subscriber. This data is read-only. With P2P replication though, each node that is involved in this replication acts as a Publisher as well as the Subscriber to the other nodes. The way each node can modify data is by allowing 2-way transactional replication. Unlike Oracle RAC in which there is a single database and no single node owns the data, in the case of SQL Server, each node has it’s own copy of the database and has the identical schema and the identical data. So, the storage requirements are more. In addition, for the DML operations (inserts, updates and deletes), an approach that is typically adopted is the usage of partitioned updates which means that if you have a table called CUST_MASTER and updates need to be made to it, then the DMLs can be partitioned so that the updates for customers with the name of A-R happens on one node and the ones from S-Z happen on the other node (or any other combination of this set). This requires changes to the application. If updates are needed from all the nodes involved in the replication, a better choice would be to use merge replication instead.

Let’s consider a scenario where we have two nodes involved in the P2P replication. In this case, the reads can be balanced across the two nodes by application server assignment and the partitioned updates can be done as mentioned above. Another scenario would be when one would direct all the writes to one node and reads to all the nodes (though this can be achieved using the traditional one-way transactional replication as well). In order to achieve high availability, one can code the application in such a way that if one of the nodes is down, the application’s connect module re-directs the connection to the other nodes involved in P2P replication.

You can read up more on P2P replication using these links:

http://msdn2.microsoft.com/en-us/library/ms151196.aspx

http://www.sql-server-performance.com/articles/dba/peer-to-peer_replication_p1.aspx

Posted in SQL Server | 1 Comment »

Virtualization in the database world

Posted by decipherinfosys on October 26, 2007

Virtualization is rapidly changing how we consolidate our servers for testing and deployment, for training and for disaster recovery. We have VMWare machines running in our office with clustered SQL Server and clustered Oracle environments. It is a great way to test out new software and play around with the new functionality as well as test out your disaster recovery scenarios without investing millions on hardware. It is also a good way to consolidate your servers and use them for development and QA purposes.

Both VMWare’s VMWare Server and MSFT’s Virtual Server 2005 R2 support 64-bit architecture on the host which means more memory utilization on the host seerver (up-to 1 Tera-byte) which translates into the capability of running many more active VMs. When you are getting ready to consolidate your servers into the VMs, you should remember to allocate 32MB per VM to account for the VM overhead. So, if you have an Oracle database for which you have allocated say 2GB of memory (overall), when you move to the VM, you would need (2 * 1024) + 32 MB for the VM. And you need to ensure that there is enough RAM left for the host as well. Another thing to consider is the usage of a SAN for the host server. You should create the VM’s virtual hard drive on a drive that is different than the host’s operating system – this is to reduce any possibility of a drive and spindle contention. Best thing is to use a SAN to help improve the I/O for the VMs. One more things to remember is that the virtual hard drives can be configured with their default settings to dynamically expand as needed. However, this is not good for performance. We would recommend to pre-allocate a fixed amount in order to avoid the performance hit of the expansion.

Here is the link to the whitepapers from VMWare on this topic:

http://www.vmware.com/solutions/whitepapers.html

Posted in Virtual Server, VMWare | 1 Comment »

Software Development and Failures

Posted by decipherinfosys on October 26, 2007

This is an excellent read – http://www.inc.com/magazine/20071101/how-hard-could-it-be-five-easy-ways-to-fail.html?partner=fogcreek

The article delves into the top 5 mistakes that lead to failures in the technology world.

Posted in Technology | Leave a Comment »

SQL Nexus

Posted by decipherinfosys on October 25, 2007

If you haven’t purchased the book “SQL Server 2005 Practical Troubleshooting: The Database Engine“, we would highly recommend it. One of the utilities that was provided on the CD with this tool was SQL Nexus.  This utility was developed by Ken Henderson and Bart Duncan and can be used to collect performance information pertaining to SQL Server and generate and visualize it via a set of SSRS (Reporting Services) based reports for further analysis.  It is based off SQLDiag as the main back-end data collection tool.  The kind of diagnostic data that you wish to collect is also configurable through an XML file.

We would encourage you to download this wonderful utility and become familiar with it – this could be a real lifesaver in production environments.

Posted in SQL Server | Leave a Comment »

Re-naming Database in SQL Server

Posted by decipherinfosys on October 24, 2007

In one of our previous blog post, we had covered how to rename objects in SQL Server. Today we will cover how we can rename the database in SQL Server. In order to rename the database, we need to first put it in a single user mode. We will start with creating a test database first. Connect to Query Analyzer or management studio and issue following command to create the database.

USE MASTER
GO
CREATE DATABASE MY_TEST
GO

We have created a database in its simplest form. One data file and one log file will be created in default directory location. We can check it using sp_helpfile system stored procedure. Now using sp_renamedb stored procedure, we will rename the database. Since we have created the database just now, there is no need to put database in single user mode but we will still go ahead and perform the step to avoid any errors.

EXEC sp_dboption ‘MY_TEST’, ‘single user’, ‘TRUE’
EXEC sp_renamedb ‘MY_TEST’, ‘NEW_TEST’
EXEC sp_dboption ‘NEW_TEST’, ‘single user’, ‘FALSE’

Commands are self explanatory. In the first line, we are putting database in single user mode. In Second line we are renaming database to the new name. Since database name is already changed, in the third line we are using new database name to put it back in the multi user mode. We can even write small stored procedure, which can take two arguments, (old database name and new database name) and perform the same steps within the stored procedure. This works for both SQL Server 2000 and SQL Server 2005. We can check the information about new database by using sp_helpdb command.

But in SQL Server 2005, Microsoft issued a warning to deprecate sp_renamedb stored procedure from the future releases and suggested to use ALTER DATABASE command. Though it is not yet decided that in which release sp_renamedb stored procedure will be put to rest. Following is the ALTER DATABASE syntax to rename the database. As we have already changed the database name for our previous test, we will change the database back to its original name.

ALTER DATABASE NEW_TEST MODIFY NAME = MY_TEST

Above command will also rename the database to the name specified in the MODIFY NAME clause. To rename the database, one should have either sysadmin or dbcreator fixed server role privilege.

Posted in SQL Server | Leave a Comment »

OPTIMIZE FOR query hint in SQL Server 2005

Posted by decipherinfosys on October 23, 2007

In one of our previous blog post, we had covered bind variable peeking (Oracle) and parameter sniffing (SQL Server) behavior that happens when one is using a bind variables in SQL queries on data-sets that are not uniformly distributed. One can choose to use plan guides (SQL Server) and stored outlines (Oracle) in order to enforce a specific query plan if needed for such queries or for these specific scenarios, one can also choose to go with the use of constant/literal values instead of the bind variables. That is one scenario where in a typical transactional system, the usage of bind variables might not yield the desired results.

In this post, we will discuss a new query hint that was introduced in SQL Server 2005 which is called the “OPTIMIZE FOR” query hint. This is another way to address the parameter sniffing problem in SQL Server. This option ties in with the usage of the plan guides or can be used independently to alter the execution plan formation by the optimizer. This query hint instructs the optimizer to use a particular value for a local variable when the query goes through the optimization phase.

Let’s look at this using an example:

CREATE TABLE DECIPHER_TEST (COL1 INT IDENTITY NOT NULL, STAT_CODE INT NOT NULL)
GO

CREATE INDEX DECIPHER_TEST_IND_1 ON DECIPHER_TEST (STAT_CODE)
GO

SET NOCOUNT ON
GO

DECLARE @I INT
SET @I = 1
WHILE (@I <= 1000000)
BEGIN
IF @I <=10
INSERT INTO DECIPHER_TEST (STAT_CODE) VALUES (0)

IF (@I > 10 AND @I <= 10000)
INSERT INTO DECIPHER_TEST (STAT_CODE) VALUES (30)

IF (@I > 10000 AND @I <= 1000000)
INSERT INTO DECIPHER_TEST (STAT_CODE) VALUES (90)

SET @I = @I + 1
END

We used the above script to create a test table in our database and populated it with 1 million rows. Here is the data distribution:

SELECT STAT_CODE, COUNT(*) CNT_DISTR FROM DECIPHER_TEST GROUP BY STAT_CODE

STAT_CODE   CNT_DISTR
 ----------- -----------
 0           10
 30          9990
 90          990000

And stats were collected:

UPDATE STATISTICS DECIPHER_TEST WITH SAMPLE 100 PERCENT, ALL

Now, let us use a parameterized query and use the value of 90:

SET SHOWPLAN_TEXT ON
GO

DECLARE @STAT_CODE INT
SET @STAT_CODE = 90

SELECT * FROM DECIPHER_TEST WHERE STAT_CODE = @STAT_CODE

StmtText
——————————————————————————————————————————————
|–Table Scan(OBJECT: ([AdventureWorks].[dbo].[DECIPHER_TEST]), WHERE: ([AdventureWorks].[dbo].[DECIPHER_TEST].[STAT_CODE]=[@STAT_CODE]))

The optimizer does a complete table scan as it should since the value of 90 qualifies for a majority of the data so going through an index gives no benefits whatsoever. Now, let us replace that with the STAT_CODE value of 0 and see whether the optimizer changes the execution plan to go through the index this time since that value qualifies for only 0.001% of the records in the table.

DECLARE @STAT_CODE INT
SET @STAT_CODE = 0

SELECT * FROM DECIPHER_TEST WHERE STAT_CODE = @STAT_CODE

StmtText
——————————————————————————————————————————————
|–Table Scan(OBJECT: ([AdventureWorks].[dbo].[DECIPHER_TEST]), WHERE: ([AdventureWorks].[dbo].[DECIPHER_TEST].[STAT_CODE]=[@STAT_CODE]))

As you can see, the execution plan still remains the same as before – that is because the execution plan for the parameterized query was cached when the previous run was done. Now, let us make use of the “OPTIMIZE FOR” query hint and see if we can force the optimizer to optimize the query for the STAT_CODE value of 0:

DECLARE @STAT_CODE INT
SET @STAT_CODE = 0

SELECT * FROM DECIPHER_TEST WHERE STAT_CODE = @STAT_CODE OPTION (OPTIMIZE FOR (@STAT_CODE = 0))

StmtText
————————————————————————————————————————————————————————————
|–Nested Loops(Inner Join, OUTER REFERENCES: ([Bmk1000]))
|–Index Seek(OBJECT: ([AdventureWorks].[dbo].[DECIPHER_TEST].[DECIPHER_TEST_IND_1]), SEEK: ([AdventureWorks].[dbo].[DECIPHER_TEST].[STAT_CODE]=[@STAT_CODE]) ORDERED FORWARD)
|–RID Lookup(OBJECT: ([AdventureWorks].[dbo].[DECIPHER_TEST]), SEEK: ([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD)

As you can see from the execution plan shown above, the optimizer picked up the index this time and did a RID look-up to get the data that was not in the index. This makes sense since the selectivity of the data value is such that it should favor an index seek operation. So, what would happen now if we execute the query with a value of 90? Since we have forced the optimizer to always form an optimized execution plan for the given value of 0 for the status code, when we use 90, it will still use the index plan that is shown above. So, how is this good? In this specific scenario, say the STAT_CODE values represent this:

0: Record is ready to be processed
30: Interim status code for processing
90: Processing was successfull
99: Failure/Error

In a majority of the cases, the code that interacts with this table will be doing look-ups based on the status code value of either 0 or 30. Places where 90 or 99 will be used will be typically only in queries that check for the data or in reports and those will be getting executed less number of times as compared to the transactional system queries. So, in the actual transactional system where one needs to parse once and execute many times, one can still make use of the bind variables (parameterized queries) and have an execution plan that best represents the data distribution and the optimal execution plan and in reporting system, one can use a query based on either constants/literals or one without the “OPTIMIZE FOR” hint. Here is an execution with a constant/literal value execution of the same query and it’s execution plan as rendered by the optimizer:

SELECT * FROM DECIPHER_TEST WHERE STAT_CODE = 90

StmtText
———————————————————————————————————————————-
|–Table Scan(OBJECT: ([AdventureWorks].[dbo].[DECIPHER_TEST]), WHERE: ([AdventureWorks].[dbo].[DECIPHER_TEST].[STAT_CODE]=(90)))

Posted in SQL Server | 2 Comments »