Systems Engineering and RDBMS

Archive for the 'DB2 LUW' Category


Data Migration Scenario - CTE solution

Posted by decipherinfosys on June 12, 2008

A few days ago, we had posted the problem and the solution pertaining to a data migration issue. We had given some solutions and explained one of those in detail along with the code. A reader asked us to do the same thing using the CTE which was one of the solutions that we had proposed. Using a CTE, there are a couple of ways to get it done. Refer to the tables and the data in that post before you use this code. Here is the CTE code and the explanation follows after that:

with
N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT 1 FROM N3 AS X, N3 AS Y),
N5 (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY X.n%10) FROM N4 AS X, N4 AS Y)
select p.ID as ItemID,
row_number() over (partition by p.id order by n) as LineItem,
substring(p.profession, case when n = 1 then 1 else n + 1 end, 10) as Profession
from N5
join dbo.profession as p
on N5.n <= Len(p.Profession)
where n <= 100
and (n = 1 or n%10 = 0)
order by p.id, n

ItemID      LineItem             Profession
----------- -------------------- ----------
1           1                    I am a Pro
1           2                    grammer an
1           3                    d a baseba
1           4                    ll Champio
1           5                    n and a fo
1           6                    otball Pla
1           7                    yer
2           1                    I am a Doc
2           2                    tor
3           1                    I am an Ar
3           2                    tist

The above piece of code first constructs a number table using the CTE (this was introduced by Itzik in his SQL column in the sqlmag) and then it joins with the source table Profession and slices the string up and shows up the data in the format that we needed for the destination table.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Data Migration Scenario

Posted by decipherinfosys on June 9, 2008

Over the weekend, a colleague had asked a question regarding one of the data migration issues that he was facing in one of the projects. We will mention the problem as well as the solution in this blog post.

Problem: He had a source table with the following structure and data (ID was an integer column with Identity attribute and Description was a string column with a max length of 100):

SOURCE TABLE

ID DESCRIPTION
1 I am a programmer
2 I am a doctor

In the destination database, he had a destination table with the same columns except that the Description column was only 10 characters long. His task was to load the records from the source to the destination table. If the length of the description was more than 10 characters, he was required to insert another row with the next 10 characters into the Destination table with a LineId of next incrementing number as follows.

Destination Table

ID LINEID DESCRIPTION
1 1 I am a pro
1 2 grammer
2 1 i am a doc
2 2 tor

Solution: There are many ways to solve this problem. Since he was using plain simple T-SQL and not SSIS package for the flow, we presented 4 solutions to the problem - before we present them in the blog, a quick glance at the example that he had given indicates that the problem is essentially the same as un-pivoting the data and transposing the column into rows. Since the maximum length of the source column is known before hand - nvarchar(100) … and the destination’s string max length is nvarchar(10), we can at the most get 10 lineID values for each ID record from the source. The problem would have been more complex if we were getting the data from a source where the max length was not known (we have seen such requirements as well) and in that case dynamic un-pivoting needs to be done. So, with that in mind, here are the solutions:

1) Use static union all lists since you already know the source column length (100) that goes into a maximum length of 10…so, it translates to 10 different SQLs with UNION ALL operations between them.

Or…

2) Use a CTE (Common Table Expression) to do recursive operation to break the source description column into separate line items.

Or…

3) Since the maximum length of the source is known, the issue translates to being able to unpivot the data i.e. treat that one single column a the source and un-pivot the column into rows (look at our whitepaper on pivot and unpivot).

Or…

4) The old way of using a cursor (or a while loop), looping around and creating the new records.

Let’s create the dummy tables and some data and then we will use #1 method to show how we can do this in a simple SQL statement (using SQL Server Syntax):

/********************************************************************************
create some dummy tables
Profession is the Source table and ProfessionItem is the destination table
*********************************************************************************/
create table Profession (Id int identity(1,1), Profession nvarchar(100))
create table ProfessionItem (ItemId int, LineItem int,Profession nvarchar(10))

/********************************************************************************
insert some dummy data
*********************************************************************************/
insert into Profession (Profession)
values (N’I am a Programmer and a baseball Champion and a football Player’);

insert into Profession (Profession)
values (N’I am a Doctor’);

insert into Profession (Profession)
values (N’I am an Artist’);

Now, a simple SQL statement using the method from #1 option above will give us the data in the required format:

Select *
from
(
SELECT P.Id as ItemID, 1 as LineItem, SUBSTRING(Profession, 1, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 2 as LineItem, SUBSTRING(Profession, 11, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 3 as LineItem, SUBSTRING(Profession, 21, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 4 as LineItem, SUBSTRING(Profession, 31, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 5 as LineItem, SUBSTRING(Profession, 41, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 6 as LineItem, SUBSTRING(Profession, 51, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 7 as LineItem, SUBSTRING(Profession, 61, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 8 as LineItem, SUBSTRING(Profession, 71, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 9 as LineItem, SUBSTRING(Profession, 81, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 10 as LineItem, SUBSTRING(Profession, 91, 10) as Profession
FROM dbo.Profession as P
) AS X
Where LEN(X.Profession) > 0
Order by X.ItemID, X.LineItem

ItemID      LineItem    Profession
----------- ----------- ----------
1           1           I am a Pro
1           2           grammer an
1           3           d a baseba
1           4           ll Champio
1           5           n and a fo
1           6           otball Pla
1           7           yer
2           1           I am a Doc
2           2           tor
3           1           I am an Ar
3           2           tist

A point to note is that if the source table is huge, then we can break it up in sets of 10000 based on the ID values and use that criteria in a while loop and put the ID fitler criteria in the WHERE clause of the SQL statement shown above. Since ID will be an indexed column (clustered index), the range scans will be fine and we can quickly get the data from the source to the destination table using this method.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Extract Day, Month or Year from a Date Field

Posted by decipherinfosys on May 21, 2008

One of the developers at a client site was writing up a report for the yearly sales. Invoice data is stored as per date when the invoice was issued. He needed to find out a way to extract year from the date field and sum up the sales amount on yearly basis. There are many ways of doing it … in this post, we will show one of the most common and easy ways of achieving this.

Oracle uses an EXTRACT(datetime) function to extract the value of a specified datetime field from a datetime expression. Function is very handy to manipulate specific datetime field value like day, month or year. Following values can be extracted from the column having date datatype. Apart from values mentioned below, timezone related values like timezone_region, timezone_hour and timezone_minute can also be obtained using EXTRACT function if underlying datatype is TIMESTAMP WITH TIME ZONE.

* Day, Month, Year, Hour, Minute, Second

Create following table and populate it with the data. If you already have table with the same name, change all the occurrences of the table name with some other name.

CREATE TABLE Invoice
(

INVOICE_NUMBER      NUMBER(9) NOT NULL,
INVOICE_DATE          DATE NOT NULL,
CLIENT_ID                NUMBER(9)     NOT NULL,
INVOICE_AMT           NUMBER(9,2) DEFAULT 0 NOT NULL,
PAID_FLAG               NUMBER(1) DEFAULT 0 NOT NULL, — 0 Not paid/ 1 paid
CONSTRAINT PK_INVOICE PRIMARY KEY(INVOICE_NUMBER)
)
– TABLESPACE Clause
/

CREATE SEQUENCE INVOICE_SEQ
START WITH 1
CACHE 100
/

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate-5,101,1100.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate-10,102,1100.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate-20,103,1100.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate-40,101,1100.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate-38,101,1500.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate-60,102,1100.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate-65,103,1100.00);

Run following SQL from SQL*Plus prompt. It Extracts Year from the Invoice_Date and group it by year to see yearly sales.

SQL> SELECT EXTRACT(Year FROM Invoice_date) as Year,
SUM(Invoice_Amt) as Sales_Amt
FROM Invoice
GROUP BY EXTRACT(Year FROM Invoice_Date);

YEAR  SALES_AMT
—- ———-
2006       4800
2007       3300

MS SQL Server has datepart functions. Using datepart we can get following datetime fields.

*  Day , Dayofyear, Week , Weekday, Month, Quarter, Year
* Hour, Minute, Second, Millisecond

It also has DAY, MONTH and YEAR functions which are equivalent to datepart(dd,date), datepart(mm,date) and datepart(yy,date) respectively. Let us create the table and see how we can do it.

CREATE TABLE dbo.Invoice
(
INVOICE_NUMBER      INT IDENTITY(1,1) NOT NULL,
INVOICE_DATE        DATETIME NOT NULL,
CLIENT_ID           INT    NOT NULL,
INVOICE_AMT         NUMERIC(9,2) DEFAULT 0 NOT NULL,
PAID_FLAG           TINYINT DEFAULT 0 NOT NULL, — 0 Not paid/ 1 paid
CONSTRAINT PK_INVOICE PRIMARY KEY(INVOICE_NUMBER)
)
– FILEGROUP Clause
GO

INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate()-5,101,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate()-10,102,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate()-20,103,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate()-40,101,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate()-38,101,1500.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate()-60,102,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate()-65,103,1100.00);

We can use either Year function or Datepart syntax to get the final results. Both SQLs are shown under. You can either run it from Query Analyzer or Management studio.

SELECT YEAR(Invoice_date) as Year, SUM(Invoice_Amt) as Sales_Amt
FROM Invoice
GROUP BY Year(Invoice_Date)
GO
OR
SELECT DATEPART(YY,Invoice_date) as Year, SUM(Invoice_Amt) as Sales_Amt
FROM Invoice
GROUP BY DATEPART(YY,Invoice_Date)
GO

DB2 LUW has date and timestamp datatype for storing dates. Given a date, time or timestamp value, we can extract following datetime fields.

* Day,  Month, Year (date and timestamp)
*  Hour, Minute, Second, MicroSecond (time and timestamp)

Apart from above functions, it also returns week, quarter, dayofweek  and dayofyear given a date or timestamp.

CREATE TABLE Invoice
(
INVOICE_NUMBER      INTEGER NOT NULL GENERATED BY DEFAULT AS
IDENTITY (START WITH +1, INCREMENT BY +1, CACHE 1000),
INVOICE_DATE        TIMESTAMP NOT NULL,
CLIENT_ID           INTEGER       NOT NULL,
INVOICE_AMT         DECIMAL(9,2) DEFAULT 0 NOT NULL,
PAID_FLAG           SMALLINT DEFAULT 0 NOT NULL,
CONSTRAINT PK_INVOICE PRIMARY KEY(INVOICE_NUMBER)
);

INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp - 5 days,101,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp - 10 days,102,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp - 20 days,103,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp - 40 days,101,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp - 38 days,101,1500.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp - 60 days,102,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp - 65 days,103,1100.00);

Following is the SQL to obtain yearly sales in DB2. You can run it from Command Editor.

SELECT YEAR(Invoice_date) as Year, SUM(Invoice_Amt) as Sales_Amt
FROM Invoice
GROUP BY Year(Invoice_Date);

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Returning a record set from a stored procedure

Posted by decipherinfosys on May 19, 2008

One of the the database developers at a client site who is responsible for writing DB code across the three RDBMS that they support wanted a simple example of a stored procedure which returns the record set to the calling application for further processing.  In this post, we will show you in brief example, how can we address this in Oracle, MS SQL Server and DB2 LUW.

Oracle:
First let us create a table and populate it with small set of data. If you already have table with the same name, change all the occurrences of the table name with some other name.

CREATE TABLE Invoice
(
INVOICE_NUMBER      NUMBER(9) NOT NULL,
INVOICE_DATE        DATE NOT NULL,
CLIENT_ID           NUMBER(9)     NOT NULL,
INVOICE_AMT         NUMBER(9,2) DEFAULT 0 NOT NULL,
PAID_FLAG           NUMBER(1) DEFAULT 0 NOT NULL, — 0 Not paid/ 1 paid
CONSTRAINT PK_INVOICE PRIMARY KEY(INVOICE_NUMBER)
)
– TABLESPACE Clause
/

CREATE SEQUENCE INVOICE_SEQ
START WITH 1
CACHE 100
/

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate,101,1100.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate,102,1100.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate,103,1100.00);

INSERT INTO INVOICE(Invoice_Number, Invoice_date, client_ID, Invoice_Amt)
VALUES(INVOICE_SEQ.NEXTVAL, sysdate,104,1100.00);

Oracle uses cursor variables to pass query result sets between PL/SQL sub programs and to the client application. A cursor variable has data type REF CURSOR and that is what is normally known as a ref cursor (we will cover more on the ref cursors in one of our future whitepapers). Following is the code snippet to return result set from the stored procedure.

To create a cursor variable first we need to define a REF CURSOR type and then declare the cursor variable of that type.  To avoid declaring the same REF CURSOR type in each program, we will create one global type in a package specification and declare cursor of that type in our procedure.

Run following scripts to create package and procedure from SQL*Plus prompt. Testing script is to test the execution of the procedure.

/* Create package */
CREATE OR REPLACE PACKAGE types
AS
type cursorType is ref cursor;
END;
/

– Here we have declared cursor variable of type cursorType as an output variable.
CREATE OR REPLACE PROCEDURE DEC_RTN_RECORDSET
(
p_InvoiceDate      IN   DATE,
p_ResultSet        OUT  TYPES.cursorType
)
AS
BEGIN
OPEN p_ResultSet FOR
SELECT Invoice_Number, Invoice_Date, Client_ID, Invoice_Amt
FROM Invoice
WHERE Invoice_date <= p_InvoiceDate
ORDER BY Invoice_number;

END DEC_RTN_RECORDSET;
/

– Testing

VARIABLE resultSet  REFCURSOR
EXEC DEC_RTN_RECORDSET(sysdate, :resultSet);
PRINT :resultSet

Following is the output. It is formatted for more readability.

Invoice_Number       Invoice_Date  Client_ID     Invoice_Amt
————–       ————  ———     ———–
1       25-JAN-07           101            1100
2       25-JAN-07           102            1100
3       25-JAN-07           103            1100
4
25-JAN-07           104            1100

MS SQL Server does not require any declaration of variable to return record set back to calling program from the stored procedure. By simply adding a SELECT statement at the end of the procedure for qualified columns, procedure will return data back to calling program since SQL Server handles all this internally which is indeed very convenient for programming purposes.  Following is the code snippet of MS SQL Server stored procedure. It does not require specific open cursor command.

CREATE TABLE dbo.Invoice
(
INVOICE_NUMBER      INT IDENTITY(1,1) NOT NULL,
INVOICE_DATE        DATETIME NOT NULL,
CLIENT_ID           INT    NOT NULL,
INVOICE_AMT         NUMERIC(9,2) DEFAULT 0 NOT NULL,
PAID_FLAG           TINYINT DEFAULT 0 NOT NULL, — 0 Not paid/ 1 paid
CONSTRAINT PK_INVOICE PRIMARY KEY(INVOICE_NUMBER)
)
– FILEGROUP Clause
GO

INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate(),101,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate(),102,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate(),103,1100.00);
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(getDate(),104,1100.00);

Create following stored procedure either from Query analyzer or management studio.

CREATE PROC DEC_RTN_RECORDSET
(

@InvoiceDate DATETIME

)

AS
BEGIN
SET NOCOUNT ON
–Get the data from the Invoice table to return it to client application
SELECT Invoice_Number, Invoice_Date, Client_ID, Invoice_Amt
FROM Invoice
WHERE Invoice_date <= @InvoiceDate
ORDER BY Invoice_number;

SET NOCOUNT OFF
END
GO

– Run followig command to test the execution of the procedure.

DECLARE @CurrentDate datetime
set @CurrentDate = GETDATE()
exec DEC_RTN_RECORDSET @CurrentDate

Following is the output. It is formatted for more readability.

Invoice_Number       Invoice_Date               Client_ID     Invoice_Amt
————–       ————               ———     ———–
1       2007-01-25 15:21:22.300          101             1100
2       2007-01-25 15:21:22.300          102             1100
3       2007-01-25 15:21:22.300          103             1100
4       2007-01-25 15:21:22.300          104             1100

DB2 LUW also does not require any variable declaration to pass back to the calling program. But we need to DECLARE the cursor first and then OPEN the cursor. When we declare a cursor we have to use WITH RETURN TO clause in order to return record set from the procedure. If result set is to be returned to invoker of the procedure then we need to use WITH RETURN TO CALLER in declaration. If result set is to be returned to originating application then we need to use WITH RETURN TO CLIENT in declaration. Here we are assuming that we are returning result set to originating application and hence we are using WITH RETURN TO CLIENT.

Let us create table first and populate it with data. You can create it either via command editor or using command window. Please make sure that statement terminator is “@” or else you may receive an error (or replace with whatever statement terminator you have set in your programming tool).

CREATE TABLE Invoice
(
INVOICE_NUMBER      INTEGER NOT NULL GENERATED BY DEFAULT AS
IDENTITY (START WITH +1, INCREMENT BY +1, CACHE 1000),
INVOICE_DATE        TIMESTAMP NOT NULL,
CLIENT_ID           INTEGER       NOT NULL,
INVOICE_AMT         DECIMAL(9,2) DEFAULT 0 NOT NULL,
PAID_FLAG           SMALLINT DEFAULT 0 NOT NULL,
CONSTRAINT PK_INVOICE PRIMARY KEY(INVOICE_NUMBER)
)
@

INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp,101,1100.00)@
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp,102,1100.00)@
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp,103,1100.00)@
INSERT INTO INVOICE(Invoice_date, client_ID, Invoice_Amt) VALUES(current timestamp,104,1100.00)@

Following is the procedure to return the result set. You can run it from Command Editor or save it in a file and run it from command window using following command.

db2 �td@ -f <filename> (filename in which following code is stored).

CREATE PROCEDURE DEC_RTN_RECORDSET
(
IN  P_InvoiceDate   TIMESTAMP
)
DYNAMIC RESULT SETS 1
LANGUAGE SQL
BEGIN
–Get the data from the table to display it back to client application
BEGIN
DECLARE c2 CURSOR WITH RETURN TO CLIENT FOR
SELECT Invoice_Number, Invoice_Date, Client_ID, Invoice_Amt
FROM Invoice
WHERE Invoice_date <= p_InvoiceDate
ORDER BY Invoice_number;

OPEN c2;
END;
END
@

– To test the procedure execute following command.
call DEC_RTN_RECORDSET(current timestamp)@

Following is the output. It is formatted for more readability.

Invoice_Number       Invoice_Date               Client_ID     Invoice_Amt
————–       ————               ———     ———–
1       2007-01-25-15.45.33.991000       101             1100
2       2007-01-25-15.45.34.054000       102             1100
3       2007-01-25-15.45.34.132000       103             1100
4       2007-01-25-15.45.34.210000       104             1100

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Traversing the hierarchy - Part II

Posted by decipherinfosys on April 26, 2008

Yesterday, we had covered how to traverse a hierarchy using CTEs. We had used the schema sample from a pharmaceutical industry as an example. In today’s post, let’s make use of a more generic example. This one pertains to a recursive 1:N relationship within the same table - and this scenario exists in almost all of the HR applications out there. An employee has a manager and the top dog in the company does not have anyone to report to (well, they do have to report to the board and the shareholders of the company but they do not have those set up as employees in the day to day activities of the organization :-)).

So, let’s take a look at the table and the data that we will use to illustrate this: The hierarchy relationship example can also be applied to the employees of a company. Let’s say that the table name is EMP_MASTER with this structure:

CREATE TABLE EMP_MASTER
(
EMP_NBR INT NOT NULL PRIMARY KEY,
EMP_NAME VARCHAR(20) NOT NULL,
MGR_NBR INT NULL
)
GO

/******************************************
Now, let us insert some data into this
This uses the SQL Server syntax
*******************************************/

SET NOCOUNT ON

BEGIN TRY
BEGIN TRANSACTION
INSERT INTO EMP_MASTER VALUES (1, ‘DON’, 5);
INSERT INTO EMP_MASTER VALUES (2, ‘HARI’, 1);
INSERT INTO EMP_MASTER VALUES (3, ‘RAMESH’, 2);
INSERT INTO EMP_MASTER VALUES (4, ‘JOE’, 2);
INSERT INTO EMP_MASTER VALUES (5, ‘DENNIS’, NULL);
INSERT INTO EMP_MASTER VALUES (6, ‘NIMISH’, 5);
INSERT INTO EMP_MASTER VALUES (7, ‘JESSIE’, 6);
INSERT INTO EMP_MASTER VALUES (8, ‘KEN’, 7);
INSERT INTO EMP_MASTER VALUES (9, ‘AMBER’, 8);
INSERT INTO EMP_MASTER VALUES (10, ‘JIM’, 6);
INSERT INTO EMP_MASTER VALUES (11, ‘JOHN’, 5);
INSERT INTO EMP_MASTER VALUES (12, ‘LUTHER’, 11);
INSERT INTO EMP_MASTER VALUES (13, ‘GARY’, 11);
INSERT INTO EMP_MASTER VALUES (14, ‘ROBERTO’, 12);
INSERT INTO EMP_MASTER VALUES (15, ‘IRA’, 13);
COMMIT TRANSACTION
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
ROLLBACK TRANSACTION
SELECT
ERROR_NUMBER() AS ErrorNumber,
ERROR_SEVERITY() AS ErrorSeverity,
ERROR_STATE() AS ErrorState,
ERROR_PROCEDURE() AS ErrorProcedure,
ERROR_LINE() AS ErrorLine,
ERROR_MESSAGE() AS ErrorMessage;
END CATCH
GO

/******************************************
Let’s do a simple select to look
at the data
*******************************************/

SELECT * FROM dbo.EMP_MASTER

EMP_NBR     EMP_NAME             MGR_NBR

----------- -------------------- -----------

1           DON                  5

2           HARI                 1

3           RAMESH               2

4           JOE                  2

5           DENNIS               NULL

6           NIMISH               5

7           JESSIE               6

8           KEN                  7

9           AMBER                8

10          JIM                  6

11          JOHN                 5

12          LUTHER               11

13          GARY                 11

14          ROBERTO              12

15          IRA                  13

/******************************************
Let’s create a view which will use a
CTE in order to traverse through
the hierarchy
*******************************************/

IF EXISTS
(
SELECT 1
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = ‘emp_hierarchy’
AND TABLE_SCHEMA = ‘dbo’
)
BEGIN
PRINT ‘View emp_hierarchy already exists…Dropping it and recreating’
DROP VIEW emp_hierarchy
END
ELSE
BEGIN
PRINT ‘Creating View emp_hierarchy’
END
GO
create view dbo.emp_hierarchy
as
with emp (traverse_path, emp_id, mgr_id, emp_name, level, lineage)
as (
select CONVERT(VARBINARY(MAX), emp_name) AS traverse_path,
emp_nbr,
Null as mgr_id,
emp_name,
1,
cast((’/’ + str(emp_nbr,6,0) + ‘/’) as varchar(255))
from dbo.emp_master
where mgr_nbr is null
/*– the highest level in the organization*/

union all

select e.traverse_path + CONVERT(VARBINARY(MAX), em.emp_name) as traverse_path,
emp_nbr,
mgr_nbr,
em.emp_name,
e.level + 1,
cast((e.lineage + str(em.emp_nbr,6,0) + ‘/’) as varchar(255))
from dbo.emp_master as em
inner join emp e
on e.emp_id = em.mgr_nbr
)
select * from emp;
GO

If you look at the above view, it is only slightly different than the example that we had shown yesterday. The reason for the difference is that in yesterday’s example, we were handling a M:N recusrive relationship between the same entity and thus there were two tables to deal with while in today’s example, we are dealing with a single table and a 1:N recursive relationship. The logic remains exactly the same as before. We get the top dog in the company as the first record and then we do a UNION ALL operation to the recursive tree and use the traverse path as an ordering mechanism so that we can see the actual tree. The trick is to cast the employee name as a VARBINARY(MAX) and keep on appending it so that we can later on order by on it to see the complete tree structure for the organization. That helps us to see from the CEO - down to the SVP and their individual department immediately down from there till the deepest level is reached. And that is then followed by the next SVP and their hierarchy in the same fashion. And this continues that way to show a complete tree representation. The lineage column is used to just show the lineage of the tree.

You can see it by executing the following command:

select * from emp_hierarchy order by traverse_path
GO

Since the output is rather large, we will leave it up to you to execute these in your environment to see it. This is a quick and fast way to recursively build up hierarchy trees in SQL Server 2005 or Oracle and DB2 (with slight syntax differences). SQL Server 2008 also introduces a new data-type for hierarchy which we will cover in a later post.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Traversing the hierarchy - Part I

Posted by decipherinfosys on April 25, 2008

Traversing the hierarchy in the user data is a common requirement in most of the software products. You must have faced this requirement when you are:

a) building up a tree for navigation, or
b) whether you are recursively trying to get the employee hierarchy in a HR application module, or
c) whether you are establishing the tree structure for drugs in pharmaceutical inudstry, or
d) you are traversing the hierarchy of the different parts in a manufacturing industry.

Above are just some of the examples. There are many such places where one is required to write code to help with traversing the hierarchy from the top level down to the bottom. If you look at our previous post which explains how to design a M:N recursive relationship within a schema - that example pertains to the pharmaceutical industry. It shows how to model a recursive M:N relationship within an entity into physical tables. We will use that as an example to write up T-SQL (SQL Server) code to show how to traverse through that hierarchy. We will make use of CTEs (Common Table Expressions) which are also present in Oracle and DB2 LUW (with slight differences). In Oracle, one can also make use of the CONNECT BY clause in order to traverse the hierarchy - we will cover that in a later post.

Let us create the view which will have the CTE and the logic for recursion:

IF EXISTS
(
SELECT 1
FROM INFORMATION_SCHEMA.VIEWS
WHERE TABLE_NAME = ‘drug_hierarchy’
AND TABLE_SCHEMA = ‘dbo’
)
BEGIN
PRINT ‘View drug_hierarchy already exists…Dropping it and recreating’
DROP VIEW drug_hierarchy
END
ELSE
BEGIN
PRINT ‘Creating View drug_hierarchy’
END
GO
create view dbo.drug_hierarchy
as
with drug (traverse_path, drug_id, parent_drug_id, drug_name, level, lineage)
as (
select CONVERT(VARBINARY(MAX), drug_name) AS traverse_path,
drug_master_id,
Null as parent_drug_id,
drug_name,
1,
cast((’/’ + str(drug_master_id,6,0) + ‘/’) as varchar(255))
from dbo.drug_master
where drug_master_id not in (select child_drug_id from dbo.drug_rltn)
/* the highest level in the tree*/

union all

select d.traverse_path + CONVERT(VARBINARY(MAX), dm2.drug_name) as traverse_path,
dr.child_drug_id,
dr.parent_drug_id,
dm2.drug_name,
d.level + 1,
cast((d.lineage + str(dm.drug_master_id,6,0) + ‘/’) as varchar(255))
from dbo.drug_master as dm
inner join dbo.drug_rltn as dr
on dm.drug_master_id = dr.parent_drug_id
inner join dbo.drug_master as dm2
on dr.child_drug_id = dm2.drug_master_id
inner join drug as d
on d.drug_id = dm.drug_master_id
)
select * from drug;
GO

The logic of the view is that we first get the top record in the DRUG_MASTER table as the first record (this one is not a child of any drug - if such a record does not exist, just create a dummy one in your table) and then we do a UNION ALL operation to the recursive tree and use the traverse path as an ordering mechanism so that we can see the actual tree. The trick is to cast the drug name as a VARBINARY(MAX) and keep on appending it so that we can later on order by on it to see the complete tree structure for the drug hierarchy. That helps us to see from the top drug - down to the next one in alphabetical order and their individual drug components immediately down from there till the deepest level is reached. And that is then followed by the next drug under the parent drug and their hierarchy in the same fashion. And this continues that way to show a complete tree representation. The lineage column is used to just show the lineage of the tree.

You can see it by executing the following command:

select * from drug_hierarchy order by traverse_path
GO

Since the output is rather large, we will leave it up to you to execute these in your environment to see it. This is a quick and fast way to recursively build up hierarchy trees in SQL Server 2005 or Oracle and DB2 (with slight syntax differences). SQL Server 2008 also introduces a new data-type for hierarchy which we will cover in a later post.

Tomorrow, we will cover the second part of this post which will go into how to recursively traverse a tree hierarchy by using a 1:N recursive relationship in a table like the Employee structure in an organization.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Myths or so called “Rules of Thumb”

Posted by decipherinfosys on April 19, 2008

Those who have worked for some time in the IT industry know that there are many exceptions to the so called “Rules of Thumb”. That is why in many scenarios, if you ask an IT expert a question, the answer will be: It Depends :-) and no, the person is not trying to beat around the bush, he/she is actually trying to explain how the different situations effect that scenario and why the answer might be a different one depending upon a particular client situation. Yesterday, while having a discussion with one of the senior resources, he suggested that we should write something about such “Rules of Thumb” that exist in the DB world. We will list some of them in this post and then in subsequent posts, pick them one by one and show examples where the rules of thumb get violated. In addition, there are a lot of Myths out there which even senior consultants seem to propagate at client sites…here are some of them:

1) FTS (Full Table Scans) are always bad and Index usage is always good.

2) Usage of dynamic SQL within the stored procedure code is always bad even for search procedures.

3) Empty Space in an index that gets created due to the DML operations do not get used.

4) Indexes should be rebuilt at regular intervals.

5) Indexes and statistics are the same thing. Also, histograms are needed only on indexed columns.

6) Usage of cursors is always bad so avoid them like the plague.

7) Truncate command cannot be rolled back because it is a non-logged operation.

8 ) Table variables in SQL Server are always only memory resident.

9) Column order in a covered index does not matter.

10) In the case of SQL Server, one can separate the clustered index from the table.

11) Only committed data gets written to the disk.

12) Logical IOs (LIO) are not a cause of concern, only Physical IO (PIO) are.

13) Count(1) is better performing than count(*).

14) Issue frequent commits in the application to make the transaction faster and also improve concurrency.

15) Views are evil evil DB Objects that always slow down performance.

There are many many more myths that are out there. We just put down some of them over here and will start writing about each one in subsequent blogs giving examples and code samples to show the exceptions to the so called “Rules of Thumb” and the myths. One should always question the “experts” - everything should be proved using sample pieces of code if it sounds fishy. It is more of a science rather than an art guys and using code samples and understanding the internals and the boundary conditions is as important as understanding the context in which the recommendations are being applied.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Using GUID as the Primary Key

Posted by decipherinfosys on April 1, 2008

We have blogged before about how to go about choosing between a natural key vs a surrogate key when doing data modeling work.  One of the recent questions that was raised was about using a GUID as a primary key column.  The need arose from the requirement that the values needed to be unique throughout the enterprise and not just in a particular table which is what you get if you choose to go with the Identity scheme (SQL Server or DB2) or a Sequence (in Oracle or DB2).  One can come up with a number range design to implement this but that is fraught with danger and is very cumbersome to maintain.  So, the solution in those situations is using a GUID but do not use it as a primary key - instead create another column as a GUID and use it as an alternate key in the table.  Choosing GUID as a primary key column has a couple of dis-advantages:

  • The values are long and obscure. The values are random and cannot accept any patterns that may make them more meaningful to users.  The storage for these values is more.
  • There is no way to determine the sequence in which uniqueidentifier values were generated. They are not suited for existing applications that depend on serially incrementing key values.
  • At 16 bytes, the uniqueidentifier data type (in SQL Server) is relatively large compared to other data types such as 4-byte integers. This means indexes built using uniqueidentifier keys may be relatively slower than implementing the indexes using an int key since it has to traverse a larger number of pages for the joins as each page can fit lesser number of values.  This also means that the joins will be slower.

In such a scenario where enterprise wide uniqueness is needed, one solution would be to use INT (or if needed - BIGINT) data type for the primary key column and implement an alternate key column using the GUID to suffice the need for having a globally unique identifier value.  That way, you have the best of both worlds and will also be able to meet the requirements.

Posted in DB2 LUW, Data Model, Oracle, SQL Server | No Comments »

The IN-List Iterator Issue

Posted by decipherinfosys on February 21, 2008

We have seen this issue time and again in many client applications. Many applications are designed to just plug in different values into the IN Clause of the filter criteria for a given column (or if the RDBMS supports row value constructors, then the set of values). That is not a scalable design. You should be specifying a criteria that qualifies for those say 1000 values (region id values or the company ids) i.e. a status code or a date criteria or something that made you select those 1000 values/records to begin with.

If you do that, then the SQL:

X IN (1, 2, 3, 4, 5, 20, 30, 40, 50….)

will just be

X.Stat_code = :1
AND X.date_field >= :2 AND X.date_field <= :3

where :1, :2 and :3 are bind variables. The idea is that there must have been some reason (some criteria) due to which those 1000 values were selected to begin with…instead of preparing an IN list, parsing it and dividing it up into multiple sets, you should use that criteria itself…what if tomorrow, instead of say 1000, a client has a requirement where you have over a million such qualifying values? Your overhead will increase…internally, these COL1 IN (x, y, z) get translated to COL1 = x OR COL1 = y OR COL1 = z…you get the picture now.

Let’s talk about the IN-List iterator for a second. As you know, the queries that use the IN clause (with values) are internally converted (by the optimizer) into a series of queries using OR statements. The optimizer calculates the estimation of cardinality and looks for a long continguous sequence where expressions are all column=constant, such as (a=1 or a=10 or a=11, …). Those tests are run against every row in the table that matches the other filter criterias and join conditions (if those are present). RDBMS short-circuit the OR comparisons (meaning if it made a match on the 3rd item in the list it will stop testing instead of testing the other 96 items in the list), but it’s messy even if most of the values are caught by the first 20% of the list. Non-matching values will still be tested against every item in the IN clause. If you use an IN clause with many values to filter a large table, the amount of overhead can be brutal compared to simply lining two ordered lists up next to each other and plucking out the matches based on the filter criteria. Performance will dip even more when there are multiple users firing off similar queries that do these things.

IMHO (can be proven by comparing logical/physical reads, CPU usage and of course execution times), the best approach on any platform with an IN list of large size is two-fold :

The first option below is the best option…

1) The criteria that was used to qualify those N number of IN values should be used directly while forming the SQL –> that way the joins & meaningful filter criteria (and thus their indexes) can be used for faster retrieval of the data.

Either that, or if the above is not feasible for whatever reason, then the next best approach is :

2) To put the IN data into a work table (by using a DB function) and use a join. The DB function is in the FROM clause and hence acts as a table and pivots the data and joins it like a derived table/inline view.

Example:

SELECT
EmpName
FROM Emp e
INNER JOIN Split(’100,101,102,103,104,105′) d
ON e.EmpID = d.Value
WHERE e.salary > 50000

One of the optimizer improvements in DB2 version 8.2 and SQL Server 2005 is that it will flatten large IN lists and will at times turn them into worktables (simulating what I have mentioned above in option #2) but this is not always guaranteed. In Oracle as well, at times, it can opt to use USE_CONCAT or NO_EXPAND (depending upon a couple of other parameter settings) if it sees the cost of “OR Expansion” for the IN-LIST to be high but again that is not always the case since the optimizer has to consider the permutations before giving up the OR expansion and that itself can take a longer parse time.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Using a SELECT 1 vs a SELECT * in the EXISTS/NOT EXISTS query

Posted by decipherinfosys on February 13, 2008

Could not come up with a better title for this post. After reading some of our blog posts, one of our readers asked the question - “Why do you always make use of a “SELECT 1″ rather than a “SELECT *” when using an EXISTS or a NOT EXISTS clause”.

The answer is that during the query plan stages, the * is expanded to bind with the list of columns. And then since the semantics of the query is such (EXISTS or NOT EXISTS), since it does not need any of those columns, it will remove all of them. Using a SELECT 1, avoids having to look at any of the meta-data that is not even needed for that table during query compilation time.

Please do NOTE that at runtime BOTH of them will have the same plans. We can quickly check that with an example (using SQL Server Syntax):

CREATE TABLE TBL1 (COL1 INT IDENTITY PRIMARY KEY, COL2 INT)
GO
CREATE TABLE TBL2 (COL3 INT IDENTITY PRIMARY KEY, COL4 INT, COL5 INT, COL6 INT, COL7 INT, COL8 INT, COL9 INT, COL10 INT, CONSTRAINT FK_TBL2_TO_TBL1 FOREIGN KEY (COL4) REFERENCES TBL1(COL1))
GO

CREATE INDEX TBL2_IND_1 ON TBL2 (COL4)
GO

SET NOCOUNT ON
GO
DECLARE @I INT, @J INT
SELECT @I = 1, @J = 1000
WHILE (@I <= @J)
BEGIN
INSERT INTO TBL1 (COL2) VALUES (@I)

IF @I%7 = 0
INSERT INTO TBL2 (COL4, COL5) VALUES (@I, @I + 10)

SET @I = @I + 1
END
GO

SET SHOWPLAN_TEXT OFF
GO

SELECT * FROM TBL1 AS A
WHERE EXISTS (SELECT * FROM TBL2 WHERE COL4 = A.COL1)

———————————————————————————————————————————————————————–
|–Merge Join(Left Semi Join, MERGE:([A].[COL1])=([master].[dbo].[TBL2].[COL4]), RESIDUAL:([master].[dbo].[TBL2].[COL4]=[master].[dbo].[TBL1].[COL1] as [A].[COL1]))
|–Clustered Index Scan(OBJECT:([master].[dbo].[TBL1].[PK__TBL1__17E28260] AS [A]), ORDERED FORWARD)
|–Index Scan(OBJECT:([master].[dbo].[TBL2].[TBL2_IND_1]), ORDERED FORWARD)

SET SHOWPLAN_TEXT OFF
GO

DBCC FREEPROCCACHE
GO

SET SHOWPLAN_TEXT ON
GO

SELECT * FROM TBL1 AS A
WHERE EXISTS (SELECT 1 FROM TBL2 WHERE COL4 = A.COL1)

StmtText
———————————————————————————————————————————————————————–
|–Merge Join(Left Semi Join, MERGE:([A].[COL1])=([master].[dbo].[TBL2].[COL4]), RESIDUAL:([master].[dbo].[TBL2].[COL4]=[master].[dbo].[TBL1].[COL1] as [A].[COL1]))
|–Clustered Index Scan(OBJECT:([master].[dbo].[TBL1].[PK__TBL1__17E28260] AS [A]), ORDERED FORWARD)
|–Index Scan(OBJECT:([master].[dbo].[TBL2].[TBL2_IND_1]), ORDERED FORWARD)

As you can see from above, the execution plan at runtime are exactly the same regardless of whether a “SELECT 1″ is used or a “SELECT *” is used. We use SELECT 1 because of the reason mentioned in the start of this post.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »