Systems Engineering and RDBMS

Archive for the 'Oracle' Category


Benchmarks and clearing the cache

Posted by decipherinfosys on July 17, 2008

This is one topic on which I hold a slightly different opinion from my colleagues. I have seen in every shop that I have been to that they clear out the cache prior to doing another benchmark run - whether they are using Oracle or SQL Server.  In SQL Server, one could use the DBCC commands to flush out the cache - DBCC FREEPROCCACHE would do that for you.  It is granular enough to allow you to even remove a specific plan if you provide it with a plan_handle or a sql_handle.  What this command does is that it clears out the plan cache and say you have a stored procedure that you are benchmarking, it would cause it to recompile since there is nothing in the cache.  Some people while doing benchmarking do this for every iteration of the benchmark in order to get “real world” numbers.  Ask yourself, do you clear out the cache in your production system as well every day?  Plan re-use and changing of the plans as one provides different input values and as one puts on the load on the system and increases up the data volume is what (IMHO) needs to be tracked rather than trying to clear out the cache each time to make a VP happy :-)

Other forms of such DBCC commands are DBCC FREESESSIONCACHE and DBCC FREESYSTEMCACHE.   You can look up more information on the syntax and their usage in BOL.

In Oracle, prior to Oracle 10g, in Oracle 9i, one could use the “alter tablespace offline/online” command to flush the buffer cache of the blocks that pertain to that tablespace.  In Oracle 10g and 11g, there is an alter system command to flush out the cache:

alter system flush buffer_cache;

Benchmarks should reflect reality and unless your benchmark consists of a single query where you want to not account for what is in the cache, I don’t see any reason for using these commands in a benchmark.  In a real world, the buffer cache is never 100% empty.  So, where are these commands useful then?  They would be useful for a developer/DBA who have their own instance and are doing performance testing on some code.

Posted in Oracle, SQL Server | No Comments »

Even Oracle launches a BI application for iPhone

Posted by decipherinfosys on July 15, 2008

Oracle released an iPhone application that enables the end users to access the analytics using OBIEE (Oracle Business Intelligence Enterprise Edition). You can read more on this here.

Posted in Oracle | No Comments »

More on Index design

Posted by decipherinfosys on July 14, 2008

We have blogged a couple of times over the factors that should be considered when designing indexes. In one of such posts, we had also covered indexes over multiple columns. In all of those posts, you will see one common thing that we try to emphasize - do not try to come up with a “rules of thumb“.  Here is a simple example (from one of the e-mails to a question to a reader);

Question: I am trying to see how to tune/optimize a query and I saw that an index on the where condition columns is missing.  Should I just go ahead and create an index and add all the columns that are used in the where clause into that index?

The answer is of course: NO.  First, we requested the reader to go ahead and read the post from above where we have discussed the column order in covered indexes and then also gave a simple example to help illustrate the point:

Say, you have this query:

select col1, col2, col3

from dbo.big_table

where col1 >= @x and col1 <= @y

and col3 = @z

For this, we might want to create a covered index (in this order): col3, col2, col1.  This is so that we can first of all process the data based on the equality operator on col3, then scan through and narrow down the data based on the >= and <= conditions on col1 and then get the value for col2 from the index itself rather than going through the table.  Now, as we have always stated, understanding your data and it’s characteristics is very very important.  Suppose that col3 is unique in this table!  In that case, we would just create a single index on col3 and be done with it.  Why?  It will qualify for a single record and thus we will just do the row ID lookup on the table for that one single record.

So, bottom line is that please spend some time to understand the data characteristics - understand how indexes work, understand how the query is accessing the data and the data distribution - density and selectivity of the column data values as well as how often that particular index is going to be used - if an index is being created to support a report that runs only once every 6 months, is it worth to have it in the schema all year round or should we just create it prior to generating that report?  Once you have that understanding, then only decide how you want to approach the design.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Oracle 11g vs SQL Server 2008 comparison

Posted by decipherinfosys on July 8, 2008

Here is a whitepaper from Microsoft about comparisons between SQL Server 2008 and Oracle 11g:

http://www.microsoft.com/sqlserver/2008/en/us/sql2008-oracle11g.aspx

It’s from Microsoft so obviously the comparison is more geared towards favoring how well SQL Server 2008 is positioned as compared to Oracle 11g.  Oracle has a similar comparison whitepaper but with SQL Server 2005.  It will be interesting to see and read their comparison with SQL Server 2008 whenever that is made public.

Posted in Oracle, SQL Server | No Comments »

Flashback Versions Query

Posted by decipherinfosys on July 8, 2008

Before Oracle 10g, it was not possible to view series of changes made to the table in the past. Oracle introduced ‘Flashback Query’ feature in oracle 9i, which gave a view of the table at very specific time in the past. To view all the changes made to a specific row between two time intervals, Oracle introduced ‘Flashback Versions Query’ in 10g. Using this feature, we can see all the versions of the row (changes made to the row) between specific time intervals. Whenever commit happens, new version of row gets created.

First let us create a table. If you already have table with the same name, change all the occurrences of the table name with some other name.

CREATE TABLE TEST_AC
(
TEST_AC_ID NUMBER(9) NOT NULL,
TRAN_DATE DATE,
TRAN_AMT NUMBER(9,2) NOT NULL,
CONSTRAINT PK_TEST_AC PRIMARY KEY(TEST_AC_ID)
)
– TABLESPACE Clause
/

Populate it with some data so that we can test our flashback query.

INSERT INTO TEST_AC(test_ac_id,tran_date,tran_amt) VALUES(101,sysdate,5000);
COMMIT;
UPDATE TEST_AC SET TRAN_AMT = TRAN_AMT + 2000 WHERE TEST_AC_ID = 101;
COMMIT;
UPDATE TEST_AC SET TRAN_AMT = TRAN_AMT - 5000 WHERE TEST_AC_ID = 101;
COMMIT;
UPDATE TEST_AC SET TRAN_AMT = TRAN_AMT + 200 WHERE TEST_AC_ID = 101;
COMMIT;

Flashback versions query uses VERSIONS BETWEEN clause to return each ‘version of the row’ for a specific time interval along with some other pseudo columns. Pseudo columns are

* Versions_StartSCN – Starting SCN of a row version.
* Versions_EndSCN – SCN when a row version got expired.
* Versions_StartTime – Starting time of a row version.
* Versions_EndTime – Ending time when a row version got expired.
* Versions_XID – transaction ID that created a row version
* Versions_Operation – Insert/ Update/ Delete operation performed by a transaction.

For a detailed explanation of pseudo column, please refer to the Oracle Database Application Developer Guide for 10g.

Run following query to see the result set.

SELECT versions_Startscn, versions_endscn,
versions_xid, CASE
WHEN versions_operation = ‘I’ THEN ‘INSERT’
WHEN versions_operation = ‘U’ THEN ‘UPDATE’
WHEN versions_operation = ‘D’ THEN ‘DELETE’
END AS Operation,
tran_amt
FROM test_ac
VERSIONS BETWEEN TIMESTAMP minvalue AND maxvalue
ORDER BY versions_Startscn;

Output is shown below.

VERSIONS_STARTSCN VERSIONS_ENDSCN VERSIONS_XID     OPERAT   TRAN_AMT
----------------- --------------- ---------------- ------ ----------
          4168035         4168037 07001500F6070000 INSERT       5000
          4168037         4168039 09002C00420A0000 UPDATE       7000
          4168039         4168042 04002600C8070000 UPDATE       2000
          4168042                 0800030075090000 UPDATE       2200

We can also query the versions_starttime and versions_endtime instead of SCN. Run following query to see the creation and expiration time of each row version. Also instead of minvalue and maxvalue, we can use interval same way as we use it in regular flash back query. Following is the example.

SELECT versions_Starttime, versions_endtime,
versions_xid, CASE
WHEN versions_operation = ‘I’ THEN ‘INSERT’
WHEN versions_operation = ‘U’ THEN ‘UPDATE’
WHEN versions_operation = ‘D’ THEN ‘DELETE’
END AS Operation,
tran_amt
FROM test_ac
VERSIONS BETWEEN TIMESTAMP systimestamp - interval ‘10′ minute and systimestamp
ORDER BY versions_Startscn;

In above query, we are requesting all the changes made to the row in last 10 minutes. Output is shown below.

VERSIONS_STARTTIME        VERSIONS_ENDTIME      VERSIONS_XID     OPERAT   TRAN_AMT
------------------------- --------------------- ---------------- ------ ----------
22-FEB-07 10.52.02 AM     22-FEB-07 10.52.02 AM 07001500F6070000 INSERT       5000
22-FEB-07 10.52.02 AM     22-FEB-07 10.52.02 AM 09002C00420A0000 UPDATE       7000
22-FEB-07 10.52.02 AM     22-FEB-07 10.52.08 AM 04002600C8070000 UPDATE       2000
22-FEB-07 10.52.08 AM                           0800030075090000 UPDATE       2200

Flashback Transaction Query

Using flashback transaction query, we can obtain transaction information including SQL code fired by transaction, to undo the changes made by transaction. A flashback transaction query is a query on the view FLASHBACK_TRANSACTION_QUERY. We can use versions_xid column from above queries to query the view and obtain the transaction information. Run following query to get the transaction details. Output is shown below the query.

SQL>SELECT operation, logon_user, undo_Sql
FROM flashback_transaction_query where xid = ‘09002C00420A0000′;

OPERATION  LOGON_USER UNDO_SQL
---------- ---------- ------------------------------------------------------------
UPDATE     DECIPHER   update "DECIPHER"."TEST_AC" set "TRAN_AMT" = '7000' where
                      ROWID = 'AAANtqAAEAAAAHGAAA';

If you look at UNDO_SQL column carefully, you will see that update statement, sets the value to 5000 and not to 7000 because it displays the SQL to undo the changes made by the transaction. In this case, transaction changed value of tran_amt column from 5000 to 7000 and hence UNDO_SQL column shows the SQL to revert back the change. Logon_user column shows the user responsible for the change.

Warning

Flashback query uses Oracle’s multiversion read-consistency to retrieve the data by applying undo as needed. So data will be available only for the time specified by UNDO_RETENTION parameter in the database. It will not return the historical data, if time difference exceeds the time defined by UNDO_RETENTION parameter. On our database value for this parameter is set as 900 (15 minutes). So if we run the same query after 15 minutes, we don’t get anything back. Following is the example.

SELECT versions_Starttime, versions_endtime,
versions_xid, CASE
WHEN versions_operation = ‘I’ THEN ‘INSERT’
WHEN versions_operation = ‘U’ THEN ‘UPDATE’
WHEN versions_operation = ‘D’ THEN ‘DELETE’
END AS Operation,
tran_amt
FROM test_ac
VERSIONS BETWEEN TIMESTAMP minvalue AND maxvalue
ORDER BY versions_Startscn;

VERSIONS_STARTTIME        VERSIONS_ENDTIME      VERSIONS_XID     OPERATION    TRAN_AMT
------------------------- --------------------- ---------------- ---------- ----------
                                                                            2200

Also as per Oracle documentation, specify the RETENTION GUARANTEE clause for the undo tablespace to ensure that unexpired undo is not discarded.

Posted in Oracle | No Comments »

Getting the first and the last record together

Posted by decipherinfosys on June 27, 2008

While fixing a performance issue with a customer report yesterday, there was a unique requirement that came up. As the items in the warehouse move through the aisles, their datetime timestamp values get changed. In one of the sub-reports, the requirement was to display both the latest record as well as the oldest record in the system i.e. both the first and the last record based on the timestamp column. Also, another thing to add to this was that if the location for the item was not decided yet, then the those items needed to be ranked lower then those items which have a location assigned to them.

There are a couple of ways to resolve this kind of a scenario using SQL. We will present the solution in this post using T-SQL syntax though the same applies (with brief changes to Oracle and DB2 LUW as well - another thing to note also is that Oracle already has analytic functions: first_value() and last_value() that can be used with window functions to easily get these results).

Even though the actual query had a lot of table joins, for the sake of simplicity and to help explain the concept and protect the IP of the client, we will take up an example of a table variable:
set nocount on
go
declare @inventory table (item_id int not null, location_id int null, create_date_time datetime not null)
insert into @inventory values (1, null, cast(’06/27/2007 12:00:00′ as datetime))
insert into @inventory values (2, 10, cast(’06/27/2007 12:15:00′ as datetime))
insert into @inventory values (3, 20, cast(’06/26/2007′ as datetime))
insert into @inventory values (4, 30, cast(’06/21/2007′ as datetime))
insert into @inventory values (5, null, cast(’06/27/2007 1:00:00′ as datetime))
insert into @inventory values (6, 50, cast(’06/23/2007′ as datetime))
insert into @inventory values (7, 70, cast(’06/24/2007′ as datetime))
insert into @inventory values (8, 80, cast(’06/25/2007′ as datetime))
insert into @inventory values (9, 40, cast(’06/27/2007 3:00:00′ as datetime))
insert into @inventory values (10, null, cast(’06/27/2007 12:30:00′ as datetime))

select ‘***ALL Records***’, * from @inventory

                  item_id     location_id create_date_time
----------------- ----------- ----------- -----------------------
***ALL Records*** 1           NULL        2007-06-27 12:00:00.000
***ALL Records*** 2           10          2007-06-27 12:15:00.000
***ALL Records*** 3           20          2007-06-26 00:00:00.000
***ALL Records*** 4           30          2007-06-21 00:00:00.000
***ALL Records*** 5           NULL        2007-06-27 01:00:00.000
***ALL Records*** 6           50          2007-06-23 00:00:00.000
***ALL Records*** 7           70          2007-06-24 00:00:00.000
***ALL Records*** 8           80          2007-06-25 00:00:00.000
***ALL Records*** 9           40          2007-06-27 03:00:00.000
***ALL Records*** 10          NULL        2007-06-27 12:30:00.000

select top 1 ‘***LAST RECORD***’ as Last_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time desc

Last_Record       item_id     location_id create_date_time
----------------- ----------- ----------- -----------------------
***LAST RECORD*** 2           10          2007-06-27 12:15:00.000

select top 1 ‘***FIRST RECORD***’ as First_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time asc

First_Record       item_id     location_id create_date_time
------------------ ----------- ----------- -----------------------
***FIRST RECORD*** 4           30          2007-06-21 00:00:00.000

If you look at the output from above, you can see that the last record that is picked up by the logic is Item #2 even though Item #10 has the latest create_date_time timestamp value. The reason for this is that we had the requirement that if an item has not been assigned a location yet, that should rank lower. So, the above 2 SQL statements are just to show how we can get the first and the last records in a SQL statement. Now, in the report, we needed to get these two together and as you know, if one uses the UNION or UNION ALL clause between two statements, one can use only one ORDER BY clause which applies to the whole set. So, it is not as straight forward as just doing a UNION/UNION ALL operation between the two sets and applying an ORDER BY.

One of the solutions is to use derived tables:

select LR.* from
(select top 1 ‘***LAST RECORD***’ as Last_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time desc) as LR
UNION ALL
select FR.* from
(select top 1 ‘***FIRST RECORD***’ as First_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time asc) as FR

Last_Record        item_id     location_id create_date_time
------------------ ----------- ----------- -----------------------
***LAST RECORD***  2           10          2007-06-27 12:15:00.000
***FIRST RECORD*** 4           30          2007-06-21 00:00:00.000

Another solution is to use CTE (Common Table Expressions):

with results as
(select top 1 ‘***LAST RECORD***’ as Last_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time desc
UNION ALL
select top 1 ‘***FIRST RECORD***’ as First_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time asc
)
select * from results;

Last_Record        item_id     location_id create_date_time
------------------ ----------- ----------- -----------------------
***LAST RECORD***  2           10          2007-06-27 12:15:00.000
***FIRST RECORD*** 4           30          2007-06-21 00:00:00.000

Yet another solution can be the usage of the row_number() analytic function, for example:

select item_id, location_id, create_date_time
from
(
select
*
, row_number() over (order by case when location_id is null then 1 else 0 end ASC
,create_date_time desc) as RN_1
, row_number() over (order by case when location_id is null then 1 else 0 end ASC
,create_date_time asc) as RN_2
from @inventory) as IV
where RN_1 = 1 or RN_2 = 1

item_id     location_id create_date_time
----------- ----------- -----------------------
4           30          2007-06-21 00:00:00.000
2           10          2007-06-27 12:15:00.000

There are other solutions as well to this problem. One needs to test these out in one’s scenario and make sure that you have meaningful filter criterias and good execution plans. The next challenge in this report was that the project manager wanted these records listed side by side on the same record. That is pretty simple to do as well - using either pivoting or even simple join between derived tables with a match on the row_number() analytic function value. We leave that as an exercise to the reader and if you have questions, we will be more than happy to post the answer to that as well.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Learning Oracle

Posted by decipherinfosys on June 20, 2008

One of our junior team members posed this question some time ago. He was hired as a SQL Server Developer/DBA and wanted to now expand his knowledge and also learn Oracle and Unix. This is one of the common questions from many folks even at client sites so we thought it will be good to post the information on the blog on what we consider to be a good way to learn Oracle. Here goes:

1) The starting point is either Oracle documentation (the Oracle Database Concepts) or a good book like the ones from Tom Kyte. This is because if you are a SQL Server or a DB2 DBA, Oracle is pretty different than what you are used to so getting the concepts right is the first step in the process. Here are the links:

Oracle 10g Database Concepts Documentation, Oracle 11g Database Concepts Documentation, Tom Kyte’s book on Oracle Database and Architecture, Tom Kyte’s book on Effective Oracle by Design and Steven Feuerstein’s book on PL/SQL programming.

Also, download the Oracle 10g Express Edition from here and install it on your system since the only way to learn is to play with the product. 11g Express Edition is not available right now.

The download of the express edition and the install is very straight forward and the download is pretty small - around 160MB only.

2) Participate in forums. That is the way to learn - since you will learn from others and others will learn from you. Since you are just starting in this area, chances are that the issues that you will be facing, someone else also faced them while going through the same path and can help you out. Be courteous and follow the forum etiquettes properly. Here are some good forums:

Oracle-l: This is an excellent list. If you take a look at the list, there are contributions from people like Jonathan Lewis, Cary Millsap, Pete Sharman, Niall Litchfield, Tim Gorman, Tom Kyte and many other respected Oracle professionals.

Oracle Users Co-operative FAQ: This is maintained by Jonathan Lewis and you will find a lot of good information in these FAQs.

OTN Discussion Forums: These are like the MSDN forums in the case of Microsoft. You might even get the actual product managers or the development leads or support analysts to directly help with your question(s).

Oracle Metalink: This has a wealth of information. When working on Oracle projects or even when helping colleagues with their Oracle issues, we use this very extensively. You will need to register for this one.

IOUG (Independent Oracle Users Group): This is not so good on the forums side of things but is excellent for networking, for conferences and for the technical content on the site.

3) Some other web sites:

Tom Kyte’s site is one of the best - you will find answers to most of your questions over there and if you do not, you can post it out there but search the forum first to see whether it has already been answered before.

Ora FAQ’s: There is a wealth of information available in these FAQs. And there are a lots of very good groups and categories.

Don Burleson’s site: Read the newsletters and subscribe to them. A lot of very good tips and tricks for the DBAs and Database Developers.

This list is by no means complete but for a person starting in Oracle or trying to take his/her skills to the next level, this is all you need and then a lots of practice. Keep at it and if we can help, let us know. If you would like us to cover certain topics that you would like to learn, we would be more than happy to do that on our blog or as a whitepaper. We have not started a full fledged forum yet to answer questions because of time constraints but we do plan to do that in the near future. So, keep those questions coming through e-mails to: info@decipherinfosys.com

Posted in Oracle | 1 Comment »

Common Problem - getting the first in a Group

Posted by decipherinfosys on June 18, 2008

This is one of the very common problems in the SQL world and also one that has many different solutions available. The problem description is this:

You have a table with say this structure (using SQL Server Syntax):

set nocount on
go
declare @test table (store_nbr int, product_name nvarchar(20), product_date datetime, aisle_nbr int)
insert into @test values (1, ‘A’, ‘8-12-2004′, 6)
insert into @test values (1, ‘B’, ‘7-10-2005′, 2)
insert into @test values (1, ‘C’, ‘6-11-2006′, 3)
insert into @test values (1, ‘D’, ‘11-10-2007′, 4)
insert into @test values (2, ‘E’, ‘10-12-2004′, 2)
insert into @test values (2, ‘F’, ‘5-9-2005′, 4)

And what you want to find out is that record in each store which was introduced the last so essentially in each group (i.e. each store), you want to find out the product that was introduced most recently. There are a lot of ways to solve this problem and the solutions vary among RDBMS based on syntax only. One can use analytic functions to resolve this, use a derived table or use a co-related sub-query. Here is one way to solve this issue:

select * from @test A
where A.product_date = (select top 1 B.product_date
from @test as B
where A.store_nbr = B.store_nbr
order by product_date desc)

store_nbr   product_name         product_date            aisle_nbr
----------- -------------------- ----------------------- -----------
1           D                    2007-11-10 00:00:00.000 4
2           F                    2005-05-09 00:00:00.000 4

In this code, we are sorting based on the product date in a descending order in the co-related sub-query and getting the equality match based on the TOP 1 record. We could have also used the ROW_NUMBER() function as well, example:

select store_nbr, product_name, product_date, aisle_nbr
from
(select ROW_NUMBER() over (partition by store_nbr order by product_date desc) as rn, *
from @test A
) as IV
where IV.rn = 1

One can also choose to do a MAX operation in the co-related sub-query, example:

select * from @test as A
where A.product_date = (select MAX(B.product_date)
from @test as B
where A.store_nbr = B.store_nbr)

And in SQL Server, one can also use the CROSS APPLY function in SQL Server 2005.  We will look at the execution plans and the performance characteristics of these different options in an upcoming post.

Posted in DB2 LUW, Oracle, SQL Server | No Comments »

Data Migration Scenario - CTE solution

Posted by decipherinfosys on June 12, 2008

A few days ago, we had posted the problem and the solution pertaining to a data migration issue. We had given some solutions and explained one of those in detail along with the code. A reader asked us to do the same thing using the CTE which was one of the solutions that we had proposed. Using a CTE, there are a couple of ways to get it done. Refer to the tables and the data in that post before you use this code. Here is the CTE code and the explanation follows after that:

with
N1 (n) AS (SELECT 1 UNION ALL SELECT 1),
N2 (n) AS (SELECT 1 FROM N1 AS X, N1 AS Y),
N3 (n) AS (SELECT 1 FROM N2 AS X, N2 AS Y),
N4 (n) AS (SELECT 1 FROM N3 AS X, N3 AS Y),
N5 (n) AS (SELECT ROW_NUMBER() OVER(ORDER BY X.n%10) FROM N4 AS X, N4 AS Y)
select p.ID as ItemID,
row_number() over (partition by p.id order by n) as LineItem,
substring(p.profession, case when n = 1 then 1 else n + 1 end, 10) as Profession
from N5
join dbo.profession as p
on N5.n <= Len(p.Profession)
where n <= 100
and (n = 1 or n%10 = 0)
order by p.id, n

ItemID      LineItem             Profession
----------- -------------------- ----------
1           1                    I am a Pro
1           2                    grammer an
1           3                    d a baseba
1           4                    ll Champio
1           5                    n and a fo
1           6                    otball Pla
1           7                    yer
2           1                    I am a Doc
2           2                    tor
3           1                    I am an Ar
3           2                    tist

The above piece of code first constructs a number table using the CTE (this was introduced by Itzik in his SQL column in the sqlmag) and then it joins with the source table Profession and slices the string up and shows up the data in the format that we needed for the destination table.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Data Migration Scenario

Posted by decipherinfosys on June 9, 2008

Over the weekend, a colleague had asked a question regarding one of the data migration issues that he was facing in one of the projects. We will mention the problem as well as the solution in this blog post.

Problem: He had a source table with the following structure and data (ID was an integer column with Identity attribute and Description was a string column with a max length of 100):

SOURCE TABLE

ID DESCRIPTION
1 I am a programmer
2 I am a doctor

In the destination database, he had a destination table with the same columns except that the Description column was only 10 characters long. His task was to load the records from the source to the destination table. If the length of the description was more than 10 characters, he was required to insert another row with the next 10 characters into the Destination table with a LineId of next incrementing number as follows.

Destination Table

ID LINEID DESCRIPTION
1 1 I am a pro
1 2 grammer
2 1 i am a doc
2 2 tor

Solution: There are many ways to solve this problem. Since he was using plain simple T-SQL and not SSIS package for the flow, we presented 4 solutions to the problem - before we present them in the blog, a quick glance at the example that he had given indicates that the problem is essentially the same as un-pivoting the data and transposing the column into rows. Since the maximum length of the source column is known before hand - nvarchar(100) … and the destination’s string max length is nvarchar(10), we can at the most get 10 lineID values for each ID record from the source. The problem would have been more complex if we were getting the data from a source where the max length was not known (we have seen such requirements as well) and in that case dynamic un-pivoting needs to be done. So, with that in mind, here are the solutions:

1) Use static union all lists since you already know the source column length (100) that goes into a maximum length of 10…so, it translates to 10 different SQLs with UNION ALL operations between them.

Or…

2) Use a CTE (Common Table Expression) to do recursive operation to break the source description column into separate line items.

Or…

3) Since the maximum length of the source is known, the issue translates to being able to unpivot the data i.e. treat that one single column a the source and un-pivot the column into rows (look at our whitepaper on pivot and unpivot).

Or…

4) The old way of using a cursor (or a while loop), looping around and creating the new records.

Let’s create the dummy tables and some data and then we will use #1 method to show how we can do this in a simple SQL statement (using SQL Server Syntax):

/********************************************************************************
create some dummy tables
Profession is the Source table and ProfessionItem is the destination table
*********************************************************************************/
create table Profession (Id int identity(1,1), Profession nvarchar(100))
create table ProfessionItem (ItemId int, LineItem int,Profession nvarchar(10))

/********************************************************************************
insert some dummy data
*********************************************************************************/
insert into Profession (Profession)
values (N’I am a Programmer and a baseball Champion and a football Player’);

insert into Profession (Profession)
values (N’I am a Doctor’);

insert into Profession (Profession)
values (N’I am an Artist’);

Now, a simple SQL statement using the method from #1 option above will give us the data in the required format:

Select *
from
(
SELECT P.Id as ItemID, 1 as LineItem, SUBSTRING(Profession, 1, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 2 as LineItem, SUBSTRING(Profession, 11, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 3 as LineItem, SUBSTRING(Profession, 21, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 4 as LineItem, SUBSTRING(Profession, 31, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 5 as LineItem, SUBSTRING(Profession, 41, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 6 as LineItem, SUBSTRING(Profession, 51, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 7 as LineItem, SUBSTRING(Profession, 61, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 8 as LineItem, SUBSTRING(Profession, 71, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 9 as LineItem, SUBSTRING(Profession, 81, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 10 as LineItem, SUBSTRING(Profession, 91, 10) as Profession
FROM dbo.Profession as P
) AS X
Where LEN(X.Profession) > 0
Order by X.ItemID, X.LineItem

ItemID      LineItem    Profession
----------- ----------- ----------
1           1           I am a Pro
1           2           grammer an
1           3           d a baseba
1           4           ll Champio
1           5           n and a fo
1           6           otball Pla
1           7           yer
2           1           I am a Doc
2           2           tor
3           1           I am an Ar
3           2           tist

A point to note is that if the source table is huge, then we can break it up in sets of 10000 based on the ID values and use that criteria in a while loop and put the ID fitler criteria in the WHERE clause of the SQL statement shown above. Since ID will be an indexed column (clustered index), the range scans will be fine and we can quickly get the data from the source to the destination table using this method.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »