Systems Engineering and RDBMS

Data Migration Scenario

Posted by decipherinfosys on June 9, 2008

Over the weekend, a colleague had asked a question regarding one of the data migration issues that he was facing in one of the projects. We will mention the problem as well as the solution in this blog post.

Problem: He had a source table with the following structure and data (ID was an integer column with Identity attribute and Description was a string column with a max length of 100):

SOURCE TABLE

ID DESCRIPTION
1 I am a programmer
2 I am a doctor

In the destination database, he had a destination table with the same columns except that the Description column was only 10 characters long. His task was to load the records from the source to the destination table. If the length of the description was more than 10 characters, he was required to insert another row with the next 10 characters into the Destination table with a LineId of next incrementing number as follows.

Destination Table

ID LINEID DESCRIPTION
1 1 I am a pro
1 2 grammer
2 1 i am a doc
2 2 tor

Solution: There are many ways to solve this problem. Since he was using plain simple T-SQL and not SSIS package for the flow, we presented 4 solutions to the problem – before we present them in the blog, a quick glance at the example that he had given indicates that the problem is essentially the same as un-pivoting the data and transposing the column into rows. Since the maximum length of the source column is known before hand – nvarchar(100) … and the destination’s string max length is nvarchar(10), we can at the most get 10 lineID values for each ID record from the source. The problem would have been more complex if we were getting the data from a source where the max length was not known (we have seen such requirements as well) and in that case dynamic un-pivoting needs to be done. So, with that in mind, here are the solutions:

1) Use static union all lists since you already know the source column length (100) that goes into a maximum length of 10…so, it translates to 10 different SQLs with UNION ALL operations between them.

Or…

2) Use a CTE (Common Table Expression) to do recursive operation to break the source description column into separate line items.

Or…

3) Since the maximum length of the source is known, the issue translates to being able to unpivot the data i.e. treat that one single column a the source and un-pivot the column into rows (look at our whitepaper on pivot and unpivot).

Or…

4) The old way of using a cursor (or a while loop), looping around and creating the new records.

Let’s create the dummy tables and some data and then we will use #1 method to show how we can do this in a simple SQL statement (using SQL Server Syntax):

/********************************************************************************
create some dummy tables
Profession is the Source table and ProfessionItem is the destination table
*********************************************************************************/
create table Profession (Id int identity(1,1), Profession nvarchar(100))
create table ProfessionItem (ItemId int, LineItem int,Profession nvarchar(10))

/********************************************************************************
insert some dummy data
*********************************************************************************/
insert into Profession (Profession)
values (N’I am a Programmer and a baseball Champion and a football Player’);

insert into Profession (Profession)
values (N’I am a Doctor’);

insert into Profession (Profession)
values (N’I am an Artist’);

Now, a simple SQL statement using the method from #1 option above will give us the data in the required format:

Select *
from
(
SELECT P.Id as ItemID, 1 as LineItem, SUBSTRING(Profession, 1, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 2 as LineItem, SUBSTRING(Profession, 11, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 3 as LineItem, SUBSTRING(Profession, 21, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 4 as LineItem, SUBSTRING(Profession, 31, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 5 as LineItem, SUBSTRING(Profession, 41, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 6 as LineItem, SUBSTRING(Profession, 51, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 7 as LineItem, SUBSTRING(Profession, 61, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 8 as LineItem, SUBSTRING(Profession, 71, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 9 as LineItem, SUBSTRING(Profession, 81, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 10 as LineItem, SUBSTRING(Profession, 91, 10) as Profession
FROM dbo.Profession as P
) AS X
Where LEN(X.Profession) > 0
Order by X.ItemID, X.LineItem

ItemID      LineItem    Profession
----------- ----------- ----------
1           1           I am a Pro
1           2           grammer an
1           3           d a baseba
1           4           ll Champio
1           5           n and a fo
1           6           otball Pla
1           7           yer
2           1           I am a Doc
2           2           tor
3           1           I am an Ar
3           2           tist

A point to note is that if the source table is huge, then we can break it up in sets of 10000 based on the ID values and use that criteria in a while loop and put the ID fitler criteria in the WHERE clause of the SQL statement shown above. Since ID will be an indexed column (clustered index), the range scans will be fine and we can quickly get the data from the source to the destination table using this method.

2 Responses to “Data Migration Scenario”

  1. […] Data Migration Scenario […]

  2. […] by Microsoft MVP Itzik and that we have also covered in our posts before – here and here.   So, here goes: with N1 (n) AS (SELECT 1 UNION ALL SELECT 1), N2 (n) AS (SELECT 1 FROM N1 AS […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: