Data Migration Scenario
Posted by decipherinfosys on June 9, 2008
Over the weekend, a colleague had asked a question regarding one of the data migration issues that he was facing in one of the projects. We will mention the problem as well as the solution in this blog post.
Problem: He had a source table with the following structure and data (ID was an integer column with Identity attribute and Description was a string column with a max length of 100):
SOURCE TABLE
ID DESCRIPTION
1 I am a programmer
2 I am a doctor
In the destination database, he had a destination table with the same columns except that the Description column was only 10 characters long. His task was to load the records from the source to the destination table. If the length of the description was more than 10 characters, he was required to insert another row with the next 10 characters into the Destination table with a LineId of next incrementing number as follows.
Destination Table
ID LINEID DESCRIPTION
1 1 I am a pro
1 2 grammer
2 1 i am a doc
2 2 tor
Solution: There are many ways to solve this problem. Since he was using plain simple T-SQL and not SSIS package for the flow, we presented 4 solutions to the problem – before we present them in the blog, a quick glance at the example that he had given indicates that the problem is essentially the same as un-pivoting the data and transposing the column into rows. Since the maximum length of the source column is known before hand – nvarchar(100) … and the destination’s string max length is nvarchar(10), we can at the most get 10 lineID values for each ID record from the source. The problem would have been more complex if we were getting the data from a source where the max length was not known (we have seen such requirements as well) and in that case dynamic un-pivoting needs to be done. So, with that in mind, here are the solutions:
1) Use static union all lists since you already know the source column length (100) that goes into a maximum length of 10…so, it translates to 10 different SQLs with UNION ALL operations between them.
Or…
2) Use a CTE (Common Table Expression) to do recursive operation to break the source description column into separate line items.
Or…
3) Since the maximum length of the source is known, the issue translates to being able to unpivot the data i.e. treat that one single column a the source and un-pivot the column into rows (look at our whitepaper on pivot and unpivot).
Or…
4) The old way of using a cursor (or a while loop), looping around and creating the new records.
Let’s create the dummy tables and some data and then we will use #1 method to show how we can do this in a simple SQL statement (using SQL Server Syntax):
/********************************************************************************
create some dummy tables
Profession is the Source table and ProfessionItem is the destination table
*********************************************************************************/
create table Profession (Id int identity(1,1), Profession nvarchar(100))
create table ProfessionItem (ItemId int, LineItem int,Profession nvarchar(10))
/********************************************************************************
insert some dummy data
*********************************************************************************/
insert into Profession (Profession)
values (N’I am a Programmer and a baseball Champion and a football Player’);
insert into Profession (Profession)
values (N’I am a Doctor’);
insert into Profession (Profession)
values (N’I am an Artist’);
Now, a simple SQL statement using the method from #1 option above will give us the data in the required format:
Select *
from
(
SELECT P.Id as ItemID, 1 as LineItem, SUBSTRING(Profession, 1, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 2 as LineItem, SUBSTRING(Profession, 11, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 3 as LineItem, SUBSTRING(Profession, 21, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 4 as LineItem, SUBSTRING(Profession, 31, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 5 as LineItem, SUBSTRING(Profession, 41, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 6 as LineItem, SUBSTRING(Profession, 51, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 7 as LineItem, SUBSTRING(Profession, 61, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 8 as LineItem, SUBSTRING(Profession, 71, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 9 as LineItem, SUBSTRING(Profession, 81, 10) as Profession
FROM dbo.Profession as P
union all
SELECT P.Id as ItemID, 10 as LineItem, SUBSTRING(Profession, 91, 10) as Profession
FROM dbo.Profession as P
) AS X
Where LEN(X.Profession) > 0
Order by X.ItemID, X.LineItem
ItemID LineItem Profession ----------- ----------- ---------- 1 1 I am a Pro 1 2 grammer an 1 3 d a baseba 1 4 ll Champio 1 5 n and a fo 1 6 otball Pla 1 7 yer 2 1 I am a Doc 2 2 tor 3 1 I am an Ar 3 2 tist
A point to note is that if the source table is huge, then we can break it up in sets of 10000 based on the ID values and use that criteria in a while loop and put the ID fitler criteria in the WHERE clause of the SQL statement shown above. Since ID will be an indexed column (clustered index), the range scans will be fine and we can quickly get the data from the source to the destination table using this method.


Data Migration Scenario - CTE solution « Systems Engineering and RDBMS said
[...] Data Migration Scenario [...]
Separating numbers and text in a string column « Systems Engineering and RDBMS said
[...] by Microsoft MVP Itzik and that we have also covered in our posts before – here and here. So, here goes: with N1 (n) AS (SELECT 1 UNION ALL SELECT 1), N2 (n) AS (SELECT 1 FROM N1 AS [...]