SSIS: Importing data from a text file using a package
Posted by decipherinfosys on July 28, 2008
Last week, we had covered how to create a SSIS package to export data to a text file from SQL Server. In today’s post, we are going to cover creating a SSIS package to import the data from a text file into the database schema. We have covered the export/import techniques from SQL Server, Oracle and DB2 LUW in several of our posts – this one concentrates on building up a SSIS package to achieve this task.
As described in the earlier posts, SSIS designer contains four design panes, the Control Flow pane, The Dataflow Pane, Event Handler and the Package Explorer. It has also been discussed that the Control Flow pane is used to incorporate the programming logic into the SSIS package while the Data Flow Pane is used to handle the dataflow inside the package. Since our current task is to import the data from a text file source into Sql Server, we are going to make use of Data Flow task in this current example.
For the purpose of our example, I am creating a text file, Applicants.txt which holds the name and address information of the Applicants of Credit Cards. The text file attachment can be found at the end of the post.
Our task is to create a new table in the database and load that data into that table. If the table already exists then our task is to truncate the table and if the table does not exist, we have to create the table. For this purpose we require one Execute Sql Task (configuration of this task is explained in the earlier posts) to create the table if it does not exist and to truncate the table if it exists. After successful completion of this task, we have to load the data into the table. To achieve this we require a data flow task since we are dealing with the data in our package.
The downside of the dataflow task is that we need to have the tables in the database which we use in this task. So meet this requirement, we initially create the table ‘Applications’ in the database AdventureWorks. Later on we are going to drop this table for our testing and do some work around to test our scenario to create the table if it does not exist and load the table with the data in the text file.
Execute the following script in the AdventureWorks database to create the table ‘Applications’
create table Applications
[S.No.] int identity(1,1),
ApplicationID uniqueidentifier default newid(),
Creation of the SSIS Package:
Open up the SSIS designer by navigating to
Start –> All Programs –> Microsoft Sql Server 2005 –> SQL Server Business Intelligence Development Studio.
This will open up the SSIS designer to create the SSIS package. Create the new project in the SSIS designer.
1. Creation of a new variable:
To create a new variable right click on the control flow pane and click on the Variables option. This will show the Variables window on the left. Create a new variable ‘TableName’ at the package level. We are going to make use of this variable to create the new table and also to specify the table name to be imported, inside the data flow task. The image below shows the creation of the variable:
2. Creating the Table if it does not exist:
The next step is to configure the Execute Sql Task to create the table ‘Applications’ if it does not exist or else truncate the table if exists. To do this drag the Execute Sql Task from the Control Flow Tool box and drop it in the Control Flow design pane. The configuration of the Execute Sql task was discussed in the earlier posts. Configure the Execute Sql Task to point to the AdventureWorks database. This will create a new connection manager which is renamed to DecipherDestination in this package.
Since we are providing the name of the table through the variable ‘TableName’ and creating the table at run time, we cannot directly write the create table script in the SqlStatement editor of the Execute Sql Task. Instead we have to make use of this new feature of SSIS called ‘Expressions Page’. Every control flow task has the Expressions Page, which is used to dynamically alter the task properties at run time.
To make use of this page, open up the Execute Sql Task Editor and Navigate to the Expressions Page. Click on the little ‘+’ sign and open up the expressions editor. In the property value of the expression editor, select the property SqlStatementSource. This property is made use of to generate any Sql statement dynamically. Since we are creating the table at run time we are making use of this property. Open up the property editor by clicking on the ‘…’. Copy and paste the following Sql script in the Expression window.
“if not exists (select 1 from sys.objects where name = ‘Applications’ and type = ‘U’)
create table ” + @[User::TableName] +” ([S.No.] int identity(1,1), ApplicationID uniqueidentifier default newid(), FirstName varchar(100), MiddleName char(2), LastName varchar(100), AddressLine1 varchar(100), AddressLine2 varchar(100),City varchar(100), State varchar(100), Country varchar(100))
TRUNCATE TABLE “+ @[User::TableName] +”
The image below shows the navigation to the expression window on the “Execute SQL” Task:
Click OK and return to the SSIS Control Flow designer.
3. Importing the Data into the Table using Data Flow Task:
Our next task is to define the data flow task which imports the data from the text file into the physical table we already created. Drag and drop the Data Flow Task from the Control Flow Tool Box into the Control Flow Designer. Double clicking on it will automatically take us to the Data Flow Tab of the SSIS Designer.
3.1 Configuring the Source in the Data Flow task:
On the Data Flow tab, drag and drop the “Flat File Source” from the Data Flow Sources in the Data Flow Tool Box. Double click on the task which opens up the editor for the Flat File Source.
Create the new flat file source by clicking on “New” button on the Flat File Source Editor. The image below shows the configuration of the flat file source connection manager editor:
Specify the name for the Connection Manager and hit the “Browse” button to navigate to the file system location where the text file is stored and click OK. If the text file contains the column names in the first data row then check the box “Column Names in the first data row”. Since in the text file we are using has the column names in the first data row this option is being checked in this example. The image below shows the General Page of the Flat File Connection Manager Editor:
Next click on the Columns Page of the Editor. On this page the row and column delimiters in the text file should be specified. In our text file, the row delimiter is the new line and the column delimiter in the tab space. The image below shows the configuration of the columns page:
We can navigate to the preview page if we wish to preview the data in the flat file. The configuration of any Data Flow Source in the Data Flow design is almost the same. Click on OK to go back to the Data Flow Design Pane
3.2 Configuring the Destination in the Data Flow task:
The next step in our task is to configure the destination for the data to be imported. We can make use of the same connection manager what we created to create/truncate the table since we are importing the data into the same connection (Same Sever, Database).
On the Data Flow tab, drag and drop the “OLE DB Destination” from the Data Flow Sources in the Data Flow Tool Box. Drag the path (green arrow) from the Flat File source and join to the OLE DB Destination. Double click on the task which opens up the editor for the OLE DB Destination.
As discussed, we can make use of the same connection manager which we created earlier for the Execute Sql Task. If we wish to make a different connection we can also create a new connection manager, a process similar to creation of connection manger in the Execute Sql Task. Since we are making use of the variable to store the table name, specify the data access mode to be “Table Name of View Name variable”. And in the Variable property, specify the variable name with holds the destination table name in our example it is ‘TableName’ variable. The image below shows the configuration of the OLEDB Destination:
Navigate to the Mappings page on the OLE DB Destination Editor and map the source columns to the destination columns. If the column names of the source and destination match, by default SQL Server maps them or else we have to map them manually. Since the first two columns are auto generated and the remaining column names matched in our example SQL Server automatically mapped the source and destination columns. The image below shows the Source and Destination Column Mappings:
The image below shows the Data Flow task after the configuration of the source and the destination:
The image below shows the entire package for importing the text file into a physical table inside the database:
Execute the package by hitting F5. The executables will turn green indicating the successful execution of the package. We can go to the database and select the records from the Applications table to verify that the data is imported indeed. What if the table did not exist when we ran this package? Drop the Applications table from the AdventureWorks database and run the SSIS package again. This time is throws an error as shown in the below screen shot stating that the “Table Does Not Exist’.
Even though our package instructs the SSIS Runtime engine to create the table if it does not exist, the package still throws the error back stating that the table does not exist. This happens because the validations are made before the package is run. Since the table does not exist at the compile time the SSIS Engine throws this error.
One way to get around this problem is to delay the validation of the package. That means we are not validating the package until the run time. Each SSIS task (including the SSIS package) has its own properties. The properties window of a particular task can be viewed by right clicking on the task and click on the ‘Properties’ option in the menu. This will display the properties of the executable. So in our example bring up the properties window of the Data Flow Task. In the properties window, and change the DelayValidation property from False to True. The image below shows how to do this:
This will delay the validation of the Data Flow task until run time. This means that even though the table does not exist at compile time the SSIS Engine will not through the error since the package in now validated at compile time. Now execute the package by hitting F5. Now we see that the table is created and the data is populated in the Application table of AdventureWorks database. The image below shows the successful execution of the SSIS package. Note that we changed the name of the Executables to some meaningful names rather than leaving them at default name what SSIS gives.
That’s it folks! In this post we discussed the usage of the data flow task and also looked at the configuration of the Flat File Source and OLE DB destinations. We then discussed the import of the data in the text file into the physical table inside the database by creating the table if it did not exist.