DTS vs SSIS: A basic overview
Posted by decipherinfosys on March 11, 2008
In today’s blog post, we will give a basic introduction of SSIS (SQL Server Integration Services) and provide a comparison of some of those features with DTS (Data Transformation Services), it’s predecessor.
SSIS is an ETL tool provided by Microsoft for extracting the data from various sources, transforming the data according to the business requirements and loading the data into the destination. The source and the destination could be anything that you can talk to using a connection. One important thing to understand is that SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server 2005 and is not a separate install. Integration Services includes graphical tools and wizards for building and debugging packages, tasks for performing workflow functions such as FTP operations, for executing SQL statements, or for sending e-mail messages, data sources and destinations for extracting and loading data, transformations for cleaning, aggregating, merging, and copying data.
In SQL Server 2000, DTS packages are created in Enterprise Manager. The DTS packages are created in the DTS designer and you have the option of saving them in SQL Server or saving them to the filesystem (by specifying the location as “Structured Storage File”). In Integration Services, the packages are created using Business Intelligence Development Studio (BIDS). BIDS is nothing but another version of Microsoft Visual Studio. It is a modified version of Visual Studio. When we create Integration Services packages using BIDS, they are saved on the client machine (local machine). We have to explicitly deploy the Integration Services Packages onto the destination server.
The following steps show how you can launch BIDS:
The basic organization concept in DTS as well as SSIS is the concept of a package. A package is essentially a collection of SSIS objects which includes:
a) Connections to the data sources,
b) Data Flows,
c) Control Flows, and
d) Event Handlers
We will discuss these shortly. Let’s first look at the DTS Package Designer and BIDS to get an idea of the IDE layout.
Package Design Pane:
The striking difference when one sees both the designers is the way the designer is designed.
In DTS, the designer consists of a single pane. And to the right we see two options: the Connections and the Workflow tasks. The Connections are used to make the connections to a wide variety of data sources and destinations like Excel, Flat Files, Sql Server and Access essentially anything that we have an ODBC, OLEDB, Jet driver etc. connectivity to. The Work Flow tasks add the functionality to the package. The thick line or what we call the data transformation task between the source and the destination in the above example transforms the data and loads into the destination. One can write custom code for the transformations or simply use the inbuilt capabilities of the data transformation task.
In SSIS, the designer is split into 4 design panes:
a) Control Flow,
b) Data Flow,
c) Event Handlers, and
d) Package Explorer.
The control flow pane is used to design the workflow inside the package. There is also a great improvement in the number of tasks and additional functionality has been added to some of the existing tasks. The control flow is like a program flow. All the workflow tasks in the DTS come under the Control Flow tasks in SSIS except the Data Transformation Task.
The Data Transformation Task in DTS has been converted into Data Flow task. As the name suggests, the Data Flow task handles the flow of data. Microsoft has introduced a wide variety of data flow tasks in SSIS for building an entire ETL. All the data transformation from the source to the destination takes place in this Data Flow task. Examples of Data Flow Task are Multicast, OLEDB Command, Sort transformation, Pivot transformation, Unpivot transformation etc.
Event Handlers is something new to SSIS. Event handlers respond to various events raised in the package like on error, on warning, on pre execute and on post execute. When an event is raised then the control goes to the Event Handler Pane, and the package does whatever is instructed in the Event Handler Pane. Again Event Handlers Contain the Control Flow Tasks because Control Flow determines the workflow of the package.
Package Explorer is also a new pane in SSIS. It gives the overall view of the package.
The image shown above shows the package explorer. As said earlier, the package explorer gives the over all gist of the entire package like Executables, Precedence Constraints, Connection Managers, Event Handlers and the Variables. Precedence Constraints and Connection Managers are described later in the post.
The Solution Explorer:
The Solution Explorer is also a new concept in SSIS. The solution explorer consists of the SSIS packages, connections and Data Source Views (DSV). The solution contains a series of projects. Each project contains a series of objects used in the projects. In DTS we were allowed to create only independent packages. There was no possibility to store the related packages of the module in a single place. The Solution Explorer allows us to do that. We can have multiple packages relation to one module in a single Solution.
Data Source View (DSV)
This is also new to SSIS. A data source view is a document that describes the schema of an underlying data source. A data source view provides a named, browseable, persisted selection of database objects that can be used to define sources, destinations, and lookup tables for SQL Server 2005 Integration Services (SSIS) tasks, transformations, data sources, and destinations. Example: In some ERPs like SAP or even in some databases, the columns might be named like A0012, A0013, AABE etc. Naming the columns this way is typically done by some designers for obfuscation (especially in packaged applications) but does not help the SSIS developers. As far as the SSIS developer is concerned it is just another column on which data transformations have to be applied. In this case we can create a data source view over this data source and can rename the columns in some meaningful way by not renaming the columns in the data source. In other words we are overlapping the table with another transparent table with change in the column names. So we see the column names of the transparent table and the data of the original table.
Connection Manager is also something new in SSIS. To understand this let us consider the following screen shot of DTS.
In this package we see two connections. The first one is to the database on the left and the other one is to the text file on the right. What happens when we move this package from one environment to another for example from development to QA? The connections to the databases and the file locations change. Therefore we have to open up each package and then each connection to configure it to point to the intended environment. Typically each DTS package might contain up to 5-10 connections. And when we are moving a large number of packages from one environment to another it becomes cumbersome. Please do note that this can easily be overcome by using dynamic properties and disconnected edits in DTS and we can either pick up the connection information from a file at run-time or even from a configuration table from a database but one needed to code for it or the configuration wasn’t that straight forward.
Now, let’s see how the connection manager helps in making this process easy in SSIS. Let us understand this with the help of an SSIS example shown in the image below:
In the above package we see two connections and one database connection. In SSIS, we have a centralized place where we can configure the connection to a particular data source or data destination called connection managers. All the connections inside the package make use of these connection managers to make the connection to the data sources and data destinations. As in DTS, SSIS also provides a wide variety of connection managers to make connections to a variety of sources and destinations. So when moving from one environment to another, we should open up the SSIS package and just make the change only in one place, which are the connection managers as opposed to multiple places as in DTS. This saves us time and also is less cumbersome. In a little while we will see how to dynamically alter the connections even without opening up the packages. This is done with Configurations.
Variables are a powerful piece of the SSIS architecture; they allow you to dynamically control the package at runtime. There are two types of variables: system and user. System variables are ones that are built into SSIS, whereas user variables are created by the SSIS developer. Variables can also have varying scope, with the default scope being the entire package. They can also be set to be in scope of a container, task, or event handler inside the package. The addition of scope to variables is the main differentiating factor between SSIS variables and DTS global variables.
SSIS provides us with a way of altering the connection information and also the variable values at run time. This is done by using the Configuration Files. The following screen shot shows us how to open up the configuration editor in SSIS.
Right click on the control flow pane and click on the Configurations menu. This will open up the configuration wizard. This is also something which did not exist in the DTS. We will explore more of package configurations and how to create configuration files in the upcoming posts.
All Microsoft SQL Server 2005 Integration Services (SSIS) container types—packages, the For Loop, For each Loop, and Sequence containers, and the task hosts that encapsulate each task—can be configured to use transactions. Integration Services provides three options for configuring transactions: NotSupported, Supported, and Required.
• Required indicates that the container starts a transaction, unless one is already started by its parent container. If a transaction already exists, the container joins the transaction
• Supported indicates that the container does not start a transaction, but joins any transaction started by its parent container. For example, if a package with four Execute SQL tasks starts a transaction and all four tasks use the Supported option, the database updates performed by the Execute SQL tasks are rolled back if any task fails. If the package does not start a transaction, the four Execute SQL tasks are not bound by a transaction, and no database updates except the ones performed by the failed task are rolled back.
• NotSupported indicates that the container does not start a transaction or join an existing transaction. A transaction started by a parent container does not affect child containers that have been configured to not support transactions. For example, if a package is configured to start a transaction and a For Loop container in the package uses the NotSupported option, none of the tasks in the For Loop can roll back if they fail.
In this post, we covered a brief introduction to SSIS, its important features and how SSIS differs from its predecessor: DTS. In future posts, we will discuss each task in detail and will also cover the best practices as well as go into the usage of SSIS in building up business intelligence applications like explaining how to handle slowly changing dimensions etc..
6 Responses to “DTS vs SSIS: A basic overview”
Sorry, the comment form is closed at this time.