Systems Engineering and RDBMS

Archive for the ‘SQL Server’ Category

Minimally logged and the recovery model

Posted by decipherinfosys on November 9, 2009

Yesterday, while attending a conference call related to a production issue at a client site, we had to emphasize to the client DBA one of the concepts related to minimally logged operation.  It is well documented, however people tend to miss this information.

In SQL Server, there are several minimally logged operations which essentially means that only enough information  is logged to be able to recover the transaction and point in time recovery is not supported for those minimal logged operations.  Such operations are typically used for bulk data processing.  Key point to remember is that if you are running under the FULL recovery model, ALL the operations are fully logged including the operations that are supposed to do minimal logging.  You need to be in the simple or bulk logged recovery model in order to take advantage of the minimally logged operations.

As far as the inserts go, the minimal logged operations are: BULK INSERT, bcp, SELECT … INTO and in SQL Server 2008 – INSERT … SELECT.   But there are certain pre-requisites for these operations to be minimally logged which you can read more on from the BOL link below.  There are some index DDL operations and also inserts/updates to the text/ntext/image data type columns that can be minimally logged.

You can read more on this topic from BOL – here and here.  And if you are a subscriber of the SQL Server Magazine, there is an excellent article by MVP Itzik Ben Gan on minimally logged inserts – here.

Posted in SQL Server | Leave a Comment »

listagg() – new analytic function in Oracle 11g R2

Posted by decipherinfosys on October 20, 2009

We all have used different methods in order to get a concatenated list of values from a column – pivoting it out, using SYS_CONNECT_BY_PATH function which was introduced in Oracle 10g R1, writing our own functions to do it etc..  In SQL Server also, using XML PATH, one can do it easily as illustrated in some of our posts before – one of them is here.

Oracle 11gR2 now has a new analytic function called listagg() for doing the list aggregation.  So, now all that we need to specify is the name of the column and the separator string that will separate the different values.  Here is the link to the 11gR2 documentation on this wonderful function along with some examples to help illustrate it’s usefulness:

http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions087.htm

Posted in Oracle, SQL Server | Leave a Comment »

Splitting a number of rows into equal groups

Posted by decipherinfosys on October 16, 2009

At the client site yesterday, one of the developers asked this question: “I want to take the large set of the data that I have in my gigantic table and split it up into balanced non-overlapping sets.  I want the starting value in the set, the ending value and the number of records in that group.  I need this to do my data processing logic for starting parallel concurrent processing of those large data sets.  What is the best way to achieve this?

This client site uses both Oracle and SQL Server so the solution had to work with both the RDBMS.  Luckily enough, there is an analytic function in both which does this very easily for us.  The function that we are talking about is NTILE().  Let’s take this up with an example:

SQL Server:

Let’s say that I want to split all the objects in sys.objects into say 10 different non-overlapping sets.  And then as per the requirement, I need to list out the starting value in the set, the ending value and the total count of the records in the set.  Here is a simple code sample that will do that:

select
min(object_id) as Starting_Value
,    max(object_id) as Ending_Value
,    count(*)       as Total_Records
,    grp_nbr           as Group_Nbr
from
(
select object_id,
ntile(10) over (order by object_id) grp_nbr
from SYS.OBJECTS
) AS IV
group by grp_nbr;

What I get as the output in my test database is this:

Starting_Value Ending_Value Total_Records Group_Nbr
-------------- ------------ ------------- --------------------
4              125243501    123           1
128719511      366624349    123           2
367340373      571865104    123           3
574625090      767341798    122           4
768721791      971150505    122           5
971866529      1163151189   122           6
1165247206     1403152044   122           7
1406628054     1707153127   122           8
1709249144     1941581955   122           9
1943677972     2144726693   122           10

Now, you will notice that not all sets have equal records but they are pretty close.  That is by definition of the NTILE() function.  Here is what BOL states:

If the number of rows in a partition is not divisible by expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example if the total number of rows is 53 and the number of buckets is five, the first three buckets will have 11 rows and the two remaining buckets will have 10 rows each. If on the other hand the total number of rows is divisible by the number of buckets, the rows will be distributed evenly among the buckets. For example, if the total number of rows is 50, and there are five buckets, each bucket will contain 10 rows.

Oracle:

And the same function is available in Oracle as well. And the same code shown above will work in Oracle as well.  Just replace SYS.OBJECTS with ALL_OBJECTS.

Pretty simple way of splitting a large data set into non-overlapping sets, isn’t it!

Resources:

  • SQL Server MSDN BOL entry for the NTILE() function – here.
  • Oracle documentation of the NTILE() function – here.

Posted in Oracle, SQL Server | Leave a Comment »

SSRS and Excel rendering limitations

Posted by decipherinfosys on October 13, 2009

Ran into an issue at a client site today in which one of the columns of the report was getting a concatenated string of the DC’s for the suppliers.  It turns out that for one of them, the string was over the limit that excel could handle and as a result when the end user was trying to export out the data to excel, it failed.  Here is what the error was:

SSRS_ERR_1

And when you clicked on Yes, you would get this:

SSRS_ERR_2

Needless to say, both are cryptic messages – after looking up on technet and MSDN and googling/binging :-) it for a while, came across the limitations of excel rendering which we thought that we would post here so that the readers can benefit from it as well.  The limitations are mentioned here:

http://msdn.microsoft.com/en-us/library/ms156418(SQL.90).aspx

The report design was changed as well since there was no need to have that concatenated list in this case but it is good to be aware of the limitations so that you know the next time what might be a possible cause for that weird error that excel throws at you.

Posted in SQL Server | 2 Comments »

SSIS – Data Flow Transformations

Posted by decipherinfosys on October 13, 2009

We have written before about different aspects of SQL Server Integration Services – you can read more on those blog posts here.   Until our previous articles we have been looking at mainly the Control Flow Tasks in the Sql Server Integration Services. Control Flow Tasks maintain the Flow of the logic in the SSIS. The main transformation logic in performed by the data flow task. One of the main uses of the Sql Server Integration Services (SSIS) is to perform the Extraction, Transformation, Data Cleansing and Loading (ETL or sometimes called as ETLM) of the data from various sources. As data from various sources might come in a different format, there is generally a requirement of cleaning up of the data before we load the data into our system. Data cleansing or rather transformation in Sql Server Integration Services (SSIS) is done using the Data Transformation Tasks. Today, in our post, we are going to look at a few of such SSIS Data Transformation Tasks.

One of the data flow transformations which is used for the data cleansing purposes is called the  Lookup Transform. We are going to explore the Lookup Transform in Sql Server 2005. There have been some enhancements in SQL Server 2008 which we are going to look into in future posts. The look up transform is very similar to that of the lookup tab in the data pump task in SQL Server 2000 DTS. This task is used to derive new columns based on the existing set of columns. This transform performs look up on an existing table based on the columns which are existing in the dataflow (incoming source). Let us explore more into this transform with the help of an example.

We will pick up a business use case scenario which is very common and we have faced it as well in our projects.  Let us consider a business scenario where we are getting an inbound file from a client and the file contains the health application records created on that particular day. Furthermore the file contains the City and the State Code but not the zip code. Our task is to look up the zip code by using the City and State Code Columns and populate our destination table. If the look up fails, then, we need to insert a -1 in the destination table.

For that we should have an already existing table in the database which stores the state code, city and the zip code. The following code block creates the table and populates it with the required data.

IF EXISTS(SELECT 1 FROM SYS.TABLES WHERE NAME ='lkpZipCode')
DROP TABLE lkpZipCode
GO
CREATE TABLE lkpZipCode
(
ZipCodeID INT IDENTITY(1,1) PRIMARY KEY
, CityCode VARCHAR(20)
, StateCode CHAR(2)
, ZipCode VARCHAR(20)
)

/*Using SQL Server 2008 functionality to do the inserts in one single statement*/

INSERT INTO lkpZipCode (CityCode, StateCode, ZipCode)
VALUES (‘Bellevue’,'WA’,98007),
(‘Seattle’,'WA’,98004),
(‘Atlanta’,'GA’,30339),
(‘Alpharetta’,'GA’,30005),
(‘Irvington’,'AL’,36544),
(‘Jacksonville’,'FL’,36265),
(‘Austin’,'TX’,73301),
(‘Houston’,'TX’,77001)
GO

Let us now create a table to hold the incoming values. The following piece of code creates the destination table.

IF EXISTS(SELECT 1 FROM SYS.TABLES WHERE NAME ='Customers')
DROP TABLE Customers
GO

CREATE TABLE dbo.Customers
(
CustomerID     INT IDENTITY(1,1) PRIMARY KEY
, CustomerFirstName Varchar(20) NOT NULL
, CustomerLastName  Varchar(20) NOT NULL
, City        Varchar(20)   NOT NULL
, [State]        Varchar(2)   NOT NULL
, ZipCode        Varchar(20)          NOT NULL
)

Create a text file (Customers.txt) and populate it with some sample data.

Open up the Business Intelligence Development Studio (BIDS) and create a new SSIS project.   If you are new to SSIS, we would recommended to read our previous articles about creating and running the SSIS Packages using BIDS.

Package Creation:

Since we are dealing with the flow of data in this scenario, we should use the data flow tasks to transform the data according to our needs.  Drag and drop the Data Flow task from the Control Flow task onto the Control Flow Pane of the BIDS.   Double click on the Data Flow Task which takes us to the data flow pane.   Since our source is a text file, drag and drop the Flat File Source from the data flow task on to the data flow pane.   Configure the Flat File Source to use the flat file we created earlier.

Configuration of the Lookup Transform:

Since we have to derive the zip code based on the State and City columns in the text file, by looking up the zip code in our look up table, we have to use the look up transform.   Drag and drop the lookup transform from the Data Flow Task on to the Data Flow pane.   Double click on the Transform to edit it.   The following image shows the Reference Table Tab of the Transform:

SSIS_DFT_1

This shows the reference table tab of the lookup transform.   In this tab, we can configure the table which we will use to look up our zip code based on the City and the State columns from the incoming file.   Click on the NEW button beside the OLEDB Connection Manager to create a connection to the database in which our table is located. Provide the server name and the database name in the relevant fields and click OK.   The image below shows the creation of the connection to the database:

SSIS_DFT_2

We can either specify a table or a sql query to generate the result set against which we will do a lookup using the City and State columns from the incoming file.   In our case we are using the table to do a look up.   If needed, we can specify a sql query by clicking on the radio button against the ‘Use Results of an Sql Query’ and specify the sql query.   For our scenario, drop down the ‘Use Table or View’ and select the table lkpZipCode.   The image shown below depicts the same:

SSIS_DFT_3

In the columns tab, we configure the columns on which we do a look up on the source columns and the destination columns.   The image below shows the configuration of columns tab in the Lookup Table:

SSIS_DFT_4

Drag the City column from the Source Available columns and drop it on the CityCode column of the destination.   Repeat the same with the State column in the source and the StateCode in the destination.   Since we are looking up the ZipCode column in the lookup table, mention the same in the Lookup Column of the editor.   Since this is a new column added to the data flow, select the ‘Add as new column’ in the lookup operation drop down.   Give the output alias as ZipCode.   This finishes the configuration of the Lookup Transformation.

Lookup Transform Outputs

The lookup Transform has two outputs:  Lookup Match output and the Error Output.   If the Lookup is successful then those rows are sent out to the Lookup Match output.  If the lookup fails to find any records, then those rows are sent to the Error Output.

Derived Column Transform:

As per our business scenario, if the lookup fails, then we have to insert a -1 into the look up table.   For that we have to make use of the Derived Column Transform.   A Derived Column Transform derives a new column and adds it to the data flow based on either the existing columns in the data flow or based on the expressions.   Drag and drop the Derived Column Transform on to the data flow from the data flow task and double click on it to edit the properties.   The image below shows the Transformation editor:

SSIS_DFT_5

In the derived column name provide the name of the column we intend to derive. In this case, since we are intending to insert a -1 zip code if the look up fails, type the name of the derived column as ZipCode and the Expression as -1.   In the derived column, select the ‘add as new column’ since we are adding a new column to the dataflow.  Drag the error output (red) arrow and join the Derived column Transform.   This opens up the editor as shown below:

SSIS_DFT_6

Even if the lookup fails, since we should not error the rows out, select the Redirect Row option in the Error column of the Lookup Output as shown in the image above and click OK.   This will redirect the rows on which the lookup failed to the Derived Column Transform, which adds the column ZipCode with a value of -1 to the data flow for the failed lookup records.

UNION ALL Transform

Since we have to combine the result set of both the successful lookups and un-successful lookup, we will use the UNION ALL Transform.   Drag and drop the union all transform from the data flow task on the data flow.   Drag the green arrow (look up match) arrow from the lookup transform and connect it to one of the input of the union all transform.

OLEDB Destination

As a final step, we have to load the cleansed records into our destination table.   To do that drag and drop the OLEDB Destination from the Data Flow tasks on to the data flow.   Configure the data flow to insert the data set from the UNION ALL transform into the destination table.   This transform has been dealt in detail in our previous posts. The image below shows the screen shot of the entire SSIS package:

SSIS_DFT_7

To summarize, in this blog post, we discussed the use of SSIS to cleanse/transform the data from various sources.  We understood the importance of the lookup transform and dealt with the configuration of the lookup transform in detail.   We also explored other transforms like Derived Column Transforms and UNION ALL transform which are also very handy transforms in the data flow.   In some of the upcoming posts, we will look at some of the enhancements in SQL Server 2008 and will also cover other business use case scenarios.

Resources:

  • Lookup Transform – MSDN article – here.
  • MVP Stacia Misner’s post on Lookup Transform and Caching – here.
  • MVP Jorg Klein’s post on the case sensitive nature of the lookup transform – here.
  • Arshad’s article on sqlserverperformance.com – it discusses the SSIS enhancements in SQL Server 2008 and has a series of posts on the topic – here.
  • Brian Knight’s books on SSIS – these are the best resource for anyone new or experienced in SSIS.  There are other authors as well – too many books & authors to list here – you can see all of them from this link at amazon – here.

Posted in SQL Server | 2 Comments »

Using schema.ini when reading a text file with openrowset()

Posted by decipherinfosys on October 6, 2009

At a client site, their existing code was using openrowset() to read and upload the text files (this process was later on changed to use a SSIS package to have a more robust process).  And a couple of days back, that nightly process failed because the demiliter was present as part of the column string itself.   The problem was that as part of the openrowset() command, there was no way to specify the delimiter and it relied on the default delimiters.  Here is a sample of how the command looked like:

declare @file varchar(100), @cmd varchar(1000)
select @file = 'dataupload_' + convert(varchar(10), getdate(), 112) + '.txt'

select @cmd =
'select * from OpenRowset(''MSDASQL'',''Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=Z:\Inbound_Feed;'',''select * from '+@file+''') as X'
print @cmd

The output was:

select * from OpenRowset('MSDASQL','Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=Z:\Inbound_Feed;','select * from dataupload_20091006.txt') as X

In order to be able to specify a delimiter when reading a text file like this, one needs to take the following steps:

1) Create a file called Schema.ini file.  You can read more about the contents of this file from this msdn post – here.

For our purposes, this file needed to have these enteries:

[dataupload_20091006.txt]
ColNameHeader=True
Format=Delimited(|)

The first entry is the name of the file, the second specifies that the first row has column names and the third entry specifies the custom delimiter.

2) This file needs to be placed in the same folder where the text file is going to be read from.

Once you do that, the upload will go through fine.  As you can see from above, it is a cumbersome process (can be automated through scripts though) and that is why we converted this into a SSIS package with all the bells and whistles and it works like a charm. But wanted to put it out there in case our readers face this issue.

Resources:

  • Schema.ini documentation – here.
  • Openrowset() documentation – here.

Posted in SQL Server | Leave a Comment »

Extracting large XML data using bcp

Posted by decipherinfosys on October 4, 2009

Ran into an issue today at the client site.  There was a stored procedure in the outbound interface of the client application which was responsible for collecting event based data, convert it into an XML and then the SSIS package would take that XML document, encrypt it using pgp encryption and upload it to the client’s partner companies.  Everything was working like a charm for the past couple of months till last night, out of memory exceptions started coming.

The reason was that the XML that was generated was close to 200MB that night – this was because of a large amount of event activity in the system which resulted into a generation of a very large XML.  When the SSIS package was trying to get that data out to encrypt it, it started running into the out of memory exceptions.  The fix that we made was pretty straight forward.  We used bcp to extract out the large XML document.  Here are the steps:

1) First, generate the format file.  A very simple way of doing this is to let the bcp command do it for you.  Here is a sample example:

bcp deciphertest.dbo.large_xml_tbl format nul -T -c -x -f “D:\format_files\large_xml_tbl.xml”

This command takes the table large_xml_tbl from the deciphertest database andd dumps the format file “large_xml_tbl.xml” on the D:\format_files folder.  This is just to illustrate the usage.  In real life, one would be using a query to extract the XML document out rather than the table itself.

2) Once the format file has been generated, you can then use the bcp command to extract out the large XML:

bcp deciphertest.dbo.large_xml_tbl out “D:\XML_Data\large_xml_tbl_1004090815.xml” -SDISprod -T -f “D:\format_files\Test.xml”

This took about 4-5 seconds for the 200MB file.  The encryption was done then and file was uploaded using the regular process.

Resources:

  • BCP XML format files post on mssqltips – here.
  • BCP documentation on MSDN – here.

Posted in SQL Server | 2 Comments »

Number of Files for Tempdb

Posted by decipherinfosys on July 16, 2009

For those who have worked in SQL Server for some time, know that one of the best practices that is recommended for tempdb is to have multiple data files – 1 data file per CPU as seen by SQL Server since it creates a logical scheduler for each CPU that it can access.  You can read more on this post by Robert Dorr @ the CSS SQL Server Engineers blog post.

Posted in SQL Server | Leave a Comment »

CPU usage reaching 100% with SQL Server 2005-SP3

Posted by decipherinfosys on July 15, 2009

Ran into an issue at a client site last week where the CPU on the DB Server was reaching closer to 100%.  The same application had been working fine for the past several months.  The only difference was the application of SP3 for SQL Server 2005.  So, searched the KBs for any known issue and there happens to be a bug for which a cumulative update is available.  You can read more in this KB article.

Posted in SQL Server | Leave a Comment »

sys.dm_db_persisted_sku_features

Posted by decipherinfosys on July 11, 2009

SQL Server 2008 has a new DMV which you might need if you run into an issue similar to what we ran into last week.  At one of our client sites, they were using CDC and data compression features in SQL Server 2008.  When they tried to restore the database in their development environment, the restore operation failed.  The reason is that for certain features in SQL Server 2008 which are only supported in the Enterprise Edition, if you are using those features in your environment and you try to restore a backup of such a database on an instance that does not support those features, you will not be able to do so.  This is written in BOL as well:

Some features of the SQL Server Database Engine change the way that Database Engine stores information in the database files. These features are restricted to specific editions of SQL Server. A database that contains these features cannot be moved to an edition of SQL Server that does not support them Use the sys.dm_db_persisted_sku_features dynamic management view to list all edition-specific features that are enabled in the current database.

There are four features that fall into this category: CDC (Change Data Capture), Data Compression, TDE (Transparent Data Encryption) and partitioning.  It actually was the same issue in SQL Server 2005 as well when you would have used the partitioning feature.  In such scenarios, you can use the above mentioned DMV to see whether you are using any of those four features.

Posted in SQL Server | Leave a Comment »