Systems Engineering and RDBMS

Joomla Installation

Posted by decipherinfosys on November 14, 2009

For one of the projects at a client site, the client had asked us to look into building a customized CMS for them.  We are looking at a couple of different options that are available including using drupal, joomla, MOSS, ektron’s solution etc..  Last night, when doing the installation for the first time for Joomla, after having installed the WAMP server and then unzipping the joomla 1.5 download in the www folder, we made the mistake of just re-naming the default configuration.php-dist file that comes with the install to configuration.php.

As a result, we were always getting the error of deleting the installation folder before it would allow the install to move forward.  In one of the forums a senior joomla developer suggested to simply re-name that file and configure it through the GUI which will create the new configuration.php file.  And that worked.  Now, we are all set to play with this wonderful CMS and see it’s capabilities.  Being a MSFT technologies shop, we might want to move to MOSS however for this project since the client wants some of the features that are already available in some open source CMS, we might go with an open source solution for this project not to mention that the costs will be less also.  Needless to state that we will be outsourcing that work to our off-shore partner since we do not have the open source capabilities in-house.  But this will be a good learning experience and we are looking forward to it.  Here are some helpful links for those evaluating Joomla:

http://help.joomla.org/

http://forum.joomla.org/

http://docs.joomla.org/Beginners

http://downloads.joomlacode.org/docmanfileversion/1/7/4/17471/1.5_Installation_Manual_version_0.5.pdf

Posted in Open Source | Leave a Comment »

Minimally logged and the recovery model

Posted by decipherinfosys on November 9, 2009

Yesterday, while attending a conference call related to a production issue at a client site, we had to emphasize to the client DBA one of the concepts related to minimally logged operation.  It is well documented, however people tend to miss this information.

In SQL Server, there are several minimally logged operations which essentially means that only enough information  is logged to be able to recover the transaction and point in time recovery is not supported for those minimal logged operations.  Such operations are typically used for bulk data processing.  Key point to remember is that if you are running under the FULL recovery model, ALL the operations are fully logged including the operations that are supposed to do minimal logging.  You need to be in the simple or bulk logged recovery model in order to take advantage of the minimally logged operations.

As far as the inserts go, the minimal logged operations are: BULK INSERT, bcp, SELECT … INTO and in SQL Server 2008 – INSERT … SELECT.   But there are certain pre-requisites for these operations to be minimally logged which you can read more on from the BOL link below.  There are some index DDL operations and also inserts/updates to the text/ntext/image data type columns that can be minimally logged.

You can read more on this topic from BOL – here and here.  And if you are a subscriber of the SQL Server Magazine, there is an excellent article by MVP Itzik Ben Gan on minimally logged inserts – here.

Posted in SQL Server | Leave a Comment »

Outlook Attachment Reminder

Posted by decipherinfosys on October 23, 2009

This was sent by a friend and is a helpful tip that one can use in their day to day work life so we are posting it here.   He wrote:

Sometimes we forget to attach an attachment that we meant to send in the Outlook email. I found a small macro program that checks for the actual attachment file when it sees a keyword “attach” in your message body before sending the email. If doesn’t see the attachment file, it shows an alert similar to shown below:

Outlook

The script and the instructions are available from Mark Bird’s google page over here:

http://mark.bird.googlepages.com/home

A simple and helpful tip indeed.  Thanks Mark.

Posted in Technology | Leave a Comment »

listagg() – new analytic function in Oracle 11g R2

Posted by decipherinfosys on October 20, 2009

We all have used different methods in order to get a concatenated list of values from a column – pivoting it out, using SYS_CONNECT_BY_PATH function which was introduced in Oracle 10g R1, writing our own functions to do it etc..  In SQL Server also, using XML PATH, one can do it easily as illustrated in some of our posts before – one of them is here.

Oracle 11gR2 now has a new analytic function called listagg() for doing the list aggregation.  So, now all that we need to specify is the name of the column and the separator string that will separate the different values.  Here is the link to the 11gR2 documentation on this wonderful function along with some examples to help illustrate it’s usefulness:

http://download.oracle.com/docs/cd/E11882_01/server.112/e10592/functions087.htm

Posted in Oracle, SQL Server | Leave a Comment »

Splitting a number of rows into equal groups

Posted by decipherinfosys on October 16, 2009

At the client site yesterday, one of the developers asked this question: “I want to take the large set of the data that I have in my gigantic table and split it up into balanced non-overlapping sets.  I want the starting value in the set, the ending value and the number of records in that group.  I need this to do my data processing logic for starting parallel concurrent processing of those large data sets.  What is the best way to achieve this?

This client site uses both Oracle and SQL Server so the solution had to work with both the RDBMS.  Luckily enough, there is an analytic function in both which does this very easily for us.  The function that we are talking about is NTILE().  Let’s take this up with an example:

SQL Server:

Let’s say that I want to split all the objects in sys.objects into say 10 different non-overlapping sets.  And then as per the requirement, I need to list out the starting value in the set, the ending value and the total count of the records in the set.  Here is a simple code sample that will do that:

select
min(object_id) as Starting_Value
,    max(object_id) as Ending_Value
,    count(*)       as Total_Records
,    grp_nbr           as Group_Nbr
from
(
select object_id,
ntile(10) over (order by object_id) grp_nbr
from SYS.OBJECTS
) AS IV
group by grp_nbr;

What I get as the output in my test database is this:

Starting_Value Ending_Value Total_Records Group_Nbr
-------------- ------------ ------------- --------------------
4              125243501    123           1
128719511      366624349    123           2
367340373      571865104    123           3
574625090      767341798    122           4
768721791      971150505    122           5
971866529      1163151189   122           6
1165247206     1403152044   122           7
1406628054     1707153127   122           8
1709249144     1941581955   122           9
1943677972     2144726693   122           10

Now, you will notice that not all sets have equal records but they are pretty close.  That is by definition of the NTILE() function.  Here is what BOL states:

If the number of rows in a partition is not divisible by expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example if the total number of rows is 53 and the number of buckets is five, the first three buckets will have 11 rows and the two remaining buckets will have 10 rows each. If on the other hand the total number of rows is divisible by the number of buckets, the rows will be distributed evenly among the buckets. For example, if the total number of rows is 50, and there are five buckets, each bucket will contain 10 rows.

Oracle:

And the same function is available in Oracle as well. And the same code shown above will work in Oracle as well.  Just replace SYS.OBJECTS with ALL_OBJECTS.

Pretty simple way of splitting a large data set into non-overlapping sets, isn’t it!

Resources:

  • SQL Server MSDN BOL entry for the NTILE() function – here.
  • Oracle documentation of the NTILE() function – here.

Posted in Oracle, SQL Server | Leave a Comment »

SSRS and Excel rendering limitations

Posted by decipherinfosys on October 13, 2009

Ran into an issue at a client site today in which one of the columns of the report was getting a concatenated string of the DC’s for the suppliers.  It turns out that for one of them, the string was over the limit that excel could handle and as a result when the end user was trying to export out the data to excel, it failed.  Here is what the error was:

SSRS_ERR_1

And when you clicked on Yes, you would get this:

SSRS_ERR_2

Needless to say, both are cryptic messages – after looking up on technet and MSDN and googling/binging :-) it for a while, came across the limitations of excel rendering which we thought that we would post here so that the readers can benefit from it as well.  The limitations are mentioned here:

http://msdn.microsoft.com/en-us/library/ms156418(SQL.90).aspx

The report design was changed as well since there was no need to have that concatenated list in this case but it is good to be aware of the limitations so that you know the next time what might be a possible cause for that weird error that excel throws at you.

Posted in SQL Server | 2 Comments »

SSIS – Data Flow Transformations

Posted by decipherinfosys on October 13, 2009

We have written before about different aspects of SQL Server Integration Services – you can read more on those blog posts here.   Until our previous articles we have been looking at mainly the Control Flow Tasks in the Sql Server Integration Services. Control Flow Tasks maintain the Flow of the logic in the SSIS. The main transformation logic in performed by the data flow task. One of the main uses of the Sql Server Integration Services (SSIS) is to perform the Extraction, Transformation, Data Cleansing and Loading (ETL or sometimes called as ETLM) of the data from various sources. As data from various sources might come in a different format, there is generally a requirement of cleaning up of the data before we load the data into our system. Data cleansing or rather transformation in Sql Server Integration Services (SSIS) is done using the Data Transformation Tasks. Today, in our post, we are going to look at a few of such SSIS Data Transformation Tasks.

One of the data flow transformations which is used for the data cleansing purposes is called the  Lookup Transform. We are going to explore the Lookup Transform in Sql Server 2005. There have been some enhancements in SQL Server 2008 which we are going to look into in future posts. The look up transform is very similar to that of the lookup tab in the data pump task in SQL Server 2000 DTS. This task is used to derive new columns based on the existing set of columns. This transform performs look up on an existing table based on the columns which are existing in the dataflow (incoming source). Let us explore more into this transform with the help of an example.

We will pick up a business use case scenario which is very common and we have faced it as well in our projects.  Let us consider a business scenario where we are getting an inbound file from a client and the file contains the health application records created on that particular day. Furthermore the file contains the City and the State Code but not the zip code. Our task is to look up the zip code by using the City and State Code Columns and populate our destination table. If the look up fails, then, we need to insert a -1 in the destination table.

For that we should have an already existing table in the database which stores the state code, city and the zip code. The following code block creates the table and populates it with the required data.

IF EXISTS(SELECT 1 FROM SYS.TABLES WHERE NAME ='lkpZipCode')
DROP TABLE lkpZipCode
GO
CREATE TABLE lkpZipCode
(
ZipCodeID INT IDENTITY(1,1) PRIMARY KEY
, CityCode VARCHAR(20)
, StateCode CHAR(2)
, ZipCode VARCHAR(20)
)

/*Using SQL Server 2008 functionality to do the inserts in one single statement*/

INSERT INTO lkpZipCode (CityCode, StateCode, ZipCode)
VALUES (‘Bellevue’,'WA’,98007),
(‘Seattle’,'WA’,98004),
(‘Atlanta’,'GA’,30339),
(‘Alpharetta’,'GA’,30005),
(‘Irvington’,'AL’,36544),
(‘Jacksonville’,'FL’,36265),
(‘Austin’,'TX’,73301),
(‘Houston’,'TX’,77001)
GO

Let us now create a table to hold the incoming values. The following piece of code creates the destination table.

IF EXISTS(SELECT 1 FROM SYS.TABLES WHERE NAME ='Customers')
DROP TABLE Customers
GO

CREATE TABLE dbo.Customers
(
CustomerID     INT IDENTITY(1,1) PRIMARY KEY
, CustomerFirstName Varchar(20) NOT NULL
, CustomerLastName  Varchar(20) NOT NULL
, City        Varchar(20)   NOT NULL
, [State]        Varchar(2)   NOT NULL
, ZipCode        Varchar(20)          NOT NULL
)

Create a text file (Customers.txt) and populate it with some sample data.

Open up the Business Intelligence Development Studio (BIDS) and create a new SSIS project.   If you are new to SSIS, we would recommended to read our previous articles about creating and running the SSIS Packages using BIDS.

Package Creation:

Since we are dealing with the flow of data in this scenario, we should use the data flow tasks to transform the data according to our needs.  Drag and drop the Data Flow task from the Control Flow task onto the Control Flow Pane of the BIDS.   Double click on the Data Flow Task which takes us to the data flow pane.   Since our source is a text file, drag and drop the Flat File Source from the data flow task on to the data flow pane.   Configure the Flat File Source to use the flat file we created earlier.

Configuration of the Lookup Transform:

Since we have to derive the zip code based on the State and City columns in the text file, by looking up the zip code in our look up table, we have to use the look up transform.   Drag and drop the lookup transform from the Data Flow Task on to the Data Flow pane.   Double click on the Transform to edit it.   The following image shows the Reference Table Tab of the Transform:

SSIS_DFT_1

This shows the reference table tab of the lookup transform.   In this tab, we can configure the table which we will use to look up our zip code based on the City and the State columns from the incoming file.   Click on the NEW button beside the OLEDB Connection Manager to create a connection to the database in which our table is located. Provide the server name and the database name in the relevant fields and click OK.   The image below shows the creation of the connection to the database:

SSIS_DFT_2

We can either specify a table or a sql query to generate the result set against which we will do a lookup using the City and State columns from the incoming file.   In our case we are using the table to do a look up.   If needed, we can specify a sql query by clicking on the radio button against the ‘Use Results of an Sql Query’ and specify the sql query.   For our scenario, drop down the ‘Use Table or View’ and select the table lkpZipCode.   The image shown below depicts the same:

SSIS_DFT_3

In the columns tab, we configure the columns on which we do a look up on the source columns and the destination columns.   The image below shows the configuration of columns tab in the Lookup Table:

SSIS_DFT_4

Drag the City column from the Source Available columns and drop it on the CityCode column of the destination.   Repeat the same with the State column in the source and the StateCode in the destination.   Since we are looking up the ZipCode column in the lookup table, mention the same in the Lookup Column of the editor.   Since this is a new column added to the data flow, select the ‘Add as new column’ in the lookup operation drop down.   Give the output alias as ZipCode.   This finishes the configuration of the Lookup Transformation.

Lookup Transform Outputs

The lookup Transform has two outputs:  Lookup Match output and the Error Output.   If the Lookup is successful then those rows are sent out to the Lookup Match output.  If the lookup fails to find any records, then those rows are sent to the Error Output.

Derived Column Transform:

As per our business scenario, if the lookup fails, then we have to insert a -1 into the look up table.   For that we have to make use of the Derived Column Transform.   A Derived Column Transform derives a new column and adds it to the data flow based on either the existing columns in the data flow or based on the expressions.   Drag and drop the Derived Column Transform on to the data flow from the data flow task and double click on it to edit the properties.   The image below shows the Transformation editor:

SSIS_DFT_5

In the derived column name provide the name of the column we intend to derive. In this case, since we are intending to insert a -1 zip code if the look up fails, type the name of the derived column as ZipCode and the Expression as -1.   In the derived column, select the ‘add as new column’ since we are adding a new column to the dataflow.  Drag the error output (red) arrow and join the Derived column Transform.   This opens up the editor as shown below:

SSIS_DFT_6

Even if the lookup fails, since we should not error the rows out, select the Redirect Row option in the Error column of the Lookup Output as shown in the image above and click OK.   This will redirect the rows on which the lookup failed to the Derived Column Transform, which adds the column ZipCode with a value of -1 to the data flow for the failed lookup records.

UNION ALL Transform

Since we have to combine the result set of both the successful lookups and un-successful lookup, we will use the UNION ALL Transform.   Drag and drop the union all transform from the data flow task on the data flow.   Drag the green arrow (look up match) arrow from the lookup transform and connect it to one of the input of the union all transform.

OLEDB Destination

As a final step, we have to load the cleansed records into our destination table.   To do that drag and drop the OLEDB Destination from the Data Flow tasks on to the data flow.   Configure the data flow to insert the data set from the UNION ALL transform into the destination table.   This transform has been dealt in detail in our previous posts. The image below shows the screen shot of the entire SSIS package:

SSIS_DFT_7

To summarize, in this blog post, we discussed the use of SSIS to cleanse/transform the data from various sources.  We understood the importance of the lookup transform and dealt with the configuration of the lookup transform in detail.   We also explored other transforms like Derived Column Transforms and UNION ALL transform which are also very handy transforms in the data flow.   In some of the upcoming posts, we will look at some of the enhancements in SQL Server 2008 and will also cover other business use case scenarios.

Resources:

  • Lookup Transform – MSDN article – here.
  • MVP Stacia Misner’s post on Lookup Transform and Caching – here.
  • MVP Jorg Klein’s post on the case sensitive nature of the lookup transform – here.
  • Arshad’s article on sqlserverperformance.com – it discusses the SSIS enhancements in SQL Server 2008 and has a series of posts on the topic – here.
  • Brian Knight’s books on SSIS – these are the best resource for anyone new or experienced in SSIS.  There are other authors as well – too many books & authors to list here – you can see all of them from this link at amazon – here.

Posted in SQL Server | 2 Comments »

Using schema.ini when reading a text file with openrowset()

Posted by decipherinfosys on October 6, 2009

At a client site, their existing code was using openrowset() to read and upload the text files (this process was later on changed to use a SSIS package to have a more robust process).  And a couple of days back, that nightly process failed because the demiliter was present as part of the column string itself.   The problem was that as part of the openrowset() command, there was no way to specify the delimiter and it relied on the default delimiters.  Here is a sample of how the command looked like:

declare @file varchar(100), @cmd varchar(1000)
select @file = 'dataupload_' + convert(varchar(10), getdate(), 112) + '.txt'

select @cmd =
'select * from OpenRowset(''MSDASQL'',''Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=Z:\Inbound_Feed;'',''select * from '+@file+''') as X'
print @cmd

The output was:

select * from OpenRowset('MSDASQL','Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=Z:\Inbound_Feed;','select * from dataupload_20091006.txt') as X

In order to be able to specify a delimiter when reading a text file like this, one needs to take the following steps:

1) Create a file called Schema.ini file.  You can read more about the contents of this file from this msdn post – here.

For our purposes, this file needed to have these enteries:

[dataupload_20091006.txt]
ColNameHeader=True
Format=Delimited(|)

The first entry is the name of the file, the second specifies that the first row has column names and the third entry specifies the custom delimiter.

2) This file needs to be placed in the same folder where the text file is going to be read from.

Once you do that, the upload will go through fine.  As you can see from above, it is a cumbersome process (can be automated through scripts though) and that is why we converted this into a SSIS package with all the bells and whistles and it works like a charm. But wanted to put it out there in case our readers face this issue.

Resources:

  • Schema.ini documentation – here.
  • Openrowset() documentation – here.

Posted in SQL Server | Leave a Comment »

Extracting large XML data using bcp

Posted by decipherinfosys on October 4, 2009

Ran into an issue today at the client site.  There was a stored procedure in the outbound interface of the client application which was responsible for collecting event based data, convert it into an XML and then the SSIS package would take that XML document, encrypt it using pgp encryption and upload it to the client’s partner companies.  Everything was working like a charm for the past couple of months till last night, out of memory exceptions started coming.

The reason was that the XML that was generated was close to 200MB that night – this was because of a large amount of event activity in the system which resulted into a generation of a very large XML.  When the SSIS package was trying to get that data out to encrypt it, it started running into the out of memory exceptions.  The fix that we made was pretty straight forward.  We used bcp to extract out the large XML document.  Here are the steps:

1) First, generate the format file.  A very simple way of doing this is to let the bcp command do it for you.  Here is a sample example:

bcp deciphertest.dbo.large_xml_tbl format nul -T -c -x -f “D:\format_files\large_xml_tbl.xml”

This command takes the table large_xml_tbl from the deciphertest database andd dumps the format file “large_xml_tbl.xml” on the D:\format_files folder.  This is just to illustrate the usage.  In real life, one would be using a query to extract the XML document out rather than the table itself.

2) Once the format file has been generated, you can then use the bcp command to extract out the large XML:

bcp deciphertest.dbo.large_xml_tbl out “D:\XML_Data\large_xml_tbl_1004090815.xml” -SDISprod -T -f “D:\format_files\Test.xml”

This took about 4-5 seconds for the 200MB file.  The encryption was done then and file was uploaded using the regular process.

Resources:

  • BCP XML format files post on mssqltips – here.
  • BCP documentation on MSDN – here.

Posted in SQL Server | 2 Comments »

NMON (Nigel’s monitor) Analyzer for AIX

Posted by decipherinfosys on August 6, 2009

Nmon is a free tool available for both AIX and Linux to monitor the performance of the AIX and Linux servers in terms of I/O, CPU Usage, top processes etc. It is bundled with AIX and is available free for Linux. It is widely used by AIX system administrators and performance tuning specialists.

Nmon has very small footprint when in use but it captures very useful information.  Normally it is located under /usr/local/bin directory.

┌─nmon────────U=Top-with-WLM─────Host=xxxxx────────Refresh=1 secs───18:00.26─┐
│ CPU-Utilisation-Small-View ──────────────────────────────────────────────────│
│                           0———-25———–50———-75———-100│
│CPU User%  Sys% Wait% Idle%|           |            |           |            |│
│  0   0.0   0.0   0.0 100.0|        >                                        |│
│  1   0.0   0.0   0.0 100.0|>                                                |│
│  2   4.0   1.0   0.0  95.0|U         >                                      |│
│  3   0.0   0.0   0.0 100.0|     >                                           |│
│  4   4.0   1.0   0.0  95.0|U         >                                      |│
│  5   0.0   0.0   0.0 100.0|>                                                |│
│Physical Averages          +———–|————|———–|————+│
│All   1.8   0.6   0.0  97.7|>                                                |│
│                           +———–|————|———–|————+│

Above is small fragmented screen shot of nmon at work. It can be scheduled as cronjob and output can be sent to file as well for later diagnostics.

With this brief introduction of nmon, we will talk about nmon analyzer, which is the topic of the blog. Again this is free tool available by IBM, which consolidates data from nmon output and represent it in a very user friendly graphs and charts. Basically it takes output files generated by nmon tool as an input and churns out various graphs/charts in a excel format which one can print, mail or even publish on the web.

There is only one caveat. IBM does not support the tool officially so one cannot seek any help from IBM. Here is the link from which it can be downloaded for free. The tool will be downloaded as a zip file that contains excel file, sample input file and user documentation.

Resources:

  • Wikipedia entry – here.
  • Article – here.
  • Article – here.

Posted in Unix | Leave a Comment »