Systems Engineering and RDBMS

Archive for June, 2008

Getting the first and the last record together

Posted by decipherinfosys on June 27, 2008

While fixing a performance issue with a customer report yesterday, there was a unique requirement that came up. As the items in the warehouse move through the aisles, their datetime timestamp values get changed. In one of the sub-reports, the requirement was to display both the latest record as well as the oldest record in the system i.e. both the first and the last record based on the timestamp column. Also, another thing to add to this was that if the location for the item was not decided yet, then the those items needed to be ranked lower then those items which have a location assigned to them.

There are a couple of ways to resolve this kind of a scenario using SQL. We will present the solution in this post using T-SQL syntax though the same applies (with brief changes to Oracle and DB2 LUW as well – another thing to note also is that Oracle already has analytic functions: first_value() and last_value() that can be used with window functions to easily get these results).

Even though the actual query had a lot of table joins, for the sake of simplicity and to help explain the concept and protect the IP of the client, we will take up an example of a table variable:
set nocount on
go
declare @inventory table (item_id int not null, location_id int null, create_date_time datetime not null)
insert into @inventory values (1, null, cast(’06/27/2007 12:00:00′ as datetime))
insert into @inventory values (2, 10, cast(’06/27/2007 12:15:00′ as datetime))
insert into @inventory values (3, 20, cast(’06/26/2007′ as datetime))
insert into @inventory values (4, 30, cast(’06/21/2007′ as datetime))
insert into @inventory values (5, null, cast(’06/27/2007 1:00:00′ as datetime))
insert into @inventory values (6, 50, cast(’06/23/2007′ as datetime))
insert into @inventory values (7, 70, cast(’06/24/2007′ as datetime))
insert into @inventory values (8, 80, cast(’06/25/2007′ as datetime))
insert into @inventory values (9, 40, cast(’06/27/2007 3:00:00′ as datetime))
insert into @inventory values (10, null, cast(’06/27/2007 12:30:00′ as datetime))

select ‘***ALL Records***’, * from @inventory

                  item_id     location_id create_date_time
----------------- ----------- ----------- -----------------------
***ALL Records*** 1           NULL        2007-06-27 12:00:00.000
***ALL Records*** 2           10          2007-06-27 12:15:00.000
***ALL Records*** 3           20          2007-06-26 00:00:00.000
***ALL Records*** 4           30          2007-06-21 00:00:00.000
***ALL Records*** 5           NULL        2007-06-27 01:00:00.000
***ALL Records*** 6           50          2007-06-23 00:00:00.000
***ALL Records*** 7           70          2007-06-24 00:00:00.000
***ALL Records*** 8           80          2007-06-25 00:00:00.000
***ALL Records*** 9           40          2007-06-27 03:00:00.000
***ALL Records*** 10          NULL        2007-06-27 12:30:00.000

select top 1 ‘***LAST RECORD***’ as Last_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time desc

Last_Record       item_id     location_id create_date_time
----------------- ----------- ----------- -----------------------
***LAST RECORD*** 2           10          2007-06-27 12:15:00.000

select top 1 ‘***FIRST RECORD***’ as First_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time asc

First_Record       item_id     location_id create_date_time
------------------ ----------- ----------- -----------------------
***FIRST RECORD*** 4           30          2007-06-21 00:00:00.000

If you look at the output from above, you can see that the last record that is picked up by the logic is Item #2 even though Item #10 has the latest create_date_time timestamp value. The reason for this is that we had the requirement that if an item has not been assigned a location yet, that should rank lower. So, the above 2 SQL statements are just to show how we can get the first and the last records in a SQL statement. Now, in the report, we needed to get these two together and as you know, if one uses the UNION or UNION ALL clause between two statements, one can use only one ORDER BY clause which applies to the whole set. So, it is not as straight forward as just doing a UNION/UNION ALL operation between the two sets and applying an ORDER BY.

One of the solutions is to use derived tables:

select LR.* from
(select top 1 ‘***LAST RECORD***’ as Last_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time desc) as LR
UNION ALL
select FR.* from
(select top 1 ‘***FIRST RECORD***’ as First_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time asc) as FR

Last_Record        item_id     location_id create_date_time
------------------ ----------- ----------- -----------------------
***LAST RECORD***  2           10          2007-06-27 12:15:00.000
***FIRST RECORD*** 4           30          2007-06-21 00:00:00.000

Another solution is to use CTE (Common Table Expressions):

with results as
(select top 1 ‘***LAST RECORD***’ as Last_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time desc
UNION ALL
select top 1 ‘***FIRST RECORD***’ as First_Record, * from @inventory as I order by
case when location_id is null then 1 else 0 end ASC
,create_date_time asc
)
select * from results;

Last_Record        item_id     location_id create_date_time
------------------ ----------- ----------- -----------------------
***LAST RECORD***  2           10          2007-06-27 12:15:00.000
***FIRST RECORD*** 4           30          2007-06-21 00:00:00.000

Yet another solution can be the usage of the row_number() analytic function, for example:

select item_id, location_id, create_date_time
from
(
select
*
, row_number() over (order by case when location_id is null then 1 else 0 end ASC
,create_date_time desc) as RN_1
, row_number() over (order by case when location_id is null then 1 else 0 end ASC
,create_date_time asc) as RN_2
from @inventory) as IV
where RN_1 = 1 or RN_2 = 1

item_id     location_id create_date_time
----------- ----------- -----------------------
4           30          2007-06-21 00:00:00.000
2           10          2007-06-27 12:15:00.000

There are other solutions as well to this problem. One needs to test these out in one’s scenario and make sure that you have meaningful filter criterias and good execution plans. The next challenge in this report was that the project manager wanted these records listed side by side on the same record. That is pretty simple to do as well – using either pivoting or even simple join between derived tables with a match on the row_number() analytic function value. We leave that as an exercise to the reader and if you have questions, we will be more than happy to post the answer to that as well.

Posted in DB2 LUW, Oracle, SQL Server | 2 Comments »

Performance Counters are missing

Posted by decipherinfosys on June 26, 2008

In yesterday’s post, we had discussed how to go about uploading the perfmon data into a database for querying the data using simple SQL scripts. In this post, we will cover the scenarios when you see that a lot of your performance counters are missing. This question was asked by one of our readers and is also a very common situation in many different shops. We have seen this happen when there are clustered servers or when counters get updated by a hot fix or a service pack.

If you look into the Windows resource kit, you will find a utility called CTRLIST.exe. This reports on all the objects and the counters that are loaded on a particular server. There is also a GUI version of the utility available and the reason why we would recommend using either the command line version or the GUI version is to be able to look at the specifics of the DLLs that are related to each of those counters. That way, it becomes easier to troubleshoot in case a counter is missing or is not working for some reason. You can download the Windows Resource Kit from here.

After reviewing the output from the GUI utility or the text file from the command line utility, you can then check the Windows Application and System logs to see if there are any warnings or errors related to the loading of the performance library. If nothing specific is noticed, we would recommend just unloading and reloading the counter, example:

If the SQL Server counters are missing in action:

a) From the output from the GUI/text file from the command line utility, make a note of the sqlctr.ini file location.

b) Unload the SQL Server counters by executing:

Unlodctr mssqlserver

c) Reload the counters:

Lodctr C:\Program Files\SQL\MSSQL.1\MSSQL\Binn\sqlctr.ini

If you are using a clustered instance, then you need to use the Virtual Server Name in the unload and reload commands and if you have a named instance, then you need to use the virtual server name. If that also does not work for you, look up this KB article from MSFT that talks about rebuilding the entire performance counter library. Please do all this on a test system first to become comfortable with the steps.

Posted in SQL Server, Windows | 1 Comment »

Querying Perfmon data

Posted by decipherinfosys on June 25, 2008

We have blogged before on topics related to perfmon (Performance Monitor) in some of our previous posts – you can search for perfmon on this site to get to those articles.  In this post, we will cover how one can take that data, load it up in SQL Server and run simple SQL queries against it to make analysis even more easier.  In the next post, we will look at one of the issues that was reported by one of our readers – some of the system counters were missing in her perfmon install and she was not able to figure out why that was the case.  We will post the troubleshooting tips that we had provided to her in our next post on this topic.

We will just run a simple perfmon trace and save the results to help demonstrate how one can go about querying that data.  Click on Start/Run and then type perfmon and press Ok.  It will bring up the Performance Monitor window for you.  You will see three counters by default: Pages/Sec., Avg. Disk Queue Length and % Processor Time.  If you click on the + sign in the pane above, you can chose to add/remove more counters in this run.  If you are not familiar with any of the counters, you can also clock on explain to understand what that particular really does.

We have set up a perfmon trace called CPU_HOG as you can see from the image below:

And before we start the trace, let’s make sure that we save this log as a csv file so that we can later on import this into SQL Server.  The default is the *.blg extension which SQL Server won’t recognize while importing this data into the database.  See the image below on how to go about changing the setting for the log file:

Now, let’s right click on the trace and start capturing the data.  After running it for some time, we stop the trace.  And now, we have the file: CPU_HOG_000001.csv which is a csv file that can easily be loaded into SQL Server using the Export/Import Wizard.  One can even chose to write up a SSIS package to be able to nightly collect that information from the lightweight trace(s) to help set up the baselines for the servers and do reporting based on that data using simple SQL queries,

After you have stopped the trace, right click on the database in SQL Server Management Studio (SSMS) into which you want to load up this data and select Tasks/Import Data.  That will bring up the Export/Import Wizard and then you can follow these simple instructions of loading up the file:

We are selecting that csv file as the source above.  Note that you had to select Flat file as the data source and had to put “” (double quote) in the text qualifier section and check the box which stated that the column names are in the first data row.  Now, click on next and select the destination:

After this, you can edit the mappings to dump this in a pre-defined table structure or even let it create the table for you.  We would recommend creating your own table with meaningful column names otherwise, if you let the wizard take the column names from the source, you will end up with column names int his format: “Full Server Name Counter Name”.  And then you can simply follow the next steps and either save this as a SSIS package (refer to our SSIS posts on the blog) or just execute it to copy over the data to a table in the database that you had chosen above.

Once you do that, you can then simply write and execute SQL queries against the data set, for example:

SELECT *
FROM dbo.CPU_HOG_062508
WHERE PAGE_IO_LATCH_WAITS < @PILW
AND MEM_AVAIL_BYTES < @MAB
AND ..

You can create your own stored procedure with different parameters and check for those values every night after the load is done.  This will help you to prepare a baseline version of the performance metrics for your servers as well as help troubleshoot performance issues quickly.

Posted in SQL Server | 1 Comment »

TOP N clause giving different results even with the same order by clause

Posted by decipherinfosys on June 23, 2008

One of our readers had asked a question recently that we would like to share on the blog. The reader was using SQL Server and was using a TOP N clause in the query with an order by on a non-unique column. The results of these two statements:

select top 10 * from x, y where x.col1 = y.col2 order by x.col3 asc

select top 20 * from x, y where x.col1 = y.col2 order by x.col3 asc

showed that in the results from the second query, the top 10 records were not always the same as the top 10 records returned by the first query. The explanation for that is that TOP N clause is deterministic only when you are sorting based on a unique column. If you are going to be sorting on a non-unique column, then the TOP N query is non deterministic.

So, the solutions were two fold:

a) Include the unique column in the sort after the actual column on which you want to sort on. That way, there is no change to your functionality.

b) If you still want to sort on the non-unique column, then you can choose to include the “WITH TIES” clause to make TOP N be deterministic. What this does is that in the above query, if you wanted top 20 rows ordered by col3, it will give you additional rows as well that match the value of col3 in those top 20 rows. This is not always an acceptable solution for obvious reasons. For the reader, the solution (a) worked out well.

Posted in SQL Server | Leave a Comment »

Analytic Functions and sorting on a constant value

Posted by decipherinfosys on June 21, 2008

We have covered analytic functions (ROW_NUMBER(), RANK(), DENSE_RANK() etc.) in our posts and whitepapers before. A requirement for these window functions also is that you need to have an order by clause and for obvious reasons – the values that are generated are based on a particular order of columns (and if there is a partition by clause also, then the order of the column(s) within that partition).

Recently, while working on tuning a particular query at a client engagement, we saw that the client really did not care about the order by column and wanted to just get the row numbers in any order. In which case, there was really no need to pay the sort penalty by including the order by colx in the the SQL code. Let’s see whether that is even possible:

set nocount on
go
create table dbo.test (col1 int not null identity, colx nvarchar(10) not null)
alter table test add constraint pk_test primary key (col1)

declare @i int, @j int
select @i = 1, @j = 10
while (@i <= @j)
begin
insert into dbo.test (colx) select name from sys.objects
set @i = @i + 1
end

And now, let’s try to see the execution plan of this query:

select ROW_NUMBER() over (order by colx) as RN, * from dbo.test

StmtText
——————————————————————————————
|–Sequence Project(DEFINE:([Expr1003]=row_number))
|–Segment
|–Sort(ORDER BY:([DECIPHER_TEST].[dbo].[test].[colx] ASC))
|–Clustered Index Scan(OBJECT:([DECIPHER_TEST].[dbo].[test].[pk_test]))

You will see the sort operation above, as expected. One can argue that we can create an index and minimize the sort costs:

create index test_ind_1 on dbo.test (colx)
/*filegroup clause*/
go

Now, the execution plans looks like this:

StmtText
———————————————————————————————–
|–Sequence Project(DEFINE:([Expr1003]=row_number))
|–Segment
|–Index Scan(OBJECT:([DECIPHER_TEST].[dbo].[test].[test_ind_1]), ORDERED FORWARD)

The cost of sorting is still there though – and this is oversimplifying the issue from the real world since in real world queries, one has many tables joined together with filter conditions etc.. So, why pay the price of sorting when we don’t even need it – remember that in this case the requirement was such that the client did not care about the order in which those row numbers were generated.

So, let’s see if we can avoid it by sorting on a constant value:

select ROW_NUMBER() over (order by 1) as RN, * from dbo.test

This time, we will get an error:

Msg 5308, Level 16, State 1, Line 1
Windowed functions do not support integer indices as ORDER BY clause expressions.

There is a work around for this issue. Instead of just doing an order by 1, we can do a order by (select 1) and that way, the query will be valid:

select ROW_NUMBER() over (order by (select 1)) as RN, * from dbo.test

And here is the execution plan for it:

Without the index:

StmtText
——————————————————————————————
|–Sequence Project(DEFINE:([Expr1005]=row_number))
|–Segment
|–Compute Scalar(DEFINE:([Expr1004]=(1)))
|–Clustered Index Scan(OBJECT:([DECIPHER_TEST].[dbo].[test].[pk_test]))

and with the index:

StmtText
———————————————————————————–
|–Sequence Project(DEFINE:([Expr1005]=row_number))
|–Segment
|–Compute Scalar(DEFINE:([Expr1004]=(1)))
|–Index Scan(OBJECT:([DECIPHER_TEST].[dbo].[test].[test_ind_1]))

And as you can see after comparing these execution plans to the before execution plans, the cost of sorting has been taken out. This was a very specific case for a very specific query in question – typically, one would always want to generate the numbers based on a particular order of a column (or columns) but in case you ever run into a situation like we did, the above solution will work for you and will also ensure that you do not have to pay the price of sorting on a column unnecessarily.

Posted in SQL Server | 2 Comments »

Some more acronyms for you

Posted by decipherinfosys on June 21, 2008

Ahhh… Acronyms – what will the IT world be without these wonderful acronyms. Yesterday, a good friend mentioned that he is working on a LAMP project. So, do you know what it stands for?

LAMP stands for:

L –> Linux

A –> Apache

M –> MySQL

P –> PHP

That’s the complete open source platform that he was using. Likewise, if you are using only a Windows based implementation, another acronym to be familiar with is WISA:

WISA stands for:

W –> Windows

I –> IIS

S –> SQL Server

A –> ASP.Net

Then, there are different permutations of these like: WISP or WASP or WIMP or WIMA (the alphabets stand for the same technologies as stated above).

Posted in Open Source, SQL Server, Windows | Leave a Comment »

Learning Oracle

Posted by decipherinfosys on June 20, 2008

One of our junior team members posed this question some time ago. He was hired as a SQL Server Developer/DBA and wanted to now expand his knowledge and also learn Oracle and Unix. This is one of the common questions from many folks even at client sites so we thought it will be good to post the information on the blog on what we consider to be a good way to learn Oracle. Here goes:

1) The starting point is either Oracle documentation (the Oracle Database Concepts) or a good book like the ones from Tom Kyte. This is because if you are a SQL Server or a DB2 DBA, Oracle is pretty different than what you are used to so getting the concepts right is the first step in the process. Here are the links:

Oracle 10g Database Concepts Documentation, Oracle 11g Database Concepts Documentation, Tom Kyte’s book on Oracle Database and Architecture, Tom Kyte’s book on Effective Oracle by Design and Steven Feuerstein’s book on PL/SQL programming.

Also, download the Oracle 10g Express Edition from here and install it on your system since the only way to learn is to play with the product. 11g Express Edition is not available right now.

The download of the express edition and the install is very straight forward and the download is pretty small – around 160MB only.

2) Participate in forums. That is the way to learn – since you will learn from others and others will learn from you. Since you are just starting in this area, chances are that the issues that you will be facing, someone else also faced them while going through the same path and can help you out. Be courteous and follow the forum etiquettes properly. Here are some good forums:

Oracle-l: This is an excellent list. If you take a look at the list, there are contributions from people like Jonathan Lewis, Cary Millsap, Pete Sharman, Niall Litchfield, Tim Gorman, Tom Kyte and many other respected Oracle professionals.

Oracle Users Co-operative FAQ: This is maintained by Jonathan Lewis and you will find a lot of good information in these FAQs.

OTN Discussion Forums: These are like the MSDN forums in the case of Microsoft. You might even get the actual product managers or the development leads or support analysts to directly help with your question(s).

Oracle Metalink: This has a wealth of information. When working on Oracle projects or even when helping colleagues with their Oracle issues, we use this very extensively. You will need to register for this one.

IOUG (Independent Oracle Users Group): This is not so good on the forums side of things but is excellent for networking, for conferences and for the technical content on the site.

3) Some other web sites:

Tom Kyte’s site is one of the best – you will find answers to most of your questions over there and if you do not, you can post it out there but search the forum first to see whether it has already been answered before.

Ora FAQ’s: There is a wealth of information available in these FAQs. And there are a lots of very good groups and categories.

Don Burleson’s site: Read the newsletters and subscribe to them. A lot of very good tips and tricks for the DBAs and Database Developers.

This list is by no means complete but for a person starting in Oracle or trying to take his/her skills to the next level, this is all you need and then a lots of practice. Keep at it and if we can help, let us know. If you would like us to cover certain topics that you would like to learn, we would be more than happy to do that on our blog or as a whitepaper. We have not started a full fledged forum yet to answer questions because of time constraints but we do plan to do that in the near future. So, keep those questions coming through e-mails to: info@decipherinfosys.com

Posted in Oracle | 2 Comments »

Common Problem – getting the first in a Group

Posted by decipherinfosys on June 18, 2008

This is one of the very common problems in the SQL world and also one that has many different solutions available. The problem description is this:

You have a table with say this structure (using SQL Server Syntax):

set nocount on
go
declare @test table (store_nbr int, product_name nvarchar(20), product_date datetime, aisle_nbr int)
insert into @test values (1, ‘A’, ‘8-12-2004’, 6)
insert into @test values (1, ‘B’, ‘7-10-2005’, 2)
insert into @test values (1, ‘C’, ‘6-11-2006’, 3)
insert into @test values (1, ‘D’, ’11-10-2007′, 4)
insert into @test values (2, ‘E’, ’10-12-2004′, 2)
insert into @test values (2, ‘F’, ‘5-9-2005’, 4)

And what you want to find out is that record in each store which was introduced the last so essentially in each group (i.e. each store), you want to find out the product that was introduced most recently. There are a lot of ways to solve this problem and the solutions vary among RDBMS based on syntax only. One can use analytic functions to resolve this, use a derived table or use a co-related sub-query. Here is one way to solve this issue:

select * from @test A
where A.product_date = (select top 1 B.product_date
from @test as B
where A.store_nbr = B.store_nbr
order by product_date desc)

store_nbr   product_name         product_date            aisle_nbr
----------- -------------------- ----------------------- -----------
1           D                    2007-11-10 00:00:00.000 4
2           F                    2005-05-09 00:00:00.000 4

In this code, we are sorting based on the product date in a descending order in the co-related sub-query and getting the equality match based on the TOP 1 record. We could have also used the ROW_NUMBER() function as well, example:

select store_nbr, product_name, product_date, aisle_nbr
from
(select ROW_NUMBER() over (partition by store_nbr order by product_date desc) as rn, *
from @test A
) as IV
where IV.rn = 1

One can also choose to do a MAX operation in the co-related sub-query, example:

select * from @test as A
where A.product_date = (select MAX(B.product_date)
from @test as B
where A.store_nbr = B.store_nbr)

And in SQL Server, one can also use the CROSS APPLY function in SQL Server 2005.  We will look at the execution plans and the performance characteristics of these different options in an upcoming post.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

The ISNUMERIC() and LIKE clause gotchas

Posted by decipherinfosys on June 17, 2008

This is something that we have seen in a lot of code at many different places. People use the isnumeric() function of SQL Server without realizing some of the issues that this function does not address and then when things do not work the way they expected, it becomes Microsoft’s fault 🙂

Here are some examples to help illustrate the issues with isnumeric() and LIKE clause usage that one should be aware of:

IsNumeric():

set nocount on
go
drop table test
create table test (col1 nvarchar(10))
insert into test values (‘1’)
insert into test values (‘.’)
insert into test values (‘+’)
insert into test values (‘-‘)
insert into test values (‘abcd’)
insert into test values (‘0001’)
insert into test values (‘ ‘)
insert into test values (‘0.25’)
insert into test values (‘$0.25’)
insert into test values (CHAR(9)) /*Tab*/
insert into test values (CHAR(10)) /*Line Feed*/
go

select isnumeric(col1) as col1_numeric_fn, col1 from test

col1_numeric_fn col1
--------------- ----------
1               1
1               .
1               +
1               -
0               abcd
1               0001
0
1               0.25
1               $0.25
1
1

Any surprises for anyone here? Note that “.”, “+”, “-” and “$0.25” return a 1 for the function.

There are better ways of weeding out such values…if you do not have decimals stored within the string, then you can just do:

select case when col1 not like ‘%[^0-9]%’ then 1 else 0 end as is_numeric, col1 from test

is_numeric  col1
----------- ----------
1           1
0           .
0           +
0           -
0           abcd
1           0001
0
0           0.25
0           $0.25
0
0

If you do have decimals also stored in your string, then one can write up a simple UDF (User Defined Function) to take care of all scenarios and use it.

LIKE Clause:

Another gotcha that we would like to point out is with the comparison of non-unicode string with trailing spaces with a unicode string:

select case when ‘test ‘ like N’test’ then ‘matches’ else ‘does not match’ end

————–
does not match

Posted in SQL Server | Leave a Comment »

More CRM Wars

Posted by decipherinfosys on June 16, 2008

Salesforce.com and Google partnered recently to allow the former to bundle google apps along with their CRM applications. The new combined package is called Salesforce.com for GoogleApps and it is available for free to their customers. Their CRM applications integrate with Google App’s e-mail, documents, calender as well as IMs.

So, what does this mean for MS Dynamics? It means more competition to the MS Dynamics CRM Online version. This could also mean a potential buyout of Salesforce.com by Google…only time will tell.

Posted in CRM | Leave a Comment »