Systems Engineering and RDBMS

Archive for May, 2007

Two common ROWNUM mistakes

Posted by decipherinfosys on May 31, 2007

As you know, ROWNUM is a pseudo-column that Oracle exposes for a given result set. This value gets assigned to the result set as they satisfy the predicate (the where clause). Two common mistakes that beginners make when working with rownum are:

1) Using an equality operator for the ROWNUM clause.
2) Using ROWNUM filter prior to applying the ORDER BY clause.

#1 above means that conditions like:

…where rownum = 2


…where rownum > 10

will never ever return results. Why is that? That is because the value for the pseudo-column gets assigned ONLY after the rows are selected as an output. That is why if you want to select say 20 random rows (without the sort), you can do:

…where rownum < 21

#2 means that if you have a query like:

select * from Emp_Master where rownum < 21 order by salary desc

And if you are expecting that you will get the top 20 highest paid employees, that is not the result that you will be getting. The reason being that in this case, the rownum is getting applied before the sort happens. If you want to get the top 20 highest paid employees, you need to first sort the result before putting the filter:

select * from (select * from Emp_Master order by salary desc) where rownum < 21

We had covered this as well as how to select for different ranges in our Paging record-sets whitepaper which you can download from our site here.

Posted in Oracle | Leave a Comment »

Google’s StreetSide

Posted by decipherinfosys on May 30, 2007

Yesterday, Google launched it’s street-view mapping feature in an attempt to further create a richer experience for it’s geoweb services. They have partnered with Immersive Media and Greg Sadetsky has blogged about the technology behind this work. Google is not the only company that is striving for this space – Microsoft has a bird’s eye view as well in it’s maps with a large coverage thanks to their relationship with Pictometry. Another company that is in the fray is the EveryScape as they attempt to provide a 3-D Web available to the users.  They have a demo on their site that you can drive through to get a feel for it.  Interesting concept.

Posted in News, Technology | Leave a Comment »

Yet Another Potential Acquisition

Posted by decipherinfosys on May 30, 2007

Looks like ebay is all set to acquire StumbleUpon (a social bookmarking discovery service) at the close of business today for close to $75 million. This seems like a weird acquisition but the hypothesis given by Om Malik does seem to make sense – you can read more on that in his blog post – here.

Posted in News, Technology | Leave a Comment »

Views and Synonyms

Posted by decipherinfosys on May 30, 2007

Synonyms have existed in Oracle since long – however, they are pretty new to SQL Server world as this feature was introduced in SQL Server 2005.  One of the common questions asked is what is the difference between the two…here are a couple of those:

  • A view is created on one or more objects (tables, views, functions) and can include more than one object type as well as more than one object (example: 3-4 tables, 1 function, 1 view can be combined together to form a single view – a logical definition).  Synonyms are created on only one single object – a table, a view, a function, a stored procedure etc.
  • Not all views are updateable by definition.  If a view is a single table view, it would be updateable, however, if it is comprised of complex calculations and joins, it is not directly updateable – one has to write Instead Of triggers (IO triggers) in order to make them updateable.  Synonyms (when created on tables or single table views) are updateable since they are created on a single object.
  • An index can be created on a view thus forming an indexed view (SQL Server) or a materialized view (Oracle) which materializes the data to the disk – this is pretty common in data warehousing and BI applications.  Synonyms are just a logical definition.
  • Synonyms (in Oracle) can be created as a public or a private synonym.  For a view, one needs to specifically grant privileges since there is no such thing like a public or a private view.

Both views and synonyms can be created over a db-link (Oracle) or linked servers (SQL Server) and are very commonly used for enforcing a security layer in the schema.

Posted in Oracle, SQL Server | 1 Comment »

Vardecimal Storage Format in SP2

Posted by decipherinfosys on May 29, 2007

In case you haven’t gone through the list of all the new features included in the second service pack of SQL 2005, here is the link. These are also included in BOL if you have the latest one installed. We had covered one of the new features logon triggers in one of our blog posts before. Another interesting new addition is the new storage format – vardecimal. Oracle has had variable decimal storage instead of the fixed one since a long time – Oracle Guru Tom Kyte had answered one of the questions on his site regarding this – you can read more on that here.

Firstly, this storage format is only available in the Enterprise and Developer Edition – I do not understand why MSFT would make such a distinction for something so basic in functionality. This storage format can be usedwith the numeric and the decimal data-types and can significantly help reduce the space storage for VLDBs (Very Large DataBases). This applies at the table level since this is a storage format and not a new data-type – that means that you cannot have a mix and match – all the decimal types in a table will adhere to this variable length storage scheme. As you know, depending upon the precision (between 1 and 38) of the numeric/decimal data-type, the storage can vary from 5 to 17 bytes. With fixed storage format, regardless of the value stored in a table, the space used is the same since it is bound to the precision. This would thus mean high storage needs for large tables with lots of decimal columns. If some of those are used in indexes, it would also mean that one can have only a few enteries per index page since the pagesize is fixed. If one could have variable length storage, then the number of index pages would reduce as well for those columns that are using the decimal data-types. Less number of pages would mean not only reduced storage but also faster look-ups for range scans.

As per the BOL:

When a table is stored in the vardecimal storage format, the decimal columns of each row consume only the space that is required to contain the provided number, plus 2 bytes of overhead. The result is always between 5 and 20 bytes. This includes the 2 bytes of overhead to store the offset to the value. However, null values and zeros are treated specially and take only 2 bytes.

You need to first enable the database for the variable decimal storage format. You can use the system stored procedure “sp_db_vardecimal_storage_format” to do so. Example:

exec sp_db_vardecimal_storage_format ‘decpherdemo’, ‘ON’

So, why do we have to do this at the database level when this option is really at the table level? That is to prevent attaching a vardecimal enabled database to prior versions of SQL 2005 which do not support this feature. The above commad internally increments the version number for the database.
MSFT has also provided a stored procedure to estimate the savings in space – sp_estimated_rowsize_reduction_for_vardecimal. You can use that stored procedure to estimate the cost savings in your environment. BOL has examples on it’s execution.
If you have ascertained that you indeed need to change the storage format, then uou can use the sp_tableoption stored procedure to turn “vardecimal storage format” on or off (1 or 0 value for the actual command) at the table level. VLDB’s or large data-warehouses stand to benefit the most. In order to find out the tables that have this option enabled for them, you can use the objectproperty() function –

select *
from sys.objects
where objectproperty(object_id, N’TableHasVarDecimalStorageFormat’) = 1

Posted in Oracle, SQL Server | 1 Comment »

Using the OUTPUT and OUTPUT INTO clauses in SQL Server 2005

Posted by decipherinfosys on May 25, 2007

SQL Server 2005 introduced the OUTPUT & OUTPUT INTO clauses.  Prior to SQL 2005, one had to use 2 SQL statements in a batch to get the values that were just inserted/updated/deleted by the DML statement.  These can now be done using a single statement with the help of these clauses.  These can be very useful for logging/auditing purposes, for confirmation messages or general application processing.

Let’s look at one of such scenarios where we need to use it for error logging for interfaces into the application.  Say, you have 100,000 records in one of the interface tables and out of those 1200 failed some sort of a validation.  You now need to log those as errors in the logging table and instead of doing it row by row, you want to do the right thing and write a SET based code.  Here is an example illustrating such a scenario:

Say the interface table is INPT_PO_HDR which stores the purchase orders that you get from the source with which you are integrating with.  Now, suppose that for a given load, they sent an XML file or a flat file with the data that our program loads up into the table.  If the table now has say 100,000 records for that given load, we want to process them in a SET based fashion so that performance is optimal.  We know that we have 4 validations that we need to do on this incoming data set prior to loading them in our transactional system of tables.  Say one of the validations finds that 1200 records fail the validation check, another one finds that 10 records fail the validation check and the remaining 2 validations go through fine.

Under such a scenario, the code is required to log the error records into the logging table (let’s call it MESSAGE_LOG) and also update the interface table’s ERROR_NBR column with the ID value from the MESSAGE_LOG table so that proper inquiries can be done against it.  Depending upon whether the number of records are going to be large or small, we can decide to use a table variable or a temp table for this purpose (we had covered these in our blog posts before).

Create the temp table first
create table #ERROR_ID_LIST     (inpt_po_hdr_id int, msg_log_id int);

Let’s suppose the country code validation failed
Get the records that failed

For the records that fail the country_code validation, our aim is to log these into the MESSAGE_LOG table as well as tie them up with the INPT_PO_HDR.  In SQL Server 2005, using the “OUTPUT clause”, we can insert these records in the MESSAGE_LOG table and get back the list of records in order to go back and do the updates to the interface table.


from inpt_po_hdr
where stat_code = 0
and not exists (select 1 from country where country_code = inpt_po_hdr.country_code);

update INPT_PO_HDR
STAT_CODE = 90            /*Error Status*/

Now, one can easily join the two tables to look at the error messages as well as the values that were sent in and take appropriate action.

AND… <additional filter criteria – maybe datetime ranges>

Posted in SQL Server | 2 Comments »

Master Domainers

Posted by decipherinfosys on May 24, 2007

Business 2.0’s latest issue had an interesting cover story. It talked about Kevin Ham and a couple of other major league “domainers” in the world – folks who have purchased several key domain names and made an interesting business out of it. The success story of Kevin Ham is particularly interesting – the new and unique business ideas that he came up with around the domain names was an interesting read.

You can read that article at this link. And maybe start thinking about domain names that might make sense for you to buy 🙂

Posted in News, Technology | Leave a Comment »

Simulating a pipeline function in SQL Server

Posted by decipherinfosys on May 24, 2007

When doing database development, many a times there is a requirement to be able to generate a list of running sequential numbers. One of such scenarios is the handling of comma separated lists that we had discussed in one of our blog posts before. We had shown how the pipeline function in Oracle can be used for that purpose. The equivalent of the pipeline function in the case of SQL Server would have to be a UDF (User Defined Function).   Such a UDF will take in two parameters: a) The number of rows to generate, and b) The starting number from which the list of sequential running numbers needs to be generated.

NAME: UDF_GET_SEQUENCE: For generating the sequential numbers in SQL Server
PRINT 'Table Function UDF_GEN_SEQUENCE already exists...Dropping it and recreating'
PRINT 'Creating function UDF_GEN_SEQUENCE'
@Nbr_Of_Rows int,
@Start_Nbr int
returns @table table
Row_Num int not null primary key,
Data_Value int
/* Seed the table with the initial value */
insert @table values (1, @Start_Nbr)

/* Now, loop through and create the list
till the rowcount of the inserts is
more than 0
while @@ROWCOUNT > 0
insert into @table (Row_Num, Data_Value)
select t1.Row_Num + t2.Max_Row_Num,
@Start_Nbr + t1.Row_Num + t2.Max_Row_Num - 1
from @table as t1
cross join
(select max (Row_Num) Max_Row_Num from @table) t2 /*derived table*/
where t1.Row_Num <= @Nbr_Of_Rows - t2.Max_Row_Num

Using this function, one can now achieve a list of the running sequential numbers. Here is an example:

Say, I want a running list of 1000 numbers beginning with 215, here is the execution:

select * from dbo.UDF_GEN_SEQUENCE (1000, 215)

Abridged Output:

Row_Num Data_Value
——– ———–
1 215
2 216
3 217
4 218
5 219
6 220
7 221
8 222
9 223
998 1212
999 1213
1000 1214

This will work fine in both SQL Server 2000 as well as SQL Server 2005.  The same can be achieved by using a CTE in SQL Server 2005 – one would need to recurse through till the limit is reached but if you need a solution that will work for both SS2k and SQL 2005, the UDF above will work fine.

Posted in SQL Server | 1 Comment »

Un-Indexed Foreign Keys

Posted by decipherinfosys on May 23, 2007

Having foreign keys in the system that are not indexed can lead to not only performance issues but also locking issues. We had briefly covered this in one of our previous blog post. In the event of an un-indexed Fk, if you are updating the parent table’s PK columns (very rare and is always a bad practice) or if you are deleting the parent table record, the child table will get locked. The performance also takes a nose dive for the joins if the FK column(s) is not indexed and there is no meaningful criteria on that table to filter upon. In the event of the FK column(s) being indexed, it could use that index to dig into the rows and filter them down to a smaller sub-set.

So, how can one easily identify the FK’s that are not indexed in the system? Oracle Guru Tom Kyte had posted this code on his web-site:

And based on the same lines, one can do this in the case of SQL Server (code written and tested in SQL 2005):

max(case c.ordinal_position when 1 then c.column_name else null end) +
coalesce(max(case c.ordinal_position when 2 then ‘, ‘ + c.column_name else null end) , ”) +
coalesce(max(case c.ordinal_position when 3 then ‘, ‘ + c.column_name else null end) , ”) +
coalesce(max(case c.ordinal_position when 4 then ‘, ‘ + c.column_name else null end) , ”) +
coalesce(max(case c.ordinal_position when 5 then ‘, ‘ + c.column_name else null end) , ”) +
coalesce(max(case c.ordinal_position when 6 then ‘, ‘ + c.column_name else null end) , ”) +
coalesce(max(case c.ordinal_position when 7 then ‘, ‘ + c.column_name else null end) , ”)
as tab_columns
information_schema.constraint_column_usage a,
information_schema.TABLE_CONSTRAINTS b,
information_schema.key_column_usage c
where a.constraint_name = b.constraint_name
and b.constraint_type = ‘Foreign Key’
and a.table_name = c.table_name
and a.column_name = c.column_name
group by a.table_name, a.constraint_name) AS A
(select object_name(b.object_id) as table_name,,
max(case b.index_column_id when 1 then index_col(object_name(b.object_id), b.index_id, b.index_column_id) else null end) +
coalesce(max(case b.index_column_id when 2 then ‘, ‘ + index_col(object_name(b.object_id), b.index_id, b.index_column_id) else null end) , ”) +
coalesce(max(case b.index_column_id when 3 then ‘, ‘ + index_col(object_name(b.object_id), b.index_id, b.index_column_id) else null end) , ”) +
coalesce(max(case b.index_column_id when 4 then ‘, ‘ + index_col(object_name(b.object_id), b.index_id, b.index_column_id) else null end) , ”) +
coalesce(max(case b.index_column_id when 5 then ‘, ‘ + index_col(object_name(b.object_id), b.index_id, b.index_column_id) else null end) , ”) +
coalesce(max(case b.index_column_id when 6 then ‘, ‘ + index_col(object_name(b.object_id), b.index_id, b.index_column_id) else null end) , ”) +
coalesce(max(case b.index_column_id when 7 then ‘, ‘ + index_col(object_name(b.object_id), b.index_id, b.index_column_id) else null end) , ”)
as tab_columns
from sys.indexes a
inner join sys.index_columns b
on object_name(a.object_id) = object_name(b.object_id)
and a.index_id = b.index_id
where a.type in (1, 2 /*, 3 — for XML index */)
group by object_name(b.object_id), AS B
on A.table_name = B.table_Name
and B.tab_columns LIKE A.tab_columns + ‘%’

This SQL code will provide you with an output of the FK constraints that are not indexed in your system. Output will look something like this (dummy tables used for illustration):

******              TABLE_A          COL1                      NULL
******              TABLE_B          COLX                      NULL
OK                    TABLE_C          COL1                       COL1,  COL2
OK                     TABLE_D         COL10                    COL10

and likewise for other tables. The ones with the “******” are the ones that need to be evaluated on a case by case basis.

Posted in Oracle, SQL Server | 1 Comment »

Handling a comma separated list in DB code

Posted by decipherinfosys on May 22, 2007

SQL Server MVP Erland Sommarskog has an excellent post about arrays and lists in SQL Server – he also covers one of the solutions for handling comma separated list in DB code and covers the different options that are available (XML, comma separated lists, IN operators etc.).  It is an excellent read and is very well written with the pros and cons of each option.  Let’s take one of the examples that he had and extend it to Oracle as well.

Let’s see the code first and then we can evaluate how this operates.  The driver of this whole logic is the pipeline function : “f_SequenceNumbers”.  It is based on a type and it’s purpose is to return a list of numbers from the starting number that is passed in to the end number that is passed in.  That logic helps in creating those new records in an array format so that the comma separated list can be converted from a big fat string to a column with different rows.

/*Oracle Syntax*/


CREATE OR REPLACE FUNCTION f_SequenceNumbers(startNumber INT,
endNumber   INT)
RETURN t_SequenceNumbers PIPELINED
FOR i IN startNumber..endNumber

RAISE_APPLICATION_ERROR(-20002, ‘Error in getting Sequence Numbers’ || CHR(10) || SQLERRM);
END f_SequenceNumbers;

So, an execution like:

select column_value row_num from table(f_sequencenumbers(1, 1000))

will return you a column called ROW_NUM with 1000 rows with the starting number as 1 and the ending number as 1000.  Now, if you have more than 1000 values in your comma separated string, then you can change this number to accordingly.


I doubt that anyone would ever want to pass more values than that (since if that is the case, there is bound to be something wrong with the design) but in case you do have a requirement to do so, you can just put the right number in the above function.  With this pipeline function in mind, now let’s take a look at the actual code that does this conversion:

set autoprint on
variable per1 refcursor;
v_string varchar2(1000) := ‘E050160059,E050342378,E050384514,E050384518,E050384519,E050384520’;
open :per1 for
SELECT ltrim(rtrim(substr(‘,’ || v_string || ‘,’, IV.Row_Num + 1,
INSTR(‘,’ || v_string || ‘,’, ‘,’, IV.Row_Num + 1) – IV.Row_Num – 1))) COL_VALUES
FROM   (select column_value row_num from table(f_sequencenumbers(1, 1000))) IV
WHERE  IV.Row_Num <= length(‘,’ || v_string || ‘,’) – 1
AND  substr(‘,’ || v_string || ‘,’, IV.Row_Num, 1) = ‘,’;


6 rows selected.

The first thing to observe is the expression  ‘,’ || v_string || ‘,’  that re-occurs no less than four times. By adding the delimiter in beginning and at the end of the string, the first and last items in the string appear in the same context as all other items.

Next, let’s turn to the WHERE clause. The expression:

substr(‘,’ || v_string || ‘,’, IV.Row_Num, 1) = ‘,’

evaluates to TRUE for all positions in the string where the delimiter appears. The expression:

IV.Row_Num <= length(‘,’ || v_string || ‘,’) – 1

simply sets an upper limit of which numbers we are using.

Let’s now look at the SELECTed column Value. We extract the strings with substr, starting on the position after the delimiter. We find the length of the substr by searching for the next delimiter with INSTR, taking use of its third parameter which tells INSTR where to start searching. Once the next delimiter is found, we subtract the position for the current delimiter and then one more, as the delimiter itself should not be included in the extracted string, to get the length of the list item.  Finally, the code also puts a ltrim and a rtrim to make sure that in case there were leading and trailing spaces after/before the comma was specified, those are taken care of as well.

This approach can be used for creating arrays out of a comma separated list provided by the application and then this output can very easily be treated as if you were querying/joining or operating upon a table in itself and it makes it very easy to manipulate the records on this basis and apply conditional logic as needed.  Please do note that whenever possible, instead of passing a string of comma delimited values, one should strive to use that criteria that qualifies for that list of values and use that in the SQL.  In other words, if the application is going to pass say 1000 values in a comma separate string values, then if the application can specify the criteria (say stat_code = 30 and some_column_val = 1) which qualified those 1000 values, then that should be used in the SQL statements.  However, there are scenarios where such a criteria cannot be applied especially when the user can pick and choose the values that they need to process and the above piece of code will do the trick for you then.

Posted in DB2 LUW, Oracle, SQL Server | 4 Comments »