Systems Engineering and RDBMS

Archive for March, 2007

Enforcing rounding rules

Posted by decipherinfosys on March 31, 2007

This was an issue that one of our clients faced recently.  This was more to do with incorrect interface design and not putting the right checks in the schema design rather than anything else but there were essentially no rules to prevent the bad data getting into the system.  The issue was that the third party interface that was designed was sometimes sending money amounts in the form of decimals but the initial design expected it to be whole numbers and hence the database team had designed that column to be NUMBER(8) (Oracle).  However, from some of the interfaces, they were getting data with decimals, for example: 12345678.50.  So, the database would round the value, if possible, to make the data fit.  Only if the data could not fit after the rounding would it give an error.  This of course meant that there was now a mis-match in dollar amounts between the two systems and over a period of time, there was a huge difference between the two systems.

Design wise, they had agreed that it really needs to be NUMBER(8) only and if there are decimal values beyond that, instead of letting the database do the rounding, that record should be rejected.  This logic could of course be done on the application tier as well as the database tier.  On the database tier, it will be as simple as enforcing a check constraint to error out that record and that was the solution that we went with.

We will take the example of SQL Server below since the same issue applies:

Let’s create a table DEMO_ROUND_TEST with a column and make it a TINYINT data-type just to demonstrate the issue and the fix:

CREATE TABLE DEMO_ROUND_TEST
(
MONEY_AMOUNT    TINYINT
)
GO

Now, insert three records in it –

INSERT INTO DEMO_ROUND_TEST VALUES (123)
GO
INSERT INTO DEMO_ROUND_TEST VALUES (123.12)
GO
INSERT INTO DEMO_ROUND_TEST VALUES (123.90)
GO

–Let’s check the data now:

SELECT * FROM DEMO_ROUND_TEST

MONEY_AMOUNT
————
123
123
123

So, as you can see, only 123 got inserted into the field and the rest got tapered off – it would have resulted into a discrepancy of a dollar and two cents.  Let’s create a check constraint now to prevent those records to get in i.e. let’s enforce the tinyint data-type not as a data definition but rather as a constraint:

drop table DEMO_ROUND_TEST
go

CREATE TABLE DEMO_ROUND_TEST
(
MONEY_AMOUNT    NUMERIC(5,2)
)
GO

alter table demo_round_test add constraint ckc_money check (cast(money_amount as tinyint) = money_amount)
go

Now, let’s try to insert again:

INSERT INTO DEMO_ROUND_TEST VALUES (123)
GO
INSERT INTO DEMO_ROUND_TEST VALUES (123.12)
GO

You will get the error message:

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the CHECK constraint “ckc_money”.

And the data inserted is only the first record.  This way, we can prevent decimal values from getting into the integer/number data-type column(s) and prevent issues with the rounding.  This approach was taken because the question of changing the interface design was ruled out – alternatively, this rule can be defined there itself and prior to loading up the data, the bad data records can be removed.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

Conditional Compilation in 10gR2

Posted by decipherinfosys on March 30, 2007

Conditional compilation is a new feature in 10gR2’s PL/SQL – it essentially makes the compiler ignore or recognize code. Let’s take an example first and then we will mention the advantages of this feature:

SQL> create or replace procedure test
2 as
3 begin
4 $IF $$debug_test $THEN
5 dbms_output.put_line( ‘My debug test code’ );
6 dbms_output.put_line( ‘is put right here.’ );
7 $END
8 dbms_output.put_line( ‘My actual code starts here’ );
9 end;
10 /
Procedure created.

Now, let’s execute it:

SQL> exec test
My actual code starts here
PL/SQL procedure successfully completed.

The debug code was completely ignored in the execution above. Now, by enabling the “debug_test” variable, we can enable that debug code – do note that debug_test is my naming, it doesn’t have to be called that, you can define your own variables.

SQL> alter procedure test compile
2 plsql_ccflags = ‘debug_test:true’ reuse settings;
Procedure altered.

SQL> exec test
My debug test code
is put right here.
PL/SQL procedure successfully completed.

This helps in code instrumentation by enabling or disabling the debug code at will. It also helps in writing code that can work across versions so if you want to use the same code but want to take advantage of some neat feature in version 10gR2, you can compile one set of code for say 9iR2 and the other one for 10gR2 – without a need for two code sets. This helps those vendor companies who provide solutions to their clients who can chose to implement the code in either version.

Posted in Oracle | 1 Comment »

Autotrace enhancement in 10gR2

Posted by decipherinfosys on March 30, 2007

In 10gR2, autotrace uses DBMS_XPLAN to display the explain plans.  In prior versions, one had to use the DBMX_XPLAN packae to get the same output.  This gives much more detailed information to the end users – especially important is the addition of the predicates at the bottom of the plan which clearly show the step at which Oracle’s optimizer is applying them.

Posted in Oracle | Leave a Comment »

Trace Analyzer

Posted by decipherinfosys on March 29, 2007

Trace Analyzer is an Oracle utility and is used to generate a comprehensive report that can be used to troubleshoot common SQL performance issues, or functional issues requiring to find the values of bind variables using the raw SQL Trace files with binds/waits that are generated by EVENT 10046 with Levels 4, 8 or 12. You can refer NOTE: 224270 on Metalink for further reference on this utility.

From that Metalink Note, here is what this utility offers:

/********************************************************************************************************************************************/

Metalink Note: 224270:

The Trace Analyzer report includes, for any given Raw SQL Trace:

  1. Total User Elapsed Time (actual); Accounted Elapsed and CPU Times (same as TKPROF); WAIT Times (IDLE and NON-IDLE).
  2. Top 5 expensive SQL in terms of CPU, Elapsed, non-idle and idle wait times.
  3. Gaps of no tracing activity; Oracle Errors; End of Transactions (COMMIT or ROLLBACK).
  4. Summary of CALLs (PARSE, EXECUTE, FETCH) by USER/INTERNAL and RECURSIVE/NON-RECURSIVE (similar to TRCSUMMARY and last page of TKPROF).
  5. Summary of CALLs per Command Type (SELECT, INSERT, etc.) by USER/INTERNAL and RECURSIVE/NON-RECURSIVE.
  6. Summary of WAITs (IDLE and NON-IDLE) by USER/INTERNAL and RECURSIVE/NON-RECURSIVE (similar to TRCSUMMARY).
  7. Hottest 5 blocks (most times waited for)
  8. Summary of CPU, Elapsed Time and WAITs per SQL Statement (CURSOR).
  9. Summary of Physical Reads, Logical Reads, Number of Rows and Library Cache Misses per SQL Statement (CURSOR).
  10. For each SQL Statement (CURSOR).
    1. CURSOR_ID, Length, Address, Hash Value, Optimizer Goal, USER, Parsing Errors.
    2. SQL Statement Text.
    3. CALLs Summary, including CALLs Count, CPU Time, Elapsed Time, Physical Reads (disk), Logical Reads (query and current), Rows, and Library Cache Misses.
    4. Row Source Plan.
    5. Explain Plan (if SQL Trace is processed on same Instance where it was generated). Includes execution order.
    6. List of tables being accessed by SQL according to Explain Plan, and indexes for these tables. Includes CBO Stats: number of rows, blocks, sample size and last analyzed date.
    7. For listed indexes, their indexed columns and if the index is unique or not.
    8. WAIT EVENTs and corresponding details (ex: Table/Index names and count of accessed blocks).
    9. Bind Variables Values used for each Execution.
    10. Execution Performance of each set of Bind Variables Values.

/********************************************************************************************************************************************/

We used this utility recently to help us find duplicate SQLs that were being fired by the application code – those SQLs were being dynamically generated using an ORM (Object Relational Mapper) and the application was un-necessarily re-loading the same data again and again. This utility was used to identify the bind variable values to ensure that the statements were indeed duplicate and were then matched with the application code to see why there was a need to re-load the data. We were able to remove a lot of such redundant calls and that in the end helped improve the performance of the application significantly since there were savings not only on the network round trips and the database execution but also on the application tier.

Getting SQL Counts

Assuming that you have run the Trace Analyzer utility on the generated trace files (we have covered tracing in Oracle before in our blog posts – you can search for them if you want to read up on tracing in oracle), you can use these scripts to find out the repeated SQL. First, you need to find out the TRACE_ID value for you trace files. The following SQL will return you the TRACE_ID for your trace file output.

SQL> Select Trace_Id From Trca$Trace
Where Trace_Filename = ‘<trace filename>’;

Then you can use this script:

Set Pages 1000 Lines 120 Veri Off Feed Off Echo Off
DEF p_trace_id = &1;

Col Ad Head ‘Address’ Format a10 wrap
Col Hv Head ‘Hash|Value’ Format 9999999999 wrap
Col Sql_Text Head ‘Sql Text’ Format a75 wrap
Col Without_Binds head ‘Count|W/O Binds’ format 9999999
Col With_Binds head ‘Count|W/ Binds’ format 9999999
Ttitle “SQL Statements that are executed more than once with/without Bind values”
Brea On Cursor_Id On Ad On HV On Without_Binds On With_Binds
Select Trca$cursor.Ad,
Trca$cursor.Hv,
Trca$Sql_Counts.Without_Binds,
Trca$Sql_Counts.With_Binds,
Trim(Trca$sql_Text.Sql_Text) As Sql_Text
From Trca$cursor
Inner Join Trca$sql_Text
On Trca$cursor.Trace_Id = Trca$sql_Text.Trace_Id
And Trca$cursor.Cursor_Id = Trca$sql_Text.Cursor_Id
Inner Join (Select /*+rule */
Trca$parsing_In_Cursor.Trace_Id As Trace_Id,
Trca$parsing_In_Cursor.Cursor_Id As Cursor_Id,
Count(*) As Without_Binds,
0 As With_Binds
From Trca$parsing_In_Cursor
Where Trca$parsing_In_Cursor.Trace_Id = &&p_trace_id
And Not Exists
(Select 1
From Trca$binds
Where Trca$binds.Cursor_Id = Trca$parsing_In_Cursor.Cursor_Id
And Trca$binds.Trace_Id = Trca$parsing_In_Cursor.Trace_Id)
And Exists
(Select 1
From Trca$cursor
Where Trca$cursor.Cursor_Id = Trca$parsing_In_Cursor.Cursor_Id
And Trca$cursor.Trace_Id = Trca$parsing_In_Cursor.Trace_Id
And Trca$cursor.Uid$ <> 0)
Group By Trca$parsing_In_Cursor.Trace_Id,
Trca$parsing_In_Cursor.Cursor_Id
Having Count(*) > 1
Union
Select /*+rule */
Trca$parsing_In_Cursor.Trace_Id,
Trca$parsing_In_Cursor.Cursor_Id,
0,
Count(*)
From Trca$parsing_In_Cursor
Where Trca$parsing_In_Cursor.Trace_Id = &&p_trace_id
And Exists
(Select 1
From Trca$binds
Where Trca$binds.Cursor_Id = Trca$parsing_In_Cursor.Cursor_Id
And Trca$binds.Trace_Id = Trca$parsing_In_Cursor.Trace_Id)
And Exists
(Select 1
From Trca$cursor
Where Trca$cursor.Cursor_Id = Trca$parsing_In_Cursor.Cursor_Id
And Trca$cursor.Trace_Id = Trca$parsing_In_Cursor.Trace_Id
And Trca$cursor.Uid$ <> 0)
Group By Trca$parsing_In_Cursor.Trace_Id,
Trca$parsing_In_Cursor.Cursor_Id
Having Count(*) > 1) Trca$sql_counts
On Trca$Cursor.Trace_Id = Trca$Sql_Counts.Trace_Id
And Trca$Cursor.Cursor_Id = Trca$Sql_Counts.Cursor_Id
Where Trca$Cursor.Trace_Id = &&p_trace_id
Order By Trca$cursor.Cursor_Id, Trca$Sql_Text.piece
/

Usage:
Using SQL*Plus (log into Trace Analyzer repository)
SQL> start SqlCounts <trace_id>

Sample Output:
Hash Count Count
Address Value W/O Binds W/ Binds Sql Text
———- ———– ——— ——– —————————————————————-
8b85b778 1497648559 696 0 SELECT ADDRESS.ADDRESS_LINE_1 FROM ADDRESS

Where
Address: represents the SQL Address of the query, that can be joined with V$SQLAREA for any additional info. Should be used along with with Hash_Value. Please note that this value should be used with INSTR function.
Hash_Value: represents the SQL Hash Value of the query that can be joined with V$SQLAREA for any additional info. Should be used along with Address.
Count W/O Binds: represents the number of times the query is executed without bind values
Count W/ Binds: represents the number of times the query is executed with bind values.

Getting Repeated SQLs with Bind values

This script shown below displays information regarding the SQL that are executed more than once using the same bind value(s). The output of this script is in two parts.

1. Shows the text of the SQL that is being executed more than once along with CURSOR_ID, ADDRESS and HASH_VALUE information.
2. Shows the actual bind values used in the above queries along with the CURSOR_ID and #of Repeats. You have to match the bind value output to the above output using the Cursor_Id value

Set Pages 1000 Lines 120 Veri Off Feed Off Echo Off

DEF p_trace_id = &1;

Col Cursor_Id Head ‘Cur|Id’ Format 999
Col Address Head ‘Address’ Format a12
Col Hash_Value Head ‘Hash|Value’ Format 999999999999
Col Sql_Text Head ‘Sql Text’ Format a80
Ttitle “SQL Statements that are executed repeatedly with same Bind values”
Brea On Cursor_Id On Address On Hash_Value skip 1
Select /*+rule */
Trca$cursor.Cursor_Id As Cursor_Id,
Trca$cursor.Ad As Address,
Trca$cursor.Hv As Hash_Value,
Trca$sql_Text.Sql_Text As Sql_Text
From Trca$cursor Inner Join Trca$sql_Text
On Trca$cursor.Trace_Id = Trca$sql_Text.Trace_Id
And Trca$cursor.Cursor_Id = Trca$sql_Text.Cursor_Id
Where Trca$cursor.Trace_Id = &&p_Trace_Id
And Trca$cursor.Uid$ <> 0
And Exists
(Select 1
From Trca$parsing_In_Cursor,
Trca$binds
Where Trca$parsing_In_Cursor.Trace_Id = Trca$binds.Trace_Id
And Trca$parsing_In_Cursor.Line_Id = Trca$binds.Parsing_In_Cursor_Line_Id
And Trca$parsing_In_Cursor.Cursor_# = Trca$binds.Cursor_#
And Trca$parsing_In_Cursor.Cursor_Id = Trca$binds.Cursor_Id
And Trca$parsing_In_Cursor.Trace_Id = Trca$cursor.Trace_Id
And Trca$parsing_In_Cursor.Cursor_Id = Trca$cursor.Cursor_Id
Group By Trca$parsing_In_Cursor.Trace_Id,
Trca$parsing_In_Cursor.Cursor_Id,
Trca$parsing_In_Cursor.Ad,
Trca$parsing_In_Cursor.Hv,
Trca$binds.Binds
Having Count(*) > 1)
Order By Trca$cursor.Cursor_Id,
Trca$sql_Text.Piece Asc
/
Ttitle Off
Ttitle “Bind Value Information for above Statement’s Cursor Id”
Col Bind_Value Head ‘Bind|Value’ Form a80
Col Repeat_Count head ‘# of|Repeats’ form 999,999,999
Select /*+rule */
Trca$parsing_In_Cursor.Cursor_Id As Cursor_Id,
Trca$binds.Binds As Bind_Value,
Count(*) As Repeat_Count
From Trca$parsing_In_Cursor,
Trca$binds
Where Trca$parsing_In_Cursor.Trace_Id = Trca$binds.Trace_Id
And Trca$parsing_In_Cursor.Line_Id = Trca$binds.Parsing_In_Cursor_Line_Id
And Trca$parsing_In_Cursor.Cursor_# = Trca$binds.Cursor_#
And Trca$parsing_In_Cursor.Cursor_Id = Trca$binds.Cursor_Id
And Trca$parsing_In_Cursor.Trace_Id = &&p_Trace_Id
And Exists
(Select 1
From Trca$cursor
Where Trca$cursor.Cursor_Id = Trca$parsing_In_Cursor.Cursor_Id
And Trca$cursor.Trace_Id = Trca$parsing_In_Cursor.Trace_Id
And Trca$cursor.Uid$ <> 0)
Group By Trca$parsing_In_Cursor.Trace_Id,
Trca$parsing_In_Cursor.Cursor_Id,
Trca$parsing_In_Cursor.Ad,
Trca$parsing_In_Cursor.Hv,
Trca$binds.Binds
Having Count(*) > 1
/

clear breaks columns
undef 1 p_trace_id

Usage:
Using SQL*Plus (log into Trace Analyzer repository)
SQL> start SqlRepeatsWithBinds.sql <trace_id>

Sample Output:

Cur Hash
Id Address Value Sql Text
—- ———— ————- —————————————————————————
14 8f2b83f8 4008304268 SELECT USER_MASTER.USER_NAME
FROM USER_MASTER
WHERE ( (((( USER_MASTER.ID = :1 ) AND ( USER_MASTER.CO = :2 ) )
AND (USER_MASTER.DIV = :3 )) AND (USER_MASTER.TYPE = :4 ))
AND ( USER_MASTER.VAL_FLAG = :5 ) )

Cur Bind # of
Id Value Repeats
—- ——————————————————————————– ————
14 0:”100″ 1:”100″ 2:”100″ 3:101 4:”L” 348
0:”100″ 1:”100″ 2:”100″ 3:101 4:”U” 348

Please do note that the same can be achieved by doing code instrumentation at the application layer to shell out the statements to a file and then just running a parser against it to present it in a readable format. This client was using an in-house built data access layer and had that functionality also baked into the product so that was also used to shell out the statements along with the bind values to a text file and then using perl, we were able to parse through that file and present it in the right format and the output was the same as shown above.

Posted in Oracle | 1 Comment »

Bookmark Look-up Operator

Posted by decipherinfosys on March 28, 2007

A bookmark look up operator in a SQL Server execution plan indicates that an index was used for the retrieval of the data and then a bookmark was then used to look up the data in the clustered index or the table to look up more data in order to suffice the query requirements.  This look-up is used typically to retrieve columns that are in the select statement that were not in the index.  This means that at-least twice the I/O is necessary in order to suffice the query.  If the original index leading to the bookmark is selective enough, then having a bookmark look-up is not an issue, however if the original index does not filter down the data to a less result set, then this bookmark look-up can cause sever performance issues which becomes obvious in case of large tables since the data is typically spread across several pages in that case.

In SQL Server 2005, a bookmark look-up is not used.  A clustered index seek operation and a RID (Row-ID) look-up is used instead and if you have Service Pack 2 installed, then the Key Look-up operator also provides the same functionality.

Posted in SQL Server | Leave a Comment »

Using Stored Procedures vs Dynamic SQL generated by ORM

Posted by decipherinfosys on March 27, 2007

This is one of those topics that always generates a lot of heat among developers/architects/DBAs etc/. People typically take a very strong stance on one approach over the other. We would take a neutral approach and highlight the pros and the cons in a humble effort to put our opinion forth. As most of the IT situations, the answer to this question is : “It Depends”. It really depends upon the type of your application and what the application is supposed to do. We are only talking about OLTP applications here…for BI applications where heavy data churning is required, it is always advisable to keep the code in the DB layer to use SET based logic to do your processing logic or use an ETLM tool for the processing logic.

The advantages of using stored procedures are:

1) Network Bandwidth: Assume that the application server(s) and the database server(s) are separate servers. Since the source code is actually stored on the database server, the application only needs to send the name and the parameters for executing it and this in turn reduces the amount of data that is sent to the server. When you use embedded SQL or dynamically generated SQL through an ORM, then the full source of commands must be transmitted and in a scenario where there is a lot of concurrent activity going on and the load on the system requires a lot of users, this can very soon become a performance bottleneck. This can be mitigate in part by a judicious usage of views as well.

2) Abstraction Layer: It helps in separating the business logic from data manipulation logic. Since the interface to the application remains the same, changes done internally to the stored procedures/packages do not effect the application and in turn leads to easy deployment of changes.

3) It offers simplified batching of the commands. Since stored procedures/packages are meant to do a “Unit of Work”, this procedural logic can be simple to maintain and offers additional advantages like making use of the rich feature functionality that the database engines provide. SQL is a SET based language and using SET based procedural logic is the easiest and most performant way of dealing with the data. With every new release of Oracle, SQL Server or DB2 LUW, new features are being introduced in PL/SQL, T-SQL and/or SQL/PL which makes handling of different types of requirements very easy in the DB layer code.

4) Increased data security and integrity: One can secure the tables for direct access and only grant privileges on the stored procedures/packages.

5) By using bind variables (parameterized queries), SQL injection issues are mitigated as well.

6) Code Instrumentation and tracing logic can be built in very easily using the stored procedures. This is one thing that we implemented for one of our clients recently. We created a table which had a list of the DB code that was being used in the schema and this table had a trace_flag column in it which could have 4 different values:

0 (no code instrumentation),

1 (log the sequence of events),

2 ( log the sequence of events and the time taken by those SQL statements),

3 ( log the sequence of events + the time taken + the execution plan from that point of time – since the execution plan can very easily be different at the time of execution under a load scenario vs when you actually run it separately), and

4 (Start the trace – example: starting 10046 level 12 trace in the case of Oracle).

Using this logic, code instrumentation and troubleshooting production issues became very easy. One could then run reports against the data that was logged and present it to the end user or the support personnel. Code instrumentation can be done in the application tier as well using the same guidelines (or using logging blocks like MSFT logging block in .Net) and a good programmer would always instrument their code. However, for the DB code, this code instrumentation becomes a lot more easier to implement.

Cons of using stored procedures:

1) If your application runs on multiple RDBMS, example: You are a vendor and you need to provide your product that runs on Oracle, SQL Server and DB2 LUW in order to expand your customer base, then in that scenario, you have to code or put fixes in for three different code bases. Not only that, you need to have proper staffing to ensure that the code written is optimal since the locking and concurrency mechanisms are very different between these RDBMS. Also, the language used by all these “big 3” is very different as well.

2) We have seen client shops which offload all of their CRUD operations on the DB tier – as a result they end up with one-liner stored procedures and if you have say 200 objects, you now have 200 * 4 (select/insert/update/delete) stored procedures or one procedure per object with the flag option to indicate the operation and need to code the procedure to use dynamic SQL in order to take into account the conditional parameter logic.  Maintenance becomes a nightmare in this case.  Also, developers/DBAs sometimes get carried away with this approach and forget that SQL is a set based language – one example is that using this scheme, a client shop was doing purges of the data and kept calling the delete procedure by passing in the ID (surrogate Key) value when they actually could have purged and archived the data using a SET based logic.  Procedures are supposed to do unit of work – having one liner procedures and invoking that many calls in a batch does not yield any benefit.  In addition, it has to un-necessarily incur the cost of checking permissions and plan associated with that one liner procedure – the cost is albeit a very miniscule one.

3) Parsing of strings is not one of the forte’s of the DB languages – that code is better suited in the application tier unless you start using CLR or Java code.

So, the bottom line is that one needs to carefully evaluate which portion(s) of the application really belong as stored procedure/packages. In applications that work on volume of data and do bulk data processing, it is always advisable to have that logic in stored procedures/packages that reside on the DB layer so that one can take advantage of the SET based approach of the SQL language.

One can also use ORMs (Object Relational Mappers) like Hibernate to prepare their data access strategy – one can then extend it to make calls to the database stored procedure (if so desired in certain situations), have externalized SQL statements, have dynamic SQL statements being prepared by the ORM etc.. One just has to make the right decision depending upon the application and the requirements that are needed.

Another thing to remember is when people point out that an advantage of stored procedures is that the code is always pre-compiled, that is not always true, there can be scenarios that can lead to re-compilation. Also, if proper bind variables are being used for the queries built using an ORM, it serves the same purpose (i.e. has the same advantage as that of a parsed/compiled stored procedure query) since the plan for that SQL statement is parsed and compiled. Depending upon a lot of factors (cache sizes, concurrent load, statistics updates etc.), that plan may or may not be available the next time the call gets made.

A good mix of an ORM plus DB code is usually the best compromise to get the best of both the worlds. Deciding what goes where is the key and though there are guidelines on that, it can vary from application to application depending upon the requirements.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Doing Deletes in an OLTP application

Posted by decipherinfosys on March 27, 2007

Last week while working at a client site, a question arose about handling deletes in an OLTP application.  This client application had over 700+ tables and was close to a third normal form implementation.  The application also had a large concurrent user load requirement.  The question that arose was whether the application should take care of doing deletes or should that task be handled by a scheduled job.  Also, another question was whether cascading DRI should be implemented or not in the system.  Yet another question was archiving of the data and allowing reporting based on the transactional data plus the archived data which is a very vast topic in itself and we will blog it at a later stage on the approach that we took for that one.

Regarding the deletes,  since some of the tables had circular references, implementing cascading DRI for those was not possible.  Cascading DRI was implemented though for some of the tables.  For others, we opted for updating a stat_code column for those tables with values that would indicate that that particular record is “Marked for Deletion”.  We opted for this technique because of a couple of various obvious reasons.  During the peak load time when there are a lot of concurrent users on the system, firing of the deletes would have meant validating the foreign keys in the child tables and that could have resulted into a performance bottleneck.  In addition, internal fragmentation issues could arise as well since in this application, they needed to delete stale data very often.  Using partitioning schemes, we could mitigate that issue but that cannot be done in the short term due to the application changes involved.

So, the code now marks the record for deletion using the stat_code column in the tables and the nightly scheduled task does the purges and the archiving logic.  If the internal fragmentation is above a certain threshold, it also takes care of that.  This client had a window of 4 hours every night where this work could be done so the implementation was pretty smooth.  Only the actual transactional data plus 4 weeks worth of old data was persisted in the OLTP database – rest was purged and archived.  And currently, we are building up a data mart for this client to facilitate BI reporting.  Pretty interesting and challenging project.

Posted in DB2 LUW, Oracle, SQL Server | Leave a Comment »

One more meta-data script for SQL Server 2005

Posted by decipherinfosys on March 26, 2007

Here’s another script to get a list of all the tables, their columns, nullability constraint, data-type, data-type lengths and column default values:

select table_name, column_name, is_nullable, data_type,
case when data_type in (‘int’, ‘smallint’, ‘tinyint’, ‘bigint’ ) then cast(numeric_precision as varchar(10))
when data_type in (‘numeric’, ‘decimal’, ‘float’ ) then ‘Numeric(‘ + cast(numeric_precision as varchar(10)) + ‘,’ + cast(numeric_scale as varchar(10))+ ‘ )’
when data_type in (‘datetime’, ‘smalldatetime’) then cast(datetime_precision as varchar(10))
when data_type = ‘bit’ then ‘1’
when data_type in (‘char’, ‘nchar’, ‘text’, ‘ntext’, ‘varchar’, ‘nvarchar’ ) then case cast(character_maximum_length as varchar(10))
when -1 then ‘varchar(max)’
else cast(character_maximum_length as varchar(10))
end
when data_type = ‘xml’ then ‘**XML**’
when data_type = ‘sql_variant’ then ‘**Variable data-type**’
else data_type
end as data_type_length,
column_default as Default_Value
from information_schema.columns
where table_name in (select table_name from information_schema.tables where table_type = ‘Base Table’)
order by table_name, ordinal_position

Posted in SQL Server | Leave a Comment »

A very simple way of pivoting row to column data in SQL Server

Posted by decipherinfosys on March 26, 2007

We had covered pivoting and un-pivoting of data in one of our previous post related to our whitepaper on the subject.  Here is yet another very simple way of doing pivoting if you need to get the data in a variable.

declare @str varchar(8000)
set @str = ”
select @str = @str + ‘, ‘ + name
from sysobjects
where xtype = ‘u’
order by name
print substring(@str, 3, len(@str))

This will prepare a comma separated list of all the tables in the schema.  If your list is huge, you can replace varchar(8000) with varchar(max) in SQL Server 2005.  This technique can be used in different places within the application code as well and works in SQL Server 2000 as well as SQL Server 2005.

Posted in SQL Server | Leave a Comment »

Bill Gates MVP Summit keynote

Posted by decipherinfosys on March 25, 2007

It’s always nice to hear Bill Gates or read his keynotes. This one was delivered at the latest MVP summit.  He talks about parallel programming challenges, multicore hardware architectures, his vision of the changes coming in the future and much more.

Posted in News | Leave a Comment »