Posted by decipherinfosys on April 2, 2007
In an OLTP application, one should be aiming at the usage of bind variables in majority of the scenarios and thus using constant values won’t be a good idea. In a transaction system, we execute hundreds/thousands of queries per second. Therefore we MUST use bind variables and reduce the number of parses as much as possible. OLAP/data-warehouse apps. are different but let’s not go there in this blog post.
However, using a constant value can be useful at some times when we are in a need to use literals in some cases — say you have a table called PURCHASE_ORDER which has a column called STAT_CODE which has only say 3 distinct values in a 10 million row table. Typically, this will be a very bad column for the creation of an index - however the application rarely queries on 90 and 99 and always queries on only 0 value which at any given point of time will have very less number of records (this is the classic density vs selectivity discussion for index creation) so an index would make sense in this case. But this index may not get used under certain scenario even with a value of 0…consider this:
Suppose that the first time that this SQL runs, a value of 90 is passed (say a purge program), then the optimizer would have hard parsed and opted for a different plan (i.e. not use that index and rightfully so since the data-value of 90 qualifies for say 80% of the records)…the second time (normal processing in the application) that this SQL comes in again (this time with a value of 0), it will be soft parsed and since the plan is still in the cache from the previous run of 90, because of the usage of the bind variable, it will use the same plan again that it generated for the value of 90…even though in this case, this is un-optimal to do so (this issue is also termed as “bind variable peeking” in Oracle or “parameter sniffing” in SQL Server)…you can read up more on bind value peeking from the Oracle Guru: Tom Kyte:
http://asktom.oracle.com/pls/ask/f?p=4950:8:948105846467748805::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:7832114438832
In such scenarios, when you know that the data distribution will be heavily skewed and you know that the OLTP application is always interested in only that value that qualifies for lesser number of records and that value is “never” going to change in the query, usage of a constant is better so that it can make use of the data-distribution stats that are available to it (bottom-line: you do not need to bind those values that do not change from execution to execution)…but such requirements are small in an OLTP application. Majority of them fall in the criteria mentioned above (the bind variable usage).
Posted in DB2 LUW, Oracle, SQL Server | 3 Comments »
Posted by decipherinfosys on April 2, 2007
A key design for interfacing with another application where the integration is not real time and the volume of data involved is high is that the data should get into the system as fast as possible. Whether that data is in the form of XML of just data records doesn’t matter. We had a similar situation at a client site over the weekend. The client was integrating with another application and they get their data loads in the form of an XML document. This data is provided to them every 3 hours and they are supposed to receive the XML, consume it and process it as well as send back an XML document detailing any errors that occured in the load so that the host system could correct it on their end. It was also required to store the error messages within their system and to provide the supervisor the ability to correct any issues with the data that was provided so that they do not have to wait for another 3 hours to get the data again. The host system in this case was a mainframe system which was capable of providing the data only ever 3 hours since it required a lot of processing to be done.
This client application was being benchmarked over the weekend and they started running into performance issues as soon as the XML document size increased. It started taking them nearly 6 hours to process the document. And the reason for that was that the application was loading the document into memory and then firing off the inserts one record at a time after doing the transformations and look-ups against it. I got a call on Saturday night to help identify and resolve this issue since this week they are supposed to go live. It did not take much time to identify the performance bottleneck - after it was identified, we decided to move to SQLXML (this client was using SQL Server 2005). Once we switched to SQLXML and used bulk data load into a set of interface tables, the data upload was finished in a matter of minutes. Once the data was loaded into these interface tables, all that remained was to process this data from the interface tables to the actual transactional tables. We wrote database stored procedure to do that and the entire processing from start to finish was done in less than 10 minutes. Any errors that occured were logged into a message_log table and was cross-referenced with the data in the interface tables. We then prepared a GUI for their supervisor to help see the failed records and correct them, if needed. This will be converted to an alert today and via a dashboard, the supervisor will be able to see the summary as well as the failed records.
The aim in such situations should be to get the data into the system as soon as possible since bulk data load will reduce the loading time. The transformation, look-up and subsequent updates to the transacational tables can be then done through database code in a SET based fashion. The stored procedure code can first weed out the bad records via validation routine and then process the remaining good records in a SET based fashion to do the insert/update/delete logic.
Posted in SQL Server, Technology | No Comments »