Systems Engineering and RDBMS

Archive for January, 2007

Update data in one table with data from another table

Posted by decipherinfosys on January 31, 2007

This blog post illustrates how to update more than one column in a table with values from columns in another table and explains how to do it in the three RDBMS that we support.

Table Structures and values:

TableA has four columns: a, b, c, d (a is the primary key column)
TableB has five columns: a1, b1, c1, d1, e1 (a1 and b1 together constitute the primary key for this table)

The foreign key relationship between the two tables is based on A.a = B.a1

The data in these 2 tables is as follows:
I.    TableA
a    b    c    d
1    x    y    z
2    a    b    c
3    t    x    z

II.    TableB
a1    b1    c1    d1    e1
1    x1    y1    z1    40
2    a1    b1    c1    50

The requirement is to write a SQL to update columns b, c and d in TableA from the columns b1, c1 and d1 from TableB where-ever the join condition satisfies and e1 > 40 in TABLEB.

Oracle:

UPDATE TABLEA
SET (b, c, d) = (SELECT b1, c1, d1 from TABLEB WHERE TABLEB.a1 = TABLEA.a and TABLEB.e1 > 40)
WHERE EXISTS (SELECT 1 from TABLEB WHERE TABLEB.a1 = TABLEA.a and TABLEB.e1 > 40)
/

Results after the update:

a    b    c    d
————————————
1     x          y           z
2     a1        b1         c1
3     t           x           z

SQL Server:

UPDATE TABLEA
SET     b = TABLEB.b1,
c = TABLEB.c1,
d = TABLEB.d1
FROM TABLEA, TABLEB
WHERE TABLEA.a = TABLEB.a1
AND TABLEB.e1 > 40
GO

Note: This is an extension in SQL Server i.e. the FROM clause – it does make it simple to understand and is a nice feature.

Results after the update:

a    b    c    d
————————————
1     x          y           z
2     a1        b1         c1
3     t           x           z

DB2 LUW:

–Same as Oracle–

UPDATE TABLEA
SET (b, c, d) = (SELECT b1, c1, d1 from TABLEB WHERE TABLEB.a1 = TABLEA.a and TABLEB.e1 > 40)
WHERE EXISTS (SELECT 1 from TABLEB WHERE TABLEB.a1 = TABLEA.a and TABLEB.e1 > 40);

Results after the update:

a    b    c    d
————————————
1     x          y           z
2     a1        b1         c1
3     t           x           z

NOTE:

It is very important to make sure that your where clause for the update statement is correct since that is what identifies the records that the update statement is going to qualify and do the update upon.  If it is incorrect, then you can get wrong results.  The reason I am mentioning this is because I have seen people write wrong where clauses and then wondering what went wrong because they specified the correct condition in the SET clause.

In the above example, if the Where condition was omitted, the other record’s columns would be updated to NULL value and this will be the final result set:

a    b    c    d
————————————
1     Null      Null      Null
2     a1        b1         c1
3     Null     Null      Null

Posted in DB2 LUW, Oracle, SQL Server | 5 Comments »

Dead Client Detection (DCD) in Oracle

Posted by decipherinfosys on January 31, 2007

A connection in Oracle might be one of many types — most popular begin DEDICATED server and SHARED server. We will discuss the “DEDICATED” mode only in this post.

Sid and Serial# jointly identify a record in the v$session view. Suppose the client application makes a connection and starts a transaction and holds locks and just then, the network connectivity is lost or a power-failure occurs on the machine from where the client application is running from (assuming that it is not running on the DB Server), in all such scenarios, the client is not able to communicate to the DB that the connection is going away and hence leaves hung sessions and if those sessions were holding locks, open locks on the DB. Such sessions are called as orphaned sessions. PMON has no visibility into these and hence it is not able to clean this session up (nor it’s resources).

Issues with Orphaned Sessions:

Orphaned sessions use server resources, and may have open cursors, temporary tables, or locks. These locks may block other connections from performing useful work, and can sometimes be the result of a major “pile up” of locks. In severe cases, it can appear that Oracle has stopped working.

Resolutions:

You can enable dead client detection (DCD) in the sqlnet.ora file on the DB Server (not on the client application machines). It is done by setting the SQLNET.EXPIRE_TIME setting in the sqlnet.ora file on the DB Server. The time specified is in minutes and what it does is that it sends a probe to verify that client/server connections are active. Setting a value greater than 0 ensures that connections are not left open indefinitely, due to an abnormal client termination. If the probe finds a terminated connection, or a connection that is no longer in use, it returns an error, causing the server process to exit. This setting is intended for the database server, which typically handles multiple connections at any one time.

Limitations on using this terminated connection detection feature are:

* It is not allowed on bequeathed connections.
* Though very small, a probe packet generates additional traffic that may downgrade network performance. So, if you put a lower number for the time parameter, it can create issues for you.
* Depending on which operating system is in use, the server may need to perform additional processing to distinguish the connection probing event from other events that occur. This can also result in downgrading network performance.

The advantage of setting this parameter is that you won’t have “dead” clients hogging resources on your DB.

Posted in Oracle | 1 Comment »

What RAID is Best for You?

Posted by decipherinfosys on January 30, 2007

Most of you are familiar with the basic RAID technologies avaible out there today, but it is always good to have too much information about this topic than not enough. Here is a brief yet informative summary of the most popular hardware RAID configurations, including pros and cons for each:

RAID-0 (Striped)

  • Does not provide fault tolerance
  • Minimum number of disks required = 2
  • Usable storage capacity = 100%
  • This is the fastest of the RAID configurations from a read-write standpoint
  • Is the least expensive RAID solution because there is no duplicate data
  • Recommended use for temporary data only

RAID-1 (Mirrored)

  • Fault tolerant – you can lose multiple disks as long as a mirrored pair is not lost
  • Minimum number of disks required = 2
  • Usable storage capacity = 50%
  • Good read performance, relatively slow write performance
  • Recommended for operating system log files

RAID-5 (Striped with Parity)

  • Fault tolerant – can afford to lose one disk only
  • Minimum number of disks required = 3
  • Usable storage capacity = subtract 1 whole disk from the total number in the array (i.e. 3 60Gig hard drives would provide 120Gig of usable disk space)
  • Generally good performance, and increases with concurrency – the more drives in the array the faster the performance
  • Recommended for operating system files, shared data, and application files

RAID-0+1 (Striped with Mirrors)

  • Fault tolerant – you can lose multiple disks as long as both are not part of a mirrored pair
  • Minimum number of disks required = 4
  • Usable storage capacity = 50%
  • Generally good performance, and increases with concurrency – the more drives in the array the faster the performance
  • Recommended for operating systems, shared data, application files, and log files

Additional Things to Keep in Mind

  • If you are using more than two disks, RAID 0+1 is a better solution than RAID 1
  • Usable storage capacity increases as the amount of disks increases, but so does the cost of the configuration
  • Performance increases as you add disks, but again, so does cost

Posted in Hardware | 7 Comments »

Microsoft Load Balancing in a Switched Environment

Posted by decipherinfosys on January 30, 2007

Here is some information on how to configure two or more Windows servers running Microsoft Load Balancing Services (WLBS) in a switched environment.

For WLBS to work properly, every packet directed to the virtual IP (VIP) address must be able to reach all WLBS hosts in the cluster. The hosts then decide which packets to accept.

When the cluster network adapter of each WLBS host is connected to a port on a Layer 2 switch instead of a hub, the switch attempts to learn the MAC (hardware) address of the computer connected to each port so that it can associate a port to a MAC address. Ethernet switches send frames to a MAC address by referencing the port associated with that address.

If a switch associates the cluster’s virtual MAC address to a port, WLBS will be unable to correctly load balance the traffic. Below are some steps to prevent a switch from associating the cluster’s virtual MAC address with a port:

1. Use Unicast Mode – This will ensure that requests to the cluster will be routed through the cluster’s network adapter only, and the appropriate host will reply through its dedicated network adapter, which connects to a dedicated port on the switch.  Unicast Mode is configured within the Properties of Network Load Balancing.

Note: Unicast mode requires multiple network interface cards (NICs).

Note: For best results, do notconfigure a default gateway on the NIC assigned to the WLBS cluster.

2. Mask the WLBS Cluster MAC Address – When you use Unicast Mode, make sure the MaskSourceMAC registry value is set to its default value of 1. This forces the cluster to use a virtual MAC address when sending packets through the switch. The switch then maps this virtual MAC address to a port, but sends traffic to the real cluster MAC address to all ports on the switch.

If a switch cannot associate a MAC address to a port it will send the frames to all of its ports, which results in ‘flooding’.

The MaskSourceMAC registry key is located at:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WLBS\Parameters

Posted in Load Balancing | 1 Comment »

IN vs EXISTS

Posted by decipherinfosys on January 30, 2007

Functionally, they are the same (as compared to NOT IN vs NOT EXISTS which are functionally different in one scenario – read this post for the differences between NOT IN and NOT EXISTS clauses : https://decipherinfosys.wordpress.com/2007/01/21/32/ ). However, there are performance implications of using one over the other that one needs to be aware of. Assume that we have two tables : TABLE_A and TABLE_B and the match is being done on TABLE_A.col1 = TABLE_B.col2. In that scenario, an in statement like:

select <select column list> from TABLE_A where col1 in (Select col2 from TABLE_B)

will get processes in this way:

1) The sub-query gets evaluated first and the results are distinct’ed and indexed,

2) The output from it is then joined with TABLE_A.

Re-writing the above query using the EXISTS clause will give:

Select <select column list> from TABLE_A

where exists (select 1 from Table_B where Table_B.col2 = Table_A.col1)

This gets evaluated in this order:

1) For every value of Table_A.col1, loop through and match the values in Table_B.col2.

2) If we get a match, select that value and move on to the next one. If there is no match, discard that value.

So, where should one use an IN vs the EXISTS clause? If the result of the sub-query “Select col2 from TABLE_B” is huge and the TABLE_A is a relatively small set and executing “select 1 from Table_B where Table_B.col2 = Table_A.col1” is very fast because of proper index on Table_B.col2, then an exists clause will be better since the optimizer can do a FTS on Table_A and then use the index to do the probe/seek operations for Table_B.

If the result of the sub-query is small, then the IN clause is much faster. If the results of the both the sub-query as well as the outer query is large, then either IN or EXISTS would work the same – it depends upon your indexing scheme.

Please do note that the example used above is a very simplistic one in order to illustrate the point – in real world, you would have queries that have additional filter criteria on those tables that narrows down the result sets. As a generic rule, if the result of the outer query is small and the result set of the inner sub-query is large, then use EXISTS – if it is the other way around, then use the IN clause.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Session Level trace in Oracle

Posted by decipherinfosys on January 30, 2007

There are many ways to trace out a session in Oracle.  This post does not talk about the DBA looking for the session information and tracing it out.  A better way to do the trace would be to write up a log-on trigger and have logic in it say if you want to trace sessions coming from a particular program, you can put that logic in the log-on trigger and trace out the sessions that way.  Another option is to write up a session stored procedure and if your application can make a procedure call as part of the code instrumentation, that will trace it out as well.  An example of such a procedure is shown below:

— This procedure should be called when you want to turn on the tracing using events
— in the current session.  10046 Event can be used to trace for SQL with binds,
— SQL with Waits or both. 

 

— Valid values for the Trace Level are:
—     4: To trace the SQL statements for binds
—     8: To trace the SQL statements for waits
—    12: To trace the SQL statements for both binds and waits.

 

— NOTE: As most of the database use a default size of 50M for trace files generated by Oracle,
—       setting the trace levels to 12, will probably reach the limit on the trace filesize.
—       A default value of 4 is specified for the trace level.

 

CREATE OR REPLACE PROCEDURE USP_SESSION_TRACE
(traceLevel INT DEFAULT 4)
AS
    EXP_INVALID_TRACE_LEVEL     EXCEPTION;
BEGIN
    IF traceLevel NOT IN (4, 8, 12) THEN RAISE EXP_INVALID_TRACE_LEVEL; END IF;

 

    EXECUTE IMMEDIATE ‘alter session set max_dump_file_size = unlimited’;
    EXECUTE IMMEDIATE ‘alter session set events ”10046 trace name context forever, level ‘ || TO_CHAR(traceLevel) || ””;
EXCEPTION
    WHEN EXP_INVALID_TRACE_LEVEL THEN
        RAISE_APPLICATION_ERROR(-20001, ‘Cannot set event 10046 with invalid trace level.  Valid Trace Level values are 4,8,12’);
    WHEN OTHERS THEN
        RAISE_APPLICATION_ERROR(-20002, SQLERRM);
END;
/

  • Timed_Statistics should be set to true – that is it’s default value as well (based on statistcs_level parameter’s default value: TYPICAL).  

  • In addition, set the max_dump_file_size to unlimited.  Just make sure that you remove the files after you are done else you will be occupying space un-necessarily.

  • Default of 4 is fine since most of the times we will be dealing with getting the SQLs only…if further analysis of waits is required, you can go with 8 or 12.

Posted in Oracle | 4 Comments »

Name Value Pair Design (EAV)

Posted by decipherinfosys on January 29, 2007

This is one of the design anomalies in data modeling.  Infact, this even has a name for it: EAV (Entity Attribute Value).  Database Guru Joe Celko has written a lot about it in newsgroups as well as in his presentations.  The reason why I am blogging about this today is because I have seen this approach time and again and it is either done by novice DBAs/database developers or by application programmers who try to use their programming context in the relational world.  Here are the disadvantages of this approach :

1)  It is next to impossible to enforce business rule constraints like uniqueness, enforcing check constraints per attribute, adding defaults per attribute, enforcing PK/FK relationships based on attributes etc.  Since the attributes are being modeled as data elements, it is very error prone since we have now opened up our attribute design to the users.  All data integrity is destroyed (violation of 1NF itself).  Any typo becomes a new attribute.

2)  Different data-type attributes will be modeled using the string data-type for the value column and this will result into performance issues while doing comparisons since the index sorts will be an issue and not only that, an effective index design startegy cannot be formulated for this.  This also increases the index size un-necessarily.  In addition, it will require un-necessary cast operations like for the date/timestamp data-types, the numeric data-types with precision and a non-zero scale etc.

3)  It would lead to the usage of cursor based (and dynamic SQL) approach for reads since the number of attributes is virtually un-known.  In the case of complex reports or other read operations, this will be a nightmare…un-necessary CASE statements, un-necessary sub-queries with self references and un-necessary pivoting/un-pivoting of data would ensure that the reports are very slow…as your data grows, performance will get exponentially worse.

If you do have a situation where you are dealing with an entity that has thousands of attributes (I have seen requirements like these) and all of them are static i.e. once they are done and have data populated in them, they never change.  In those scenarios, in order to not exceed the rowlength (like SQL 2000 has a limitation on the rowsize being 8060), one can either chose to create a hard 1:1 relationship tables or chose this design to accomodate for such requirements.  Since the list of attributes is static and the data in them is static as well, a simple pivot view can be used to flatten it out.  But that is the only scenario where this approach can be used/considered.

If one really needs to model a system where the attributes can be generated by the end user (which itself is questionable but can happen in certain verticles – especially those where the end user needs to use the software for planning and forecasting), then a better approach will be to use a static list of attributes (for the different data-types) and have another table for their name look up so that storage is done in a relational manner but for the display piece, one can use the static list and can call an attribute whatever they want.  This is the most reasonable compromise for data integrity, performance and sufficing the need of the application.  We will be writing up a “How-To” article or a whitepaper on this taking an example from the real world and demostrating how you can solve this issue in a reasonable fashion using the above approach rather than using an EAV design.

Posted in Data Model | 4 Comments »

Could not obtain exclusive lock on database ‘Model’

Posted by decipherinfosys on January 29, 2007

How many times have you faced this error during the installation and wondered what the heck is SQL Server complaining about.  All that I want is to create my database and move on.  The error that I am talking about is shown below:

model.JPG

In SQL Server,  whenever a new user database is created, it takes it’s properties from the “model” database (this system database acts as a template for new user databases) and if there are over-rides to some of the parameters for the new user database, those are overridden based on what is specified in the create database script.  SQL Server attempts to obtain an EXCLUSIVE lock on the model database during this process.  This is to prevent other processes from updating the model database’s schema whilst the new database is being created, so that the new database’s schema is in a known, consistent state.

There are four reasons why you can get this error:

1)  It can happen if someone is executing something against the MODEL database via Enterprise Manager/Query Analyzer/Management Studio (SQL 2005) or some other tool i.e. if there is an active session working against the “MODEL” database when the create database command is being executed, you will get this error (this is very simple to prove also..in QA, open up a window and select model database from the drop down :

USE MODEL

GO

And from a second session, try running “CREATE DATABASE TEST”…this will fail to obtain exclusive access to the “model” database and will give the 1807 error.

In order to circumvent the problem from happening via your install scripts, you need to check and make sure of that no sessions are open for the model database…if there are, get their spid and kill ’em before running the “CREATE DATABASE command”.  Also, if 2 “create database” commands are running in parallel and one takes a long time to finish, the other one would time out with this error.

If it is really because someone else is accessing the model database at this stage, you can re-start the service and re-try since that will release the connection to the model database (or check the existing connections using sp_who2 and kill the one that connects to the model database).  If it is an issue with the installation code, then that would need to be fixed.

2)  It can also occur if the “AUTOCLOSE” property is set to ON for the “model” database as a result of which when a new user database is being created, if the model database is closed at that time (when this property is on, the database is shutdown cleanly when it is not in use), you can get this error because the new database takes it’s default properties from the model database and it is close at that time as a result of a log-off from some other session.

NOTE:  AutoClose should always be OFF for all databases.  Setting this to ON is strictly against the recommendations.

If it is occuring because of this reason, correct the setting and re-start the service.

3)  It can also happen if the machine name has been changed (you need to follow proper steps to re-register the new name) but if this is the cause, then the error will happen all the time till that issue is corrected.  select @@servername will tell you the name of the local machine for point #3.

4)  Another reason is that if you have previously opened the model database in Enterprise Manager/Management Studio (SQL 2005), then closed it, the connection to the database remains open, which means that the Create Database command cannot acquire the exclusive access lock that it requires to execute successfully.  For the AutoClose:

SET NOCOUNT ON
go
SELECT DATABASEPROPERTYEX( ‘model’ , ‘IsAutoClose’ ) AS [AutoClose]
go
ALTER DATABASE [model] SET AUTO_CLOSE OFF
go
For the connections:

select spid, hostname, program_name
from master..sysprocesses
where db_name(dbid) = ‘model’
go

Posted in SQL Server | Leave a Comment »

Auditing capabilities in Oracle

Posted by decipherinfosys on January 29, 2007

There are many different ways to audit the data, the users, the DML/DDL, the “select” statements within Oracle.  Here are the options that you have at your disposal (this does not talk about the third party tools that are out there but only the OEM functionality that is available):

  •  You can write auditing triggers for the objects in the schema.  This is done for the DML statements.  This is one of the most common auditing techniques that are used and can help in building up an audit trail as well as help in troubleshooting specific application issues like the quantity for a particular product going negative in the schema because of an application bug.
  • You can use LogMiner utility which helps in the auditing of both the DML statements as well as the DDL statements.
  • You can write up system level triggers – this can be used for auditing both the DML as well as the DDL statements and is a very powerful auditing technique for auditing changes to your system.
  • You can also chose to use the flashback feature of Oracle to go back in time and do DML statements auditing.
  • You can use FGA (Fine Grained Auditing).  Using this feature, you can audit the DML statements as well as plain simple read (select) statements in Oracle 10g.  In 9i, you can audit only the select statements.

The object level triggers, LogMiner and flashback are the most commonly used techniques.  LogMiner approach ensures that all changes are recorded and kept for auditing purposes.   FGA approach has the additional benefit of recording the select statement that was issued by the application.   It does have it’s advantages over the trigger approach if you have a highly transactional system.  Since the trigger is per row, if the DML is modifying say 1 million records, it will create a lot of overhead while auditing.  Using a statement level trigger will not help in this case since you cannot capture the before and after values for the individual records using that type of trigger.  In the FGA approach, only one record gets created in the audit log since it executes once per statement, not once per row and thus the effect on performance is negligible.  For extracting the before and after values, flashback queries can be used (size your UNDO properly for it).

Obviously, one single solution does not fit everyone’s needs which explains why Oracle has provided so many different options.  You can pick whichever suits your need depending upon the type of application that you have and your specific needs.

Posted in Oracle | 1 Comment »

Installing FTP on Red Hat Linux AS 3.0

Posted by decipherinfosys on January 29, 2007

The RPM files referenced in the document can be found in the following location on the installation media: the sub-directories containing the RPMS have the following Format: “CDn/RedHatRPMS”

To set up an FTP server, make sure the vsftpd package is installed by issuing the following command as root:
rpm -q vsftpd

You should see:
vsftpd-1.2.0-4

If not installed, then use the following directions.

1)    The vsftp package can be obtained on your distribution media, or through your system appropriate channel of the Red Hat Network

vsftpd-1.2.0-4.i386.rpm  (CD1/RedHat/RPMS)

2)    If the package is not installed, insert the installation CD-ROM containing the appropriate package, change to the /mnt/cdrom/RedHat/RPMS/ directory, and, as root, type the following command:

rpm -i vsftpd-1.2.0-4.i386.rpm

3)    Check for installation:

rpm -qa | grep vsftp               (alternatively rpm -i vsftp)

You should see:
vsftpd-1.2.0-4

4)    Once installed, start the vsftpd service by issuing the following command:

service vsftpd start

5)    To make the service available for the next boot sequence chkconfig the service on by issuing the following command:

chkconfig vsftpd on

Additional information can be found in the vsftpd man page:
man vsftpd

Posted in Linux | Leave a Comment »