Systems Engineering and RDBMS

Archive for January 30th, 2007

What RAID is Best for You?

Posted by decipherinfosys on January 30, 2007

Most of you are familiar with the basic RAID technologies avaible out there today, but it is always good to have too much information about this topic than not enough. Here is a brief yet informative summary of the most popular hardware RAID configurations, including pros and cons for each:

RAID-0 (Striped)

  • Does not provide fault tolerance
  • Minimum number of disks required = 2
  • Usable storage capacity = 100%
  • This is the fastest of the RAID configurations from a read-write standpoint
  • Is the least expensive RAID solution because there is no duplicate data
  • Recommended use for temporary data only

RAID-1 (Mirrored)

  • Fault tolerant – you can lose multiple disks as long as a mirrored pair is not lost
  • Minimum number of disks required = 2
  • Usable storage capacity = 50%
  • Good read performance, relatively slow write performance
  • Recommended for operating system log files

RAID-5 (Striped with Parity)

  • Fault tolerant – can afford to lose one disk only
  • Minimum number of disks required = 3
  • Usable storage capacity = subtract 1 whole disk from the total number in the array (i.e. 3 60Gig hard drives would provide 120Gig of usable disk space)
  • Generally good performance, and increases with concurrency – the more drives in the array the faster the performance
  • Recommended for operating system files, shared data, and application files

RAID-0+1 (Striped with Mirrors)

  • Fault tolerant – you can lose multiple disks as long as both are not part of a mirrored pair
  • Minimum number of disks required = 4
  • Usable storage capacity = 50%
  • Generally good performance, and increases with concurrency – the more drives in the array the faster the performance
  • Recommended for operating systems, shared data, application files, and log files

Additional Things to Keep in Mind

  • If you are using more than two disks, RAID 0+1 is a better solution than RAID 1
  • Usable storage capacity increases as the amount of disks increases, but so does the cost of the configuration
  • Performance increases as you add disks, but again, so does cost

Posted in Hardware | 2 Comments »

Microsoft Load Balancing in a Switched Environment

Posted by decipherinfosys on January 30, 2007

Here is some information on how to configure two or more Windows servers running Microsoft Load Balancing Services (WLBS) in a switched environment.

For WLBS to work properly, every packet directed to the virtual IP (VIP) address must be able to reach all WLBS hosts in the cluster. The hosts then decide which packets to accept.

When the cluster network adapter of each WLBS host is connected to a port on a Layer 2 switch instead of a hub, the switch attempts to learn the MAC (hardware) address of the computer connected to each port so that it can associate a port to a MAC address. Ethernet switches send frames to a MAC address by referencing the port associated with that address.

If a switch associates the cluster’s virtual MAC address to a port, WLBS will be unable to correctly load balance the traffic. Below are some steps to prevent a switch from associating the cluster’s virtual MAC address with a port:

1. Use Unicast Mode – This will ensure that requests to the cluster will be routed through the cluster’s network adapter only, and the appropriate host will reply through its dedicated network adapter, which connects to a dedicated port on the switch.  Unicast Mode is configured within the Properties of Network Load Balancing.

Note: Unicast mode requires multiple network interface cards (NICs).

Note: For best results, do notconfigure a default gateway on the NIC assigned to the WLBS cluster.

2. Mask the WLBS Cluster MAC Address – When you use Unicast Mode, make sure the MaskSourceMAC registry value is set to its default value of 1. This forces the cluster to use a virtual MAC address when sending packets through the switch. The switch then maps this virtual MAC address to a port, but sends traffic to the real cluster MAC address to all ports on the switch.

If a switch cannot associate a MAC address to a port it will send the frames to all of its ports, which results in ‘flooding’.

The MaskSourceMAC registry key is located at:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WLBS\Parameters

Posted in Load Balancing | 1 Comment »

IN vs EXISTS

Posted by decipherinfosys on January 30, 2007

Functionally, they are the same (as compared to NOT IN vs NOT EXISTS which are functionally different in one scenario – read this post for the differences between NOT IN and NOT EXISTS clauses : http://decipherinfosys.wordpress.com/2007/01/21/32/ ). However, there are performance implications of using one over the other that one needs to be aware of. Assume that we have two tables : TABLE_A and TABLE_B and the match is being done on TABLE_A.col1 = TABLE_B.col2. In that scenario, an in statement like:

select <select column list> from TABLE_A where col1 in (Select col2 from TABLE_B)

will get processes in this way:

1) The sub-query gets evaluated first and the results are distinct’ed and indexed,

2) The output from it is then joined with TABLE_A.

Re-writing the above query using the EXISTS clause will give:

Select <select column list> from TABLE_A

where exists (select 1 from Table_B where Table_B.col2 = Table_A.col1)

This gets evaluated in this order:

1) For every value of Table_A.col1, loop through and match the values in Table_B.col2.

2) If we get a match, select that value and move on to the next one. If there is no match, discard that value.

So, where should one use an IN vs the EXISTS clause? If the result of the sub-query “Select col2 from TABLE_B” is huge and the TABLE_A is a relatively small set and executing “select 1 from Table_B where Table_B.col2 = Table_A.col1″ is very fast because of proper index on Table_B.col2, then an exists clause will be better since the optimizer can do a FTS on Table_A and then use the index to do the probe/seek operations for Table_B.

If the result of the sub-query is small, then the IN clause is much faster. If the results of the both the sub-query as well as the outer query is large, then either IN or EXISTS would work the same – it depends upon your indexing scheme.

Please do note that the example used above is a very simplistic one in order to illustrate the point – in real world, you would have queries that have additional filter criteria on those tables that narrows down the result sets. As a generic rule, if the result of the outer query is small and the result set of the inner sub-query is large, then use EXISTS – if it is the other way around, then use the IN clause.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

Session Level trace in Oracle

Posted by decipherinfosys on January 30, 2007

There are many ways to trace out a session in Oracle.  This post does not talk about the DBA looking for the session information and tracing it out.  A better way to do the trace would be to write up a log-on trigger and have logic in it say if you want to trace sessions coming from a particular program, you can put that logic in the log-on trigger and trace out the sessions that way.  Another option is to write up a session stored procedure and if your application can make a procedure call as part of the code instrumentation, that will trace it out as well.  An example of such a procedure is shown below:

– This procedure should be called when you want to turn on the tracing using events
– in the current session.  10046 Event can be used to trace for SQL with binds,
– SQL with Waits or both. 

 

– Valid values for the Trace Level are:
–     4: To trace the SQL statements for binds
–     8: To trace the SQL statements for waits
–    12: To trace the SQL statements for both binds and waits.

 

– NOTE: As most of the database use a default size of 50M for trace files generated by Oracle,
–       setting the trace levels to 12, will probably reach the limit on the trace filesize.
–       A default value of 4 is specified for the trace level.

 

CREATE OR REPLACE PROCEDURE USP_SESSION_TRACE
(traceLevel INT DEFAULT 4)
AS
    EXP_INVALID_TRACE_LEVEL     EXCEPTION;
BEGIN
    IF traceLevel NOT IN (4, 8, 12) THEN RAISE EXP_INVALID_TRACE_LEVEL; END IF;

 

    EXECUTE IMMEDIATE ‘alter session set max_dump_file_size = unlimited’;
    EXECUTE IMMEDIATE ‘alter session set events ”10046 trace name context forever, level ‘ || TO_CHAR(traceLevel) || ””;
EXCEPTION
    WHEN EXP_INVALID_TRACE_LEVEL THEN
        RAISE_APPLICATION_ERROR(-20001, ‘Cannot set event 10046 with invalid trace level.  Valid Trace Level values are 4,8,12′);
    WHEN OTHERS THEN
        RAISE_APPLICATION_ERROR(-20002, SQLERRM);
END;
/

  • Timed_Statistics should be set to true – that is it’s default value as well (based on statistcs_level parameter’s default value: TYPICAL).  

  • In addition, set the max_dump_file_size to unlimited.  Just make sure that you remove the files after you are done else you will be occupying space un-necessarily.

  • Default of 4 is fine since most of the times we will be dealing with getting the SQLs only…if further analysis of waits is required, you can go with 8 or 12.

Posted in Oracle | 4 Comments »