Systems Engineering and RDBMS

Archive for February 19th, 2007

VMWare ESX Server Disk Structure

Posted by decipherinfosys on February 19, 2007

Disk structure and technologies used in a virtualized environment are quite different than in the physical computing world. Each virtualization vendor has their own methods for implementing a virtual disk environment. This blog entry focuses exclusively on the disk structure of VMWare’s enterprise virtualization solution – ESX Server.

Virtual Machine File System

ESX server uses VMWare’s virtual machine file system (VMFS) for the storage of virtual machines. VMFS is a high-performance file system that runs on physical SCSI disks and partitions, including storage attached network (SAN) environments. It’s designed purpose is to store large files such as virtual disk images, as well as the memory images of suspended virtual machines.

Virtual Machine Disks

Virtual machines are stored on an ESX Server’s VMFS partitions. However, each virtual machine also has its own set of files. These files are in a special format and use a .vmdk file extension. A virtual machine can have one or many .vmdk files associated with it, depending on the size and configuration of that virtual machine. The .vmdk file can contain all of the virtual machine’s information, or if the virtual it can serve as a symbolic link from a VMFS to a raw logical unit number (LUN) when raw device mappings (RDM) are used from a SAN.

Redo Log Files

The most straightforward scenario is where the .vmdk file contains the virtual machine’s information. All changes to the virtual machine’s virtual disk are immediately written to the .vmdk file. However, there is a snapshot feature that administrators can use to save the current state of a virtual machine. This snapshot generates a redo log file, and all changes to the virtual machine from that point on are written to the log file, which has a .vmdk.REDO extension. The .vmdk file itself remains static. In this case, the term ‘log’ file is somewhat of a misnomer, as the REDO file is more than just a log, but a complete record of the virtual machine’s state.

 Accessing and Managing Virtual Disks

VMWare virtual disks are accessible via the following methods:

  • The ESX Server Service Console
  • VMWare Virtual Center
  • The VMWare SDK

From the service console, files can be viewed and manipulated using ordinary command line commands. The directory where the files are located is always /vmfs.

Additionally, administrators can use the vmkfstools to import and export .vmdk files from different VMWare host servers. The command can also be used to convert .vmdk files from one version of VMWare Server to another.

Hopefully this high-level overview of the basice VMWare virtual server file structure will provide you with more insight into the overall virtualization infrastructure. Future blog entries will dig deeper into each component of a VMWare implementation, including Service Console usage and features as well as more details on how to use the ‘vmkfstools’ command.

Posted in VMWare | Leave a Comment »

Loading Chinese data in Oracle

Posted by decipherinfosys on February 19, 2007

Here are the steps to load chinese data in Oracle using sql *ldr from a Windows box and to view it through SQL PLUS. It assumes that the database is already set with AL32UTF8 character set and has the NLS_LENGTH_SEMANTICS set to CHAR.

1. Start –> settings –> control panel –> double click Regional options.

Click ‘General’ tab. From ‘Language setting for the system’ select Simplified Chinese (do not select the Traditional Chinese check-box). Click Apply. This part will require to install some extra components. It will prompt for Windows CD, please insert the CD or point it to correct network folder so that missing components can be installed. Click ok. This requires reboot. Upon prompted for reboot, cancel it, we will reboot it later. See the image below for illustration:

chn.jpg

 

2. Now Click ‘Input Locales’ tab. Click on Add button. Select ‘Chinese (PRC)’ from Input Locale drop down list box. Make sure that it is selected. Click Ok. See Figure below for illustration.

chn_2.jpg

 

Again Click ‘General’ tab (see the first figure).  Choose ‘Chinese (PRC) for ‘Your locale (Location)’. Upon selecting this, ‘Sorting Order’ drop down will display ‘pronunciation’. Make sure that from ‘Language setting for the system’, Simplified Chinese is checked. Click ‘Set Default …’. This is very important as it will set the code page correctly in the registry. Click Apply and Click ok. If it asks for reboot, click on ‘Cancel’ and do not reboot the box. We will check the registry settings before rebooting the box.

1. Start–>Run. Type regedit. Check following registry keys.

· HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\Codepage. At the very bottom, entry will be there for ACP. After performing last step, value for ACP should be set to 936 (Code page for Simplified Chinese). If value for ACP key is not set to 936, some steps were missed from previous section. Please revisit the previous steps.

· HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEX (Where x could be 0,1 etc..) Update following keys with the values specified.

Ø NLS_LANG: SIMPLIFIED CHINESE_CHINA.ZHS16GBK

Ø SQLPLUS_FONT: PMingLiu

Ø SQLPLUS_FONT_CHARSET: CHINESEBIG5

If SQLPLUS_ registry entries are not there, create both of them.

· HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE

NLS_LANG: SIMPLIFIED CHINESE_CHINA.ZHS16GBK

· HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\ALL_HOMES\ID0

NLS_LANG: SIMPLIFIED CHINESE_CHINA.ZHS16GBK

2. Come out of registry and reboot the box.

For loading the data, you can use this sample control file and use sql *ldr:

The control file will look like this:

LOAD DATA
CHARACTERSET UTF8
INFILE ‘the_text_file.txt’
BADFILE ‘table_name.bad’
APPEND
INTO TABLE TABLE_NAME
FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘”‘
TRAILING NULLCOLS
(
Your columns and their default values etc.
)

Posted in Oracle | Leave a Comment »

Mirror, Mirror on the Wall, which is the best RDBMS of all?

Posted by decipherinfosys on February 19, 2007

We get this question a lot from our clients. Since we provide consulting in all three leading RDBMS in the world –> Oracle, MS SQL Server and DB2 LUW, there is a very frequent question that is posed to us about which RDBMS do we think is the best of all. Other names like MS Access, MySQL, Postgress, Sybase and Informix get thrown into this mix as well but generally the discussion centers amongst the top 3 that are mentioned in the beginning.

Well, personal preferences aside, the answer is: “It Depends” 🙂 Yes, the (in) famous IT answer to everything. Well, we believe that is the fair answer as well in this case. It really depends upon a lot of things. There are database developers or DBAs who are fanatics about which one is the better RDBMS – but at the end of the day, it is the business that drives it. If we can implement a system for a client using even MS Access and do it such that the business is successful and can keep it’s operating costs less, then why not do it? For Tier -1 clients, however the throughput needs are much more and hence smaller (perceived) RDBMS don’t cut it for them. Another reason why the answer is what it is is also because if you do not have the necessary people to support the system and do not have that skill set in house, what’s the point in increasing your operating costs when you can make-do with what you have. The business is what should really be driving the decision. If it demands that you need to have Tera-byte data in the database with tons and tons of concurrent users, one needs to carefully evaluate which RDBMS will fit your needs. Each RDBMS has literally thousands of features and functions. If you only prepare a checklist for comparison sakes, you will notice that that feature is implemented in all of those, however, how they have implemented that feature is what makes them different. It is this “internal” implementation knowledge that becomes the key differentiator. This only comes with experience – been there, done that, seen that.

For an example, Oracle’s multi-version concurrency model is unique to them and it is an excellent feature. MS SQL Server 2005 does allow that by using snapshot isolation level but it’s internal workings still have a bit of an overhead over the way Oracle does it. The concept of a lock manager in SQL Server and DB2 LUW versus the locking at the block level concept i.e. the absence of a lock manager in Oracle is very different. So, even though on a piece of sheet, one can say that they all support row level locking, how it is implemented and how it effects the applications is very very different. Here is our take on the existing leading RDBMS in the market:

  1. Informix: IBM bought it and is no longer promoting it. It is favoring DB2 over Informix. It is a great RDBMS and very scalable.
  2. Sybase: They are on a decline and have been losing market revenue every year. Unless it is a legacy system (and most of the companies are now migrating those to one of the three (Oracle, MS SQL Server and DB2 LUW) now, it really doesn’t have much future for the new applications.
  3. DB2: With Viper, IBM has improved the product a lot but it still has issues especially centered around concurrency when it comes to highly transactional applications (we have specifically seen those issues for applications written with Oracle’s multiversion concurrency in mind and then ported over to DB2…those are not a direct port because of lock/blocking issues and require a major change). The volume of the data is not an issue in DB2 and neither is the language/functionality support within the product – the engine also handles the load well but when it comes to the Oracle ported applications which were architected under the assumption of the multiversion concurrency being available in all RDBMS, without a major re-write, the application doesn’t scale very well. However, if the applications are built specifically with DB2 in mind, it works very well by tuning the knobs and configuring it correctly. It does have a very powerful SQL/PL language and a good set of tools and utilities available.
  4. MS SQL Server: With their latest offering : SQL Server 2005, on the Windows OS, they are the kings. Both Oracle and IBM don’t stand a chance of ever taking that market share from them. This product is rock solid and is laden with feature functionality that brings it very very close to Oracle. The drawback though is the OS itself since SQL Server runs on only Windows. It has very rich set of tools that makes configuring it a breeze and keeps the costs low. It is very well documented and is very well supported by newsgroups, end users and MSFT forums.
  5. Oracle: Most scalable, most feature rich, and just a wonderful RDBMS. However, it is also the most expensive. Runs on LUW and has the most powerful PL/SQL language. As compared to SQL Server, it is still very cumbersome to configure and tune it but Oracle 11g is addressing some of those issues like automatic memory management. In addition, the GUI tools could be improved upon – they are much better in 10g but we hope that Oracle will improve upon them further in 11g. That will make administration easier.

In addition, you need to look at your hardware costs and the skill set of your in-house staff as well as the kind of application you are building. If you are building up an application that needs to run on multiple RDBMS, make sure that that application has been architected for the lowest common denominator i.e. the RDBMS that is the most restrictive of all else you will be in for a surprise when you do a port from one to the other.

Posted in DB2 LUW, Oracle, SQL Server | 1 Comment »

NLS_LENGTH_SEMANTICS

Posted by decipherinfosys on February 19, 2007

This parameter in the init<sid>.ora parameter file plays an important role in the Globalization Support for Oracle databases.  It can be controlled at the database level or the session level using the “ALTER SESSION” command. It can also be controlled via an environment variable on the client.

The default value of this parameter is BYTE which is fine if the database that you are using uses a single byte encoding scheme.  However, if the database character set uses a multi-byte encoding scheme (UTF8 or AL32UTF8), then the number of bytes no longer always equal to the number of characters because a character can require one or more bytes for storage.  Thus, you would either need to define your the lengths of your CHAR and VARCHAR2 columns to be sufficiently large if you want to stick with the BYTE semantics or you can switch to the CHAR semantics for this parameter.

English characters take 1 byte for storage but chinese characters for example can take anywhere between 1-4 bytes for storage.  If the character set is UTF8, then the maximum bytes for a character that are supported is 3 and if you go with AL32UTF8, then it is 4 because of the supplementary characters.  For latin derived languages like French, Spanish, German etc., they almost all characters need 1 byte for storage except the accented characters that take 2 bytes and the Euro symbol that takes 3 bytes.

So, does this effect your data-pumps and your exp/imp when you are moving from a non-CHAR semantics to a CHAR semantics database?  Yes, it does.  In one of the following blog posts, we will post steps on how you can import into a CHAR semantic database and the scripts that you can use to do so.

Posted in Oracle | Leave a Comment »

New “How-To” Articles Posted

Posted by decipherinfosys on February 19, 2007

Three new “How-To” articles have been posted on our web site:

  1. How can I alter a regular column to be an identity column in SQL Server?
  2. How can I enable event level tracing in Oracle?
  3. How can I reset the sequence value in Oracle and identity current value in MS SQL Server and DB2 LUW?

Posted in Decipher News | Leave a Comment »