Systems Engineering and RDBMS

Difference between UTF8 and AL32UTF8 character sets in Oracle

Posted by decipherinfosys on January 28, 2007

Recently, one of our clients had a question on the differences between these two character sets since they were in the process of making their application global.  In an upcoming whitepaper, we will discuss in detail what it takes (from a RDBMS perspective) to address localization and globalization issues.  As far as these two character sets go in Oracle,  the only difference between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character).  Besides this storage difference, another difference is better support for supplementary characters in AL32UTF8 character set.

 

About these ads

2 Responses to “Difference between UTF8 and AL32UTF8 character sets in Oracle”

  1. [...] as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character). (More on UTF8 vs. AL32UTF8 in Oracle.) Needless to say, even when you *know* you set up your database correctly for supporting UTF8, the [...]

  2. […] Difference between UTF8 and AL32UTF8 character sets in Oracle […]

Sorry, the comment form is closed at this time.

 
Follow

Get every new post delivered to your Inbox.

Join 74 other followers

%d bloggers like this: