Difference between UTF8 and AL32UTF8 character sets in Oracle
Posted by decipherinfosys on January 28, 2007
Recently, one of our clients had a question on the differences between these two character sets since they were in the process of making their application global. In an upcoming whitepaper, we will discuss in detail what it takes (from a RDBMS perspective) to address localization and globalization issues. As far as these two character sets go in Oracle, the only difference between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character). Besides this storage difference, another difference is better support for supplementary characters in AL32UTF8 character set.
One Response to “Difference between UTF8 and AL32UTF8 character sets in Oracle”
Sorry, the comment form is closed at this time.


When PHP and Oracle assume the worst about each other - Maggie Nelson said
[...] as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character). (More on UTF8 vs. AL32UTF8 in Oracle.) Needless to say, even when you *know* you set up your database correctly for supporting UTF8, the [...]