We have covered Unicode characters, data-types and application changes related to them before in our posts. Over the weekend, while consuming an XML from another application from the same client company, QA team saw bad data making it’s way to the database. Upon further research, it was found that the application that was sending the XML was forming that XML by integrating data from several different sources. It was then inserting those data elements into an existing XML and passing it along when it was done with it. And in so doing, it was not taking into account the unicode nature of the strings. Just like you would take care of it when inserting a record into a unicode data-type column, you would do the same thing with the XML as well. Here is an example:
CREATE TABLE UNICODE_TEST_TBL (COL1 NVARCHAR(100), COL2 VARCHAR(100));
In this table, we have a column called COL1 which is declared to be of Unicode data-type and another one – COL2 which has been declared as a non-Unicode data type column. Let us try to insert this string into it:
After inserting, this is how the data will look like:
As you can see, by using the prefix of N before the string, we were able to indicate properly to the database engine that the data that is being inserted is a unicode string and we used that for the unicode data type column and hence the data was inserted correctly where as it was corrupted for the non-unicode data type column. The same thing can be applied for XML as well.
As you can see from above, we first tried to insert the non-unicode string and got corrupted data and then we tried to insert the unicode data-type variable and it appears fine without any issues. Here is the final XML:
And the same thing applies, if we were replacing (updating) a value in that XML document – example: Suppose, we want to replace the value of “testing” with the same Unicode string:
Pretty simple fix. Once the change was made, all the data was perfectly fine. Doing so at design/development/QA time would have saved a weekend