Can DB XML parse my unusually encoded XML document?
DB XML uses the Xerces-C library for a lot of its XML parsing. Out of the box, Xerces-C has the ability to parse XML document in a number of well known encodings, including (but not limited to) UTF-8, UTF-16 and ISO-8859-1. However, if you have documents that use an unsupported encoding (Big-5 for instance) there is still a solution. You can compile the Xerces-C library with ICU support, which allows BDB XML to transcode and parse over 500 different character encodings. Using the following options to the buildall.sh script that comes with BDB XML is one way to do this: ./buildall.sh –with-xerces-conf=”-t icu” Using the ICU library also fixes a bug in versions of DB XML up to 2.2.13, where the fn:upper-case() and fn:lower-case() functions did not handle unicode characters correctly.