What are the optimizations that VTD-XML has implemented?
Every time an object is created, it needs to be garbage-collected. So there is a round trip penalty. Every time one takes apart the the document for a small change, he will have to put everything back together. So there is another round trip penalty. Every time one decodes (e.g. from UTF-8 to UCS 2) the entire document for a small change, he will have to encode the document when writing out on disk. So there is yet another roundtrip penalty. Putting all these overheads together, XML processing performance probably isn’t going to be very good. VTD-XML is designed from ground up to overcome these overheads. The first thing VTD-XML does is to keep the document intact in memory, and un-decoded. The tokenization is done by only recording the starting offset and length. Next, VTD-XML represents tokens in 64 bit integers (VTD records). Because VTD records are constant in length, they can be stored in large memory blocks, resulting in a very significant memory saving. Finally, VTD-XML’s intern