Can it convert documents automatically from Word/ODT to XML and if so, what sort of heuristics are used?
Yes, automatic, hands-free conversion is what Lemon8-XML is designed for. The approach is loosely based on looking for visual “markers” within a document: e.g. a section title which is larger than the surrounding text and bolded, a list of references at the end of the document, a caption immediately before or after an embedded figure, etc. Although the parsers are far from perfect, they have been developed over dozens of scholarly articles and continue to be improved.