Why do some objects contain raw HTML tags?
If the SOIF produced from an object (or the output displayed by the broker) contains raw HTML tags, this indicates that the HTML in the page has not been correctly parsed. Harvest uses a strict, SGML based parser by default. This parser will fail if the page contains invalid HTML. You can turn on errors to see which pages are failing, and why, by altering a line in the file $HARVEST_HOME/lib/gatherer/SGML.