How can I get htdig not to index JavaScript code or CSS?
The HTML parser in htdig recognizes and parses only HTML, which is all there should be within an HTML file. If your HTML files contain in-line JavaScript code or Cascading Style Sheets (CSS), these in-line codes, which are clearly not HTML, should be enclosed within an HTML comment tag so they are hidden from view from the HTML parser, or for that matter from any web client that is not JavaScript-aware or CSS-aware. See Behind the Scenes with JavaScript for a description of the technique, which applies equally well to in-line style sheets. If fixing up all non-HTML compliant JavaScript or CSS code in your HTML files is not an option, then see question 4.15 for an alternate technique.The HTML parser in htdig 3.1.6 tries skipping over bare in-line JavaScript code in HTML, unlike previous versions, but a small bug in the parser causes it to be thrown off by a “<" sign in the JavaScript, and it may then miss the closing tag. This can be fixed by applying this patch.
The HTML parser in htdig recognizes and parses only HTML, which is all there should be within an HTML file. If your HTML files contain in-line JavaScript code or Cascading Style Sheets (CSS), these in-line codes, which are clearly not HTML, should be enclosed within an HTML comment tag so they are hidden from view from the HTML parser, or for that matter from any web client that is not JavaScript-aware or CSS-aware. See Behind the Scenes with JavaScript for a description of the technique, which applies equally well to in-line style sheets. If fixing up all non-HTML compliant JavaScript or CSS code in your HTML files is not an option, then see question 4.15 for an alternate technique. The HTML parser in htdig 3.1.6 tries skipping over bare in-line JavaScript code in HTML, unlike previous versions, but a small bug in the parser causes it to be thrown off by a “<" sign in the JavaScript, and it may then miss the closing tag. This can be fixed by applying this patch.