How are documents from HTML4 and earlier versions parsed?
All documents with a text/html media type (that is, including those without or with an HTML 2.0, HTML 3.2, HTML4, or XHTML1 DOCTYPE) will be parsed using the same parser algorithm as defined by the HTML spec. This matches what Web browsers have done for HTML documents so far and keeps code complexity down. That in turn is good for security, maintainability, and in general keeping the amount of bugs down. The HTML syntax as now defined therefore does not require a new parser and documents with an HTML4 DOCTYPE for example will be parsed as described by the new HTML specification. Validators are allowed to have different code paths for previous levels of HTML.
Related Questions
- What does it mean that only Office 2007 documents files can be scrubbed, implying that documents in older versions (i.e 2003) can only be destroyed?
- Will earlier versions of iGrafx Process for Six Sigma documents open in iGrafx Process 2011 for Six Sigma?
- Why can’t users of previous versions of Word view macros in some Word 2000 documents?