here are some interesting statsfrom google’s “reader” blog:>
| % of errors | Error description |
|---|---|
| 15.6% | Input claims to be UTF-8 but contains invalid characters. |
| 14.9% | Opening and ending tags mismatch |
| 13.9% | An undefined entity is used (e.g. `` `` in an XML document without importing the HTML set) |
| 7.8% | Documented expected to begin with a start tag, but no ``<`` was found |
| 5.7% | Disallowed control characters present |
| 5.5% | Extra content at the end of the document |
| 4.2% | Unterminated entity reference (missing semi-colon) |
| 4.2% | Unquoted attribute value |
| 3.8% | Premature end of data in tag (truncated feed) |
| 3.3% | Naked ampersand (should be represented as ``&``) |
| 2.1% | XML declaration allowed only at the start of the document |
| 1.8% | Namespace prefix is used but not defined |
| 0.75% | Comment not terminated |
| 0.64% | Attribute without value |
| 0.17% | Unescaped ``<`` not allowed in attributes values |
| 0.11% | Malformed numerical entity reference |
| 0.11% | Unsupported/invalid encoding |
| 0.10% | Comment must not contain '--' |
| 0.10% | Attribute defined more than once |
| 0.07% | Char out of allowed range |
| 0.03% | Comment not terminated |
| 0.02% | Sequence ``]]>`` not allowed in content |