Stats on XML Errors in Feeds from Google

22 Mar 2006

here are some interesting statsfrom google’s “reader” blog:>

% of errors Error description
15.6%Input claims to be UTF-8 but contains invalid characters.
14.9%Opening and ending tags mismatch
13.9%An undefined entity is used (e.g. `` `` in an XML document without importing the HTML set)
7.8%Documented expected to begin with a start tag, but no ``<`` was found
5.7%Disallowed control characters present
5.5%Extra content at the end of the document
4.2%Unterminated entity reference (missing semi-colon)
4.2%Unquoted attribute value
3.8%Premature end of data in tag (truncated feed)
3.3%Naked ampersand (should be represented as ``&amp;``)
2.1%XML declaration allowed only at the start of the document
1.8%Namespace prefix is used but not defined
0.75%Comment not terminated
0.64%Attribute without value
0.17%Unescaped ``<`` not allowed in attributes values
0.11%Malformed numerical entity reference
0.11%Unsupported/invalid encoding
0.10%Comment must not contain '--'
0.10%Attribute defined more than once
0.07%Char out of allowed range
0.03%Comment not terminated
0.02%Sequence ``]]>`` not allowed in content