Stats on XML Errors in Feeds from Google
here are some interesting stats from google’s “reader” blog:
% of errors Error description 15.6% Input claims to be UTF-8 but contains invalid characters. 14.9% Opening and ending tags mismatch 13.9% An undefined entity is used (e.g. in an XML document without importing the HTML set)7.8% Documented expected to begin with a start tag, but no <was found5.7% Disallowed control characters present 5.5% Extra content at the end of the document 4.2% Unterminated entity reference (missing semi-colon) 4.2% Unquoted attribute value 3.8% Premature end of data in tag (truncated feed) 3.3% Naked ampersand (should be represented as &)2.1% XML declaration allowed only at the start of the document 1.8% Namespace prefix is used but not defined 0.75% Comment not terminated 0.64% Attribute without value 0.17% Unescaped <not allowed in attributes values0.11% Malformed numerical entity reference 0.11% Unsupported/invalid encoding 0.10% Comment must not contain ‘–’ 0.10% Attribute defined more than once 0.07% Char out of allowed range 0.03% Comment not terminated 0.02% Sequence ]]>not allowed in content
