bluedot.net

March 22, 2006

Stats on XML Errors in Feeds from Google

Filed under:Markup, Google — sps @ 1:35 pm

here are some interesting stats from google’s “reader” blog:

% of errors Error description
15.6%Input claims to be UTF-8 but contains invalid characters.
14.9%Opening and ending tags mismatch
13.9%An undefined entity is used (e.g.   in an XML document without importing the HTML set)
7.8%Documented expected to begin with a start tag, but no < was found
5.7%Disallowed control characters present
5.5%Extra content at the end of the document
4.2%Unterminated entity reference (missing semi-colon)
4.2%Unquoted attribute value
3.8%Premature end of data in tag (truncated feed)
3.3%Naked ampersand (should be represented as &amp;)
2.1%XML declaration allowed only at the start of the document
1.8%Namespace prefix is used but not defined
0.75%Comment not terminated
0.64%Attribute without value
0.17%Unescaped < not allowed in attributes values
0.11%Malformed numerical entity reference
0.11%Unsupported/invalid encoding
0.10%Comment must not contain ‘–’
0.10%Attribute defined more than once
0.07%Char out of allowed range
0.03%Comment not terminated
0.02%Sequence ]]> not allowed in content
end

January 30, 2006

Google Bookmarks bookmarklet

Filed under:Markup, Technology, Google — sps @ 6:11 pm

According to Digg google has released a del.icio.us‘esque bookmarking service

Access your bookmarks from any computer. Supports labels, stars, and notes.

I love del.icio.us and dont plan on switching anytime soon, but for the heck of it I created a bookmarklet to add the current page to your google bookmarks.

drag this link to your toolbar (works in safari, dont know about others):

Add To Google Bookmarks

end

December 15, 2005

Designers: How Much To Charge

Filed under:Misc, Markup, Technology — sps @ 4:30 pm

Found this via NSLog();

Designers: How much to charge

end

June 27, 2004

Dynamic Text Replacement

Filed under:Markup — sps @ 7:28 pm

Let your server do the walking! Whether you’re replacing one headline or a thousand, Stewart Rosenberger’s Dynamic Text Replacement automatically swaps XHTML text with an image of that text, consistently displayed in any font you own. The markup is clean, semantic, and accessible. No CSS hacks are required, and you needn’t open Photoshop or any other image editor. Read about it today; use it on personal and commercial web projects tomorrow.

[read the article]

end

June 17, 2004

The Atom Link Model

Filed under:Markup — sps @ 2:54 pm

Atom is an emerging XML vocabulary and protocol for syndication and editing. Atom has a coherent linking model to express a number of different types of links. Atom borrows heavily from the element in HTML, although they are not identical. This article explores several of the most common link types that are already deployed in Atom feeds today.

[read the article]

end

June 14, 2004

Improve XML transport performance

Filed under:Markup — sps @ 12:45 pm

XML is a text markup format designed for clarity and ease of use, without concern for conciseness. Because of these design choices, text XML can be costly in terms of both document size and processing overhead. Part 1 of this two-part article shows you some of the issues involved in alternative non-text representations of XML, and covers a few of the approaches being developed for this purpose; Part 2 will add some actual performance measurements so you can get a feel for the level of improvements possible.

[read the article]

end

June 5, 2004

Thinking XML: Use the Atom format for syndicating news and more

Filed under:Markup — sps @ 10:29 pm

The Web has always included sites that present series of articles, events, and other postings which are meant to be shared and cross-referenced. With large parts of the Web becoming conversational communities, many in these communities have come together to work on an XML-based standard for such interchange and cross-reference. Atom is the product of this effort — a format and API for exchanging Web metadata.

[read the article]

end

May 25, 2004

Non-Extractive Parsing for XML

Filed under:Development, Markup — sps @ 3:28 pm

Text processing is one of the most common tasks in application development. Whether it is a Java Servlet or a VOIP application, the conversion from a raw text-based input message to a machine-readable internal representation almost always requires parsing (or tokenization), which, in its current form, refers to a process of extracting tokens (the smallest unit of relevent textual information) and storing them in null-terminated memory blocks, also known as strings. Over the years, people have invented various automation techniques and tools, e.g. regular expression and Lex, to reduce the complexity of manual parsing. Proven both useful and stable, those techniques and tools have stood the test of time. As a result, the current framework of text processing is generally considered to be fairly well-established.

[read the article]

end

May 21, 2004

Normalizing Syndicated Feed Content

Filed under:Development, Markup — sps @ 10:39 pm

So you want to write a program to read RSS and Atom syndicated feeds. Sounds simple enough. After all, RSS stands for “Really Simple Syndication” (or “Rich Site Summary”, or “RDF Site Summary”, or something), and Atom is just RSS with different tag names, right? Well, not exactly.

[read the article]

end

May 20, 2004

XML 1.1 and Namespaces 1.1 Revealed

Filed under:Markup — sps @ 7:16 pm

This article attempts to clear up the mystery that seems to surround XML 1.1 and its companion specification, Namespaces 1.1. With this information in hand you are prepared to deal with XML 1.1 should you ever be asked to support it in your programs. XML 1.1 is not a revolution — it’s merely an evolution of XML 1.0 that does not require major changes. Most people will end up with XML 1.1 processors as they upgrade their parsers, just as all the Xerces users already did. Indeed, since version 2.3.0 was released over a year ago Xerces Java can parse XML 1.1 documents! And since the recent version 2.5.0, Xerces C++ can too. So, even though you may not know it, if you have already picked up one of these versions or a more recent one you can already process XML 1.1 documents. The nature of the changes brought by XML 1.1 and Namespaces 1.1 do not necessitate such a change in the Infoset specification. When the W3C released the other two recommendations, they also released a new edition of the XML Information Set Recommendation in which the impact of these specs is described, but basically it is limited in what content one can find in the Infoset. No structural change was made to the data model, and therefore you don’t need to define new information items or modify existing ones.

[ read the article]

end