bluedot.net

May 25, 2004

Non-Extractive Parsing for XML

Filed under:Development, Markup — sps @ 3:28 pm

Text processing is one of the most common tasks in application development. Whether it is a Java Servlet or a VOIP application, the conversion from a raw text-based input message to a machine-readable internal representation almost always requires parsing (or tokenization), which, in its current form, refers to a process of extracting tokens (the smallest unit of relevent textual information) and storing them in null-terminated memory blocks, also known as strings. Over the years, people have invented various automation techniques and tools, e.g. regular expression and Lex, to reduce the complexity of manual parsing. Proven both useful and stable, those techniques and tools have stood the test of time. As a result, the current framework of text processing is generally considered to be fairly well-established.

[read the article]

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Blue Dot
  • StumbleUpon
  • Technorati
  • Reddit
  • YahooMyWeb
end

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.