All Stuff and More: Parsing a Text on Android Phone

Monday, September 7, 2009

Parsing a Text on Android Phone

Introduction
I have a wiki markup text that I need to parse. My first version, which I use in DroidWiki application for Android, is a wiki custom parsing. I wrote my own parsing, because I haven't found a parsing code on the internet that was light enough to be used on Android phone. That code does a regular expression matching for each wiki tag. So, every line is matched at least Recently, I decided to try my hand at more fancy parsing: just parse into a sequence of tokens. Below, are the results of my research on this topic.

Using One-Character-At-A-Time Parsing
One way to solve the problem is to use a one-pass parsing, having the Java code to look at each character (just once) and isolate the tokens this way. Using the character iterator goes like this:

StringCharacterIterator iter = new StringCharacterIterator(markup);

for( char c = iter.first(); c != CharacterIterator.DONE; c = iter.next() ) {
  // process the char: is it one of the characters that starts any of the syntax tokens?
}

This could be fast, but the code would be complicated. I decided for a different approach.

Using Interpreter Design Pattern
The idea:

Parse the text and convert it to a sequence of basic tokens:
- every continuous piece of a regular text is a token
- every sequence of syntax is a token (for example, the char less-than if parsing HTML text would be a simgle token by itself)
Process the list of these basic tokens and aggregate them into complete syntactical elements (for example, every complete HTML tag would be a single token).

If you want, create

a SimpleToken class for item 1 above,
a base Token class and specialize it for more significant/complex syntactical elements for item 2 above

All Stuff and More

Monday, September 7, 2009

Parsing a Text on Android Phone

No comments:

My Blog List

Blog Archive

About Me

Post Tags