GitHub - pvdz/ZeParser: My JavaScript parser

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.gitignore		.gitignore
.npmignore		.npmignore
LICENSE		LICENSE
README		README
Tokenizer.js		Tokenizer.js
ZeParser.js		ZeParser.js
benchmark.html		benchmark.html
interactive.html		interactive.html
package.json		package.json
test-parser.html		test-parser.html
test-tokenizer.html		test-tokenizer.html
tests.js		tests.js
unicodecategories.js		unicodecategories.js

Repository files navigation

This is a JavaScript parser.
http://github.com/qfox/ZeParser
(c) Peter van der Zee
http://qfox.nl


Benchmark
http://qfox.github.com/ZeParser/benchmark.html

The Tokenizer is used by the parser. The parser tells the tokenizer whether the next token may be a regular expression or not. Without the parser, the tokenizer will fail if regular expression literals are used in the input.

Usage:
ZeParser.parse(input);

Returns a "parse tree" which is a tree of an array of arrays with tokens (regular objects) as leafs. Meta information embedded as properties (of the arrays and the tokens).

ZeParser.createParser(input);

Returns a new ZeParser instance which has already parsed the input. Amongst others, the ZeParser instance will have the properties .tree, .wtree and .btree.

.tree is the parse tree mentioned above.
.wtree ("white" tree) is a regular array with all the tokens encountered (including whitespace, line terminators and comments)
.btree ("black" tree) is just like .wtree but without the whitespace, line terminators and comments. This is what the specification would call the "token stream".

I'm aware that the naming convention is a bit awkward. It's a tradeoff between short and descriptive. The streams are used quite often in the analysis.

Tokens are regular objects with several properties. Amongst them are .tokposw and .tokposw, they correspond with their own position in the .wtree and .btree.

The parser has two modes for parsing: simple and extended. Simple mode is mainly for just parsing and returning the streams and a simple parse tree. There's not so much meta information here and this mode is mainly built for speed. The other mode has everything required for Zeon to do its job. This mode is toggled by the instance property .ast, which is true by default :)


Non-factory example:

var input = "foo";
var tree = []; // this should probably be refactored away some day
var tokenizer = new Tokenizer(input); // dito
var parser = new ZeParser(input, tokenizer, tree);
parser.parse(); // returns tree..., should never throw errors
parser.tokenizer.fixValues(); // makes sure all tokens have a .value property


Highlighting example:

var parser = ZeParser.createParser(textarea.value); // textarea.value:input
parser.tokenizer.fixValues(); // makes sure all tokens have a .value property
var wtree = parser.tokenizer.wtree; // all the tokens ("token stream", including whitespace)
textarea.className = '';
var tokenstrings = wtree.map(function(t){
	if (t.name == 14) textarea.className = 'error';
	return '<span class="t'+t.name+'">'+(t.name==13?'\u29e6':(t.name==14?'\u292c':t.value)).replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;')+'</span>';
});
// the string that would contain highlighted code
// tokenstrings.join('');