Significantly decrease parser file size by compacting parser table #234

RubenVerborgh · 2014-08-17T15:33:43Z

Summary

This pull request nearly halves the gzipped size of generated parsers.

Problem

The largest part of a Jison-generated parser is its table: a large array containing objects with numeric keys and (arrays of) numeric values.

Two kinds of patterns occur frequently in such a table:

repeated long numerical arrays

example: tables = [{ 5: [1,3,4,6,7,8,15], 6: 7}, { 20: [1,3,4,6,7,8,15], 9: 8 }]
objects where all keys have the same value

example: tables = [{ 5: [200,204], 17: [200,204], 20: [200,204], 21: [200,204] }]

Solution

I tackled the first case by storing frequently occurring arrays into temporary variables:

var a = [1,3,4,6,7,8,15],
    tables = [{ 5: a, 6: 7 }, { 20: a, 9: 8 }];

I tackled the second case by creating such objects with an auxiliary function o:

var tables = [o([200,204], [5, 17, 20, 21])];

That also leads to new long arrays with numbers, which can be optimized under the first case.

Not only does this lead to a significantly decreased filesize of the parser, it also leads to less memory usage, as it avoids having multiple copies of the same array in memory.

To support such chunks of reusable code, the generateModule_ function has been updated to return an object with commonCode and moduleCode (instead of only moduleCode).

Results

A parser I am working on benefited significantly from the new table generation function:

before: 173kb (generated), 155kb (minified), 36kb (gzipped)
optimization 1: 138kb (generated), 112kb (minified), 36kb (gzipped)
optimizations 1 and 2: 91kb (generated), 71kb (minified), 19kb (gzipped)

The decrease from 36kb to 19kb is a 47% reduction.

RubenVerborgh · 2014-08-17T18:10:29Z

There are two immediate cases left for further optimization:

recognition of sublists
e.g., [1,11,12,13,14,15,100] and [2,11,12,13,14,15,200] share the majority of elements
objects with almost all identical values
e.g., {1: X, 2: X, 3: X, 4: X, 5: Y} is almost a candidate for optimization 2

zaach · 2014-08-17T19:07:20Z

lib/jison.js

+  } while (id !== 0);
+  return name;
+}
+var nextVariableId = 0;


Should generateTableCode reset this to 0? If you were creating multiple parsers the second one would start at an arbitrary position, if that matters.

New parsers can initialize this to 0 indeed, but it is not necessary. If they do, variable names can be shorter (as they don't have to be unique across all parsers, only within a single parser). However, minification will reassign variable names anyway, making them as short as possible.

Summarizing: you could make createVariable a member function with nextVariableId as a member variable, so names will be shorter in unminified versions.

Or maybe the best and easiest option: generateModule_ can set nextVariableId to 0.
(That way, other methods can also create and use new variables.)

Added in https://github.com/RubenVerborgh/jison/commit/5c59629ecb7273efc137168f49216a42a7697eed

I like that 👍. In general it's nice to keep functions as "pure" as possible.

True. Maybe the cleanest would have been to make it a member function, but that would add unnecessary complexity. The important thing is that there are no side-effects.

zaach · 2014-08-17T19:34:09Z

Two thumbs up 👍 👍

Significantly decrease parser file size by compacting parser table

RubenVerborgh · 2014-08-17T19:35:30Z

Yihaa, thanks for merging! Any chance you could publish a new version to npm?

I plan to have a look next week at the other optimizations suggested in my comment above. Probably they won't be as spectacular, but we might still shave a few kilobytes off.

RubenVerborgh · 2014-08-18T16:37:36Z

Pull request #235 implements the suggestion “objects with almost all identical values”.

I also tried “recognition of sublists”, but this doesn't bring the gzipped size down (as expected); it also doesn't significantly change the minified version. Therefore, I haven't included it.

ericprud · 2014-12-28T09:55:42Z

Hi, this looks really cool and I'm trying to understand the impact on the development process. The build script for your SPARQL parser calls jison directly:

./node_modules/jison/lib/cli.js ./lib/sparql.jison -p slr -o ./lib/SparqlParser.js

Should I see a call to a minimizer which would replace identical sequences in the generated table?

RubenVerborgh · 2014-12-28T10:00:41Z

There is no explicit call to a minimizer; it is part of the Jison build process.
Concretely, generateModule_ uses the minimized table.

RubenVerborgh added 2 commits August 17, 2014 15:40

generateModule_ returns common and module code.

1171af5

Replace frequent number lists by variables.

a7bca89

Replace objects with identical values by function calls.

8d64d35

zaach reviewed Aug 17, 2014
View reviewed changes

Use the same variable names for each generation.

5c59629

zaach added a commit that referenced this pull request Aug 17, 2014

Merge pull request #234 from RubenVerborgh/compact-table

8543cc4

Significantly decrease parser file size by compacting parser table

zaach merged commit 8543cc4 into zaach:master Aug 17, 2014

RubenVerborgh deleted the compact-table branch August 17, 2014 19:35

RubenVerborgh mentioned this pull request Aug 18, 2014

Reduce parser file size even more #235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significantly decrease parser file size by compacting parser table #234

Significantly decrease parser file size by compacting parser table #234

RubenVerborgh commented Aug 17, 2014

RubenVerborgh commented Aug 17, 2014

zaach Aug 17, 2014

RubenVerborgh Aug 17, 2014

RubenVerborgh Aug 17, 2014

RubenVerborgh Aug 17, 2014

zaach Aug 17, 2014

RubenVerborgh Aug 17, 2014

zaach commented Aug 17, 2014

RubenVerborgh commented Aug 17, 2014

RubenVerborgh commented Aug 18, 2014

ericprud commented Dec 28, 2014

RubenVerborgh commented Dec 28, 2014

Significantly decrease parser file size by compacting parser table #234

Significantly decrease parser file size by compacting parser table #234

Conversation

RubenVerborgh commented Aug 17, 2014

Summary

Problem

Solution

Results

RubenVerborgh commented Aug 17, 2014

zaach Aug 17, 2014

Choose a reason for hiding this comment

RubenVerborgh Aug 17, 2014

Choose a reason for hiding this comment

RubenVerborgh Aug 17, 2014

Choose a reason for hiding this comment

RubenVerborgh Aug 17, 2014

Choose a reason for hiding this comment

zaach Aug 17, 2014

Choose a reason for hiding this comment

RubenVerborgh Aug 17, 2014

Choose a reason for hiding this comment

zaach commented Aug 17, 2014

RubenVerborgh commented Aug 17, 2014

RubenVerborgh commented Aug 18, 2014

ericprud commented Dec 28, 2014

RubenVerborgh commented Dec 28, 2014