Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly decrease parser file size by compacting parser table #234

Merged
merged 4 commits into from
Aug 17, 2014
Merged

Significantly decrease parser file size by compacting parser table #234

merged 4 commits into from
Aug 17, 2014

Conversation

RubenVerborgh
Copy link
Contributor

Summary

This pull request nearly halves the gzipped size of generated parsers.

Problem

The largest part of a Jison-generated parser is its table: a large array containing objects with numeric keys and (arrays of) numeric values.

Two kinds of patterns occur frequently in such a table:

  1. repeated long numerical arrays

    example: tables = [{ 5: [1,3,4,6,7,8,15], 6: 7}, { 20: [1,3,4,6,7,8,15], 9: 8 }]
  2. objects where all keys have the same value

    example: tables = [{ 5: [200,204], 17: [200,204], 20: [200,204], 21: [200,204] }]

Solution

I tackled the first case by storing frequently occurring arrays into temporary variables:

var a = [1,3,4,6,7,8,15],
    tables = [{ 5: a, 6: 7 }, { 20: a, 9: 8 }];

I tackled the second case by creating such objects with an auxiliary function o:

var tables = [o([200,204], [5, 17, 20, 21])];

That also leads to new long arrays with numbers, which can be optimized under the first case.

Not only does this lead to a significantly decreased filesize of the parser, it also leads to less memory usage, as it avoids having multiple copies of the same array in memory.

To support such chunks of reusable code, the generateModule_ function has been updated to return an object with commonCode and moduleCode (instead of only moduleCode).

Results

A parser I am working on benefited significantly from the new table generation function:

  • before: 173kb (generated), 155kb (minified), 36kb (gzipped)
  • optimization 1: 138kb (generated), 112kb (minified), 36kb (gzipped)
  • optimizations 1 and 2: 91kb (generated), 71kb (minified), 19kb (gzipped)

The decrease from 36kb to 19kb is a 47% reduction.

@RubenVerborgh
Copy link
Contributor Author

There are two immediate cases left for further optimization:

  1. recognition of sublists
    e.g., [1,11,12,13,14,15,100] and [2,11,12,13,14,15,200] share the majority of elements
  2. objects with almost all identical values
    e.g., {1: X, 2: X, 3: X, 4: X, 5: Y} is almost a candidate for optimization 2

} while (id !== 0);
return name;
}
var nextVariableId = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should generateTableCode reset this to 0? If you were creating multiple parsers the second one would start at an arbitrary position, if that matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New parsers can initialize this to 0 indeed, but it is not necessary. If they do, variable names can be shorter (as they don't have to be unique across all parsers, only within a single parser). However, minification will reassign variable names anyway, making them as short as possible.

Summarizing: you could make createVariable a member function with nextVariableId as a member variable, so names will be shorter in unminified versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe the best and easiest option: generateModule_ can set nextVariableId to 0.
(That way, other methods can also create and use new variables.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that 👍. In general it's nice to keep functions as "pure" as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Maybe the cleanest would have been to make it a member function, but that would add unnecessary complexity. The important thing is that there are no side-effects.

@zaach
Copy link
Owner

zaach commented Aug 17, 2014

Two thumbs up 👍 👍

zaach added a commit that referenced this pull request Aug 17, 2014
Significantly decrease parser file size by compacting parser table
@zaach zaach merged commit 8543cc4 into zaach:master Aug 17, 2014
@RubenVerborgh
Copy link
Contributor Author

Yihaa, thanks for merging! Any chance you could publish a new version to npm?

I plan to have a look next week at the other optimizations suggested in my comment above. Probably they won't be as spectacular, but we might still shave a few kilobytes off.

@RubenVerborgh
Copy link
Contributor Author

Pull request #235 implements the suggestion “objects with almost all identical values”.

I also tried “recognition of sublists”, but this doesn't bring the gzipped size down (as expected); it also doesn't significantly change the minified version. Therefore, I haven't included it.

@ericprud
Copy link

Hi, this looks really cool and I'm trying to understand the impact on the development process. The build script for your SPARQL parser calls jison directly:

./node_modules/jison/lib/cli.js ./lib/sparql.jison -p slr -o ./lib/SparqlParser.js

Should I see a call to a minimizer which would replace identical sequences in the generated table?

@RubenVerborgh
Copy link
Contributor Author

There is no explicit call to a minimizer; it is part of the Jison build process.
Concretely, generateModule_ uses the minimized table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants