-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly decrease parser file size by compacting parser table #234
Conversation
There are two immediate cases left for further optimization:
|
} while (id !== 0); | ||
return name; | ||
} | ||
var nextVariableId = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should generateTableCode
reset this to 0
? If you were creating multiple parsers the second one would start at an arbitrary position, if that matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New parsers can initialize this to 0 indeed, but it is not necessary. If they do, variable names can be shorter (as they don't have to be unique across all parsers, only within a single parser). However, minification will reassign variable names anyway, making them as short as possible.
Summarizing: you could make createVariable
a member function with nextVariableId
as a member variable, so names will be shorter in unminified versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe the best and easiest option: generateModule_
can set nextVariableId
to 0.
(That way, other methods can also create and use new variables.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that 👍. In general it's nice to keep functions as "pure" as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Maybe the cleanest would have been to make it a member function, but that would add unnecessary complexity. The important thing is that there are no side-effects.
Two thumbs up 👍 👍 |
Significantly decrease parser file size by compacting parser table
Yihaa, thanks for merging! Any chance you could publish a new version to npm? I plan to have a look next week at the other optimizations suggested in my comment above. Probably they won't be as spectacular, but we might still shave a few kilobytes off. |
Pull request #235 implements the suggestion “objects with almost all identical values”. I also tried “recognition of sublists”, but this doesn't bring the gzipped size down (as expected); it also doesn't significantly change the minified version. Therefore, I haven't included it. |
Hi, this looks really cool and I'm trying to understand the impact on the development process. The build script for your SPARQL parser calls jison directly:
Should I see a call to a minimizer which would replace identical sequences in the generated table? |
There is no explicit call to a minimizer; it is part of the Jison build process. |
Summary
This pull request nearly halves the gzipped size of generated parsers.
Problem
The largest part of a Jison-generated parser is its
table
: a large array containing objects with numeric keys and (arrays of) numeric values.Two kinds of patterns occur frequently in such a table:
example:
tables = [{ 5: [1,3,4,6,7,8,15], 6: 7}, { 20: [1,3,4,6,7,8,15], 9: 8 }]
example:
tables = [{ 5: [200,204], 17: [200,204], 20: [200,204], 21: [200,204] }]
Solution
I tackled the first case by storing frequently occurring arrays into temporary variables:
I tackled the second case by creating such objects with an auxiliary function
o
:That also leads to new long arrays with numbers, which can be optimized under the first case.
Not only does this lead to a significantly decreased filesize of the parser, it also leads to less memory usage, as it avoids having multiple copies of the same array in memory.
To support such chunks of reusable code, the
generateModule_
function has been updated to return an object withcommonCode
andmoduleCode
(instead of onlymoduleCode
).Results
A parser I am working on benefited significantly from the new table generation function:
The decrease from 36kb to 19kb is a 47% reduction.