-
-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Lexer string interning #1758
Conversation
66aa45c
to
50f0325
Compare
Codecov Report
@@ Coverage Diff @@
## main #1758 +/- ##
==========================================
- Coverage 57.02% 55.72% -1.30%
==========================================
Files 199 201 +2
Lines 16842 17336 +494
==========================================
+ Hits 9604 9661 +57
- Misses 7238 7675 +437
Continue to review full report at Codecov.
|
18a2d91
to
a90a690
Compare
Test262 conformance changesVM implementation
|
I will try to solve #503 with this. |
dee5b3e
to
ac882f1
Compare
I tried to start implementing this for the parser, but the changes are huge. I created a project and will create new issues to implement the interner for the parser, the compiler and the executor, but I think this is ready for review and merge. I did some benchmarks to see which backend was faster, and for now, I selected the fastest one in my machine. This might change once we implement the interner in more places of the engine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the interner in a global hidden by an API might make it easier to develop boa
(since we wouldn't need to pass its reference around everywhere), but I don't feel too strongly either way.
@Razican correct me if I'm wrong, but I think that would not work, because the interner must be specific to one parsed block of code, so it can be deallocated when the code block is being dropped right? |
If you don't de-allocate, you could use the same interner multiple times, in theory, but if the application is running for long, you might have a huge memory usage. |
Yeah, I guess depending on the use case you would want to drop the As I said, I can accept this choice, just not sure it makes much of a difference, if the Maybe for game engines or other embedding use cases it would be relevant to not use just 1 ( (I feel this last point is good enough reason to keep it in the |
I'm also thinking on servers using it as their scripting language of choice. If they can run multiple scripts, in different days or so, I think this approach could be better. |
60034b8
to
6d63566
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I agree with Rageknify's concern around this being passed everywhere, but I don't see any other option from what has been discussed
/// The string interner for Boa. | ||
/// | ||
/// This is a type alias that makes it easier to reference it in the code. | ||
pub type Interner = StringInterner<BucketBackend<Sym>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come you chose bucketbackend over stringbackend? Im guessing because the use of static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statics seem to be the reason. It probably makes sense that we try out the stringbackend when we use the interner in the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason was that I tried locally all backends and this was the one giving better results, but I have no personal preference xD in the future once we use the interner everywhere we can benchmark again.
bors r+ |
This Pull Request is part of #279. It adds a string interner to Boa, which allows many types to not contain heap-allocated strings, and just contain a `NonZeroUsize` instead. This can move types to the stack (hopefully I'll be able to move `Token`, for example, maybe some `Node` types too. Note that the internet is for now only available in the lexer. Next steps (in this PR or future ones) would include also using interning in the parser, and finally in execution. The idea is that strings should be represented with a `Sym` until they are displayed. Talking about display. I have changed the `ParseError` type in order to not contain anything that could contain a `Sym` (basically tokens), which might be a bit faster, but what is important is that we don't depend on the interner when displaying errors. The issue I have now is in order to display tokens. This requires the interner if we want to know identifiers, for example. The issue here is that Rust doesn't allow using a `fmt::Formatter` (only in nightly), which is making my head hurt. Maybe someone of you can find a better way of doing this. Then, about `cursor.expect()`, this is the only place where we don't have the expected token type as a static string, so it's failing to compile. We have the option of changing the type definition of `ParseError` to contain an owned string, but maybe we can avoid this by having a `&'static str` come from a `TokenKind` with the default values, such as "identifier" for an identifier. I wanted for you to think about it and maybe we can just add that and avoid allocations there. Oh, and this depends on the VM-only branch, so that has to be merged before :) Another thing to check: should the interner be in its own module?
Build failed: |
I will rebase this, it seems to fail due to Rust 1.58 lints. |
One option here would be to have the interner in the parser, as a reference. Might make sense actually, and I might do so in the other PR, if you're OK with it. |
6d63566
to
7057a08
Compare
I had to revert the upgrade to Wasm-bindgen 0.2.79 due to rustwasm/wasm-bindgen#2774. |
bors r+ |
This Pull Request is part of #279. It adds a string interner to Boa, which allows many types to not contain heap-allocated strings, and just contain a `NonZeroUsize` instead. This can move types to the stack (hopefully I'll be able to move `Token`, for example, maybe some `Node` types too. Note that the internet is for now only available in the lexer. Next steps (in this PR or future ones) would include also using interning in the parser, and finally in execution. The idea is that strings should be represented with a `Sym` until they are displayed. Talking about display. I have changed the `ParseError` type in order to not contain anything that could contain a `Sym` (basically tokens), which might be a bit faster, but what is important is that we don't depend on the interner when displaying errors. The issue I have now is in order to display tokens. This requires the interner if we want to know identifiers, for example. The issue here is that Rust doesn't allow using a `fmt::Formatter` (only in nightly), which is making my head hurt. Maybe someone of you can find a better way of doing this. Then, about `cursor.expect()`, this is the only place where we don't have the expected token type as a static string, so it's failing to compile. We have the option of changing the type definition of `ParseError` to contain an owned string, but maybe we can avoid this by having a `&'static str` come from a `TokenKind` with the default values, such as "identifier" for an identifier. I wanted for you to think about it and maybe we can just add that and avoid allocations there. Oh, and this depends on the VM-only branch, so that has to be merged before :) Another thing to check: should the interner be in its own module?
Pull request successfully merged into main. Build succeeded: |
Actually, now that I check it, the only way to do this would be to change all the methods to receive a |
This builds on top of #1758 to try to bring #1763 to life. Something that should probably be done here would be to convert `JsString` to a `Sym` internally. Then, further optimizations could be done adding common strings to a custom interner type (those that we know statically). This is definitely work in progress, but I would like to have feedback on the API, and feel free to contribute. Co-authored-by: raskad <[email protected]>
This Pull Request is part of #279.
It adds a string interner to Boa, which allows many types to not contain heap-allocated strings, and just contain a
NonZeroUsize
instead. This can move types to the stack (hopefully I'll be able to moveToken
, for example, maybe someNode
types too.Note that the internet is for now only available in the lexer. Next steps (in this PR or future ones) would include also using interning in the parser, and finally in execution. The idea is that strings should be represented with a
Sym
until they are displayed.Talking about display. I have changed the
ParseError
type in order to not contain anything that could contain aSym
(basically tokens), which might be a bit faster, but what is important is that we don't depend on the interner when displaying errors.The issue I have now is in order to display tokens. This requires the interner if we want to know identifiers, for example. The issue here is that Rust doesn't allow using a
fmt::Formatter
(only in nightly), which is making my head hurt. Maybe someone of you can find a better way of doing this.Then, about
cursor.expect()
, this is the only place where we don't have the expected token type as a static string, so it's failing to compile. We have the option of changing the type definition ofParseError
to contain an owned string, but maybe we can avoid this by having a&'static str
come from aTokenKind
with the default values, such as "identifier" for an identifier. I wanted for you to think about it and maybe we can just add that and avoid allocations there.Oh, and this depends on the VM-only branch, so that has to be merged before :)
Another thing to check: should the interner be in its own module?