-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser: limit maximum number of tokens #3684
Conversation
✅ Deploy Preview for compassionate-pike-271cb3 ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
Hi @IvanGoncharov, I'm @github-actions bot happy to help you with this PR 👋 Supported commandsPlease post this commands in separate comments and only one per comment:
|
Motivation: Parser CPU and memory usage is linear to the number of tokens in a document however in extreme cases it becomes quadratic due to memory exhaustion. On my mashine it happens on queries with 2k tokens. For example: ``` { a a <repeat 2k times> a } ``` It takes 741ms on my machine. But if we create document of the same size but smaller number of tokens it would be a lot faster. Example: ``` { a(arg: "a <repeat 2k times> a" } ``` Now it takes only 17ms to process, which is 43 time faster. That mean if we limit document size we should make this limit small since it take only two bytes to create a token, e.g. ` a`. But that will hart legit documents that have long tokens in them (comments, describtions, strings, long names, etc.). That's why this PR adds a mechanism to limit number of token in parsed document. Also exact same mechanism implemented in graphql-java, see: graphql-java/graphql-java#2549 I also tried alternative approach of counting nodes and it gives slightly better approximation of how many resources would be consumed. However comparing to the tokens, AST nodes is implementation detail of graphql-js so it's imposible to replicate in other implementation (e.g. to count this number on a client).
e04ce2e
to
814efb8
Compare
This comment has been minimized.
This comment has been minimized.
@saihaj Please, see benchmark results here: https://github.com/graphql/graphql-js/runs/7580332109?check_suite_focus=true#step:6:1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be something that lives on server side code not in a reference implementation. The users of library can chose to limit the tokens onParse
phase
I think by having ability to throw an error within parse, you don't have to have a pre-parse step that would separately count the number of tokens? |
@IvanGoncharov looks good to me, I suggested changes to the wording on a few of the comments if you think an improvement. |
Co-authored-by: Yaacov Rydzinski <[email protected]>
Thanks, @yaacovCR I merged those.
@saihaj The problem here is that |
Maybe we should consider making |
@IvanGoncharov will you have a default max token size or is that just a new option that you add and leave the setting up to the user? |
@michaelstaib The idea is to leave the default limit to the more high-level libraries. |
@saihaj This is a way bigger task. I propose merging this as a solution to a problem and return back to the discussion if we will have an async parser in the future. |
Backport of graphql#3684 Motivation: Parser CPU and memory usage is linear to the number of tokens in a document however in extreme cases it becomes quadratic due to memory exhaustion. On my mashine it happens on queries with 2k tokens. For example: ``` { a a <repeat 2k times> a } ``` It takes 741ms on my machine. But if we create document of the same size but smaller number of tokens it would be a lot faster. Example: ``` { a(arg: "a <repeat 2k times> a" } ``` Now it takes only 17ms to process, which is 43 time faster. That mean if we limit document size we should make this limit small since it take only two bytes to create a token, e.g. ` a`. But that will hart legit documents that have long tokens in them (comments, describtions, strings, long names, etc.). That's why this PR adds a mechanism to limit number of token in parsed document. Also exact same mechanism implemented in graphql-java, see: graphql-java/graphql-java#2549 I also tried alternative approach of counting nodes and it gives slightly better approximation of how many resources would be consumed. However comparing to the tokens, AST nodes is implementation detail of graphql-js so it's imposible to replicate in other implementation (e.g. to count this number on a client). * Apply suggestions from code review Co-authored-by: Yaacov Rydzinski <[email protected]> Co-authored-by: Yaacov Rydzinski <[email protected]>
Backport of graphql#3684 Motivation: Parser CPU and memory usage is linear to the number of tokens in a document however in extreme cases it becomes quadratic due to memory exhaustion. On my mashine it happens on queries with 2k tokens. For example: ``` { a a <repeat 2k times> a } ``` It takes 741ms on my machine. But if we create document of the same size but smaller number of tokens it would be a lot faster. Example: ``` { a(arg: "a <repeat 2k times> a" } ``` Now it takes only 17ms to process, which is 43 time faster. That mean if we limit document size we should make this limit small since it take only two bytes to create a token, e.g. ` a`. But that will hart legit documents that have long tokens in them (comments, describtions, strings, long names, etc.). That's why this PR adds a mechanism to limit number of token in parsed document. Also exact same mechanism implemented in graphql-java, see: graphql-java/graphql-java#2549 I also tried alternative approach of counting nodes and it gives slightly better approximation of how many resources would be consumed. However comparing to the tokens, AST nodes is implementation detail of graphql-js so it's imposible to replicate in other implementation (e.g. to count this number on a client). * Apply suggestions from code review Co-authored-by: Yaacov Rydzinski <[email protected]> Co-authored-by: Yaacov Rydzinski <[email protected]>
Motivation: Parser CPU and memory usage are linear to the number of tokens in a document however, in extreme cases, it becomes quadratic due to memory exhaustion.
On my machine, it happens on queries with 2k tokens.
For example:
It takes 741ms on my machine.
But if we create a document of the same size but a smaller number of tokens, it would be a lot faster.
Example:
Now it takes only 17ms to process, which is 43 times faster.
If we just limit document size, we should make this limit small since it takes only two bytes to create a token, e.g.
a
.But that will create issues for legit documents with long tokens (comments, descriptions, strings, long names, etc.).
That's why this PR adds a mechanism to limit the number of tokens in the parsed document.
Also, exact same mechanism is implemented in graphql-java, see:
graphql-java/graphql-java#2549
I also tried the alternative approach of counting nodes, and it gives
slightly better approximation of how many resources would be consumed.
However, compared to the tokens, AST nodes are an implementation detail of graphql-js
so it's impossible to replicate in other implementations (e.g. to count
this number on a client).