-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http: simplify checkInvalidHeaderChar #18381
Conversation
lib/_http_common.js
Outdated
return true; | ||
} | ||
return false; | ||
return !headerCharRegex.test(val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Rather than doing
!/^[...]*$/.test(val)
, it might be simpler to do/[^...]/.test(val)
. - It seems there are some regressions in the benchmark. If they still exist after swapping the conditions as I did above, we might want to add a fast case for things like the empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am relatively certain that there will still be a certain amount of regression even with that change. If not: wonderful. But in case the regressions continue to exist: I would like to keep the implementation and add the RegExp
for val.length > 5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed response, and thanks for the feedback!
- Agreed,
/[^...]/.test(val)
is prettier. I'll update it. - I think regressing some particular strings is okay if the expected overall throughput on realistic data is improved. I don't have real-world data, but I do have data from an AcmeAir run (which uses popular modules in relatively straightforward ways, so might be somewhat representative of real data), and it tends to call this method with longer strings where the regex is faster. The same sort of suggestions came up in the previous change to simplify checkIsHttpToken, and people ended up deciding that the readability win was worth regressing a few benchmark cases. Furthermore, V8 folks like @bmeurer and @psmarshall are committed to ensuring that idiomatic code is performant, and there is already an open bug for improving regex performance on short inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison between iterations (old is !/^[...]*$/.test(val)
, new is /[^...]/.test(val)
):
improvement confidence p.value
http\\check_invalid_header_char.js n=1000000 input="\177" -10.43 % *** 1.145512e-11
http\\check_invalid_header_char.js n=1000000 input="" 20.55 % *** 5.135039e-20
http\\check_invalid_header_char.js n=1000000 input="\\t\\t\\t\\t\\t\\t\\t\\t\\t\\tFoo bar baz" -1.96 % 3.430864e-01
http\\check_invalid_header_char.js n=1000000 input="1" 18.08 % *** 1.373559e-22
http\\check_invalid_header_char.js n=1000000 input="20091" 13.54 % *** 4.766386e-09
http\\check_invalid_header_char.js n=1000000 input="ä,-æ-╪å`¢" -9.09 % *** 2.598671e-09
http\\check_invalid_header_char.js n=1000000 input="close" 14.75 % *** 5.152685e-10
http\\check_invalid_header_char.js n=1000000 input="en-US" 15.73 % *** 4.681282e-09
http\\check_invalid_header_char.js n=1000000 input="foo\\nbar" -1.83 % 1.630593e-01
http\\check_invalid_header_char.js n=1000000 input="group_acmeair" -13.64 % *** 8.914585e-15
http\\check_invalid_header_char.js n=1000000 input="gzip" 14.60 % *** 7.965079e-26
http\\check_invalid_header_char.js n=1000000 input="Here is a value that is real..."(truncated) 55.78 % *** 9.914895e-30
http\\check_invalid_header_char.js n=1000000 input="keep-alive" 10.51 % *** 5.515997e-09
http\\check_invalid_header_char.js n=1000000 input="private" 9.98 % *** 4.858798e-09
http\\check_invalid_header_char.js n=1000000 input="SAMEORIGIN" 7.52 % *** 1.855577e-13
http\\check_invalid_header_char.js n=1000000 input="Sat, 07 May 2016 16:54:48 GMT" -7.75 % *** 1.633302e-04
http\\check_invalid_header_char.js n=1000000 input="text/html; charset=utf-8" -8.10 % *** 2.957433e-07
http\\check_invalid_header_char.js n=1000000 input="text/plain" 7.21 % *** 2.673265e-11
These changes are generally pretty small compared to the change-versus-baseline numbers. I'll update the table in the description as soon as I push the updated code.
Ping @sethbrenith |
e4b1747
to
54f84e9
Compare
In the spirit of [17399](nodejs#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data.
54f84e9
to
1179bab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This should no be backported to 4 or 6, as I suspect the performance might be completely different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In the spirit of [17399](nodejs#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data. PR-URL: nodejs#18381 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Benedikt Meurer <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Landed in 862389b 🎉 |
Thanks everyone! |
In the spirit of [17399](#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data. PR-URL: #18381 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Benedikt Meurer <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
In the spirit of [17399](#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data. PR-URL: #18381 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Benedikt Meurer <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
In the spirit of [17399](nodejs#17399), we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data. PR-URL: nodejs#18381 Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: Matteo Collina <[email protected]> Reviewed-By: Joyee Cheung <[email protected]> Reviewed-By: Benedikt Meurer <[email protected]> Reviewed-By: Tiancheng "Timothy" Gu <[email protected]>
Does not land cleanly on 8.x. Will need a backport PR in order to land. |
In the spirit of 17399, we can also simplify checkInvalidHeaderChar to use regex matching instead of a loop. This makes it faster on long matches and slower on short matches or non-matches. This change also includes some sample data from an AcmeAir benchmark run, as a rough proxy for real-world data.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
Benchmark results