-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise strings.ToLower() #1170
base: main
Are you sure you want to change the base?
Conversation
Benchmark:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1170 +/- ##
==========================================
+ Coverage 82.72% 83.56% +0.84%
==========================================
Files 162 166 +4
Lines 9080 9572 +492
==========================================
+ Hits 7511 7999 +488
- Misses 1319 1323 +4
Partials 250 250
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
// asciiToLowerInPlace converts ASCII characters in the string to lowercase | ||
// starting from the specified index. | ||
func asciiToLowerInPlace(s string, start int) string { | ||
res := []byte(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you have access to medium but this can be done more efficiently https://josestg.medium.com/140x-faster-string-to-byte-and-byte-to-string-conversions-with-zero-allocation-in-go-200b4d7105fc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"returned byte slice MUST NOT be modified since it shares the same backing array with the given string."
Not sure, if this is the best use here, i tried to copy the byte slice before modifying it, and it is causing unexpected behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not per se against this change (performance is a key concept), but I would like to get a better understanding of the tradeoff we have in order to gain this extra performance.
- Being just meant for ASCII chars, what are the transformations that we would not perform anymore? (First of all, I'm thinking at chars like
ÀÈÉÖ
). - Should we restrict the usage of the new
stringsutil.AsciiToLower
for comparisons where we expect to deal with ASCII and keep the standard one to deal with generic inputs? (but I get that it would not really be that beneficial in terms of performance gains) - Otherwise, we should consider it a breaking change, shouldn't we? Running the official ToLower tests with the new function might give more clarity about the behavior's change: https://github.com/golang/go/blob/master/src/strings/strings_test.go#L640-L650.
Thanks!
@M4tteoP I think the main assumption here is that variable keys and collection names will always be ascii. |
Hey @M4tteoP , coraza/internal/strings/strings_test.go Line 107 in dd8793d
As @jcchavezs mentioned, the main assumption here is that variable keys and collection names will always be ascii, and this should not cause coraza to miss any evasions as well. |
Hey @jcchavezs @M4tteoP , Thanks! |
Make sure that you've checked the boxes below before you submit PR:
Thanks for your contribution ❤️