-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
util: avoid allocations when escaping multibyte characters #88671
util: avoid allocations when escaping multibyte characters #88671
Conversation
buf.WriteString(`\u0000`) | ||
} else { | ||
buf.WriteString(`\U00000000`) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if simply having
var scratch [8]byte
writing to that, and just buf.Append(scratch[:ln]) might be better/faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be surprised if it was faster, but I'll try that out later. My version writes to the same place twice, but yours writes the same bytes twice to different places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not be faster... But... Wirting directly to internal Bytes() buffer -- sometimes feels a bit magical.
Scratch I think is clearer. And fwiw, it's used a lot in Appent* functions;
} else { | ||
buf.WriteString(`\U00000000`) | ||
} | ||
for i := 1; r > 0; r /= 16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to do something like hex.Encode does vs mod/div.
dst[j] = hextable[v>>4]
dst[j+1] = hextable[v&0x0f]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really hope the compiler just does this optimization on its own but I guess it can't hurt to just write it that way, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB, @miretskiy, and @yuzefovich)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)
Does seem like using bitwise operations helps, at least on my machine! Using scratch space instead of writing directly to the buffer costs significantly, though, do you mind if I leave that part out? It is part of the contract of buffer.Bytes() that modifications will show up in the next read call. |
EncodeEscapedChar (which is called in EncodeSQLStringWithFlags) is pretty optimized, but for escaping a multibyte character it was using fmt.FPrintf, which means every multibyte character ended up on the heap due to golang/go#8618. This had a noticeable impact in changefeed benchmarking. This commit just hand-compiles the two formatting strings that were being used into reasonably efficient go, eliminating the allocs. Benchmark encoding the first 10000 runes shows a 4x speedup: Before: BenchmarkEncodeNonASCIISQLString-16 944 1216130 ns/op After: BenchmarkEncodeNonASCIISQLString-16 3468 300777 ns/op Release note: None
b81cba0
to
9ce721d
Compare
bors r=[miretskiy, yuzefovich] |
Build failed (retrying...): |
Build succeeded: |
Interest / value in a v22.2 backport here @HonoreDB? |
blathers backport 22.1 22.2 |
EncodeEscapedChar (which is called in EncodeSQLStringWithFlags) is pretty optimized, but for escaping a multibyte character it was using fmt.FPrintf, which means every multibyte character ended up on the heap due to golang/go#8618. This had a noticeable impact in changefeed benchmarking.
This commit just hand-compiles the two formatting strings that were being used into reasonably efficient go, eliminating the allocs.
Benchmark encoding the first 10000 runes shows a 4x speedup:
Before: BenchmarkEncodeNonASCIISQLString-16 944 1216130 ns/op
After: BenchmarkEncodeNonASCIISQLString-16 3468 300777 ns/op
Release note: None