Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: avoid allocations when escaping multibyte characters #88671

Merged
merged 1 commit into from
Sep 30, 2022

Conversation

HonoreDB
Copy link
Contributor

EncodeEscapedChar (which is called in EncodeSQLStringWithFlags) is pretty optimized, but for escaping a multibyte character it was using fmt.FPrintf, which means every multibyte character ended up on the heap due to golang/go#8618. This had a noticeable impact in changefeed benchmarking.

This commit just hand-compiles the two formatting strings that were being used into reasonably efficient go, eliminating the allocs.

Benchmark encoding the first 10000 runes shows a 4x speedup:

Before: BenchmarkEncodeNonASCIISQLString-16 944 1216130 ns/op
After: BenchmarkEncodeNonASCIISQLString-16 3468 300777 ns/op

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

buf.WriteString(`\u0000`)
} else {
buf.WriteString(`\U00000000`)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if simply having

var scratch [8]byte

writing to that, and just buf.Append(scratch[:ln]) might be better/faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be surprised if it was faster, but I'll try that out later. My version writes to the same place twice, but yours writes the same bytes twice to different places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be faster... But... Wirting directly to internal Bytes() buffer -- sometimes feels a bit magical.
Scratch I think is clearer. And fwiw, it's used a lot in Appent* functions;

} else {
buf.WriteString(`\U00000000`)
}
for i := 1; r > 0; r /= 16 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to do something like hex.Encode does vs mod/div.

dst[j] = hextable[v>>4]
dst[j+1] = hextable[v&0x0f]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hope the compiler just does this optimization on its own but I guess it can't hurt to just write it that way, thanks.

@miretskiy miretskiy self-requested a review September 26, 2022 22:52
Copy link
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB, @miretskiy, and @yuzefovich)

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)

@HonoreDB
Copy link
Contributor Author

name \ time/op              orig_pr     with_bit_twiddling  with_scratch_and_bit_twiddling
EncodeNonASCIISQLString-16  302µs ± 2%          293µs ± 1%                      373µs ± 3%

Does seem like using bitwise operations helps, at least on my machine! Using scratch space instead of writing directly to the buffer costs significantly, though, do you mind if I leave that part out? It is part of the contract of buffer.Bytes() that modifications will show up in the next read call.

EncodeEscapedChar (which is called in EncodeSQLStringWithFlags)
is pretty optimized, but for escaping a multibyte character it
was using fmt.FPrintf, which means every multibyte character
ended up on the heap due to golang/go#8618.
This had a noticeable impact in changefeed benchmarking.

This commit just hand-compiles the two formatting strings that
were being used into reasonably efficient go, eliminating the allocs.

Benchmark encoding the first 10000 runes shows a 4x speedup:

Before: BenchmarkEncodeNonASCIISQLString-16    	     944	   1216130 ns/op
After: BenchmarkEncodeNonASCIISQLString-16    	    3468	    300777 ns/op

Release note: None
@HonoreDB HonoreDB force-pushed the avoid_allocs_in_escape_multibyte branch from b81cba0 to 9ce721d Compare September 27, 2022 19:17
@HonoreDB
Copy link
Contributor Author

bors r=[miretskiy, yuzefovich]

@craig
Copy link
Contributor

craig bot commented Sep 30, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Sep 30, 2022

Build succeeded:

@craig craig bot merged commit 2b58861 into cockroachdb:master Sep 30, 2022
@shermanCRL
Copy link
Contributor

Interest / value in a v22.2 backport here @HonoreDB?

@HonoreDB
Copy link
Contributor Author

blathers backport 22.1 22.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants