-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/mysql: performance optimizations in protocol encoding #16341
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16341 +/- ##
==========================================
- Coverage 68.72% 68.71% -0.02%
==========================================
Files 1547 1547
Lines 198267 198317 +50
==========================================
+ Hits 136264 136271 +7
- Misses 62003 62046 +43 ☔ View full report in Codecov by Sentry. |
c23d96b
to
c549579
Compare
You'll need to update the commits and force push to ensure the DCO sign off. |
@mattrobenolt Is clearing this way faster than using the following? func writeZeroes(data []byte, pos int, len int) int {
data = data[pos : pos+len]
for i := range data {
data[i] = 0
}
return pos + len
} The Go compiler recognizes that pattern and optimizes it into a |
@dbussink lol so
|
This employs a couple tricks that combined seemed fruitful: * Swapping to binary.LittleEndian.Put* on the basic calls gets us a free boost while removing code. The main win from this swap is the slice boundary check, resulting in a massive boost. I kept it inlined, but added my own boundary checking in `writeLenEncInt` since swapping it out here resulted in a very minor performance regression from the current results. I assume from the extra coersion needed to the uint* type, and another reslice. * Reslicing the byte slice early so all future operations work on 0-index rather than pos+ indexing. This seemed to be a pretty sizeable win without needing to do more addition on every operation later to determine the index, they get swapped out for constants. * Read path employs the same early reslicing, but already has explicit bounds checks. * Rewrite `writeZeroes` to utilize the Go memclr optimization. ``` $ benchstat {old,new}.txt goos: darwin goarch: arm64 pkg: vitess.io/vitess/go/mysql │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ EncWriteInt/16-bit-10 0.4685n ± 0% 0.3516n ± 0% -24.94% (p=0.000 n=10) EncWriteInt/16-bit-lenencoded-10 2.049n ± 0% 2.049n ± 0% ~ (p=0.972 n=10) EncWriteInt/24-bit-lenencoded-10 1.987n ± 0% 2.056n ± 0% +3.45% (p=0.000 n=10) EncWriteInt/32-bit-10 0.7819n ± 0% 0.3906n ± 0% -50.05% (p=0.000 n=10) EncWriteInt/64-bit-10 1.4080n ± 0% 0.4684n ± 0% -66.73% (p=0.000 n=10) EncWriteInt/64-bit-lenencoded-10 3.126n ± 0% 2.051n ± 0% -34.40% (p=0.000 n=10) EncWriteZeroes/4-bytes-10 2.5030n ± 0% 0.3123n ± 0% -87.52% (p=0.000 n=10) EncWriteZeroes/10-bytes-10 4.3815n ± 0% 0.3120n ± 0% -92.88% (p=0.000 n=10) EncWriteZeroes/23-bytes-10 8.4575n ± 0% 0.3124n ± 0% -96.31% (p=0.000 n=10) EncWriteZeroes/55-bytes-10 20.8750n ± 10% 0.6245n ± 0% -97.01% EncReadInt/16-bit-10 2.050n ± 0% 2.068n ± 1% +0.90% (p=0.001 n=10) EncReadInt/24-bit-10 2.034n ± 0% 2.050n ± 0% +0.76% (p=0.000 n=10) EncReadInt/64-bit-10 2.819n ± 1% 2.187n ± 0% -22.41% (p=0.000 n=10) geomean 2.500n 0.8363n -66.55% ``` Signed-off-by: Matt Robenolt <[email protected]>
c549579
to
2a6a739
Compare
@dbussink updated to the memclr optimization as well as benchmarks in the PR description. |
Nice work! |
This employs a couple tricks that combined seemed fruitful:
boost while removing code. The main win from this swap is the slice
boundary check, resulting in a massive boost. I kept it inlined, but
added my own boundary checking in
writeLenEncInt
since swapping itout here resulted in a very minor performance regression from the
current results. I assume from the extra coersion needed to the uint*
type, and another reslice.
0-index rather than pos+ indexing. This seemed to be a pretty sizeable
win without needing to do more addition on every operation later to
determine the index, they get swapped out for constants.
bounds checks.
writeZeroes
to utilize the Go memclr optimization.Related issue:
#16789
Checklist