-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve string encoding by following json approach #1350
Conversation
Benchmark results goos: linux goarch: amd64 pkg: go.uber.org/zap/zapcore cpu: AMD EPYC 7B13 │ /tmp/old.txt │ /tmp/new.txt │ │ sec/op │ sec/op vs base │ ZapJSON-8 89.10µ ± 1% 33.38µ ± 3% -62.54% (p=0.000 n=10) StandardJSON-8 40.74µ ± 1% 42.46µ ± 1% +4.22% (p=0.000 n=10) geomean 60.25µ 37.65µ -37.52%
Codecov Report
@@ Coverage Diff @@
## master #1350 +/- ##
=======================================
Coverage 98.40% 98.41%
=======================================
Files 52 52
Lines 3457 3471 +14
=======================================
+ Hits 3402 3416 +14
Misses 46 46
Partials 9 9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
I would prefer to do this without unsafe, and I have some ideas on how we could do that. Let me try something out locally.
buffer/buffer.go
Outdated
func (b *Buffer) AppendByteV(v ...byte) { | ||
b.bs = append(b.bs, v...) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be named AppendBytes
and take a []byte
, not a vararg. (Also, the docstring is inaccurate.)
(FYI, there's also Buffer.Write which does the same, while satisfying io.Writer, but there's no problem with also having this method. However, if we have both, maybe Write should call AppendBytes.)
zapcore/json_encoder.go
Outdated
enc.safeAddByteString(*(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{ | ||
Data: (*reflect.StringHeader)(unsafe.Pointer(&s)).Data, | ||
Len: len(s), | ||
Cap: len(s), | ||
}))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting aside that this is decidedly Not Safe (in a function called safeAddString), starting Go 1.20, the above isn't the best way for an unsafe string to byte slice conversion.
It's better to now do: unsafe.Slice(unsafe.StringData(s), len(s))
.
zapcore/json_encoder.go
Outdated
if s[i] >= 0x20 && s[i] != '\\' && s[i] != '"' { | ||
i++ | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! This right here is the performance win.
Scanning until the next position escaping is needed instead of appending one byte at a time.
I'm in favor of such a change, but I would prefer if we could do this without the unsafe.
@cdvr1993 I just pushed a change to your branch that drops the unsafe/reflect in favor of generics.
I think that may be acceptable since this is still a pretty massive net improvement. WDYT? |
LGTM, on my computer the delta is like 1-2%, so probably you have some noise there. |
This no longer needs to be a separate function.
Adds a fuzz test for the string and []byte versions of safeAppendStringLike that verifies that both variants are able to decode the original string back.
The optimization is basically "instead of appending byte at a time, skip over non-special bytes and append them all together." The original optimization applies only to single-byte runes. This applies the same to multi-byte runes.
Flips the logic a little to be easier to follow. The shape is basically: if mutli byte rune { if no special handling { skip continue } special handling } else { if no special handling { skip continue } special handling } This makes the logic much more obvious while retaining performance.
@cdvr1993 I realized that the same idea (skip over characters that don't need special handling) could be used for the multi-byte runes as well. That yields a small improvement too. I've pushed that and a small readability fix on top. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tangential: why did we never escape \f as well?
I don't know what the original reasoning for that choice is, but it's definitely handled: zap/zapcore/json_encoder_impl_test.go Lines 73 to 76 in 82c728b
We could change that and still be okay. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @cdvr1993! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retrospective LGTM!
I measured this on a pretty beefy machine but here's the results:
name old time/op new time/op delta
ZapJSON-96 57.1µs ±25% 27.8µs ± 2% -51.34% (p=0.000 n=10+8)
@@ -42,6 +42,11 @@ func (b *Buffer) AppendByte(v byte) { | |||
b.bs = append(b.bs, v) | |||
} | |||
|
|||
// AppendBytes writes a single byte to the Buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just came here from the 1.26.0 release notes. Great job! This comment, though, doesn't seem to be right in that the method writes all bytes to the buffer. /cc @abhinav
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yes, you're right. I missed this. Thanks!
Recently we found an application were using zap.Reflect was faster than using zapcore.ObjectMarshaler. After profiling we found that string encoding is really expensive. I replicated what encoding/json does and was able to get greater performance than Reflect.
I had to modify the benchmark to have a greater number of strings.
Benchmark results