-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zap.Any: Reduce stack size with generics #1310
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1310 +/- ##
=======================================
Coverage 98.08% 98.09%
=======================================
Files 50 50
Lines 3242 3249 +7
=======================================
+ Hits 3180 3187 +7
Misses 53 53
Partials 9 9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. I did wonder if generics could help here, but this is clearly beyond me.
For the fun of it, I'll try running this later on arm and amd to compare against previous numbers. I'll also try running the current head against gotip just to verify that this didn't change and that go1.22 doesn't change anything.
After Prashant suggestion,I also planned pulling the benchmarks into a separate PR and adding some other cases. Stay tuned.
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ```
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - -a "create and long in goroutine in a pre-warmed system" that does the above- we decided it's not worth the complication. The test results are: ``` ❯ go test -bench BenchmarkAny -benchmem -cpu 1 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string/field-only/typed 161981473 7.374 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 82343354 14.67 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 2965648 416.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 2920292 418.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 1000000 1158 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 553144 2152 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 160509367 7.548 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 51330402 23.45 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 3221404 377.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 2726443 393.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 1000000 1129 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 558602 2147 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 19.426s ``` On gotip: ``` ❯ gotip test -bench BenchmarkAny -benchmem -cpu 1 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string/field-only/typed 155084869 7.603 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 82740788 14.55 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 2800495 411.6 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 2896258 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 1000000 1155 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 551599 2168 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 159505488 7.578 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 51406354 23.78 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 3011210 388.6 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 3010370 395.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 1000000 1161 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 553860 2161 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 19.391s ``` on amd64 (similar, 2x worse stack growth impact) ``` % go test -bench BenchmarkAny -benchmem -cpu 1 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/string/field-only/typed 47534053 25.23 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 36913526 32.57 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 1693508 725.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 1576172 765.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 516832 2343 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 243692 4404 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 48735537 24.73 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 26115684 47.24 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 1761630 677.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 1646913 705.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 534187 2275 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 273787 4348 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 18.890s ```
I ran the new benchmarks:
Which (1) seems comparable with all the other PRs (2) shows improvement vs #1311 (apart from the field-only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved, but I don't think I can rebase this for you, can I?
Rebased! |
@@ -410,6 +410,14 @@ func Inline(val zapcore.ObjectMarshaler) Field { | |||
} | |||
} | |||
|
|||
type anyFieldC[T any] func(string, T) Field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment about why we're doing all of this
ideally we'd have an upstream issue linked which describes the compiler issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more details. I don't think there's an upstream compiler issue yet. I believe there are plans to write one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue now exists, adding a link in #1387
Yet another attempt at reducing the stack size of zap.Any, borrowing from uber-go#1301, uber-go#1303, uber-go#1304, uber-go#1305, uber-go#1307, and 1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in uber-go#1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } Stack size: The stack size of zap.Any following this change is 0xc0. % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than uber-go#1305, which is the smallest stack size of all other attempts. Allocations: Everything appears to get inlined with no heap escapes: go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value Maintainability: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. Performance: This is a net improvement on all BenchmarkAny calls except "any-no-logger" which calls `zap.Any` and discards the result. ``` name old time/op new time/op delta Any/str-no-logger-2 8.77ns ± 0% 8.75ns ± 1% ~ (p=0.159 n=4+5) Any/any-no-logger-2 54.1ns ± 0% 81.6ns ± 0% +50.71% (p=0.016 n=5+4) Any/str-with-logger-2 1.38µs ± 3% 1.38µs ± 4% ~ (p=0.841 n=5+5) Any/any-with-logger-2 1.60µs ±22% 1.37µs ± 1% ~ (p=0.151 n=5+5) Any/str-in-go-2 3.41µs ± 1% 3.42µs ± 5% ~ (p=0.905 n=4+5) Any/any-in-go-2 5.98µs ± 1% 3.68µs ± 6% -38.44% (p=0.008 n=5+5) Any/str-in-go-with-stack-2 3.42µs ± 2% 3.46µs ± 3% ~ (p=0.421 n=5+5) Any/any-in-go-with-stack-2 5.98µs ± 3% 3.65µs ± 3% -38.95% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/str-no-logger-2 0.00B 0.00B ~ (all equal) Any/any-no-logger-2 0.00B 0.00B ~ (all equal) Any/str-with-logger-2 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/any-with-logger-2 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/str-in-go-2 88.5B ± 1% 88.0B ± 0% ~ (p=0.429 n=4+4) Any/any-in-go-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) Any/str-in-go-with-stack-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) Any/any-in-go-with-stack-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/str-no-logger-2 0.00 0.00 ~ (all equal) Any/any-no-logger-2 0.00 0.00 ~ (all equal) Any/str-with-logger-2 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/any-with-logger-2 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/str-in-go-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/any-in-go-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/str-in-go-with-stack-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/any-in-go-with-stack-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) ``` I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stack size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ```
Yet another attempt at reducing the stack size of zap.Any, borrowing from #1301, #1303, #1304, #1305, #1307, and #1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in #1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } **Stack size**: The stack size of zap.Any following this change is 0xc0 (192 bytes). % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than #1305, which is the smallest stack size of all other attempts. **Allocations**: Everything appears to get inlined with no heap escapes: % go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' [no output] (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value **Maintainability**: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. **Performance**: This is a net improvement against master on BenchmarkAny's log-go checks that log inside a new goroutine. ``` name old time/op new time/op delta Any/string/field-only/typed 25.2ns ± 1% 25.6ns ± 2% ~ (p=0.460 n=5+5) Any/string/field-only/any 56.9ns ± 3% 79.4ns ± 0% +39.55% (p=0.008 n=5+5) Any/string/log/typed 1.47µs ± 0% 1.49µs ± 4% +1.58% (p=0.016 n=4+5) Any/string/log/any 1.53µs ± 2% 1.55µs ± 1% +1.37% (p=0.016 n=5+5) Any/string/log-go/typed 5.97µs ± 6% 5.99µs ± 1% ~ (p=0.151 n=5+5) Any/string/log-go/any 10.9µs ± 0% 6.2µs ± 0% -43.32% (p=0.008 n=5+5) Any/stringer/field-only/typed 25.3ns ± 1% 25.5ns ± 1% +1.09% (p=0.008 n=5+5) Any/stringer/field-only/any 85.5ns ± 1% 124.5ns ± 0% +45.66% (p=0.008 n=5+5) Any/stringer/log/typed 1.43µs ± 1% 1.42µs ± 2% ~ (p=0.175 n=4+5) Any/stringer/log/any 1.50µs ± 1% 1.56µs ± 6% +4.20% (p=0.008 n=5+5) Any/stringer/log-go/typed 5.94µs ± 0% 5.92µs ± 0% -0.40% (p=0.032 n=5+5) Any/stringer/log-go/any 11.1µs ± 2% 6.3µs ± 0% -42.93% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/string/field-only/typed 0.00B 0.00B ~ (all equal) Any/string/field-only/any 0.00B 0.00B ~ (all equal) Any/string/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/string/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00B 0.00B ~ (all equal) Any/stringer/field-only/any 0.00B 0.00B ~ (all equal) Any/stringer/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/stringer/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/string/field-only/typed 0.00 0.00 ~ (all equal) Any/string/field-only/any 0.00 0.00 ~ (all equal) Any/string/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/string/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00 0.00 ~ (all equal) Any/stringer/field-only/any 0.00 0.00 ~ (all equal) Any/stringer/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` It causes a regression in "field-only" which calls the field constructor and discards the result without using it in a logger. I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ``` reviews rename review farewell, my friend dummy
Yet another attempt at reducing the stack size of zap.Any, borrowing from #1301, #1303, #1304, #1305, #1307, and #1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in #1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } **Stack size**: The stack size of zap.Any following this change is 0xc0 (192 bytes). % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than #1305, which is the smallest stack size of all other attempts. **Allocations**: Everything appears to get inlined with no heap escapes: % go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' [no output] (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value **Maintainability**: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. **Performance**: This is a net improvement against master on BenchmarkAny's log-go checks that log inside a new goroutine. ``` name old time/op new time/op delta Any/string/field-only/typed 25.2ns ± 1% 25.6ns ± 2% ~ (p=0.460 n=5+5) Any/string/field-only/any 56.9ns ± 3% 79.4ns ± 0% +39.55% (p=0.008 n=5+5) Any/string/log/typed 1.47µs ± 0% 1.49µs ± 4% +1.58% (p=0.016 n=4+5) Any/string/log/any 1.53µs ± 2% 1.55µs ± 1% +1.37% (p=0.016 n=5+5) Any/string/log-go/typed 5.97µs ± 6% 5.99µs ± 1% ~ (p=0.151 n=5+5) Any/string/log-go/any 10.9µs ± 0% 6.2µs ± 0% -43.32% (p=0.008 n=5+5) Any/stringer/field-only/typed 25.3ns ± 1% 25.5ns ± 1% +1.09% (p=0.008 n=5+5) Any/stringer/field-only/any 85.5ns ± 1% 124.5ns ± 0% +45.66% (p=0.008 n=5+5) Any/stringer/log/typed 1.43µs ± 1% 1.42µs ± 2% ~ (p=0.175 n=4+5) Any/stringer/log/any 1.50µs ± 1% 1.56µs ± 6% +4.20% (p=0.008 n=5+5) Any/stringer/log-go/typed 5.94µs ± 0% 5.92µs ± 0% -0.40% (p=0.032 n=5+5) Any/stringer/log-go/any 11.1µs ± 2% 6.3µs ± 0% -42.93% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/string/field-only/typed 0.00B 0.00B ~ (all equal) Any/string/field-only/any 0.00B 0.00B ~ (all equal) Any/string/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/string/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00B 0.00B ~ (all equal) Any/stringer/field-only/any 0.00B 0.00B ~ (all equal) Any/stringer/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/stringer/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/string/field-only/typed 0.00 0.00 ~ (all equal) Any/string/field-only/any 0.00 0.00 ~ (all equal) Any/string/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/string/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00 0.00 ~ (all equal) Any/stringer/field-only/any 0.00 0.00 ~ (all equal) Any/stringer/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` It causes a regression in "field-only" which calls the field constructor and discards the result without using it in a logger. I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
@abhinav do we want to revert this with
I kinda like your code, though. After the initial 5 minutes of wth, I appreciate the elegance of it, every time. Edit: pre your change.
after the fix:
So your fix is still 110 bytes smaller than it would be if we reverted. |
I think it makes sense to keep it as-is until Go 1.24 is released, since Go 1.22 is still supported by Zap. |
Yet another attempt at reducing the stack size of zap.Any,
borrowing from #1301, #1303, #1304, #1305, #1307, and #1308.
This approach defines a generic data type for field constructors
of a specific type. This is similar to the lookup map in #1307,
minus the map lookup, the interface match, or reflection.
The generic data type provides a non-generic method
matching the interface:
Stack size:
The stack size of zap.Any following this change is 0xc0 (192 bytes).
This is just 8 bytes more than #1305,
which is the smallest stack size of all other attempts.
Allocations:
Everything appears to get inlined with no heap escapes:
(Line 413 declares anyFieldC)
Besides that, the output of
-m
for the relevant section of codeconsists of almost entirely:
Followed by:
Maintainability:
Unlike some of the other approaches, this variant is more maintainable.
The
zap.Any
function looks roughly the same.Adding new branches there is obvious, and requires no duplication.
Performance:
This is a net improvement against master on BenchmarkAny's
log-go checks that log inside a new goroutine.
It causes a regression in "field-only"
which calls the field constructor and discards the result
without using it in a logger.
I believe this is acceptable because that's not a real use case;
we expect the result to be used with a logger.