Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move benchmark-unrelated code out of the hot path #18

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rockdaboot
Copy link
Contributor

@rockdaboot rockdaboot commented Dec 21, 2023

As the title says. It should also obsolete #16.
Every string instantiation in key(i) caused a heap allocation, which tainted the benchmarks.

[UPDATE]
Meanwhile I added some more commits to move all code out of the hot path that is not directly related to benchmarking set/get. E.g. creation of keys and values including random number generation should not add up to the measurements of the cache API (except for ser/deser code that some cache APIs require while others don't require it or do it internal).

Before

$ go version
go version go1.21.4 linux/amd64
goos: linux
goarch: amd64
pkg: github.com/allegro/bigcache-bench
cpu: 12th Gen Intel(R) Core(TM) i7-12800H
BenchmarkMapSetForStruct-20                         4574           1050507 ns/op          697285 B/op      19746 allocs/op
BenchmarkSyncMapSetForStruct-20                     1512           3062503 ns/op         1828855 B/op      69662 allocs/op
BenchmarkOracamanMapSetForStruct-20                 3039           1593436 ns/op         1216733 B/op      20144 allocs/op
BenchmarkFreeCacheSetForStruct-20                   1172           4117920 ns/op         6987192 B/op      40287 allocs/op
BenchmarkBigCacheSetForStruct-20                    1362           3640569 ns/op         3771568 B/op      42459 allocs/op
BenchmarkMapSetForBytes-20                          2886           1722816 ns/op         2097688 B/op      29749 allocs/op
BenchmarkSyncMapSetForBytes-20                      1399           3438202 ns/op         3112061 B/op      79919 allocs/op
BenchmarkOracamanMapSetForBytes-20                  2138           2121852 ns/op         2943254 B/op      30175 allocs/op
BenchmarkFreeCacheSetForBytes-20                    1424           3446344 ns/op         7867129 B/op      30537 allocs/op
BenchmarkBigCacheSetForBytes-20                     1602           3034032 ns/op         4651572 B/op      32713 allocs/op
BenchmarkMapGetForStruct-20                     38352931               111.5 ns/op            23 B/op          1 allocs/op
BenchmarkSyncMapGetForStruct-20                 35730880               133.3 ns/op            23 B/op          1 allocs/op
BenchmarkOracamanMapGetForStruct-20             36561675               130.1 ns/op            23 B/op          1 allocs/op
BenchmarkFreeCacheGetForStruct-20                9007900               525.5 ns/op           271 B/op          8 allocs/op
BenchmarkBigCacheGetForStruct-20                 9274410               515.5 ns/op           287 B/op          9 allocs/op
BenchmarkMapGetForBytes-20                      42340047               111.4 ns/op            23 B/op          1 allocs/op
BenchmarkSyncMapGetForBytes-20                  35446474               135.0 ns/op            23 B/op          1 allocs/op
BenchmarkOracamanMapGetForBytes-20              34858657               135.3 ns/op            23 B/op          1 allocs/op
BenchmarkFreeCacheGetForBytes-20                22743499               207.3 ns/op           135 B/op          2 allocs/op
BenchmarkBigCacheGetForBytes-20                 25927096               186.1 ns/op           151 B/op          3 allocs/op
BenchmarkSyncMapSetParallelForStruct-20          8432505               687.5 ns/op            72 B/op          5 allocs/op
BenchmarkOracamanMapSetParallelForStruct-20     113340970               42.98 ns/op           30 B/op          2 allocs/op
BenchmarkFreeCacheSetParallelForStruct-20       84426818                54.65 ns/op           53 B/op          4 allocs/op
BenchmarkBigCacheSetParallelForStruct-20        62022183                74.63 ns/op          222 B/op          4 allocs/op
BenchmarkSyncMapSetParallelForBytes-20           7928991               647.5 ns/op           200 B/op          6 allocs/op
BenchmarkOracamanMapSetParallelForBytes-20      88186278                55.89 ns/op          143 B/op          3 allocs/op
BenchmarkFreeCacheSetParallelForBytes-20        84665589                52.74 ns/op          141 B/op          3 allocs/op
BenchmarkBigCacheSetParallelForBytes-20         57134434                90.22 ns/op          506 B/op          3 allocs/op
BenchmarkSyncMapGetParallelForStruct-20         189450904               25.36 ns/op           23 B/op          1 allocs/op
BenchmarkOracamanMapGetParallelForStruct-20     186230124               26.05 ns/op           23 B/op          1 allocs/op
BenchmarkFreeCacheGetParallelForStruct-20       45472926                99.74 ns/op          271 B/op          8 allocs/op
BenchmarkBigCacheGetParallelForStruct-20        43793809               109.5 ns/op           288 B/op          9 allocs/op
BenchmarkSyncMapGetParallelForBytes-20          206990751               23.17 ns/op           23 B/op          1 allocs/op
BenchmarkOracamanMapGetParallelForBytes-20      190093576               25.74 ns/op           23 B/op          1 allocs/op
BenchmarkFreeCacheGetParallelForBytes-20        91603290                48.53 ns/op          135 B/op          2 allocs/op
BenchmarkBigCacheGetParallelForBytes-20         90818000                47.96 ns/op          152 B/op          3 allocs/op
PASS
ok      github.com/allegro/bigcache-bench       194.972s

After

$ go version
go version go1.21.4 linux/amd64
$ go test -bench=. -benchmem -benchtime=4s ./... -timeout 30m
goos: linux
goarch: amd64
pkg: github.com/allegro/bigcache-bench
cpu: 12th Gen Intel(R) Core(TM) i7-12800H
BenchmarkMapSetForStruct-20                     738278595                6.465 ns/op           0 B/op          0 allocs/op
BenchmarkSyncMapSetForStruct-20                 73809016                68.92 ns/op           40 B/op          3 allocs/op
BenchmarkOracamanMapSetForStruct-20             179029155               27.51 ns/op            0 B/op          0 allocs/op
BenchmarkFreeCacheSetForStruct-20               58267056                71.22 ns/op            8 B/op          1 allocs/op
BenchmarkBigCacheSetForStruct-20                43320409               100.3 ns/op           128 B/op          1 allocs/op
BenchmarkMapSetForBytes-20                      145805667               32.41 ns/op          112 B/op          1 allocs/op
BenchmarkSyncMapSetForBytes-20                  41185538               112.0 ns/op           168 B/op          4 allocs/op
BenchmarkOracamanMapSetForBytes-20              73706602                61.97 ns/op          112 B/op          1 allocs/op
BenchmarkFreeCacheSetForBytes-20                45121453               100.9 ns/op           112 B/op          1 allocs/op
BenchmarkBigCacheSetForBytes-20                 29798607               143.3 ns/op           463 B/op          1 allocs/op
BenchmarkMapGetForStruct-20                     917468594                5.063 ns/op           0 B/op          0 allocs/op
BenchmarkSyncMapGetForStruct-20                 350010081               12.55 ns/op            0 B/op          0 allocs/op
BenchmarkOracamanMapGetForStruct-20             229678330               20.87 ns/op            0 B/op          0 allocs/op
BenchmarkFreeCacheGetForStruct-20               51396169                81.19 ns/op           32 B/op          2 allocs/op
BenchmarkBigCacheGetForStruct-20                81617512                58.30 ns/op           32 B/op          2 allocs/op
BenchmarkMapGetForBytes-20                      816561459                5.388 ns/op           0 B/op          0 allocs/op
BenchmarkSyncMapGetForBytes-20                  387219021               14.09 ns/op            0 B/op          0 allocs/op
BenchmarkOracamanMapGetForBytes-20              239492982               20.39 ns/op            0 B/op          0 allocs/op
BenchmarkFreeCacheGetForBytes-20                46738532               111.6 ns/op           136 B/op          2 allocs/op
BenchmarkBigCacheGetForBytes-20                 63684832                81.63 ns/op          136 B/op          2 allocs/op
BenchmarkSyncMapSetParallelForStruct-20         18331375               274.9 ns/op            41 B/op          2 allocs/op
BenchmarkOracamanMapSetParallelForStruct-20     169913868               27.97 ns/op            0 B/op          0 allocs/op
BenchmarkFreeCacheSetParallelForStruct-20       144847314               31.37 ns/op            8 B/op          1 allocs/op
BenchmarkBigCacheSetParallelForStruct-20        96001162                56.98 ns/op          192 B/op          1 allocs/op
BenchmarkSyncMapSetParallelForBytes-20          13546615               351.2 ns/op           170 B/op          4 allocs/op
BenchmarkOracamanMapSetParallelForBytes-20      121414638               37.81 ns/op          112 B/op          1 allocs/op
BenchmarkFreeCacheSetParallelForBytes-20        131029023               36.91 ns/op          112 B/op          1 allocs/op
BenchmarkBigCacheSetParallelForBytes-20         68330320                75.51 ns/op          482 B/op          1 allocs/op
BenchmarkSyncMapGetParallelForStruct-20         1000000000               3.839 ns/op           0 B/op          0 allocs/op
BenchmarkOracamanMapGetParallelForStruct-20     506150209                9.661 ns/op           0 B/op          0 allocs/op
BenchmarkFreeCacheGetParallelForStruct-20       222971008               24.01 ns/op           32 B/op          2 allocs/op
BenchmarkBigCacheGetParallelForStruct-20        348548276               14.01 ns/op           32 B/op          2 allocs/op
BenchmarkSyncMapGetParallelForBytes-20          1000000000               3.824 ns/op           0 B/op          0 allocs/op
BenchmarkOracamanMapGetParallelForBytes-20      495279145                9.952 ns/op           0 B/op          0 allocs/op
BenchmarkFreeCacheGetParallelForBytes-20        172124143               28.57 ns/op          136 B/op          2 allocs/op
BenchmarkBigCacheGetParallelForBytes-20         203385400               24.08 ns/op          136 B/op          2 allocs/op
PASS
ok      github.com/allegro/bigcache-bench       216.286s

@rockdaboot rockdaboot changed the title Create/allocate keys outside the hot path Move benchmark-unrelated code out of the hot path Dec 23, 2023
@@ -17,13 +19,13 @@ const maxEntrySize = 256
const maxEntryCount = 10000

type myStruct struct {
Id int `json:"id"`
Id int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change to ID please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP. I'll do when back at my laptop.

m := make(map[string]T, maxEntryCount)
for n := 0; n < maxEntryCount; n++ {
m[key(n)] = cs.Get(n)
if id >= maxEntryCount {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id = (id + 1) % maxEntryCount ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know that modulo is an IDIV operation which is one of the slowest CPU instructions on x86 (and other architectures as well)!? It can take up to 100 CPU cycles or so, while regular instructions can be down to 0.2 cycles. I try to avoid % and / in tight loops if possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know but I bet that this op will dominate in the benchmark. + main idea of any benchmark is to be fair, if every benchmark func will do modulo this will not change benchmark result (raw numbers might be different but the ratio will remain the same 🤷 )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever, feel free to leave as it is, I just proposed a shorter version 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants