Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance benchmarks and alternatives to map lookups #19

Closed
mishak87 opened this issue Apr 2, 2024 · 5 comments · Fixed by #20
Closed

Performance benchmarks and alternatives to map lookups #19

mishak87 opened this issue Apr 2, 2024 · 5 comments · Fixed by #20

Comments

@mishak87
Copy link

mishak87 commented Apr 2, 2024

This is my follow up from r/golang Reddit I did some benchmarking.

Each unmarshal benchmark iteration parses all 5 values (4x 4-6 characters long and one <empty>) and one bad value (4 characters long) ie. abcd, bcdef, cdefg, defg, <empty> and zabc.
Similar for each marshal benchmark iteration parses all 5 values.

Map lookup is significantly slower than loop for 5 items. Converting bytes to string and comparing is faster than comparing bytes using bytes.Equal because bytes.Equal converts both arguments to string anyway.

Array index is 30x faster than map lookup. Map lookup in MarshalText() and String() functions are even 40+% slower than UnmarshalText.

func BenchmarkMarshalText(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_, _ = Abcd.MarshalText()
		_, _ = Bcdef.MarshalText()
		_, _ = Cdefg.MarshalText()
		_, _ = Defg.MarshalText()
		_, _ = UnknownType.MarshalText()
	}
}

func BenchmarkUnmarshalText(b *testing.B) {
	x := MarketType{}
	for i := 0; i < b.N; i++ {
		x.UnmarshalText([]byte("abcd"))
		x.UnmarshalText([]byte("bcdef"))
		x.UnmarshalText([]byte("cdefg"))
		x.UnmarshalText([]byte("defg"))
		x.UnmarshalText([]byte(""))
		x.UnmarshalText([]byte("zabc"))
	}
}
cpu: AMD Ryzen 9 7950X3D 16-Core Processor          
BenchmarkMarshalText-32                 19269684            62.31 ns/op        0 B/op          0 allocs/op
BenchmarkMarshalTextLookup-32           502871172            2.200 ns/op           0 B/op          0 allocs/op
BenchmarkMarshalTextLookupString-32     541060243            2.202 ns/op           0 B/op          0 allocs/op
BenchmarkString-32                      20310391            57.20 ns/op        0 B/op          0 allocs/op
BenchmarkStringLookupString-32          439474542            2.646 ns/op           0 B/op          0 allocs/op
BenchmarkStringLookupBytes-32           100000000           10.97 ns/op        0 B/op          0 allocs/op
BenchmarkStringSwitch-32                372506850            3.955 ns/op           0 B/op          0 allocs/op
BenchmarkStringSwitchValue-32           262102232            3.888 ns/op           0 B/op          0 allocs/op
BenchmarkUnmarshalText-32                          	33958742	        35.89 ns/op	       0 B/op	       0 allocs/op
BenchmarkUnmarshalTextLookup-32                    	38068413	        31.15 ns/op	       0 B/op	       0 allocs/op
BenchmarkUnmarshalTextLookupString-32              	52166524	        23.68 ns/op	       0 B/op	       0 allocs/op
BenchmarkUnmarshalTextLookupStringAndConvert-32    	31179402	        38.46 ns/op	       0 B/op	       0 allocs/op
@nikolaydubina
Copy link
Owner

nikolaydubina commented Apr 3, 2024

hey, thanks!

I was able to repro 5x improvement in encoding.

however, I still don't get it how you speed up decoding with arrays. Can you share code/PR/Gist/repo how you do it?

so, far I don't see good way to switch to arrays based encoding/decoding without: 1) requiring users to specify values contigiously and starting from 0; and 2) adding reading numeric value from var/const declaration AST statement. So looks like this is no go.

Feel free to commander or add your PR like this, or leave comments in PR by 2024-04-09.

#20

@mishak87
Copy link
Author

mishak87 commented Apr 3, 2024

I was thinking about how I use enums. And all the time I use them to optimize two things: memory footprint and speed of comparisons.

In my code I needed only uint8 and fast encoding/decoding to database/json and text. So I am thinking about extending stringer into an enumer that adds that functionality and wraps in type Enum struct { v uint8 }.

Source code for stringer is interesting. Author decided arbitrary cutoff of more than 10 contiguous intervals when it uses map else it is comparing integer and using appropriate slice of string per interval.
One string per interval optimization is used to cut down number of compiled strings to one

however, I still don't get it how you speed up decoding with arrays. Can you share code/PR/Gist/repo how you do it?

# pseudo code
var A, B, C = enum{}, enum{1}, enum{2}
var enumStrings = []string{"a", "b", "c"}
var enumAll = []enum{A, B, C}
// stringer uses single string and indexes; performance might be slightly slower due to double lookup
var enumStringerString = "ABC"
var enumStringerIndexes = []int{0, 1, 2, 3}

func (e *enum) UnmarshalText(value []byte) error {
  // TODO optimize len(value) == 0 => set Undefined if enabled
  // TODO fast path len(value) > longest enum value => ErrBadValue
  // TODO optimize skip Undefined value and start from 1 
  // NOTE s := string(value); and for (...) { if allValues[i] == s ... } is slower than type casting on every iteration
  for (i := 0; i < len(allValues); i++) {
    // if enumStringerString[enumStringerIndexes[i], enumStringerIndexes[i+1]] == string(value) {
    if allValues[i] == string(value) {
      *e = enumAll[i]
      return nil
    }
  }
  return ErrBadValue
}

I have not tested how contiguous integers would perform against the map but there will be cutoff where hashing the key and getting it will be faster than N range checks and one for loop for that interval.
EDIT: There is no benefit of checking multiple ranges and would be better to always generate array of all values and array of all strings.

EDIT: I am not sure if it could be made significantly faster than for loop for small number of enum values. Maybe writing tree of if statements or switches checking specific bytes before comparing to a most likely value.

@nikolaydubina
Copy link
Owner

I could not reproduce this. in my benchmarks this solution (loop over array) leads to slightly faster encoding, but slower decoding.

name                    old time/op    new time/op    delta
MarshalText_Color-16      10.3ns ± 0%     7.5ns ± 0%   ~     (p=1.000 n=1+1)
UnmarshalText_Color-16    11.5ns ± 0%    13.8ns ± 0%   ~     (p=1.000 n=1+1)

name                    old alloc/op   new alloc/op   delta
MarshalText_Color-16       0.00B          0.00B        ~     (all equal)
UnmarshalText_Color-16     0.00B          0.00B        ~     (all equal)

name                    old allocs/op  new allocs/op  delta
MarshalText_Color-16        0.00           0.00        ~     (all equal)
UnmarshalText_Color-16      0.00           0.00        ~     (all equal)

from my previous experience dealing with encoding/decoding enums, for large enum sets (256 values) map is significantly better than loop. it literally becomes O(1) vs O(N).

@nikolaydubina
Copy link
Owner

let's keep it map for now. but of course feel free to open PR that beats benchmarks. ideally keep in mind that code should be minimal.

@mishak87
Copy link
Author

mishak87 commented Apr 4, 2024

Just to clarify my point.

I could not reproduce this. in my benchmarks this solution (loop over array) leads to slightly faster encoding, but slower decoding.

The fastest by a 8-9ns margin for 5 valid values without other optimizations is BenchmarkUnmarshalTextLookupString-32. Using array []string not [][]byte and checking in for (x = 0; i < len(allStrings); i++) { if string(value) == allStrings[i] { return allEnums[i] } } .

Using for range compared to for (;;) is slow because it allocates a variable and copies value into it on every iteration.
Internal map hash function can have few loops to compute hash so for small data it makes sense single array lookup on strings (constant type unlike []byte) is faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants