Nerdsniped - GC2023 #2

dancantos · 2023-11-10T02:15:38Z

Attempting an improvement on the (use of the) Trigram function.

We do this by moving the slice allocation outside the trigram function. Trigrams is only used by Tokenize, and that function can actually compute the result slice size while cleaning up the input. The trigram function now takes the result slice and a location to start populating the output.

Since this inherently changes the function signature, we unexport it, and export the Trigram method as a wrapper for this unexported function that implements the old function signature. Since the exported method must behave as per the old code, it still needs to do its own memory alloc, and as a result does not see much improvement. Most of the improvement here is seen in the Tokenize method.

Benchmarks on Tokenize (on my machine) show ~1.5x improvement

# NEW
goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTokenize-10    	  139582	      7814 ns/op	    8760 B/op	     243 allocs/op
PASS
ok  	indexer	1.287s

# OLD
goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTokenize-10    	  101376	     11273 ns/op	   21608 B/op	     312 allocs/op
PASS
ok  	indexer	1.369s

UPDATE: credit to @Merovius for this idea
Second optimization: remove the alloc for a []rune slice in the trigram function by walking the string with 3 variables to keep track of a start, middle, and end index, using the utf8 package DecodeRune function to find an appropriate increment each iteration without the need to allocate the rune slice.

This brings the final time down to ~2.6us per call to Tokenize, a ~4.2x improvment

goos: darwin
goarch: arm64
pkg: indexer
BenchmarkTokenize-10    	  458174	      2656 ns/op	    6848 B/op	       4 allocs/op
PASS
ok  	indexer	2.477s

improvments inherently involve a change in function signature to allow the calling function to pass in the result slice and start location as input parameters. This allows the caller to do all the memory allocation. This helps us as the Tokenize method can pre-compute the final slice size before doing a single large allocation. The new function is unexported, while the main Trigram method remains exported to preserve the func signature. However, since this method must still allocate its own memory, there is no significant improvement here. What IS improved greater is the way Tokenize uses the unexported trigram function. There is also a removal of an unecessary if branch.

optimization first.

boyter · 2023-11-12T21:42:12Z

Sweet. Im going to do a benchmark of all of the PR's against my best effort and turn it into a blog post, and then merge the winner.

dancantos added 4 commits November 10, 2023 13:02

add correctness tests

f327698

add benchmark test

d94e21a

remove final memory alloc. credit to Merovius for finding this

d0c4d44

optimization first.

jamesrom mentioned this pull request Nov 10, 2023

[GC2023] Simple fix for allocation overhead #3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nerdsniped - GC2023 #2

Nerdsniped - GC2023 #2

dancantos commented Nov 10, 2023 •

edited

Loading

boyter commented Nov 12, 2023

Nerdsniped - GC2023 #2

Are you sure you want to change the base?

Nerdsniped - GC2023 #2

Conversation

dancantos commented Nov 10, 2023 • edited Loading

boyter commented Nov 12, 2023

dancantos commented Nov 10, 2023 •

edited

Loading