Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Go compiler directive 'noescape' to assembly implementations #1

Merged
merged 1 commit into from
Feb 11, 2019

Conversation

smira
Copy link
Contributor

@smira smira commented Feb 8, 2019

Without this directive Go considers that data []byte escapes from the
function, and forces allocation on the heap even for values which can
perfectly stay on the stack.

This removes extra allocation when value passed in is not on the heap
already.

See also: golang/go#4099

Without this directive Go considers that `data []byte` escapes from the
function, and forces allocation on the heap even for values which can
perfectly stay on the stack.

This removes extra allocation when value passed in is not on the heap
already.

See also: golang/go#4099
@smira
Copy link
Contributor Author

smira commented Feb 11, 2019

These hints allow compiler to avoid allocating argument on the heap, gc build -gcflags '-m -m' shows that:

./murmur128_decl.go:12:13: Sum128 data does not escape
./murmur128_decl.go:21:38: SeedSum128 data does not escape
./murmur128_decl.go:26:19: StringSum128 data does not escape
./murmur128_decl.go:31:44: SeedStringSum128 data does not escape

@twmb
Copy link
Owner

twmb commented Feb 11, 2019

This LGTM, thank you!

(Sorry about the late reply, I didn't get a notification for this for some reason)

@twmb twmb merged commit f46bb17 into twmb:master Feb 11, 2019
@twmb
Copy link
Owner

twmb commented Feb 11, 2019

@smira is there a way to get this to show up in a benchmark?

func BenchmarkNoescape32(b *testing.B) {
        var buf [8192]byte
        for i := 0; i < b.N; i++ {
                Sum32(buf[:])
        }
}

Shows no allocation both before and after for me.

@twmb
Copy link
Owner

twmb commented Feb 11, 2019

Scratch that,

func BenchmarkNoescape32(b *testing.B) {
        for i := 0; i < b.N; i++ {
                var buf [8192]byte
                Sum32(buf[:])
        }
}

shows the difference. I'll commit that benchmark (and the corresponding 128 one) once I'm on a network not blocking port 22. Thanks again!

@twmb
Copy link
Owner

twmb commented Feb 12, 2019

I've pushed the benchmark. I also reran comparisons against spaolacci/murmur3 and ended up removing the 32 bit assembly implementation altogether. The Go compiler has caught up and matches or beats my prior implementation, so there's no reason to leave it around. Lastly, this repo now supports go modules (with a basic go.mod).

Thanks again for the reminder to touch up this repo.

@smira
Copy link
Contributor Author

smira commented Feb 12, 2019

Thanks a lot!

Last thing I had on my mind, but I haven't got to implement that is to have optimized version of 128-bit murmur3 when I have to hash two byte slices concatenated. If I really concatenate them that leads to memory copy, if I don't concatenate them I have to use New128, Write, Sum interface, but that doesn't have assembly version. I haven't done thorough benchmark to verify it, but looks like assembly version should be faster for 128-bit case.

@twmb
Copy link
Owner

twmb commented Feb 12, 2019

So you were thinking something along the lines of...

ManySum128(data ...[]byte)

?
(and the corresponding string / 32 bit functions for parity)

@smira
Copy link
Contributor Author

smira commented Feb 13, 2019

yep, exactly.

I have two byte slices in my case, and I have to do:

h := New128()
h.Write(slice1)
h.Write(slice2)
s := h.Sum128()

This implementation is still Go-based, but slices are somewhat large, several KB - 1MB, I don't know if assembly version would be giving enough performance increase to make it worth the effort. In the README I saw you expect 1.12x speedup for larger byte sequences.

@twmb
Copy link
Owner

twmb commented May 1, 2019

Coming back to this, I considered pursuing this but the use case is a little obscure to add a one off function for it: to be complete with the existing functions would basically require adding 6 or 12 new functions.

@smira
Copy link
Contributor Author

smira commented May 6, 2019

yep, agreed, I think it would be too much work and support coming forward for such usecase. thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants