Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a more performant Hyperloglog. #190

Merged
merged 3 commits into from
Oct 5, 2017

Conversation

martinpinto
Copy link
Contributor

Summary

Changed the current implementation of the Hyperloglog by Clark Duvall to the Axiom one.

Motivation

Based on the intent on https://news.ycombinator.com/item?id=14636699 I took a stab at using the axiomhq implementation to reduce resource usage.

Test plan

I adapted and used the existing samplers_test.go test case.

Rollout/monitoring/revert plan

A rollout should be fairly easy as changing the import and changing a few method calls.

@cory-stripe
Copy link
Contributor

Hey @martinpinto! We saw that new library as well and were looking forward to trying this out. You beat us to it!

Due to to it being a holiday week we might be a bit slow to evaluate this, but I will try and do so soon as I'm trying to burn down our PR/issue list this week. :)

Thanks!

@stripe-ci
Copy link

Gerald Rule: Copy Observability on Veneur and Unilog pull requests

cc @stripe/observability
cc @stripe/observability-stripe

@martinpinto
Copy link
Contributor Author

sure :) no stress! happy to be able to contribute :)

@cory-stripe
Copy link
Contributor

Great! I'm winding up to do some profiling on my end for this and #189.

@cory-stripe
Copy link
Contributor

cory-stripe commented Jul 5, 2017

I did some benchmarking this morning for this and #189 since they are related. The micro benchmark made a single Set sampler and added the current iteration integer as a string. Since b.N grows and go "figured it out", I'm assuming this is an ok test even if it also tracks the time spent in string conversion. :)

func BenchmarkSet(b *testing.B) {

	s := NewSet("ab.b.c", nil)
	for n := 0; n < b.N; n++ {
		s.Sample(strconv.Itoa(n), 1.0)
	}
	_ = s.Flush()
}

Here are the results. I ran each 3 times so I could average them. That's probably not necessary considering how go handles benchmarks, but whatevs:

ClarkDuvall, 18 (current implementation, precision):

BenchmarkSet-8   	 5000000	       211 ns/op	      30 B/op	       3 allocs/op
BenchmarkSet-8   	 5000000	       213 ns/op	      30 B/op	       3 allocs/op
BenchmarkSet-8   	 5000000	       209 ns/op	      30 B/op	       3 allocs/op

ClarkDuvall, 14 (current implementation, lower precision suggested by @seiflotfy):

BenchmarkSet-8   	10000000	       114 ns/op	      24 B/op	       3 allocs/op
BenchmarkSet-8   	10000000	       117 ns/op	      24 B/op	       3 allocs/op
BenchmarkSet-8   	10000000	       119 ns/op	      24 B/op	       3 allocs/op

Axiom, 18 (new implementation, same precision)

BenchmarkSet-8   	10000000	       123 ns/op	      17 B/op	       2 allocs/op
BenchmarkSet-8   	10000000	       124 ns/op	      17 B/op	       2 allocs/op
BenchmarkSet-8   	10000000	       122 ns/op	      17 B/op	       2 allocs/op

Axiom, 14 (new implementation, new precision)

BenchmarkSet-8   	20000000	        91.9 ns/op	      16 B/op	       2 allocs/op
BenchmarkSet-8   	20000000	        88.4 ns/op	      16 B/op	       2 allocs/op
BenchmarkSet-8   	20000000	        93.9 ns/op	      16 B/op	       2 allocs/op

Assuming that the results of each of these implementation is Good Enough™ it's pretty clear that the new implementation + new precision are a net win showing a 47% reduction in B/op and 57% reduction in ns/op.

I will touch base with @aditya-stripe in the next few days and get his feels on this, but it seems like a win to me!

@cory-stripe
Copy link
Contributor

It's also worth nothing that this PR removes the use of the fnv hash in Set's Sample method. The existing implementation expects a hashed value to be passed in whereas the axiom implementation hashes for us.

@seiflotfy
Copy link
Contributor

Any updates?

@cory-stripe
Copy link
Contributor

Nope! Pretty sure we'll merge this soon but we're a bit busy this week with some travel and team stuff. Sorry for the delay! I don't think we'll need anything extra except maybe a rebase in the future? :)

@martinpinto
Copy link
Contributor Author

Yeah no problem! We can do that. :)

@cory-stripe
Copy link
Contributor

I'm doing a bunch of merges this morning, so I'll ping this when I'm done.

Also, you can give me permission to rebase this myself!

@martinpinto
Copy link
Contributor Author

Allow edits from maintainers is checked. Feel free :)

@cory-stripe
Copy link
Contributor

Perfect, thanks!

@seiflotfy seiflotfy force-pushed the new_hyperloglog branch 2 times, most recently from 585fd77 to 267a653 Compare July 13, 2017 07:35
@cory-stripe
Copy link
Contributor

Update!

Last week we decided that this PR would be held out while we merged the release we were already working on. That release — something we'd been working on for internal use — consumed most of our time.

This week part of the team is OoO so this is most likely something we'll target for next week.

@seiflotfy
Copy link
Contributor

any update?

@ChimeraCoder
Copy link
Contributor

ChimeraCoder commented Jul 28, 2017

Hey @seiflotfy - thanks for following up! We're back fully in the office this week, so just having a chance to take a look now.

This looks good to us, but it'd be nice for us to have some solid benchmarks (before/after) based on the actual runtime performance. We tried adding that a few months ago, but it turned out that we couldn't do this in 1.8; 1.9 includes a patch that fixes the problem. So we have a PR that gives us the GC metrics, and it's just waiting on 1.9 to hit stable. We also have a number of other changes regarding memory usage that are also waiting on that.

Since this isn't urgent, and 1.9 is only a week or two away, we'd prefer to hold off until after 1.9 is released and #144 is merged. The upside is that, by waiting a few weeks, we'll actually be able to do a proper blue/green test of this change and measure its effects on our production data, with a nice before-and-after graph to accompany it! :)

@seiflotfy
Copy link
Contributor

seiflotfy commented Jul 28, 2017 via email

@martinpinto
Copy link
Contributor Author

Same here, would love to help out :)

@ChimeraCoder
Copy link
Contributor

Short of anything that would get the 1.9 stable release out faster, not really! ;)

We also have some August OoOs on the team, so it may not be immediate, but we've got this bookmarked and are excited to try it out. Stay tuned!

@cory-stripe
Copy link
Contributor

Hey folks!

Just a note that we're now rolling out 1.9 in our local tests, and will be able to start looking at the stats + this merge in the near term. Thanks for your patience!

@gphat
Copy link
Contributor

gphat commented Aug 18, 2017

Note that our GC metrics have been merged, so we should be able to get to this soon.

@ChimeraCoder
Copy link
Contributor

@asf-stripe is the release captain for v1.7: Can you make sure we include this? Keep in mind that we'll need to rebase, and since this is before we migrated to dep, we'll need to incorporate that into the rebase here as well. (Basically, the resultant diff shouldn't contain a vendor.json file)

@asf-stripe
Copy link
Contributor

I just rebased using dep, will merge when tests pass (they do locally)

Copy link
Contributor

@asf-stripe asf-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@asf-stripe asf-stripe merged commit 3364544 into stripe:master Oct 5, 2017
@asf-stripe
Copy link
Contributor

Thank you so much for your contribution, @martinpinto!! This will be in 1.7 - I'll copy you on a pull request to update the CHANGELOG (:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants