Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] start maintaining an internal count of time series produced per metric #246

Closed
wants to merge 7 commits into from

Conversation

kiran-stripe
Copy link
Contributor

@kiran-stripe kiran-stripe commented Sep 7, 2017

Summary

Maintain an internal count of time series produced by metric, which we then flush out to sinks.

Still to do:

  • Ensure that the memory consumption is within bounds (and tune the precision)
  • Determine the cardinality width of each type of metric and ensure that we account for this correctly
  • Determine whether we'd only want to flush the top k highest metrics to Datadog

Motivation

This lays down the groundwork for a few features:

Test plan

  • Automated tests

Rollout/monitoring/revert plan

deploy veneur

@@ -72,6 +74,101 @@ type JSONMetric struct {
Value []byte `json:"value"`
}

const CardinalityCountName = "cardinality.count"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend adding comments explaining the difference here

// is unique across all types)
func (cc *CardinalityCount) Sample(name string, joinedTags string) {
if _, present := cc.MetricCardinality[name]; !present {
hll, _ := hyperloglog.NewPlus(18)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we extract 18 into a const? we can keep it unexported for now

// Export groups the CardinalityCount struct into sub structs, grouped by what
// their metric consistently hashes to
func (cc *CardinalityCount) ExportSets() ([]JSONMetric, error) {
jsonMetrics := make([]JSONMetric, 0, len(cc.MetricCardinality))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you for pre-make-ing! :D

jsonMetrics := make([]JSONMetric, 0, len(cc.MetricCardinality))
for metricName, hll := range cc.MetricCardinality {

buf := new(bytes.Buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm not much of a fan of new, and it's generally avoided style-wise (for no particularly good reason) in Go projects.

In this case, the zero value would work if we then pass &buf everywhere; we could also do buf := &bytes.Buffer{}.

err := encoder.Encode(map[string]*hyperloglog.HyperLogLogPlus{metricName: hll})
if err != nil {
// TODO (kiran): do we return an array of metrics, instead?
log.Printf("failed to export cardinality count for metric name:%s, error:%v", metricName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, we don't use logrus (and therefore our Sentry hooks) here. Ideally, this should go to Sentry, though that'll require a little more work to configure, since this is in a subpackage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from IRL: we're going to make a custom error type to return to the caller here, and have the caller emit to Sentry via logrus, instead

cc.MetricCardinality[name].Add(hasher)
}

// Export groups the CardinalityCount struct into sub structs, grouped by what
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment about 'sub strict'


err := decoder.Decode(&decodedMap)
if err != nil {
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return instead of panic?

}

// // HLLs are approximate, and we've seen error of +-1 here in the past, so
// // we're giving the test some room for error to reduce flakes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract the margin of error into a constant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use InDelta here!

worker_test.go Outdated
@@ -78,3 +80,25 @@ func TestWorkerImportHistogram(t *testing.T) {
wm := w.Flush()
assert.Len(t, wm.histograms, 1, "number of flushed histograms")
}

func TestWorkerImportCardinality(t *testing.T) {
w := NewWorker(1, nil, logrus.New())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can actually do log here instead of logrus.New(), because it will pick up the global one initialized in server.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh yes

}

func (e *CardinalityExportError) Error() string {
return "oh no"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tumblr_ousbyvjmpz1vbwf2ko1_1280

for metricName, hll := range cc.MetricCardinality {

buf := &bytes.Buffer{}
encoder := gob.NewEncoder(buf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recently had #220 show up which got us into a bit of trouble because the transmission of this sort of data isn't versioned or anything. I get a bit worried when I see new gob-encoded things. This is probably not actionable now but something we should consider for the future. Should we version the import endpoint or something?


// cardinalityPrecision is the precision of the HLL that stores cardinality for
// each metric name.
const cardinalityPrecision = 18
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configurable? #189 has pointed out that we seem to use an overly "big" HLL @ precision of 18.


// CardinalityCount is a sampler that records an approximate cardinality count
// by metric name.
type CardinalityCount struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's really cool that you implemented this as a sampler!

hll, _ := hyperloglog.NewPlus(cardinalityPrecision)
cc.MetricCardinality[name] = hll
}
hasher := fnv.New64a()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UDPMetric has a Digest wherein we've already computed the hash of tags + type + name. Rather than rehashing you can just use that. Then your function can be Sample(name string, digest uint32), your map keyed by the digest and you save a hash operation.

}

// Flush emits the names + cardinality of all the metrics in the map
// TODO (kiran): investigate whether we'd want to emit only the top k metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also emit a counter here based on len(cc.MetricCardinality) to get the total # of unique time series seen?

@@ -54,6 +57,7 @@ func NewWorkerMetrics() WorkerMetrics {
histograms: make(map[samplers.MetricKey]*samplers.Histo),
sets: make(map[samplers.MetricKey]*samplers.Set),
timers: make(map[samplers.MetricKey]*samplers.Histo),
cardinality: samplers.NewCardinalityCount(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity maybe note that we can't use a map[MetricKey]HLL here because we need a per-name map not a per-name, per-unique-tag map.

@kiran-stripe
Copy link
Contributor Author

closing this for now -- I think I have a better solution down the road.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants