Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit GC metrics at flush time #144

Closed
wants to merge 2 commits into from
Closed

Emit GC metrics at flush time #144

wants to merge 2 commits into from

Conversation

cory-stripe
Copy link
Contributor

@cory-stripe cory-stripe commented Mar 21, 2017

Summary

Emit GC metrics at each remote metrics flush.

Adds:

  • A gauge veneur.mem.heap_alloc_bytes tracking the size of the allocated heap in bytes
  • A gauge veneur.gc.number tracking the number of GC runs
  • A gauge veneur.gc.pause_total_ns tracking the total duration of GC pauses

Motivation

Now that we're using 1.9 RCs for our deploys, it's safe to merge this. In past versions Go would do a Stop-The-World collection when we asked for these stats. 1.9 fixes that.

We're adding this because, before now, it was difficult to measure the impact that changes to veneur's code had on production machines.

With these metrics we can determine if changes such as increasing ringbuffer sizes for traces or using a new HLL implementation in #190 have an impact either to the amount of memory allocated or the amount of time spend GCing.

These metrics will be invaluable as Veneur grows!

Test plan

Just metric emission, so should be fine.

Rollout/monitoring/revert plan

Deploy to globals and monitor. Eventually merge to fleet and let it naturally work it's way out.

r? @aditya-stripe

@cory-stripe cory-stripe changed the title Emit GC metrics at flush time. Emit GC metrics at flush time Mar 21, 2017
Copy link
Contributor

@ChimeraCoder ChimeraCoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit.

Besides that, my concern isn't that it'd block flush times; more so that it'd impact metric collection times, since this performs a full dump of all goroutines. On the globalstats boxes, this isn't a huge problem, but since Veneur submits metrics to itself (running in the same process), that means that we could impact Veneur's ability to collect metrics about itself.

That said, it'd be pretty easy to test that out in QA and see if it'd be a problem.

Also, just in case you forgot, we aren't collecting this in Datadog, but we do have access to runtime debugging information on all globalstats boxes:

https://:8200/debug/pprof/heap?debug=1

flusher.go Outdated
@@ -274,6 +275,14 @@ func (s *Server) reportGlobalMetricsFlushCounts(ms metricsSummary) {
// flushRemote breaks up the final metrics into chunks
// (to avoid hitting the size cap) and POSTs them to the remote API
func (s *Server) flushRemote(finalMetrics []samplers.DDMetric) {

mem := new(runtime.MemStats)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: &runtime.MemStats{}

@gphat
Copy link
Contributor

gphat commented Mar 22, 2017

After reading around — I was curious how you knew that! — I learned that currently ReadMemStats causes a full STW collection. Yuck!

Thankfully this was noticed and there's a patch in the current 1.9 work that brings this down to ~25µs which is 🎉.

The downside is that measuring this as it stands will cause the data to change. Since the call itself causes a full GC, we'll artificially influence the heap. Bleh. I'll skip this, but wanted it as a baseline for a patch I'll send tomorrow. Oh well! I'll rely on the benchmark output. :)

@cory-stripe cory-stripe deleted the cory-gc-metrics branch March 22, 2017 14:23
@cory-stripe cory-stripe restored the cory-gc-metrics branch June 23, 2017 20:20
@cory-stripe
Copy link
Contributor Author

Reopening this so it's here when 1.9 ships in ~August…

@stripe-ci
Copy link

Gerald Rule: Copy Observability on Veneur and Unilog pull requests

cc @stripe/observability
cc @stripe/observability-stripe

@cory-stripe cory-stripe assigned ghost Aug 7, 2017
@stripe-ci stripe-ci assigned cory-stripe and unassigned aditya-stripe and ghost Aug 7, 2017
@cory-stripe
Copy link
Contributor Author

This got merged elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants