-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify storage resolution #125
Conversation
If metrics are retrieved at a higher resolution than 1min we should store them at a higher resolution
backend/cloudwatch.go
Outdated
@@ -62,6 +62,7 @@ func (cb *CloudWatchBackend) Collect(r *collector.Result) error { | |||
|
|||
svc := cloudwatch.New(sess) | |||
metrics := []*cloudwatch.MetricDatum{} | |||
duration := r.PollDuration.Milliseconds() / 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the API has the ability to change the PollDuration dynamically with a response header:
buildkite-agent-metrics/collector/collector.go
Lines 152 to 159 in a0e44e3
if pollSeconds := res.Header.Get(PollDurationHeader); pollSeconds != "" { | |
pollSecondsInt, err := strconv.ParseInt(pollSeconds, 10, 64) | |
if err != nil { | |
log.Printf("Failed to parse %s header: %v", PollDurationHeader, err) | |
} else { | |
result.PollDuration = time.Duration(pollSecondsInt) * time.Second | |
} | |
} |
buildkite-agent-metrics/main.go
Lines 122 to 164 in a0e44e3
f := func() (time.Duration, error) { | |
t := time.Now() | |
result, err := c.Collect() | |
if err != nil { | |
return time.Duration(0), err | |
} | |
if !*dryRun { | |
err = bk.Collect(result) | |
if err != nil { | |
return time.Duration(0), err | |
} | |
} | |
log.Printf("Finished in %s", time.Now().Sub(t)) | |
return result.PollDuration, nil | |
} | |
minPollDuration, err := f() | |
if err != nil { | |
fmt.Println(err) | |
} | |
if *interval > 0 { | |
for { | |
waitTime := *interval | |
// Respect the min poll duration returned by the API | |
if *interval < minPollDuration { | |
log.Printf("Increasing poll duration based on rate-limit headers") | |
waitTime = minPollDuration | |
} | |
log.Printf("Waiting for %v (minimum of %v)", waitTime, minPollDuration) | |
time.Sleep(waitTime) | |
minPollDuration, err = f() | |
if err != nil { | |
fmt.Println(err) | |
} | |
} | |
} |
Though the API's source code seems to have it hard-coded to 10s.
This would mean that everyone using the cloudwatch backend would suddenly be using the high resolution storage if this were merged, and they started using it.
I think we should make the high resolution storage optional for this reason. Perhaps we can make the default to use the high resolution storage but allow customers to turn it off (and poll less frequently) if it costs them more than they expected.
I don't think we should increase the hard-coded 10s value in the API. Presumably it is more relevant to other backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh weird, we do send an interval along in the user agent, but the API ignores it?
…ariable The API poll duration is just a lower bounds
Avoid accidentally incuring more charges unless it's explicitly enabled
@@ -146,7 +149,7 @@ func Handler(ctx context.Context, evt json.RawMessage) (string, error) { | |||
if err != nil { | |||
return "", err | |||
} | |||
metricsBackend = backend.NewCloudWatchBackend(awsRegion, dimensions) | |||
metricsBackend = backend.NewCloudWatchBackend(awsRegion, dimensions, int64(time.Since(lastPollTime).Seconds()), enableHighResolution) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first time this runs lastPollTime will be 0 so this will be epoch seconds (i.e. 1720764897). But since we only care if it's less than or more than 60 seconds then that's inconsequential. The first metric we send within this lambda context won't be high frequency, but that should be OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 🚀
If metrics are retrieved at a higher resolution than 1min we should store them at a higher resolution in CloudWatch