-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: support authentication and TLS for Alertmanager #1838
*: support authentication and TLS for Alertmanager #1838
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, very nice, it would be nice to add some e2e tests TBH, but it looks good from static review (:
c76a305
to
358fdcf
Compare
@bwplotka it's ready now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! It looks great, but I have some suggestions. I think we improved single alertmanagers.urls
flag on the way which is nice. (: Thanks!
I think all of those are minor suggestions and this is generally good!
cmd/thanos/rule.go
Outdated
return err | ||
} | ||
var ( | ||
alertingcfg alert.AlertingConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we stick to camelCase
here, but not a big deal, looks readable (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/alert/alert.go
Outdated
"msg", "sending alerts failed", | ||
"alertmanager", u.Host, | ||
"numAlerts", len(alerts), | ||
"err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
)
should be in next line I think in terms of formatting (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/alert/alert.go
Outdated
@@ -248,12 +238,15 @@ func (q *Queue) Push(alerts []*Alert) { | |||
} | |||
} | |||
|
|||
type AlertmanagerDoer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everywhere else we refer as Client
- should we rename here as well? (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/alert/alert.go
Outdated
@@ -308,7 +294,7 @@ func NewSender( | |||
return s | |||
} | |||
|
|||
// Send an alert batch to all given Alertmanager URLs. | |||
// Send an alert batch to all given Alertmanager client. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Send an alert batch to all given Alertmanager client. | |
// Send an alert batch to all given Alertmanager clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
docs/components/rule.md
Outdated
|
||
### Alertmanager | ||
|
||
The configuration format supported by the `--alertmanagers.config` and `--alertmanagers.config-file` flags is the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mention something like:
The configuration allows specifying multiple Alertmanagers. Those entries are treated as a single HA group. This means that alert send failure is claimed only if Ruler fails to send to all instances.
I think we might be missing this as users could use it in a different way (sharding alerts, fanout etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
test/e2e/rule_test.go
Outdated
r := rule(a.New(), a.New(), rulesDir, amCfg, []address{qAddr}, nil) | ||
q := querier(qAddr, a.New(), []address{r.GRPC}, nil) | ||
|
||
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1*time.Minute
This might be not enough for our sometimes slow CI, let's make it 3m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still 1m?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right! it should be ok now.
test/e2e/rule_test.go
Outdated
})) | ||
|
||
// Update the Alertmanager file service discovery configuration. | ||
writeAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort()) | |
writeRulerAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort()) |
test/e2e/rule_test.go
Outdated
return nil | ||
})) | ||
|
||
// Update the Alertmanager file service discovery configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds like we are updating some Alertmanager
file SD not Ruler\s file SD for alertmanager (: Can we clarify a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed writeAlertmanagerFileSD
which wasn't really needed since it was only called once. Hopefully it's clearer now.
test/e2e/rule_test.go
Outdated
<-exit | ||
}() | ||
|
||
// Wait for a couple of evaluations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we comment on what we wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
func TestRuleAlertmanagerFileSD(t *testing.T) { | ||
a := newLocalAddresser() | ||
|
||
am := alertManager(a.New()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this and using alertmanager Mock? I like e2e compatibility check against Alertmanager. I guess it would be too hard to use proper alertmanager in TestRuleAlertmanagerHTTPClient
as well? (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with a "fake" Alertmanager for TestRuleAlertmanagerHTTPClient
because Alertmanager doesn't support TLS and authentication natively so we would have to deploy something else in front of it. Since the other tests still exercise the "real" Alertmanager API, I felt that it was worth the trade off.
Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense totally, worth to comment maybe? (:
Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.
From API perspective yes (methods, required parameters etc), but it's always nice to have e2e tests against actual implementation. This useful to check against hidden invariants etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It would be worth revisiting this part when we add support for Alertmanager API v2. I tried quickly to hack something with httputil.ReverseProxy
but I failed short.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment added.
8150c9f
to
69083ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Looks like provider
and some docs and test timeout are the only things to address (:
LGTM otherwise.
// TODO(simonpasquier): add support for API version (v1 or v2). | ||
type AlertmanagerConfig struct { | ||
// HTTP client configuration. | ||
HTTPClientConfig HTTPClientConfig `yaml:"http_config"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, it's just bit more work, but happy with this.
func TestRuleAlertmanagerFileSD(t *testing.T) { | ||
a := newLocalAddresser() | ||
|
||
am := alertManager(a.New()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense totally, worth to comment maybe? (:
Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.
From API perspective yes (methods, required parameters etc), but it's always nice to have e2e tests against actual implementation. This useful to check against hidden invariants etc.
test/e2e/rule_test.go
Outdated
r := rule(a.New(), a.New(), rulesDir, amCfg, []address{qAddr}, nil) | ||
q := querier(qAddr, a.New(), []address{r.GRPC}, nil) | ||
|
||
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still 1m?
This change adds support for authentication with basic auth, client certificates and bearer tokens. It also enables to configure TLS settings for the Alertmanager endpoints. Most of the work leverages the existing Prometheus configuration format and code. In particular TLS certificate files are automatically reloaded whenever they change. Signed-off-by: Simon Pasquier <[email protected]>
…re both defined Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
Signed-off-by: Simon Pasquier <[email protected]>
69083ff
to
f467acf
Compare
Rdy for review? (: |
yep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚄 Let's go! LGTM, thanks for this good work @simonpasquier
Thanks a lot for the speedy reviews! |
var userAgent = fmt.Sprintf("Thanos/%s", version.Version) | ||
|
||
type AlertingConfig struct { | ||
Alertmanagers []AlertmanagerConfig `yaml:"alertmanagers"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking.. It might be nice to also use the alert_relabel_configs
instead of the --alert.label-drop
?
That would provide higher variability to user.
Also adding external labels to the config instead of the --label
flag?
Not sure how fare we want to take this configuration @bwplotka WDYT?
(Sorry for late comments, I had no time lately)
Closes #606
Changes
This change adds support for authentication with basic auth, client
certificates and bearer tokens. It also enables to configure TLS
settings for the Alertmanager endpoints.
Most of the work leverages the existing Prometheus configuration format
and code. In particular TLS certificate files are automatically reloaded
whenever they change.
Verification
End-to-end tests added to cover various HTTP client configurations (TLS, authentication) and file SD integration.