*: support authentication and TLS for Alertmanager #1838

simonpasquier · 2019-12-04T16:27:11Z

Closes #606

I added CHANGELOG entry for this change.
Change is not relevant to the end user.

Changes

This change adds support for authentication with basic auth, client
certificates and bearer tokens. It also enables to configure TLS
settings for the Alertmanager endpoints.

Most of the work leverages the existing Prometheus configuration format
and code. In particular TLS certificate files are automatically reloaded
whenever they change.

Verification

End-to-end tests added to cover various HTTP client configurations (TLS, authentication) and file SD integration.

bwplotka

Awesome, very nice, it would be nice to add some e2e tests TBH, but it looks good from static review (:

cmd/thanos/rule.go

docs/components/rule.md

scripts/cfggen/main.go

simonpasquier · 2019-12-11T09:09:47Z

@bwplotka it's ready now.

bwplotka

Nice! It looks great, but I have some suggestions. I think we improved single alertmanagers.urls flag on the way which is nice. (: Thanks!

I think all of those are minor suggestions and this is generally good!

bwplotka · 2019-12-16T09:47:49Z

cmd/thanos/rule.go

+		return err
+	}
+	var (
+		alertingcfg alert.AlertingConfig


I think we stick to camelCase here, but not a big deal, looks readable (:

bwplotka · 2019-12-16T10:49:42Z

pkg/alert/alert.go

+						"msg", "sending alerts failed",
+						"alertmanager", u.Host,
+						"numAlerts", len(alerts),
+						"err", err)


) should be in next line I think in terms of formatting (:

bwplotka · 2019-12-16T10:50:32Z

pkg/alert/alert.go

@@ -248,12 +238,15 @@ func (q *Queue) Push(alerts []*Alert) {
 	}
 }

+type AlertmanagerDoer interface {


Everywhere else we refer as Client - should we rename here as well? (:

bwplotka · 2019-12-16T10:51:54Z

pkg/alert/alert.go

@@ -308,7 +294,7 @@ func NewSender(
 	return s
 }

-// Send an alert batch to all given Alertmanager URLs.
+// Send an alert batch to all given Alertmanager client.


Suggested change

// Send an alert batch to all given Alertmanager client.

// Send an alert batch to all given Alertmanager clients.

bwplotka · 2019-12-16T10:55:50Z

docs/components/rule.md

+
+### Alertmanager
+
+The configuration format supported by the `--alertmanagers.config` and `--alertmanagers.config-file` flags is the following:


Can we mention something like:

The configuration allows specifying multiple Alertmanagers. Those entries are treated as a single HA group. This means that alert send failure is claimed only if Ruler fails to send to all instances.

I think we might be missing this as users could use it in a different way (sharding alerts, fanout etc)

bwplotka · 2019-12-16T12:39:58Z

test/e2e/rule_test.go

+	r := rule(a.New(), a.New(), rulesDir, amCfg, []address{qAddr}, nil)
+	q := querier(qAddr, a.New(), []address{r.GRPC}, nil)
+
+	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute)


1*time.Minute This might be not enough for our sometimes slow CI, let's make it 3m

right! it should be ok now.

bwplotka · 2019-12-16T12:43:01Z

test/e2e/rule_test.go

+	}))
+
+	// Update the Alertmanager file service discovery configuration.
+	writeAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort())


Suggested change

writeAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort())

writeRulerAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort())

bwplotka · 2019-12-16T12:43:41Z

test/e2e/rule_test.go

+		return nil
+	}))
+
+	// Update the Alertmanager file service discovery configuration.


It sounds like we are updating some Alertmanager file SD not Ruler\s file SD for alertmanager (: Can we clarify a bit?

I've removed writeAlertmanagerFileSD which wasn't really needed since it was only called once. Hopefully it's clearer now.

bwplotka · 2019-12-16T12:45:19Z

test/e2e/rule_test.go

+		<-exit
+	}()
+
+	// Wait for a couple of evaluations.


can we comment on what we wait?

bwplotka · 2019-12-16T12:49:52Z

test/e2e/rule_test.go

+func TestRuleAlertmanagerFileSD(t *testing.T) {
+	a := newLocalAddresser()
+
+	am := alertManager(a.New())


What do you think about this and using alertmanager Mock? I like e2e compatibility check against Alertmanager. I guess it would be too hard to use proper alertmanager in TestRuleAlertmanagerHTTPClient as well? (:

I went with a "fake" Alertmanager for TestRuleAlertmanagerHTTPClient because Alertmanager doesn't support TLS and authentication natively so we would have to deploy something else in front of it. Since the other tests still exercise the "real" Alertmanager API, I felt that it was worth the trade off.

Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.

This makes sense totally, worth to comment maybe? (:

Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.

From API perspective yes (methods, required parameters etc), but it's always nice to have e2e tests against actual implementation. This useful to check against hidden invariants etc.

Agreed. It would be worth revisiting this part when we add support for Alertmanager API v2. I tried quickly to hack something with httputil.ReverseProxy but I failed short.

Comment added.

bwplotka

Awesome! Looks like provider and some docs and test timeout are the only things to address (:

LGTM otherwise.

bwplotka · 2019-12-16T19:08:45Z

pkg/alert/client.go

+// TODO(simonpasquier): add support for API version (v1 or v2).
+type AlertmanagerConfig struct {
+	// HTTP client configuration.
+	HTTPClientConfig HTTPClientConfig `yaml:"http_config"`


Fair, it's just bit more work, but happy with this.

pkg/alert/client_test.go

bwplotka · 2019-12-17T13:44:43Z

test/e2e/rule_test.go

+func TestRuleAlertmanagerFileSD(t *testing.T) {
+	a := newLocalAddresser()
+
+	am := alertManager(a.New())


This makes sense totally, worth to comment maybe? (:

Side-note: with the Alertmanager v2 API and its Open API specification, it's even less needed to run a "real" Alertmanager server as you can generate the server code and probably hook into it from the e2e tests.

From API perspective yes (methods, required parameters etc), but it's always nice to have e2e tests against actual implementation. This useful to check against hidden invariants etc.

bwplotka · 2019-12-17T13:44:50Z

test/e2e/rule_test.go

+	r := rule(a.New(), a.New(), rulesDir, amCfg, []address{qAddr}, nil)
+	q := querier(qAddr, a.New(), []address{r.GRPC}, nil)
+
+	ctx, cancel := context.WithTimeout(context.Background(), 1*time.Minute)


cmd/thanos/rule.go

This change adds support for authentication with basic auth, client certificates and bearer tokens. It also enables to configure TLS settings for the Alertmanager endpoints. Most of the work leverages the existing Prometheus configuration format and code. In particular TLS certificate files are automatically reloaded whenever they change. Signed-off-by: Simon Pasquier <[email protected]>

…re both defined Signed-off-by: Simon Pasquier <[email protected]>

Signed-off-by: Simon Pasquier <[email protected]>

bwplotka · 2019-12-18T14:56:13Z

Rdy for review? (:

simonpasquier · 2019-12-18T15:05:17Z

yep

bwplotka

🚄 Let's go! LGTM, thanks for this good work @simonpasquier

simonpasquier · 2019-12-18T16:33:10Z

Thanks a lot for the speedy reviews!

FUSAKLA · 2020-01-14T00:19:01Z

pkg/alert/client.go

+var userAgent = fmt.Sprintf("Thanos/%s", version.Version)
+
+type AlertingConfig struct {
+	Alertmanagers []AlertmanagerConfig `yaml:"alertmanagers"`


Just thinking.. It might be nice to also use the alert_relabel_configs instead of the --alert.label-drop ?
That would provide higher variability to user.
Also adding external labels to the config instead of the --label flag?
Not sure how fare we want to take this configuration @bwplotka WDYT?
(Sorry for late comments, I had no time lately)

bwplotka reviewed Dec 4, 2019

View reviewed changes

cmd/thanos/rule.go Outdated Show resolved Hide resolved

docs/components/rule.md Show resolved Hide resolved

scripts/cfggen/main.go Show resolved Hide resolved

simonpasquier force-pushed the tls-and-auth-for-alertmanager branch 2 times, most recently from c76a305 to 358fdcf Compare December 11, 2019 08:16

simonpasquier marked this pull request as ready for review December 11, 2019 09:07

bwplotka reviewed Dec 16, 2019

View reviewed changes

bwplotka mentioned this pull request Dec 16, 2019

packaging alertmanager sets and add a tests. #1894

Closed

simonpasquier force-pushed the tls-and-auth-for-alertmanager branch 2 times, most recently from 8150c9f to 69083ff Compare December 16, 2019 16:42

bwplotka reviewed Dec 17, 2019

View reviewed changes

simonpasquier added 10 commits December 17, 2019 15:47

Fail hard when --alertmanagers.url and --alertmanagers.config flags a…

5673d74

…re both defined Signed-off-by: Simon Pasquier <[email protected]>

Update CHANGELOG.md

a245cc2

Signed-off-by: Simon Pasquier <[email protected]>

Move tests from cmd/thanos to pkg/alert

282681e

Signed-off-by: Simon Pasquier <[email protected]>

Add end-to-end for Alertmanager file SD

16f1858

Signed-off-by: Simon Pasquier <[email protected]>

test/e2e: add test with different alerting HTTP clients

351841d

Signed-off-by: Simon Pasquier <[email protected]>

Fix panic in pkg/alert/client_test.go

075283c

Signed-off-by: Simon Pasquier <[email protected]>

Address Bartek's comments

65f00de

Signed-off-by: Simon Pasquier <[email protected]>

Re-use dns.Provider for resolving Alertmanager addresses

07f711b

Signed-off-by: Simon Pasquier <[email protected]>

update documentation

f467acf

Signed-off-by: Simon Pasquier <[email protected]>

simonpasquier force-pushed the tls-and-auth-for-alertmanager branch from 69083ff to f467acf Compare December 18, 2019 07:41

bwplotka approved these changes Dec 18, 2019

View reviewed changes

bwplotka merged commit 56abeab into thanos-io:master Dec 18, 2019

simonpasquier deleted the tls-and-auth-for-alertmanager branch December 18, 2019 16:33

simonpasquier mentioned this pull request Dec 19, 2019

cmd/thanos/rule: remove unused metric #1912

Merged

simonpasquier mentioned this pull request Jan 6, 2020

*: support TLS and authentication for Thanos Ruler queries #1939

Merged

2 tasks

FUSAKLA reviewed Jan 14, 2020

View reviewed changes

pgier mentioned this pull request Jan 21, 2020

add thanos-ruler operator prometheus-operator/prometheus-operator#2943

Merged

	// Send an alert batch to all given Alertmanager client.
	// Send an alert batch to all given Alertmanager clients.


		### Alertmanager

		The configuration format supported by the `--alertmanagers.config` and `--alertmanagers.config-file` flags is the following:

	writeAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort())
	writeRulerAlertmanagerFileSD(t, filepath.Join(amDir, "targets.yaml"), am.HTTP.HostPort())

*: support authentication and TLS for Alertmanager #1838

*: support authentication and TLS for Alertmanager #1838

Conversation

simonpasquier commented Dec 4, 2019 • edited Loading

Changes

Verification

bwplotka left a comment

Choose a reason for hiding this comment

simonpasquier commented Dec 11, 2019

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka commented Dec 18, 2019

simonpasquier commented Dec 18, 2019

bwplotka left a comment

Choose a reason for hiding this comment

simonpasquier commented Dec 18, 2019

Choose a reason for hiding this comment

simonpasquier commented Dec 4, 2019 •

edited

Loading