[new feature] promtail: Add config reload endoint / signal to promtail #7247

liguozhong · 2022-09-26T13:10:58Z

What this PR does / why we need it:
Add config reload endoint / signal to promtail.

reload is a very dangerous feature, it is easy to make promtail panic, we can refer to the failure history of another log agent

● vectordotdev/vector#10485
● vectordotdev/vector#10412
● vectordotdev/vector#13228

But "/reload" can make promtail more flexible, make loki easier to use with k8s, and provide a better foundation for log tail CRD (like prometheus ServiceMonitor).

config

server:
  enable_runtime_reload: true

detail config

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  enable_runtime_reload: true

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs-reload-promtail-test
      __path__: /var/log/cloudprintupdate.log

Which issue(s) this PR fixes:
Fixes #16

Special notes for your reviewer:
old config

curl "/reload"

new config, "job: varlogs-reload-promtail-test" has change

Checklist

Reviewed the CONTRIBUTING.md guide
Documentation added
Tests updated
CHANGELOG.md updated
Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

trevorwhitney

Thanks for this addition. I think this is a good idea for promtail, I just have a couple thoughts and questions.

trevorwhitney · 2022-09-26T17:57:12Z

clients/pkg/promtail/promtail.go

+	return promtail, nil
+}
+
+func (p *Promtail) reloadConfig(cfg config.Config) error {


why is this not a pointer to config.Config? It seems like we're mutating it within this function (ie. cfg.PositionsConfig.ReadOnly

Thank you for such fast feedback,done.

trevorwhitney · 2022-09-26T17:59:38Z

clients/pkg/promtail/promtail.go

+	promtailServer := p.server.(*server.PromtailServer)
+	for {
+		select {
+		case <-hup:


sorry, not super familiar with these live reload patterns, can you explain the SIGHUP flow?

I refer to the reload documentation of prometheus, we should also provide this reload way.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/
A configuration reload is triggered by sending a SIGHUP.

trevorwhitney · 2022-09-26T18:01:47Z

clients/pkg/promtail/promtail_test.go

@@ -693,7 +693,7 @@ func Test_DryRun(t *testing.T) {
 			PositionsFile: f.Name(),
 			SyncPeriod:    time.Second,
 		},
-	}, clientMetrics, false, nil)
+	}, nil, clientMetrics, false, nil)


Do we want a test that this new configReload function is called? Not sure if we need to go so far as the whole config gets reloaded, but maybe at least the function is called, and test the flow where watch config is disabled when the function is not provided?

done.
hi, your review suggestion is great, through this test I found a bug that "PromtailServer.promtailCfg" was not updated correctly.

dannykopping

Thanks for the nice feature @liguozhong!

This is a very delicate feature which has to be handled and communicated very carefully. I've left a bunch of small nits relating to wording, and identified a couple areas that I'm a bit concerned about - most notably the lack of sad-day tests and the behaviour of invalid configs exiting the process on reload.

clients/pkg/promtail/promtail.go

dannykopping · 2022-09-27T07:22:46Z

clients/pkg/promtail/promtail.go

+		level.Warn(p.logger).Log("msg", "disable watchConfig")
+		return
+	}
+	level.Warn(p.logger).Log("msg", "enable watchConfig")


These log messages should be more clear, please.

dannykopping · 2022-09-27T07:23:30Z

clients/pkg/promtail/promtail.go

+			cfg := p.newConfig()
+			if err := p.reloadConfig(cfg); err != nil {
+				level.Error(p.logger).Log("msg", "Error reloading config", "err", err)
+			}
+		case rc := <-promtailServer.Reload():
+			cfg := p.newConfig()
+			if err := p.reloadConfig(cfg); err != nil {
+				level.Error(p.logger).Log("msg", "Error reloading config", "err", err)


I would prefer if we centralised this logic rather than repeating it

dannykopping · 2022-09-27T07:24:26Z

clients/pkg/promtail/server/server.go

@@ -51,6 +54,8 @@ type Config struct {
 	ExternalURL       string `yaml:"external_url"`
 	HealthCheckTarget *bool  `yaml:"health_check_target"`
 	Disable           bool   `yaml:"disable"`
+	Reload            bool   `yaml:"reload"`


We should consider naming this more clearly.

I would suggest enable_runtime_reload

dannykopping · 2022-09-27T07:27:51Z

clients/pkg/promtail/promtail.go

+		return nil, err
+	}
+	server, err := server.New(cfg.ServerConfig, promtail.logger, promtail.targetManagers, cfg.String())
+	if err != nil {
+		return nil, err


Please wrap these errors

dannykopping · 2022-09-27T07:31:27Z

clients/pkg/promtail/server/server.go

@@ -51,6 +54,8 @@ type Config struct {
 	ExternalURL       string `yaml:"external_url"`
 	HealthCheckTarget *bool  `yaml:"health_check_target"`
 	Disable           bool   `yaml:"disable"`
+	Reload            bool   `yaml:"reload"`
+	NewByReload       bool


What is this for?

This is redundant, invalid variables left over from my development process

dannykopping · 2022-09-27T07:31:42Z

clients/pkg/promtail/server/server.go

@@ -60,6 +65,7 @@ func (cfg *Config) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) {
 	cfg.Config.RegisterFlags(f)

 	f.BoolVar(&cfg.Disable, prefix+"server.disable", false, "Disable the http and grpc server.")
+	f.BoolVar(&cfg.Reload, prefix+"server.reload", false, "Enable reload via HTTP request.")


It can also be reloaded via SIGHUP, right?

As far as I know, the SIGHUP way in the reload of prometheus cannot be disable.
I suggest we should keep the same behavior as prometheus.

clients/pkg/promtail/promtail.go

clients/pkg/promtail/promtail_test.go

dannykopping · 2022-09-27T07:35:20Z

clients/cmd/promtail/main.go

+		var config Config
+		if err := cfg.DefaultUnmarshal(&config, args, flag.NewFlagSet(os.Args[0], flag.ExitOnError)); err != nil {
+			fmt.Println("Unable to parse config:", err)
+			os.Exit(1)


I certainly don't think this behaviour is desired; if a runtime config cannot be reloaded, it should not kill the process.
I think we need to distinguish between the initial load and subsequent (runtime) reloads.

This will also better match how Loki works with its overrides.

Co-authored-by: Danny Kopping <[email protected]>

grafanabot · 2022-09-27T12:58:21Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

liguozhong · 2022-09-27T13:08:45Z

All the review tips, I have made targeted changes

grafanabot · 2022-09-28T04:46:41Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
- querier/queryrange	-0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

liguozhong · 2022-10-10T07:46:16Z

@dannykopping hi, this pr is ready for review .
please help me to review it again when you have time

dannykopping

Looking good @liguozhong - a few minor points to address then I think this is ready to gp

clients/cmd/promtail/main.go

dannykopping · 2022-10-10T08:52:05Z

clients/pkg/promtail/promtail.go

@@ -8,6 +8,7 @@ import (

 	"github.com/go-kit/log"
 	"github.com/go-kit/log/level"
+	"github.com/pkg/errors"


Let's please use the stdlib errors where possible

dannykopping · 2022-10-10T08:52:36Z

clients/pkg/promtail/promtail.go

@@ -65,20 +74,24 @@ func New(cfg config.Config, newConfig func() *config.Config, metrics *client.Met
 		metrics: metrics,
 		dryRun:  dryRun,
 	}
+	err := promtail.reg.Register(reloadTotal)
+	if err != nil {
+		return nil, err


Please wrap this error

dannykopping · 2022-10-10T08:54:23Z

clients/pkg/promtail/promtail.go

@@ -87,9 +100,14 @@ func New(cfg config.Config, newConfig func() *config.Config, metrics *client.Met
 }

 func (p *Promtail) reloadConfig(cfg *config.Config) error {
-	level.Info(p.logger).Log("msg", "Loading configuration file")
+	level.Info(p.logger).Log("msg", "Reloading configuration file")


I think let's move this into the condition on L107 so that it only logs when a change is made.
Alternatively, you could change this to a debug level and add an info level log line in the condition on L107

dannykopping · 2022-10-10T08:56:35Z

clients/pkg/promtail/promtail.go

+	}
+	promtailServer, ok := p.server.(*server.PromtailServer)
+	if !ok {
+		level.Warn(p.logger).Log("msg", "disable watchConfig", "reason", "promtailServer cast fail")


This log message won't mean much to the user. Can we return the actual parse error here?

dannykopping · 2022-10-10T08:58:12Z

clients/pkg/promtail/promtail.go

+	}
+	err = p.reloadConfig(cfg)
+	if err != nil {
+		reloadTotal.With(prometheus.Labels{"code": "500"}).Inc()


Are these 200 and 500 codes meant to mimick HTTP status codes?
If so, I think that's confusing. Let's be clear here and have two separate metrics - one for successful reloads and one for failures.

Co-authored-by: Danny Kopping <[email protected]>

liguozhong · 2022-10-10T10:25:19Z

Auto-merging clients/cmd/promtail/main.go
CONFLICT (content): Merge conflict in clients/cmd/promtail/main.go
Automatic merge failed; fix conflicts and then commit the result.

I have to merge master branch

# Conflicts: # clients/cmd/promtail/main.go

grafanabot · 2022-10-10T10:34:42Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
- querier/queryrange	-0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

grafanabot · 2022-10-10T10:44:42Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

liguozhong · 2022-10-10T11:07:31Z

Looking good @liguozhong - a few minor points to address then I think this is ready to gp

Thank you for investing so much time in helping to review this PR.
According to your suggestion, I have commit new code, and the cicd also passed.

dannykopping

Last couple of small nits around errors, and then we can get this merged 👍

dannykopping · 2022-10-10T12:25:11Z

clients/pkg/promtail/promtail.go

 	if err != nil {
-		return nil, err
+		return nil, errors.Wrap(err, "error register prometheus collector reloadSuccessTotal")


Please replace these with fmt.Errorf

dannykopping · 2022-10-10T13:07:33Z

clients/pkg/promtail/promtail.go

+	cfg, err := p.newConfig()
+	if err != nil {
+		reloadFailTotal.Inc()
+		return errors.Wrap(err, "Error new Config")


Please use fmt.Errorf

grafanabot · 2022-10-11T05:16:25Z

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0.1%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

liguozhong · 2022-10-11T05:45:15Z

done ，and cicd has passed

dannykopping

LGTM, thanks a lot for this!

grafana#7247)

[new feature] promtail: support /reload config

0c7e77c

liguozhong requested a review from a team as a code owner September 26, 2022 13:10

pull-request-size bot added the size/L label Sep 26, 2022

liguozhong added 7 commits September 26, 2022 21:20

fix lint

f7d7244

lint

b9c89d5

lint

0b7e536

lint

07a00bd

lint

9e82c9f

lint

4bdf6cd

lint

ce7ddb0

trevorwhitney reviewed Sep 26, 2022

View reviewed changes

liguozhong added 3 commits September 27, 2022 13:55

lint,and *pointer

e01b8bd

add promtail reload test

4d6ee77

lint

69489b9

dannykopping suggested changes Sep 27, 2022

View reviewed changes

liguozhong and others added 12 commits September 27, 2022 15:48

Update clients/pkg/promtail/promtail.go

646b120

Co-authored-by: Danny Kopping <[email protected]>

fix review tip

ebc0e35

wrap server error

df6f15d

do not panic

f838787

do not panic when reload config fail.

3074a1e

add reload metrics and test

9bd469f

delete cnt int

ab2dd37

do not panic,add more detail log for watchConfig

6bc47e3

do not reload when config not changed

6651e62

lint

2817f60

lint

a2809cd

lint

b980753

add test for promtailServer.configLoaded field

41b584f

dannykopping reviewed Oct 10, 2022

View reviewed changes

liguozhong and others added 2 commits October 10, 2022 17:30

Update clients/cmd/promtail/main.go

5e2dd98

Co-authored-by: Danny Kopping <[email protected]>

fix review tips

489f9cf

Merge branch 'main' into promtail_reload

5582e95

# Conflicts: # clients/cmd/promtail/main.go

trigger ci agent

dba1290

dannykopping reviewed Oct 10, 2022

View reviewed changes

liguozhong added 2 commits October 11, 2022 12:57

replace errors.Wrap with std fmt.Errorf

3928fff

Merge branch 'main' into promtail_reload

3396a6f

dannykopping approved these changes Oct 11, 2022

View reviewed changes

dannykopping merged commit fb26baa into grafana:main Oct 11, 2022

This was referenced Oct 11, 2022

add promtail reload changelog and doc #7386

Merged

[loki-operator] CRD: support podMonitor k8s CRD #7387

Open

lxwzy pushed a commit to lxwzy/loki that referenced this pull request Nov 7, 2022

[new feature] promtail: Add config reload endoint / signal to promtail (

ef17299

grafana#7247)

changhyuni pushed a commit to changhyuni/loki that referenced this pull request Nov 8, 2022

[new feature] promtail: Add config reload endoint / signal to promtail (

3d3cb0c

grafana#7247)

pschulten mentioned this pull request Nov 21, 2022

promtail: config reload fails #7734

Closed

Abuelodelanada pushed a commit to canonical/loki that referenced this pull request Dec 1, 2022

[new feature] promtail: Add config reload endoint / signal to promtail (

7b2e57a

grafana#7247)

jkroepke mentioned this pull request Feb 7, 2023

[promtail] Implement config-reloader grafana/helm-charts#2187

Merged

cstyan mentioned this pull request Nov 10, 2023

Promtail runtime config reload #6388

Closed

[new feature] promtail: Add config reload endoint / signal to promtail #7247

[new feature] promtail: Add config reload endoint / signal to promtail #7247

Conversation

liguozhong commented Sep 26, 2022 • edited Loading

trevorwhitney left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dannykopping left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grafanabot commented Sep 27, 2022

liguozhong commented Sep 27, 2022

grafanabot commented Sep 28, 2022

liguozhong commented Oct 10, 2022

dannykopping left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liguozhong commented Oct 10, 2022

grafanabot commented Oct 10, 2022

grafanabot commented Oct 10, 2022

liguozhong commented Oct 10, 2022

dannykopping left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grafanabot commented Oct 11, 2022

liguozhong commented Oct 11, 2022

dannykopping left a comment

Choose a reason for hiding this comment

liguozhong commented Sep 26, 2022 •

edited

Loading