Add file watch to support config reload on file change #4454

vjsamuel · 2021-11-18T08:40:32Z

Description:
Fixes: #4397

This PR allows the main config.yml to be reloaded each time the config file changes. The entire pipeline gets reloaded.

This PR doesn't use an FS notify/inotify style watcher as we have seen that Kubernetes doesnt support such watches when a config is mounted as a config file. os.Stat given that it is cheap can be run every second to trigger a reload.

tigrannajaryan · 2021-11-18T14:31:34Z

This PR doesn't use an FS notify/inotify style watcher as we have seen that Kubernetes doesnt support such watches when a config is mounted as a config file. os.Stat given that it is cheap can be run every second to trigger a reload.

Can we measure how much of CPU polling every second uses exactly?

tigrannajaryan · 2021-11-18T14:20:45Z

config/configmapprovider/file_watch.go

+			if os.IsNotExist(err) && lastfi != nil {
+				return errNoOp
+			}
+			return err


This appears to trigger reloading. Why do we reload if we can't stat the file?

fixed this one. thanks for catching this. as a result, i changed the flow up slightly to ensure that if someone writes a faulty config, we preserve the last sane state as long as we dont shut down the process. Such a bug would bring down the collector across the entire kube cluster if there was a faulty config map update.

tigrannajaryan · 2021-11-18T14:24:26Z

config/configmapprovider/file_watch.go

+	// Perform an initial check.
+	err := check()
+	if err != nil && err != errNoOp {
+		return err


Why do we give up watching if the initial check fails? I think we can keep checking and reload when a change is detected.

i have removed this.

tigrannajaryan · 2021-11-18T14:26:38Z

config/configmapprovider/file_watch.go

+				// If check returns a valid event, exit the loop. A new watch will be placed on the next Retrieve()
+				err := check()
+				if err == nil || err != errNoOp {
+					onChange(&ChangeEvent{


This will reload the config immediately when a change is detected. That's an undesirable behavior since the file may be in the middle of being written and we may read partially written file. It is better to wait for some small amount of time (e.g. 1 second) after the last change to the file and only after that trigger reloading to increase the chance that the entire content of the file is written.

addressed by adding a sleep.

I don't see where this is addressed, please point me to the code.

apologies. i somehow removed it during cleanup. added it back.

service/collector.go

config/configmapprovider/file_watch.go

service/collector.go

vjsamuel · 2021-11-19T06:38:30Z

Raised #4460 to fix the blocking channel

vjsamuel · 2021-11-19T06:43:54Z

@tigrannajaryan the benchmark that I have done was using the following code:

func BenchmarkOsStat(b *testing.B) {
	file, err := ioutil.TempFile("", "file_watcher_test")
	require.NoError(b, err)

	defer os.Remove(file.Name())
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		os.Stat(file.Name())
	}
}

and the result was:

goos: darwin
goarch: amd64
pkg: go.opentelemetry.io/collector/config/configmapprovider
cpu: Intel(R) Core(TM) i9-8950HK CPU @ 2.90GHz
BenchmarkOsStat
BenchmarkOsStat-12    	  197296	      6127 ns/op	     288 B/op	       2 allocs/op
PASS

we use this logic in some of our processing intensive code flows internally and we haven't run into issues so far.

vjsamuel · 2021-11-19T06:47:53Z

i have marked this PR as ready to review. it will require #4460 to be reviewed and rebased with before this one can go in if approved.

tigrannajaryan · 2021-11-22T23:29:34Z

config/configmapprovider/file.go

+				// If check returns a valid event, exit the loop. A new watch will be placed on the next Retrieve()
+				err := check()
+				if err == nil || err != errNoOp {
+					time.Sleep(time.Second * 2)


Nit: this will result in reading the file 2 seconds after the first modification. A slightly better approach is to wait 2 seconds after the last modification. The difference is small but may be visible if we have small writers. It is probably fine for now.

tigrannajaryan · 2021-11-23T02:43:11Z

config/configmapprovider/file.go

 type fileMapProvider struct {
 	fileName string
+	watching bool


What's the purpose of this? It does't seem to be set anywhere.

fixed this one. this is to ensure that we create a watch only once as Retrieve is called each time the onChange is invoked during a file change.

tigrannajaryan · 2021-11-23T02:45:42Z

service/config_watcher.go

 	close(cm.watcher)
-	return cm.ret.Close(ctx)


Why is this deleted?

i have modified the code flow in a way that the config watcher is only created once in the lifecycle of the collector as compared to how it was originally implemented where a config watcher is created per change in the config file. once that change was made, it didnt make sense to close the retrieved and pass the error down. i moved that logic into a get() method that follows the Retrieve() -> watch -> Close() lifecycle.

i have modified the code flow in a way that the config watcher is only created once in the lifecycle of the collector

The new rearranged code is harder to follow and understand. Please refactor it to clearly show that the code follows lifecycle described in the Provider comments:

// The typical usage is the following: // // r := mapProvider.Retrieve() // r.Get() // // wait for onChange() to be called. // r.Close() // r = mapProvider.Retrieve() // r.Get() // // wait for onChange() to be called. // r.Close() // // repeat Retrieve/Get/wait/Close cycle until it is time to shut down the Collector process. // // ... // mapProvider.Shutdown()

It was more visible before this change, admittedly it was not ideal but was better than what we have now. Now it is even harder to see that we are actually following the required lifecycle. All the current loop in runAndWaitForShutdownEvent shows is a watch, followed by get().

bogdandrutu · 2021-11-23T17:55:11Z

Please rebase, and mark as resolved comments that are resolved.

codecov · 2021-11-24T07:46:02Z

Codecov Report

Merging #4454 (2132485) into main (adca4fb) will decrease coverage by 0.07%.
The diff coverage is 77.52%.

@@            Coverage Diff             @@
##             main    #4454      +/-   ##
==========================================
- Coverage   90.77%   90.70%   -0.08%     
==========================================
  Files         179      179              
  Lines       10412    10468      +56     
==========================================
+ Hits         9452     9495      +43     
- Misses        743      754      +11     
- Partials      217      219       +2

Impacted Files	Coverage Δ
service/collector.go	`73.91% <52.38%> (+0.02%)`	⬆️
service/config_watcher.go	`80.00% <79.16%> (-9.66%)`	⬇️
config/configmapprovider/file.go	`91.07% <88.37%> (-8.93%)`	⬇️
config/configmapprovider/properties.go	`89.65% <100.00%> (ø)`
config/configmapprovider/simple.go	`50.00% <0.00%> (-50.00%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update adca4fb...2132485. Read the comment docs.

Incorporate review comments Add sleep to the onChange

tigrannajaryan · 2021-12-01T14:05:53Z

config/configmapprovider/file.go

+		watchFile(ctx, fmp.fileName, onChange)
+		fmp.watching = true
+	}
+
 	return &simpleRetrieved{confMap: cp}, nil


I don't think this will work as expected by the Provider interface. The Retrieved that is returned is expected to implement a Close function that stops watching and guarantees that onChange will not be called after that. See

opentelemetry-collector/config/configmapprovider/provider.go

Line 76 in 96a882a

// Close signals that the configuration for which it was used to retrieve values is

tigrannajaryan · 2021-12-01T14:08:48Z

service/config_watcher.go

 	close(cm.watcher)
-	return cm.ret.Close(ctx)


i have modified the code flow in a way that the config watcher is only created once in the lifecycle of the collector

The new rearranged code is harder to follow and understand. Please refactor it to clearly show that the code follows lifecycle described in the Provider comments:

// The typical usage is the following: // // r := mapProvider.Retrieve() // r.Get() // // wait for onChange() to be called. // r.Close() // r = mapProvider.Retrieve() // r.Get() // // wait for onChange() to be called. // r.Close() // // repeat Retrieve/Get/wait/Close cycle until it is time to shut down the Collector process. // // ... // mapProvider.Shutdown()

It was more visible before this change, admittedly it was not ideal but was better than what we have now. Now it is even harder to see that we are actually following the required lifecycle. All the current loop in runAndWaitForShutdownEvent shows is a watch, followed by get().

github-actions · 2021-12-09T03:15:57Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2021-12-18T03:16:12Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2022-01-02T03:16:06Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

seh · 2022-01-14T15:10:31Z

This PR doesn't use an FS notify/inotify style watcher as we have seen that Kubernetes doesnt support such watches when a config is mounted as a config file.

Can you clarify why this doesn't work? Tools like jimmidyson/configmap-reload attempt to detect such changes, as does the Thanos reloader used with Prometheus.

seh · 2022-01-14T15:12:23Z

This capability could also help with #1591.

github-actions · 2022-01-29T03:15:45Z

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions · 2022-02-13T03:15:40Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

HankVeal12

https://www.wellsfargo.com/preselected/?lang=en

tigrannajaryan reviewed Nov 18, 2021

View reviewed changes

tigrannajaryan mentioned this pull request Nov 18, 2021

configmapprovider.File does not watch for file changes #4397

Closed

bogdandrutu reviewed Nov 18, 2021

View reviewed changes

config/configmapprovider/file_watch.go Outdated Show resolved Hide resolved

service/collector.go Outdated Show resolved Hide resolved

vjsamuel marked this pull request as ready for review November 19, 2021 06:44

vjsamuel requested review from a team and owais November 19, 2021 06:44

tigrannajaryan reviewed Nov 23, 2021

View reviewed changes

vjsamuel force-pushed the add_config_reload branch 2 times, most recently from 826531e to 3a36712 Compare November 24, 2021 07:45

vjsamuel added 2 commits November 28, 2021 23:32

Add file watch to support config reload on file change

52a9456

Incorporate review comments Add sleep to the onChange

correct usage of Retrieved and add change log

bbb008b

vjsamuel force-pushed the add_config_reload branch 6 times, most recently from e77a604 to f907a2d Compare November 29, 2021 08:13

Fix windows UT

2132485

vjsamuel force-pushed the add_config_reload branch from f907a2d to 2132485 Compare November 29, 2021 08:16

tigrannajaryan reviewed Dec 1, 2021

View reviewed changes

github-actions bot added the Stale label Dec 9, 2021

bogdandrutu removed the Stale label Dec 10, 2021

github-actions bot added the Stale label Dec 18, 2021

bogdandrutu removed the Stale label Dec 18, 2021

github-actions bot added the Stale label Jan 2, 2022

bogdandrutu removed the Stale label Jan 3, 2022

github-actions bot added the Stale label Jan 29, 2022

github-actions bot closed this Feb 13, 2022

HankVeal12 reviewed Feb 13, 2022

View reviewed changes

bogdandrutu mentioned this pull request Aug 8, 2022

Watch config file for changes #273

Closed

3 tasks

dmitryax mentioned this pull request Mar 12, 2023

Add a flag for enabling config watch #6300

Closed

cforce mentioned this pull request Oct 10, 2023

Capability to update otel configuration dynamically at runtime #4205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add file watch to support config reload on file change #4454

Add file watch to support config reload on file change #4454

vjsamuel commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

tigrannajaryan Nov 18, 2021

vjsamuel Nov 19, 2021

tigrannajaryan Nov 18, 2021

vjsamuel Nov 19, 2021

tigrannajaryan Nov 18, 2021

vjsamuel Nov 18, 2021

vjsamuel Nov 19, 2021

tigrannajaryan Nov 19, 2021

vjsamuel Nov 19, 2021

vjsamuel commented Nov 19, 2021

vjsamuel commented Nov 19, 2021

vjsamuel commented Nov 19, 2021

tigrannajaryan Nov 22, 2021

tigrannajaryan Nov 23, 2021

vjsamuel Nov 24, 2021

tigrannajaryan Nov 23, 2021

vjsamuel Nov 24, 2021

tigrannajaryan Dec 1, 2021

bogdandrutu commented Nov 23, 2021

codecov bot commented Nov 24, 2021 •

edited

Loading

tigrannajaryan Dec 1, 2021 •

edited

Loading

tigrannajaryan Dec 1, 2021

github-actions bot commented Dec 9, 2021

github-actions bot commented Dec 18, 2021

github-actions bot commented Jan 2, 2022

seh commented Jan 14, 2022

seh commented Jan 14, 2022

github-actions bot commented Jan 29, 2022

github-actions bot commented Feb 13, 2022

HankVeal12 left a comment

Add file watch to support config reload on file change #4454

Add file watch to support config reload on file change #4454

Conversation

vjsamuel commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vjsamuel commented Nov 19, 2021

vjsamuel commented Nov 19, 2021

vjsamuel commented Nov 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bogdandrutu commented Nov 23, 2021

codecov bot commented Nov 24, 2021 • edited Loading

Codecov Report

tigrannajaryan Dec 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Dec 9, 2021

github-actions bot commented Dec 18, 2021

github-actions bot commented Jan 2, 2022

seh commented Jan 14, 2022

seh commented Jan 14, 2022

github-actions bot commented Jan 29, 2022

github-actions bot commented Feb 13, 2022

HankVeal12 left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 24, 2021 •

edited

Loading

tigrannajaryan Dec 1, 2021 •

edited

Loading