Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: /etc/nsswitch.conf auto-reloading #56515

Closed
mateusz834 opened this issue Nov 1, 2022 · 19 comments
Closed

net: /etc/nsswitch.conf auto-reloading #56515

mateusz834 opened this issue Nov 1, 2022 · 19 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@mateusz834
Copy link
Member

Glibc recently started to auto-reload the /etc/nsswitch.conf (https://sourceware.org/bugzilla/show_bug.cgi?id=12459, since Glibc 2.33). It rechecks the content of the nsswitch.conf on every call to the getaddrinfo.
I think that we should align the behavior and do something similar. Maybe just like with /etc/hosts and /etc/resolv.conf check for updates every 5s?

@mateusz834
Copy link
Member Author

mateusz834 commented Nov 1, 2022

The man page of nsswitch.conf still documents the old behaviour:

Within each process that uses nsswitch.conf, the entire file is read only once. If the file is later changed, the process will continue using the old configuration.

But I checked it and the auto-reloading takes place.

Edit: https://sourceware.org/bugzilla/show_bug.cgi?id=29750

@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 1, 2022
@seankhliao
Copy link
Member

cc @neild

@thediveo
Copy link

thediveo commented Nov 1, 2022

absolute nightmare. having lots of go-based process all checking some file every few seconds. in many containers (where they never change after engine developers learnt their lessons the hard way). apart from not being able to properly detect when changes to these files are fully committed.

I'm willing to hold your beer.

@mateusz834
Copy link
Member Author

mateusz834 commented Nov 1, 2022

@thediveo Yep, But consider the other case, someone changed the nsswitch.conf to use a new nss source, then half of the system is using the new config, the other half is using the old configuration (not ideal situation). When dynamic configuration of the net resolver is being used we should as much as possible emulate the glibc behaviour to avoid inconsistencies. (this is my opinion)

@thediveo
Copy link

thediveo commented Nov 1, 2022

@thediveo Yep, But consider the other case, someone changed the nsswitch.conf to use a new nss source, then half of the system is using the new config, half is using the old configuration (not ideal situation). When dynamic configuration of the net resolver is being used we should as much as possible emulate the glibc behaviour to avoid inconsistencies. (this is my opinion)

Yes, I did this as we actually had this discussion a few weeks ago while trying to hunt down problems with a corporate cloud-based endpoint security thingie. There's a reason why systemd-resolved operates by providing as stable /etc/resolv.conf and then not changing anything except within the resolver itself. BTW, this cloud-based sec thingie water-sheds over the system configuration files and then users wonder why things start to fail in a seemingly random manner.

Look at the lessons Docker learnt, there are issues about corrupted /etc/hosts. I'm considering these too. Docker thus uses an embedded intermediate resolver to ensure stable and reliable operation.

There's a pattern here: don't mess with configuration files that simply don't age well in a container workload world. Provide intermediate DNS resolver services that shield the changes.

Could it be that you mix up the nsswitch elements with the specific DNS resolver configuration? IIRC go hasn't anything like nsswitch implemented in its native resolver. Otherwise you opt in to the glibc based resolver implementation anyway, and let's not start talking about musl based distributions.

Bottom line, we considered a great many use cases in our company and I'm afraid of what gets proposed here as it is an immediate thread to a reliable operation for a functionality that isn't even used nor needed in many cases, not least cloud/container workloads.

My offer still holds to hold your beer. Can I give your contact details to our operations fire bridgade?

@seankhliao
Copy link
Member

if you need glibc behaviour, maybe you should just use glibc (run with the cgo resolver).

@mateusz834
Copy link
Member Author

IIRC go hasn't anything like nsswitch implemented in its native resolver. Otherwise you opt in to the glibc based resolver implementation

It is using it to decide whether to use cgo or the go resolver (when compiled with cgo).

if you need glibc behaviour, maybe you should just use glibc (run with the cgo resolver).

Not exactly, it requires the use of GODEBUG=netdns=cgo.

@ianlancetaylor
Copy link
Contributor

@thediveo Go's resolver does already parse the /etc/nsswitch.conf file on Unix systems. The reload check is a single stat system call. Doesn't seem that terrible. Note that the Go resolver already does a similar check for /etc/resolv.conf, though it only makes the stat calls once every 5 seconds.

@neild
Copy link
Contributor

neild commented Nov 1, 2022

Matching the behavior of checking /etc/resolv.conf and checking for a change to /etc/nsswitch.conf at most once every five seconds seems reasonable to me.

Is there an equivalent to the resolv.conf no-reload option?

@thediveo
Copy link

thediveo commented Nov 2, 2022

What about the atimr nodes being constantly updated: is this still a problem on flash file systems?

Having lots of workload process all uselessly checking every 5s some file that never changes cause lots of cache trashing and cpu spiking for no reason.

@mateusz834
Copy link
Member Author

mateusz834 commented Nov 2, 2022

@thediveo It is not going to be a special goroutine that will time.Sleep(5 * time.Second) and stat, this check will only happen when you use LookupHost (and others) and it finds out that the nsswitch.conf was last checked more than 5s ago. Same as with /etc/hosts and /etc/resolv.conf.

@mateusz834
Copy link
Member Author

mateusz834 commented Nov 2, 2022

We now have two different implementation for auto-reloading (/etc/hosts and /etc/resolv.conf), adding a third one might be a bit messy. I think that we should implement a new net internal package for this kind of rechecking, which we will use for the /etc/nsswitch.conf and maybe in the future migrate the /etc/hosts and /etc/resolv.conf to it.
I was thinking about something like (net/internal/rechecker):

type Rechecker[T any] struct {
	File     string
	Parse    func(content []byte) (*T, error)
	Duration time.Duration

        // plus few unexported fields for the implementation
}
// Get on the initial call reads r.File and calls r.Parse with the contents of that file.
// On next calls when more than r.Duration has passed, Get stats the r.File to detect
// changes to it, when it is modified it calls r.Parse again. Get is safe to call concurrently.
func (r *Rechecker[T]) Get() (*T, error)

Thoughts?

@thediveo
Copy link

thediveo commented Nov 2, 2022

how to you ensure you get a complete updated file and aren't reading at the same time the file gets updated?

@mateusz834
Copy link
Member Author

mateusz834 commented Nov 2, 2022

@thediveo I am expecting the rename trick to be used while updating files. How would I be able to detect that someone else is writing to the same file at the same time?? Most text editors use the rename trick so I don't consider it to be a huge problem. Currently the auto-updating of /etc/resolv.conf and /etc/hosts does not have such defense.

@thediveo
Copy link

thediveo commented Nov 2, 2022

but then why did Docker had so much trouble, moby/moby#17190?

@mateusz834
Copy link
Member Author

@thediveo moby/moby#17190 (comment)
It looks like they were not using the rename trick, just directly writing to the /etc/hosts file.

@thediveo
Copy link

thediveo commented Nov 2, 2022

ah, thanks; the rename is probably impossible on a bind mount.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/448075 mentions this issue: net: auto-reload the /etc/nsswitch.conf on unix systems

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/451420 mentions this issue: doc/go1.20: add release notes for net package

gopherbot pushed a commit that referenced this issue Nov 18, 2022
For #50101
For #51152
For #53482
For #55301
For #56515

Change-Id: I11edeb4be0a7f80fb72fd7680a3407d081f83b8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/451420
TryBot-Result: Gopher Robot <[email protected]>
Auto-Submit: Damien Neil <[email protected]>
Run-TryBot: Damien Neil <[email protected]>
Reviewed-by: Ian Lance Taylor <[email protected]>
@golang golang locked and limited conversation to collaborators Nov 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants