Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-iam-authenticator: nodes of role master fail to build 17% of the time #9580

Closed
johngmyers opened this issue Jul 16, 2020 · 2 comments · Fixed by #9581
Closed

aws-iam-authenticator: nodes of role master fail to build 17% of the time #9580

johngmyers opened this issue Jul 16, 2020 · 2 comments · Fixed by #9581

Comments

@johngmyers
Copy link
Member

johngmyers commented Jul 16, 2020

1. What kops version are you running? The command kops version, will display
this information.

Private build off of master branch

2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.

1.19.0-rc.1

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

Terminated master. Got unlucky.

5. What happened after the commands executed?

New master failed to come up. Saw in logs:

Jul 16 05:29:34 ip-172-20-32-91 nodeup[1886]: I0716 05:29:34.932799    1886 user.go:97] Creating user "kube-apiserver-healthcheck"
Jul 16 05:29:34 ip-172-20-32-91 nodeup[1886]: I0716 05:29:34.932891    1886 user.go:99] running command: useradd -u 10012 -s /sbin/nologin -d /etc/kubernetes/kube-apiserver-healthcheck/secrets kube-apiserver-healthcheck
Jul 16 05:29:34 ip-172-20-32-91 groupadd[1915]: group added to /etc/group: name=docker, GID=998
Jul 16 05:29:35 ip-172-20-32-91 groupadd[1915]: group added to /etc/gshadow: name=docker
Jul 16 05:29:35 ip-172-20-32-91 groupadd[1915]: new group: name=docker, GID=998
Jul 16 05:29:35 ip-172-20-32-91 useradd[1916]: new group: name=user, GID=10012
Jul 16 05:29:35 ip-172-20-32-91 useradd[1916]: new user: name=user, UID=10012, GID=10012, home=/var/etcd, shell=/sbin/nologin, from=none
Jul 16 05:29:35 ip-172-20-32-91 useradd[1918]: failed adding user 'kube-apiserver-healthcheck', data deleted

Subsequent attempts all failed:

Jul 16 05:29:46 ip-172-20-32-91 nodeup[1886]: W0716 05:29:46.302066    1886 executor.go:128] error running task "UserTask/kube-apiserver-healthcheck" (9m48s remaining to succeed): error creating user: exit status 4
Jul 16 05:29:46 ip-172-20-32-91 nodeup[1886]: Output: useradd: UID 10012 is not unique

6. What did you expect to happen?

nodeup to complete

7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.

n/a

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

n/a

9. Anything else do we need to know?

EtcdBuilder doesn't specify a uid for the user user. If it needs such a user, it should.

@johngmyers
Copy link
Member Author

This was probably triggered by trying to use uid 10011 for kops-controller in my private build. That might be what user is normally assigned.

@johngmyers
Copy link
Member Author

Looks like when user runs after another UserTask it gets assigned the next UID. So this is a race which happens when there are at least three UserTasks. So it probably also happes to people who use aws-iam-authenticator.

@johngmyers johngmyers changed the title Etcd user created with uid conflicting with kube-apiserver-healthcheck aws-iam-authenticator: nodes of role master fail to build 17% of the time Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant