Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update rhcos to 42.80.20190827.1 #766

Merged
merged 1 commit into from
Aug 29, 2019
Merged

Conversation

stbenjam
Copy link
Member

We'll need this when openshift/installer#2264 lands.

Eventually we can avoid this if and when openshift/installer#2092 lands, otherwise there's not a convenient way to get the data out of the installer.

@hardys
Copy link

hardys commented Aug 27, 2019

I've been testing this locally, and the bootstrap VM doesn't come up correctly - still trying to figure out why, but it's also been replicated by @karmab and a few other folks - reverting openshift/installer to 0454021 appears to resolve the issue so I suspect some badness in the new qemu image

@stbenjam stbenjam added the CI check this PR with CI label Aug 27, 2019
@metal3ci
Copy link

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/1099/

@stbenjam
Copy link
Member Author

stbenjam commented Aug 27, 2019

On the latest rhcos the code that keeps retrying waiting for etcd cluster to come up isn't working...

Aug 27 18:22:46 localhost bootkube.sh[1669]: https://etcd-0.ostest.test.metalkube.org:2379 is unhealthy: f
ailed to connect: dial tcp: lookup etcd-0.ostest.test.metalkube.org on 192.168.111.1:53: no such host
Aug 27 18:22:46 localhost bootkube.sh[1669]: Error: unhealthy cluster
Aug 27 18:22:46 localhost bootkube.sh[1669]: etcd cluster up. Killing etcd certificate signer...

This should retry:

https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L272-L288

but it doesn't.

@stbenjam
Copy link
Member Author

stbenjam commented Aug 27, 2019

I can verify it's exiting 0, so we never retry:

# bootkube_podman_run \
>                 --rm \
>                 --name etcdctl \
>                 --env ETCDCTL_API=3 \
>                 --volume /opt/openshift/tls:/opt/openshift/tls:ro,z \
>                 --entrypoint etcdctl \
>                 6f3f175b53ca \
>                 --dial-timeout=5s \
>                 --cacert=/opt/openshift/tls/etcd-ca-bundle.crt \
>                 --cert=/opt/openshift/tls/etcd-client.crt \
>                 --key=/opt/openshift/tls/etcd-client.key \
>                 --endpoints=https://nonsense.example:2379 \
>                 endpoint health
https://nonsense.example:2379 is unhealthy: failed to connect: dial tcp: lookup nonsense.example on 127.0.0.1:53: no such host
Error: unhealthy cluster
[root@localhost openshift]# echo $?
0

Here's the full debug logs:

https://gist.github.com/stbenjam/24bf10326664ddee968df407c716e20b

podman version:

[root@localhost openshift]# podman --version
podman version 1.4.2-stable1

I don't notice any changes to etcd, it's last commit is from July.

@stbenjam
Copy link
Member Author

Interactively in the container, it does return 1:

[root@localhost /]# etcdctl --dial-timeout=5s --cacert=/opt/openshift/tls/etcd-ca-bundle.crt --cert=/opt/openshift/tls/etcd-client.crt --key=/opt/openshift/tls/etcd-client.key --endpoints=https://nonsense.example:2379  endpoint health
https://nonsense.example:2379 is unhealthy: failed to connect: dial tcp: lookup nonsense.example on 127.0.0.1:53: no such host
Error: unhealthy cluster
[root@localhost /]# echo $?
1

@stbenjam
Copy link
Member Author

Summary:

@stbenjam
Copy link
Member Author

stbenjam commented Aug 28, 2019

42.80.20190827.1 is out with the fixes, got a good local install.

openshift/installer#2277 bumps the installer rhcos.

@stbenjam stbenjam changed the title Update rhcos to 42.80.20190823.0 Update rhcos to 42.80.20190827.1 Aug 28, 2019
@metal3ci
Copy link

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/1103/

@metal3ci
Copy link

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/1104/

@metal3ci
Copy link

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/1105/

@hardys
Copy link

hardys commented Aug 28, 2019

Ok with openshift/installer#2277 and rebased to include #775 this is working for me locally 👍

@stbenjam thanks for all the effort yesterday debugging the podman issues! :)

ocp_install_env.sh Outdated Show resolved Hide resolved
@metal3ci
Copy link

Build FAILURE, see build http://10.8.144.11:8080/job/dev-tools/1107/

@metal3ci
Copy link

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/1108/

@hardys hardys self-requested a review August 28, 2019 18:14
@hardys
Copy link

hardys commented Aug 28, 2019

Proven locally and CI so should be good to go when the installer PR lands and we remove the temporary stuff here to test it 👍

@stbenjam
Copy link
Member Author

Great thanks! openshift PR landed. We should get a nightly with that fix in about an hour.

@hardys
Copy link

hardys commented Aug 29, 2019

Looks like the nightly picked up the new image so I removed the changes to run the installer from source and force-pushed - I then saw #778 which looks good too, we can discuss later which approach you'd prefer to go with.

@metal3ci
Copy link

Build SUCCESS, see build http://10.8.144.11:8080/job/dev-tools/1111/

@russellb
Copy link
Member

Since CI passed here, let’s merge this. #778 can go in on top of this later

@russellb russellb merged commit 88a71ad into openshift-metal3:master Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI check this PR with CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants