Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: kubernetes: 1.19.5 -> 1.20.1 #109275

Closed
wants to merge 2 commits into from
Closed

Conversation

ymatsiuk
Copy link
Contributor

@ymatsiuk ymatsiuk commented Jan 13, 2021

Motivation for this change

upversion

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, did you had a change to run the tests locally @ymatsiuk ? :)

@johanot
Copy link
Contributor

johanot commented Jan 13, 2021

Thanks for the PR! I haven't read the 1.20-release notes yet, and thus I don't know whether there's something breaking in there we should include in the NixOS release notes.

Also, we should ensure that the kubernetes nixos-tests pass. I tried running them locally against your branch: nix-build ./nixos/release.nix -A tests.kubernetes.rbac.singlenode -A tests.kubernetes.rbac.multinode -A tests.kubernetes.dns.singlenode -A tests.kubernetes.dns.multinode, initial attempts failed.

Please ensure that tests run clean before merging :)

@ymatsiuk
Copy link
Contributor Author

seems like etcd doesn't start. let me dig into the test. this is what I've got:

machine1 # [   12.101119] kube-addon-manager-pre-start[791]: Error in configuration:
machine1 # [   12.117839] kube-addon-manager-pre-start[791]: * unable to read client-cert /var/lib/kubernetes/secrets/cluster-admin.pem for cluster-admin due to open /var/lib/kubernetes/secrets/cluster-admin.pem: no such file or directory
machine1 # [   12.152365] kube-addon-manager-pre-start[791]: Error in configuration:
machine1 # * unable to read client-cert /var/lib/kubernetes/secrets/cluster-admin.pem for cluster-admin due to open /var/lib/kubernetes/secrets/cluster-admin.pem: no such file or directory
machine1 # * unable to read client-key /var/lib/kubernetes/secrets/cluster-admin-key.pem for cluster-admin due to open /var/lib/kubernetes/secrets/cluster-admin-key.pem: no such file or directory
machine1 # * unable to read certificate-authority /var/lib/kubernetes/secrets/ca.pem for local due to open /var/lib/kubernetes/secrets/ca.pem: no such file or directory
machine1 # * unable to read client-key /var/lib/kubernetes/secrets/cluster-admin-key.pem for cluster-admin due to open /var/lib/kubernetes/secrets/cluster-admin-key.pem: no such file or directory
machine1 # [   12.211946] kube-addon-manager-pre-start[791]: * unable to read certificate-authority /var/lib/kubernetes/secrets/ca.pem for local due to open /var/lib/kubernetes/secrets/ca.pem: no such file or directory
machine1 # [   12.258458] etcd[783]: recognized and used environment variable ETCD_NAME=machine1.my.zyx
machine1 # [   12.289704] dhcpcd[635]: [   12.438440] NET: Registered protocol family 17
machine1 # eth0: soliciting a DHCP lease
machine1 # [   12.347200] etcd[783]: recognized and used environment variable ETCD_PEER_CERT_FILE=/var/lib/kubernetes/secrets/etcd.pem
machine1 # [   12.372703] dhcpcd[635]: eth0: offered 10.0.2.15 from 10.0.2.2
machine1 # [   12.419357] etcd[783]: recognized and used environment variable ETCD_PEER_KEY_FILE=/var/lib/kubernetes/secrets/etcd-key.pem
machine1 # [   12.444483] dhcpcd[635]: eth0: leased 10.0.2.15 for 86400 seconds
machine1 # [   12.465023] etcd[783]: recognized and used environment variable ETCD_PEER_TRUSTED_CA_FILE=/var/lib/kubernetes/secrets/ca.pem
machine1 # [   12.489775] dhcpcd[635]: eth0: adding route to 10.0.2.0/24
machine1 # [   12.516009] etcd[783]: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/var/lib/kubernetes/secrets/ca.pem
machine1 # [   12.540971] dhcpcd[635]: eth0: adding default route via 10.0.2.2
machine1 # [   12.565945] etcd[783]: unrecognized environment variable ETCD_DISCOVERY=
machine1 # [   12.590926] etcd[783]: etcd Version: 3.3.25
machine1 #
machine1 # [   12.617981] etcd[783]: Git SHA: GitNotFound
machine1 #
machine1 # [   12.656951] etcd[783]: Go Version: go1.15.6
machine1 #
machine1 # [   12.674554] etcd[783]: Go OS/Arch: linux/amd64
machine1 #
machine1 # [   12.704185] etcd[783]: setting maximum number of CPUs to 1, total number of available CPUs is 1
machine1 # [   12.734944] etcd[783]: failed to detect default host (could not find default route)
machine1 # [   12.756655] etcd[783]: peerTLS: cert = /var/lib/kubernetes/secrets/etcd.pem, key = /var/lib/kubernetes/secrets/etcd-key.pem, ca = , trusted-ca = /var/lib/kubernetes/secrets/ca.pem, client-cert-auth = false, crl-file =
machine1 # [   12.807299] etcd[783]: open /var/lib/kubernetes/secrets/etcd.pem: no such file or directory
machine1 # [   12.852314] systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
machine1 # [   12.890346] systemd[1]: etcd.service: Failed with result 'exit-code'.
machine1 # [   12.913368] systemd[1]: Failed to start etcd key-value store.

if etcd doesn't start than it doesn't make sense to run the rest of the test

@johanot
Copy link
Contributor

johanot commented Jan 13, 2021

@ymatsiuk

seems like etcd doesn't start

It looks like the PKI has not yet been bootstrapped. Note that this is normal behavior, i.e. daemons might restart a couple of times until perquisites are met.

it doesn't make sense to run the rest of the test

In this case, you'll probably have to wait until certmgr has provisioned the certificates for the cluster.

@ymatsiuk
Copy link
Contributor Author

@johanot thanks for helping out. I'm trying to figure out how successful tests look like and it seems like they fail for nixos-20.09 as well

@ymatsiuk
Copy link
Contributor Author

first issue is:

machine1 # [  107.748194] kube-apiserver[2185]: Error: [service-account-issuer is a required flag, --service-account-signing-key-file and --service-account-issuer are required flags]

digging into the module

@SuperSandro2000
Copy link
Member

This is a semi-automatic executed nixpkgs-review which does not build all packages (e.g. lumo, tensorflow or pytorch)
If you find some bugs or got suggestions for further things to search or run please reach out to SuperSandro2000 on IRC.

Result of nixpkgs-review pr 109275 run on x86_64-darwin 1

2 packages built:
  • kubectl
  • kubernetes

@ymatsiuk ymatsiuk changed the title kubernetes: 1.19.5 -> 1.20.1 WIP: kubernetes: 1.19.5 -> 1.20.1 Jan 13, 2021
@ymatsiuk
Copy link
Contributor Author

rbac tests are passing now. digging into dns. Currently I see this error:

machine1: waiting for success: kubectl get pods -n kube-system | grep 'coredns.*1/1'
machine1 # No resources found in kube-system namespace.

@ofborg ofborg bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Jan 14, 2021
@johanot
Copy link
Contributor

johanot commented Feb 19, 2021

@ymatsiuk I finally got the time to look at this, sorry it took so long.

I believe these two commits fixes the coredns deployment issues:

johanot@e839f26
(ref: coredns/coredns#3737)

and

johanot@4a821aa
(ref: kubernetes/kubernetes#99232)

You are welcome to cherry-pick these onto your branch.

Can you please also bump this to kubernetes v1.20.4 (in a separate commit) and maybe rebase with nixos-unstable?

Hopefully we can then get this show on the road :)

@blaggacao
Copy link
Contributor

blaggacao commented Mar 1, 2021

We also have to keep an eye on kubernetes/kubernetes#98946targeted for 1.21 code freeze — scheduled for release on 8th of april.

@blaggacao
Copy link
Contributor

@johanot why should nixos have same-PR priority as pkgs? A strong argument against it is that nixos is only one possible downstream consumer (yes, it is a downstream consumer) and the longer this takes to merge the more we penalize other downstream users that do have less of a voice on this repo.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/wild-idea-how-about-splitting-nixpkgs-and-nixos/11487/22

@ymatsiuk
Copy link
Contributor Author

ymatsiuk commented Mar 1, 2021

Guys, apologies for the delay, I've been busy lately with other job related tasks. @blaggacao thanks for picking this up.

@johanot
Copy link
Contributor

johanot commented Mar 7, 2021

superseded by #114737

@johanot johanot closed this Mar 7, 2021
@ymatsiuk ymatsiuk deleted the kubernetesbump branch May 19, 2021 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 1-10 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants