Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release dependency management improvements umbrella #601

Open
tpepper opened this issue Apr 23, 2019 · 26 comments
Open

Release dependency management improvements umbrella #601

tpepper opened this issue Apr 23, 2019 · 26 comments
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/release Categorizes an issue or PR as relevant to SIG Release.
Milestone

Comments

@tpepper
Copy link
Member

tpepper commented Apr 23, 2019

There is an ongoing need for better managing external dependencies.

The release team regularly scrambles to collect the current preferred dependency versions. These are inconsistently articulated in multiple files across multiple repos in non-machine-readable ways. And some are even untracked outside of anecdotal lore.

Various prior issues have been opened, for example #400 and this regularly comes up in release retrospectives.

SIG Release needs to draft a KEP for implementation by the release team to outline the problem space, possible solutions. We need a machine readable, structured, single source of truth. It should have a broad OWNERS set to get wide review on changes and not be blocked on a small set of reviewers. Code in the project that needs to get “etcd” should get the version specified in this file. Release notes should draw from this file and its changelog. A PR changing a dependency in this file might get a special label, insured release notes inclusion, and special review. Special review can be needed to insure one group doesn't upgrade for a fix, introduce a regression in some other code, those owners revert the upgrade, re-introducing the prior bug (this has actually happened multiple times).

One potential problem with this approach, which has been a past blocker, is that this could mean work in a sub-project repo requires checking out some other repo in order to get this hypothetical yaml saying what are the preferred versions.

@tpepper tpepper added the area/release-team Issues or PRs related to the release-team subproject label Apr 23, 2019
@tpepper tpepper added this to the v1.15 milestone Apr 23, 2019
@jeefy
Copy link
Member

jeefy commented Apr 23, 2019 via email

@justaugustus
Copy link
Member

/area release-eng
/priority important-soon

@k8s-ci-robot k8s-ci-robot added area/release-eng Issues or PRs related to the Release Engineering subproject priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels May 1, 2019
@claurence
Copy link

Circling back to this item - I expressed interest in helping out here - I mostly care for the purposes of 1.15 for defining what is the list of dependencies that we need to care about

I can start a draft of a KEP for what are those dependencies

@yastij
Copy link
Member

yastij commented Jun 4, 2019

/assign

@figo
Copy link

figo commented Jun 5, 2019

/cc

@justaugustus
Copy link
Member

Initial PR for discussion here: kubernetes/kubernetes#79366

@justaugustus
Copy link
Member

Notice sent to k-dev, @kubernetes/sig-release, @kubernetes/release-team, and @kubernetes/release-engineering regarding the merged changes in kubernetes/kubernetes#79366: https://groups.google.com/d/topic/kubernetes-dev/cTaYyb1a18I/discussion

@yastij
Copy link
Member

yastij commented Jul 11, 2019

also error message improvements here: kubernetes/kubernetes#80060

@idealhack idealhack modified the milestones: v1.15, v1.16 Jul 29, 2019
@justaugustus
Copy link
Member

build/external: Move dependencies.yaml and update OWNERS - kubernetes/kubernetes#80799

@tpepper
Copy link
Member Author

tpepper commented Aug 19, 2019

I propose we un-milestone 1.16 this umbrella issue and remove the area release team, assuming that the release notes team for 1.16 (@saschagrunert @onyiny-ang @cartyc @kcmartin @paulbouwer ) have the dependencies.yaml file documented and codified as the source of info for the dependencies section https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#dependencies

It may also be about time to go ahead and close this as complete for the first go and move from the giant long lived umbrella issue to smaller point issues (like is happening already above) for incremental improvement.

@saschagrunert
Copy link
Member

saschagrunert commented Aug 19, 2019

Thanks for the hint, I assume that we still update the release notes dependency section manually for 1.16. :)

That may be a bit out of scope, but I wrote a tool some time ago to diff go modules between git releases automatically: https://github.com/saschagrunert/go-modiff

@lachie83
Copy link
Member

/remove-area area/release-team

@k8s-ci-robot
Copy link
Contributor

@lachie83: Those labels are not set on the issue: area/area/release-team

In response to this:

/remove-area area/release-team

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lachie83
Copy link
Member

/remove-area release-team

@k8s-ci-robot k8s-ci-robot removed the area/release-team Issues or PRs related to the release-team subproject label Aug 21, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 19, 2019
@justaugustus justaugustus removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 4, 2019
@justaugustus justaugustus modified the milestones: v1.16, v1.18 Dec 4, 2019
@justaugustus justaugustus added the sig/release Categorizes an issue or PR as relevant to SIG Release. label Dec 9, 2019
@justaugustus
Copy link
Member

/lifecycle frozen
/unassign @claurence

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Feb 6, 2020
@justaugustus
Copy link
Member

@Pluies reached out to me before the holidays with this:

Hi Stephen!

Florent here, I wanted to get in touch to thank you for sharing this – I've been thinking about the "pinned infra dependencies" problem for a long time, and really enjoyed reading about the way Kubernetes deal with this!

I've used it as an inspiration to write Zeitgeist, a language-agnostic dependency checker: https://github.com/Pluies/zeitgeist

It includes the "dependencies declaration in yaml" and "checking all occurrences of dependencies are in sync" feature of verifydependencies.go, and extends it with a way to check if the current version is up-to-date with its upstream (which could be releases in a Github repo, a Helm chart...). Upstreams are based on a plugin system, so more types of upstreams can be added as desired.

Let me know if this is something that could be of interest for the k8s project, I'd be happy to help with the integration (which should be pretty much drop-in). :)

Cheers,
Florent Delannoy

What do we think about using zeitgeist?

cc: @dims @cblecker @BenTheElder @liggitt

@BenTheElder
Copy link
Member

Other than vendor/ I don't think the original post makes much sense with code like kuebadm going out of tree.

etcd is not specified in tree by anything other than cluster provisioning tooling, which we have issues open about removing from the tree.

vendor/ already has an established dependency review system, and I don't think it needs any new tooling.

what other dependencies are we talking about?

@BenTheElder
Copy link
Member

I would also note that in order to maintain a tool that brings up clusters you pretty much need the freedom to update dependencies at will. We do not force all cluster tools to synchronize on some specific version of e.g. containerd today, and I would not be in favor of doing so in the future.

@tpepper
Copy link
Member Author

tpepper commented Feb 10, 2020

I agree with @BenTheElder that users (and vendors) need the ability to override project preferred defaults, if that's what was stated ;) My primary point is we need a stronger definition of "project preferred defaults". We do have these sprinkled around the code. We do bring up clusters, intentionally with certain components and component versions, and run tests with intention of proving specific combinations. We observe and fix real bugs relative to specific external non-golang dependency name/version/release tuples.

At some point we, the collective us as a community, need to understand what we're engineering, coding and testing against, and giving "support". IMO we should do that more strongly.
(Also am open to conversation around if we could also not do that and assume vendors will manage that in a sufficiently coherent way, or expect that the dependencies don't have incompatible skews.)

From the older example linked above, there was a time where we tried to track more:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.15.md#dependencies

Since then the list of non-go-modules dependencies which are tracked is down to golang and etcd:
https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.17.md#dependencies
If I were to read the difference between that 1.15 and 1.17 list, might I infer that Kubernetes 1.17 and higher now run fine with any cri-tools, cluster autoscaler, cadvisor, CNI, CSI, klog, etc. I'd love for the ecosystem of projects to be stable enough that we don't need to actively track in detail. Yet patches to some of those dependencies' version-in-use are frequently proposed for cherry-pick on release branches, which I take as evidence we do seem to track.

Another point that changelog shows is that we don't have a canonical source of truth. The user-focused message there is coming from the long series of commits and gives the sum of those (in arbitrary order?):

Update etcd client side to v3.4.3 (#83987, @wenjiaswe)
Kubernetes now requires go1.13.4+ to build (#82809, @liggitt)
Update to use go1.12.12 (#84064, @cblecker)
Update to go 1.12.10 (#83139, @cblecker)
Update default etcd server version to 3.4.3 (#84329, @jingyih)

Also interesting to me: K3s proves one may not even need etcd at all (or via a small patch anyway, and with a set of caveats on runtime robustness).

The kubeadm departure from k/k is an interesting case. Since they intend to branch and version with k/k, are they implicitly following any k/k's implicit dependencies? They (and other installers) do actively track some dependencies.

I'm open to arguments that it's not on us to manage. As the monolith splits and toward a more loosely coupled future, can we argue there is no longer a need for common base expectations? To me the split feels like it makes worse the potential for unmanaged risk of implicit dependencies and end-user confusion.

@neolit123
Copy link
Member

The kubeadm departure from k/k is an interesting case. Since they intend to branch and version with k/k, are they implicitly following any k/k's implicit dependencies?

yes, unless something is broken.

They (and other installers) do actively track some dependencies.

most installers usually trail behind.

@BenTheElder
Copy link
Member

+1 @neolit123

@tpepper :

My primary point is we need a stronger definition of "project preferred defaults". We do have these sprinkled around the code. We do bring up clusters, intentionally with certain components and component versions, and run tests with intention of proving specific combinations. We observe and fix real bugs relative to specific external non-golang dependency name/version/release tuples.

IMO project preferred defaults is a problematic topic for political rather than technical reasons.

Are we going to start advertising preferred CRI and CNI ...?

At some point we, the collective us as a community, need to understand what we're engineering, coding and testing against, and giving "support". IMO we should do that more strongly.

We don't provide support for external tools. Doing so is perhaps not the best idea.

Complete solutions like kops, minikube, kind etc. do package some external tools necessarily and provide their own support there, but for kubernetes to do so seems like a mis-step unless we're prepared to pick a favorite for each option...

If I were to read the difference between that 1.15 and 1.17 list, might I infer that Kubernetes 1.17 and higher now run fine with any cri-tools, cluster autoscaler, cadvisor, CNI, CSI, klog, etc. I'd love for the ecosystem of projects to be stable enough that we don't need to actively track in detail. Yet patches to some of those dependencies' version-in-use are frequently proposed for cherry-pick on release branches, which I take as evidence we do seem to track.

Cluster autoscaler should advertise it's own compatibility with kubernetes and not vice versa, as should CNI implementations and CSI implementations etc. klog is ??? not an issue??

Also interesting to me: K3s proves one may not even need etcd at all (or via a small patch anyway, and with a set of caveats on runtime robustness).

Zero patches away, you can "simply" implement the etcd wire protocol but there are some problems there that I'd rather discuss in another forum :+)

@tpepper
Copy link
Member Author

tpepper commented Feb 11, 2020

IMO project preferred defaults is a problematic topic for political rather than technical reasons.

Are we going to start advertising preferred CRI and CNI ...?

In as much as there are classes of interfaces or providers, as an open source project with limited resources I feel like we have a few paths:

  1. Treat one as a reference implementation with NVR declared programmatically and used consistently in test variations. If politics is the worry, this could be worst case. Drop the admittedly awkward word "preferred" and with an open mind this is the easiest for us to rationalize that our project is functional when integrated with something. Simplest test matrix. As long as some of us can keep the one thing running, we're good. In the face of an issue, an {alternate?, non-preferred?, non-default?, something politically correct?) provider has the onus to demonstrate whether the issue is theirs or upstream, fix their issues and get involved in upstream ones where applicable on behalf of their customers.
  2. Have multiple in test. Better than option 1 as it gives A/B comparison across reference implementations. Slippery slope of test matrix. Still has politics: "You let T, U, V in, therefore now you must take a patch for my X too". Odds become good that some of these aren't actually kept in a working state. We spend a lot of cycles trying to specialize beyond our specialization to understand vendor specifics. Can our community rationalize much about a bug that only shows in some deployment variations? Or meh the vendors will fix it and let us know if not? This gets expensive in CI. This gets expensive in human dev/test/debug time. Overall CI health is fuzzy on average.
  3. Have all variations in test. No politics here. Are they all actually in a working state though? Can our community rationalize much about a bug that only shows on some of them? If somebody wants to pay for the CI and staff the engineers...

Seriously though the latter is obviously highly unlikely to happen. The middle is where we are now. The first is simpler but for the choice of which.

At some point we, the collective us as a community, need to understand what we're engineering, coding and testing against, and giving "support". IMO we should do that more strongly.

We don't provide support for external tools. Doing so is perhaps not the best idea.

We don't support external tools, but we debug problem reports. We support our code running in conjunction with external components both in CI and we welcome end-users' problem reports. That requires our finite resources have an understanding of and ability to debug a not-very-finite set of runtime combinations. Can we actively manage that complexity or must it be a free for all?

I feel like if we declare the things we run in test, do that in common (across the org?), reduce the size of the test matrix, then we can have more realistic conversations about what is containable beyond a simple, common short list of variations and at what cost. We feel out of balance and unsustainable where we are today.

@tpepper
Copy link
Member Author

tpepper commented Aug 3, 2020

Relates IMO partly to conversation in kubernetes/test-infra#18551 and #966 around establishing more clean test plan.

@justaugustus justaugustus removed this from the v1.18 milestone Mar 31, 2021
@justaugustus justaugustus added this to the v1.22 milestone Mar 31, 2021
@saschagrunert saschagrunert modified the milestones: v1.22, v1.23 Sep 6, 2021
@justaugustus
Copy link
Member

/unassign
/help

@k8s-ci-robot
Copy link
Contributor

@justaugustus:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/unassign
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Dec 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests