Prepare initial Zuul CI setup #9103

webknjaz · 2020-11-04T23:38:06Z

This is a spin-off of #7279 where folks pre-agreed to explore the possibility of extending pip testing experience with external Zuul CI resources. Let's use this issue to coordinate this effort.

@ssbarnea offered to help with the maintenance of the CI itself.

Action items:

pip needs to enable the OpenDev Zuul app https://github.com/apps/opendev-zuul. This requires read-only permissions, and the ability to use the checks API to post results. (@pradyunsg)
OpenDev will need to merge
- https://review.opendev.org/761467
- and https://review.opendev.org/761468
  so we can setup Zuul to monitor pull requests and run jobs. (@ianw)
pip repo needs to have an initial config (essentially, Zuul CI configuration #9107 should be merged once it's ready)

pradyunsg · 2020-11-04T23:53:42Z

Well, the app is installed. :)

ianw · 2020-11-04T23:54:38Z

@pradyunsg thank you. I'll shepherd things through on our side and get back

pradyunsg · 2020-11-05T00:07:29Z

:)

As a heads-up, our CI takes ~20 minutes on Linux, on the 2 vCPU machines we get from most commercial CI providers (like GitHub Actions, Travis CI, Azure Pipelines). I'm not sure what the details of these external Zuul instances/checks/resources would be [1] but I do think that if we're going to run tests on an even slightly decent matrix, they'd need to be parallelized to not take, like, 2 hours.

[1]: If someone with visibility could share the details on this, that'd be great!

ianw · 2020-11-05T00:44:51Z

If someone with visibility could share the details on this, that'd be great!

Our basic host is a dedicated VM with 8GB RAM 8xCPU. This is mentioned at [1]. These are donated by a range of hosting providers (you can in fact see them all at [2])

[1] https://docs.opendev.org/opendev/system-config/latest/contribute-cloud.html#contributing-cloud-test-resources
[2] https://opendev.org/openstack/project-config/src/branch/master/nodepool

pradyunsg · 2020-11-05T01:00:19Z

Those are about 4x our existing setups, and I'd expect our tests to scale pretty well with that (I think our tests are I/O bound w/ bursty compute).

That sounds great actually -- it's probably big enough to run the entire suite with RAM disks, I think, which would probably work even better since it might reduce the CI times a fair bit (which is better for everyone, given that it's shared+donated resources).

mnaser · 2020-11-05T04:56:34Z

FWIW, just chiming in, using Zuul as nothing but a third-party signal and continuous integration utility really defeats the purpose of it. IMHO, it makes Zuul quite meaningless when it comes to all of it's really powerful features. It really shines when you let it gate your project, which I think should be what is taken into consideration.

Otherwise, it's just a boring job runner like any other hosted CI.

pfmoore · 2020-11-05T07:45:32Z

It really shines when you let it gate your project, which I think should be what is taken into consideration.

One problem I have with the zuul docs from my (very brief, out of necessity) skim is that it uses a lot of terminology that I'm not familiar with. Here's an example - what do you mean to "gate the project"?

ianw · 2020-11-05T08:03:53Z

gate the project

When using Zuul to it's full potential, humans do not merge changes/pull requests. You indicate to Zuul that a change is reviewed and tell it that it is safe to merge, and it applies the change to the current HEAD, runs CI and commits the change only after that has passed. You can no longer merge a broken change because you ran the CI 3 days ago against a now out-of-date HEAD and someone else has merged, say, an API change in between -- that would cause the "gate" CI to fail and the change would be rejected; you would rework it, re-review it and submit it again. That's "gating".

Zuul can certainly operate like this on this project. However, the current app doesn't have write permissions, so it's not configured for it (the Zuul that Ansible uses in various ways, is, however).

I have not beaten it into shape for commit to the docs, but I have written up

https://review.opendev.org/#/c/683085/3/doc/source/discussion/zen.rst

which is a more "conversational" view of what Zuul does and is perhaps of interest to you.

albinvass · 2020-11-05T08:38:30Z

One problem I have with the zuul docs from my (very brief, out of necessity) skim is that it uses a lot of terminology that I'm not familiar with. Here's an example - what do you mean to "gate the project"?

There's a nice short video at the frontpage that should make it a bit clearer: https://zuul-ci.org/

:)

pfmoore · 2020-11-05T08:44:49Z

When using Zuul to it's full potential, humans do not merge changes/pull requests.

Right. I doubt we'd want to do anything like that. All we're looking for (at least, in my opinion) is a CI runner. This whole conversation was triggered because Travis changed their Ts&Cs, and as a result we have ended up on just one runner, Github Actions. Our limitation is CI runtime.

Add to that some comments that people feel we should have a bigger test matrix (which I'm not sure we need, @pradyunsg explained why on the other thread) and we ended up discussing zuul. But for me, it's still just "how can we push through our test suite on CI faster" along with some people (not me) being concerned that we currently rely on just one platform.

Specifically, I don't want to see the pip developers spending our very limited time on re-engineering our CI. As a volunteer project, I don't get to dictate, but my hope is that we focus on improving pip's code, and just have CI that's "good enough" (or maintained on our behalf by others 🙂). So I'm very happy to see others offering CI for pip, but I want it to be low (or zero) effort for the pip devs to adopt.

There's a nice short video at the frontpage

I don't do videos for stuff like that, sorry. I prefer to skim text at my own pace.

albinvass · 2020-11-05T08:52:41Z

I don't do videos for stuff like that, sorry. I prefer to skim text at my own pace.

Alright. I usually think things are a bit clearer when I have an image in front of me showing how things would fit together.
Maybe this will suit you better: https://zuul-ci.org/docs/zuul/discussion/concepts.html#zuul-concepts

ssbarnea · 2020-11-05T08:52:52Z

@pfmoore Shortly gating is not unique to zuul, is the process where the merge is not human made, is made by CI/CD pipeline when the right conditions are met, mainly only after testing again the final form of the code. There are lots of GitHub projects that are using a gated approach where no human has merge rights, they usually use a label to mark that "ready-to-merge" and the bot will take care of rebasing, retesting and doing the merge. It does produce a chain of changes that do gradually go into the final product without requiring a human steward to watch them.

Over the years I seen lots of accidents where a change broke the code because CI run on older version of the code. The old case where A and B changes are perfectly normal in isolation but if you put them both, they break. Github has an option to require updated branch before merging but that proves not to work well with active projects, ones that have many changes being made, especially when combines with long running jobs. It creates a long cascade of rebases.

pfmoore · 2020-11-05T09:12:09Z

Got it, thanks.

IMO pip's workflow isn't perfect, but it's an exercise in balance between catching as much as we can and not spending too much of our precious developer time on infrastructure. Like pretty much everything else in the world 🙂

ianw · 2020-11-06T00:07:06Z

The tenant and basic configuration are now merged and live

https://zuul.opendev.org/t/pypa/status

webknjaz · 2020-11-06T00:08:16Z

A separate tenant — yaaay! 🎉

webknjaz · 2020-11-06T00:35:08Z

@pfmoore @pradyunsg re: gating — the current effort is to just add more resources and do small incremental steps so that it's not too overwhelming. That's why I didn't even bring up gating in my messages.

But I'd still like to make a comment on this. Both of you seem to think that gating would introduce maintenance burden, friction, and consume an enormous amount of time. I think that there's two separate things that got mixed up in this point of view.

Gating itself is just letting some automated system to do the merge instead of you. Essentially this means that if a PR gets reviewed and labeled as approved to be merged before it gets all of the CI statuses, you don't have to babysit it in order to catch the moment when the Merge button becomes green. You can spend this time doing something useful instead. So it actually saves time rather than consumes it.
Another example would be a case that is an example of real-world friction that you're not protected from in a normal workflow: you get PR A and PR B, they both have green statuses but when you merge them both into master, it becomes red. What would you do normally? I guess it'd be firefighting and debugging the CI on the red master == it's time-consuming and introduces friction to anybody sending-in PRs because they all are now red because of something unrelated or even worse — they have a mixture of tests failing both because of broken master and their own bugs and have to somehow figure out relevant problems.
Now, if you have gating in place, then in this scenario you'll be notified about a breaking change early, before it gets into master, and everybody sending PRs won't be as frustrated. So, in my mind, it's another example of how gating improves the experience rather than making it worse.

Another thing that got into the mix-up is extending the envs matrix. This is what really could introduce more burden because one may end up needing to debug more platforms than they are knowledgeable about. This is also something that the initial setup will probably avoid and will need to be agreed upon with folks who are actually affected by the burden presented.

That's all I wanted to say for now. And to reiterate: this issue doesn't have a goal to introduce anything gating-related. But in the future, it'd be interesting to explore it and maybe enable for a day or a week for people to try it out and have actual feedback on whether it's annoying for them or helpful.

webknjaz · 2020-11-06T00:40:49Z

@pradyunsg FYI the configuration PR already has some status reports: https://github.com/pypa/pip/pull/9107/checks?check_run_id=1361306493.

webknjaz · 2020-11-25T14:43:44Z

@pradyunsg @pfmoore: I think it may be interesting for you to watch https://youtu.be/mjUPThomu4Q and maybe https://youtu.be/vb0Iuf-6wHs.

ichard26 · 2024-04-18T23:02:48Z

Are we still interested in using Zuul as part of CI? If not, let's close this.

webknjaz · 2024-04-18T23:16:57Z

I think it's safe to assume that nobody's going to drive this effort either way...

webknjaz mentioned this issue Nov 4, 2020

Clarify & document our CI strategy #7279

Open

webknjaz mentioned this issue Feb 3, 2021

Changes in Travis CI pricing model, turn it off? #9087

Closed

webknjaz closed this as not planned Won't fix, can't repro, duplicate, stale Apr 18, 2024

github-actions bot locked as resolved and limited conversation to collaborators May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare initial Zuul CI setup #9103

Prepare initial Zuul CI setup #9103

webknjaz commented Nov 4, 2020 •

edited

Loading

pradyunsg commented Nov 4, 2020

ianw commented Nov 4, 2020

pradyunsg commented Nov 5, 2020

ianw commented Nov 5, 2020

pradyunsg commented Nov 5, 2020 •

edited

Loading

mnaser commented Nov 5, 2020

pfmoore commented Nov 5, 2020

ianw commented Nov 5, 2020

albinvass commented Nov 5, 2020

pfmoore commented Nov 5, 2020

albinvass commented Nov 5, 2020

ssbarnea commented Nov 5, 2020

pfmoore commented Nov 5, 2020

ianw commented Nov 6, 2020

webknjaz commented Nov 6, 2020

webknjaz commented Nov 6, 2020

webknjaz commented Nov 6, 2020

webknjaz commented Nov 25, 2020

ichard26 commented Apr 18, 2024

webknjaz commented Apr 18, 2024

Prepare initial Zuul CI setup #9103

Prepare initial Zuul CI setup #9103

Comments

webknjaz commented Nov 4, 2020 • edited Loading

pradyunsg commented Nov 4, 2020

ianw commented Nov 4, 2020

pradyunsg commented Nov 5, 2020

ianw commented Nov 5, 2020

pradyunsg commented Nov 5, 2020 • edited Loading

mnaser commented Nov 5, 2020

pfmoore commented Nov 5, 2020

ianw commented Nov 5, 2020

albinvass commented Nov 5, 2020

pfmoore commented Nov 5, 2020

albinvass commented Nov 5, 2020

ssbarnea commented Nov 5, 2020

pfmoore commented Nov 5, 2020

ianw commented Nov 6, 2020

webknjaz commented Nov 6, 2020

webknjaz commented Nov 6, 2020

webknjaz commented Nov 6, 2020

webknjaz commented Nov 25, 2020

ichard26 commented Apr 18, 2024

webknjaz commented Apr 18, 2024

webknjaz commented Nov 4, 2020 •

edited

Loading

pradyunsg commented Nov 5, 2020 •

edited

Loading