Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zuul CI configuration #9107

Closed
wants to merge 5 commits into from
Closed

Zuul CI configuration #9107

wants to merge 5 commits into from

Conversation

ianw
Copy link
Contributor

@ianw ianw commented Nov 6, 2020

No description provided.

@webknjaz
Copy link
Member

webknjaz commented Nov 6, 2020

@ianw if you set it to Draft, will Zuul still test it?

@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

@ianw if you set it to Draft, will Zuul still test it?

Yes, I think so

@ianw ianw marked this pull request as draft November 6, 2020 00:37
@webknjaz
Copy link
Member

webknjaz commented Nov 6, 2020

@ianw I think this change may need to update MANIFEST.in too, in order to satisfy the linter. @pradyunsg probably knows better tho.

@webknjaz
Copy link
Member

webknjaz commented Nov 6, 2020

Oh, nice, the reporting in Zuul UI seems to be set up way better than in other places.

@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

Ok, so the first run is in

https://zuul.opendev.org/t/pypa/build/06bdbc5eafed4c3380fb19ff6801c1b3/log/job-output.txt

the failures appear to be related to our use of mirrors. Our test nodes setup pip to use a reverse proxy to a cloud-local mirror; this stops them having to go over the internet as much as possible

e.g.

2020-11-06 00:38:47.905346 | ubuntu-focal | E           AssertionError: assert '\nUsage:   \...the future.\n' == '\nUsage:   \...the future.\n'
2020-11-06 00:38:47.905359 | ubuntu-focal | E             Skipping 9514 identical leading characters in diff, use -v to show
2020-11-06 00:38:47.905375 | ubuntu-focal | E             +   https://pypi.org/simple). This should point to a
2020-11-06 00:38:47.905388 | ubuntu-focal | E             -   https://mirror.kna1.airship-
2020-11-06 00:38:47.905400 | ubuntu-focal | E             -                               citycloud.opendev.org/pypi/simple). This should
2020-11-06 00:38:47.905413 | ubuntu-focal | E             -                               point to a repository compliant with PEP 503
2020-11-06 00:38:47.905426 | ubuntu-focal | E             ?                              -----------
2020-11-06 00:38:47.905438 | ubuntu-focal | E             +                               repository compliant with PEP 503 (the simple...
2020-11-06 00:38:47.905451 | ubuntu-focal | E
2020-11-06 00:38:47.905463 | ubuntu-focal | E             ...Full output truncated (63 lines hidden), use '-vv' to show

UPDATE : I removed the /etc/pip.conf we write and this seemed to go away. Probably a bug that the help is different

@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

  • tests/functional/test_help.py::test_help_commands_equally_functional
  • tests/functional/test_install.py::test_basic_install_from_pypi - Asser...
  • tests/functional/test_install_user.py::Tests_UserSite::test_upgrade_user_conflict_in_globalsite
  • tests/functional/test_install_user.py::Tests_UserSite::test_install_user_conflict_in_globalsite_and_usersite
  • tests/functional/test_install_user.py::Tests_UserSite::test_install_user_conflict_in_globalsite
  • tests/functional/test_uninstall_user.py::Tests_UninstallUserSite::test_uninstall_from_usersite_with_dist_in_global_site
  • tests/functional/test_new_resolver_user.py::test_new_resolver_install_user_reinstall_global_site
  • tests/functional/test_new_resolver_user.py::test_new_resolver_install_user_conflict_in_global_site
  • tests/functional/test_new_resolver_user.py::test_new_resolver_install_user_conflict_in_global_and_user_sites
  • tests/functional/test_search.py::test_run_method_should_return_no_matches_found_when_does_not_find_pkgs

@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

It looks like these failures are very similar to #7785

UPDATE I've added --use-venv which has made these go away for now

@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

test_run_method_should_return_no_matches_found_when_does_not_find_pkgs() fails with

xmlrpc.client.Fault: <Fault -32500: 'HTTPTooManyRequests: The action could not be performed because there were too many requests by the client. Limit may reset in 1 seconds.'>

I think this might be because we're running 8-way parallel. These tests are probably hitting pypi/fastly in very quick succession so we get rate limited


Not sure what is up with test_search_exit_status_code_when_finds_no_package

2020-11-06 01:40:26.475550 | ubuntu-focal | [gw5] linux -- Python 3.8.5 /home/zuul/src/github.com/pypa/pip/.tox/py38/bin/python
2020-11-06 01:40:26.475561 | ubuntu-focal |
2020-11-06 01:40:26.475572 | ubuntu-focal | script = <tests.lib.PipTestEnvironment object at 0x7f4eeae5fc10>
2020-11-06 01:40:26.475583 | ubuntu-focal |
2020-11-06 01:40:26.475593 | ubuntu-focal |     @pytest.mark.network
2020-11-06 01:40:26.475604 | ubuntu-focal |     def test_search_exit_status_code_when_finds_no_package(script):
2020-11-06 01:40:26.475615 | ubuntu-focal |         """
2020-11-06 01:40:26.475626 | ubuntu-focal |         Test search exit status code for no matches
2020-11-06 01:40:26.475636 | ubuntu-focal |         """
2020-11-06 01:40:26.475657 | ubuntu-focal |         result = script.pip('search', 'nonexistentpackage', expect_error=True)
2020-11-06 01:40:26.475672 | ubuntu-focal | >       assert result.returncode == NO_MATCHES_FOUND, result.returncode
2020-11-06 01:40:26.475683 | ubuntu-focal | E       AssertionError: 2
2020-11-06 01:40:26.475694 | ubuntu-focal | E       assert 2 == 23
2020-11-06 01:40:26.475705 | ubuntu-focal | E        +  where 2 = <tests.lib.TestPipResult object at 0x7f4eeae5f580>.returncode

@ianw ianw force-pushed the zuul branch 3 times, most recently from 3302777 to 7225f26 Compare November 6, 2020 02:42
@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

recheck

@ianw ianw force-pushed the zuul branch 4 times, most recently from 64f53e7 to b70952f Compare November 6, 2020 06:23
@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

So this now works, and I think is a good proof of concept. Things to note

  • I've had to remove our pip configuration to use cloud-local pypi reverse proxies that we setup to avoid CI jobs hitting the internet. This is setup by our base jobs at mirror.yaml and the template is pip.conf.j2 I feel like this is a bug in the pip testing that it doesn't work in tests/functional/test_help.py::test_help_commands_equally_functional and tests/functional/test_install.py::test_basic_install_from_pypi
  • It needed the --use-venv flag ... this went a bit over my head but looks like Tests fail when executing test_install_user.py locally #7785
  • A few pauses were required in tests/functional/test_search.py to avoid hitting pypi rate limits. Not sure if existing CI is just slower, or perhaps runs in cloud providers that have some agreement with fastly that prevents this. This happened even if I ran the network marked tests sequentially.
  • test_sort_best_candidate__* appeared to fail in unit tests @ here but has not appeared on re-runs (update: another and now another) I think there is probably a small race lurking somewhere there
  • You can see all the job runs @ https://zuul.opendev.org/t/pypa/builds. More data needed but looks like a run is ~5m:30s
  • It is a manual process, but we can put nodes on hold and add people's ssh keys if we need to debug things on the host. Zuul also watches for a comment "recheck" and will re-run testing if seen (this is configured @ https://opendev.org/pypa/project-config/src/branch/master/zuul.d/pipelines.yaml#L38). Easiest way to do this is to reach out in freenode #opendev where admins live.

At this point, the world is our oyster, as they say. This can be updated to run more python versions, or different platforms, or whatever we like. I don't think I'm the right person to drive this, but am certainly happy to help as required @ssbarnea @SeanMooney

@ianw ianw changed the title [Not for merge yet] Zuul CI configuration Zuul CI configuration Nov 6, 2020
@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

recheck

1 similar comment
@ianw
Copy link
Contributor Author

ianw commented Nov 6, 2020

recheck

@ssbarnea
Copy link
Contributor

ssbarnea commented Nov 6, 2020

This was a very busy week for me but I will look into during the weekend. I was expecting to find pip bugs while setting up the new pipelines.

Some may be considered infra related but some are test design bugs (or design limitation). Hopefully we will sort them.

For example the fact that testing fails when user has his own pip.conf config is a testing bug, the lack of isolation from machine setup. In fact is a very common bug, I fixes several and also made it.

I am always connected to #opendev and #pypa channels using @zbr nick, feel free to ping me there for faster responses. It would be very useful to see one or two pip core team members watching this ticket.

Regarding approach: I would slowly increase testing coverage while addressing the bugs we discover, especially as we also have a day job.

I am really pleased on how fast we managed to join efforts and get results. Thanks to everyone! This is, at least for myself a morale booster.

Comment on lines 22 to 26
- name: Run tox integration tests with no network
include_role:
name: tox
vars:
tox_extra_args: '-- -m integration -n auto --use-venv'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can units and integrations run in parallel jobs? Would it make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can. The other existing jobs seem to run separately so it fails earlier. Personally I'd just run it all as you suggest because that way you see all failures your change introduces, without having to run it through multiple times

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The current CI setup resorts to using hacks with early cancellations because of the limited resources. So if Zuul can handle the load, it's better to build a more responsive setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the unit/integration split

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weren't we going to do the opposite?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Is it possible to use pre-baked images to improve the cold start? Like other CIs do.

Copy link
Contributor Author

@ianw ianw Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's always room for improvement, but there's probably two parts to that. Part of the setup is cloud and host specific things; stuff that we don't actually know until the node starts like what mirror to point it at and ssh keys.

The other bits are setting up tox and pip. We have very generic jobs because these are widely portable -- people use these building blocks not just on the VM images Opendev builds, but their own images in openshift, k8s, google compute, aws etc etc. For that reason, a lot of the framework in https://zuul-ci.org/docs/zuul-jobs/ is very intentionally lowest-common-denominator.

It's a trade off -- other CI systems probably don't have a goal of making completely re-usable jobs you can roll out to a wide range of very minimal testing environments. So we're always considering that when we build bits of the jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that. But let me rephrase: how hard would it be to make some image from say quay.io run this build?

Copy link
Contributor Author

@ianw ianw Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the way that I think you mean, this isn't possible in Opendev as the Opendev node environment is not container-based. Note that Zuul can certainly be setup to provide resources in many different ways, but in Opendev's Zuul instance you are getting a virtual-machine from the images we pre-build daily (qcow images are available @ https://nb01.opendev.org/images/ and I can go into how and why they are built, but I don't think that's the point here). So you can't tell this job to run on an arbitrary container.

You can install docker/podman on the host, and do things inside that like run tasks in whatever container you want. For example, the pypa manylinux builds use this to build wheels in a container; e.g. https://github.com/pyca/cryptography/blob/master/.zuul.playbooks/playbooks/wheel/roles/build-wheel-manylinux/tasks/main.yaml

This is a flexibility win, but because you have to install docker and pull the image, not a speed win for running something like tox.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks! I was just thinking about what we could do in the context of what @pradyunsg wished for when saying that it better be fast.

news/9103.feature.rst Outdated Show resolved Hide resolved
@webknjaz webknjaz mentioned this pull request Nov 8, 2020
4 tasks
@pfmoore
Copy link
Member

pfmoore commented Nov 10, 2020

Massively superficial (and hence I wouldn't class it as a "review") but a couple of comments:

  1. Could we put everything under a single .zuul directory (maybe .zuul.d -> .zuul and .zuul.playbooks -> .zuul/playbooks) to reduce clutter in the top-level directory? On Windows, directories named with a dot aren't hidden...
  2. There's only one link to an example of output that I could find. It's basically just a textfile, so it's much harder to read (or find the important bits) than our other CI systems (which highlight "important" bits). And when I tried clicking around in the interface on that page, I couldn't find any sort of "just pip" view, so I rapidly got lost in the various jobs and projects (the "projects" page for pypa/pip is empty).

I'd like to see a better example of how this would look "for real" before committing to it. As it stands, it doesn't feel as usable as our existing CI.

@ianw
Copy link
Contributor Author

ianw commented Nov 10, 2020

1. Could we put everything under a single `.zuul` directory (maybe `.zuul.d` -> `.zuul` and `.zuul.playbooks` -> `.zuul/playbooks`) to reduce clutter in the top-level directory? On Windows, directories named with a dot aren't hidden...

Zuul was considering everything under .zuul.d as job config, but I added https://review.opendev.org/#/c/744811/ to allow projects to do this. I'll have to double check if we have restarted things since that merged to see if it's live, but the answer is yes. The idea is that the Ansible playbooks, however, are fairly abstract, and can be useful to anyone using Ansible. I will admit, this is less important for pip, rather than things like server projects that you tend to deploy, where it makes great sense to be using the same logic for deploying in CI as in production as much as possible.

2. There's only one link to an example of output that I could find. It's basically just a textfile, so it's much harder to read (or find the important bits) than our other CI systems (which highlight "important" bits). And when I tried clicking around in the interface on that page, I couldn't find any sort of "just pip" view, so I rapidly got lost in the various jobs and projects (the "projects" page for pypa/pip is empty).

The build will report in the checks interfaces a link to a build result, e.g.

https://zuul.opendev.org/t/pypa/build/94c241f5c2b749bcb0ea4c8d6250142d

For the quickest results; click "artifacts -> pytest results"

For more info, start by going to "console". There you will see each ansible task run. You can scroll down to "Run tox" which is the task that ... runs tox :) You can click to expand, or click on the magnifying glass (actually, we've had some debate about how to make this more discoverable ... so suggestions welcome!)

You can also click on "logs" and see various text-based logs. This has everything, but in the most verbose/least UI-friendly mode.

You can always go to https://zuul.opendev.org/t/pypa/builds to see the latest builds that have run

This moves the playbooks under the Zuul config directory; note the
.zuul.ignore to avoid them being read as configuration.
@pfmoore
Copy link
Member

pfmoore commented Nov 10, 2020

The build will report in the checks interfaces a link to a build result, e.g.

Thanks, that interface is a lot better. Unfamiliar, but it's unfair to criticise a new thing for being unfamiliar, so that's just something I'll have to get used to.

I'm still concerned over who's going to maintain this long-term (I have no personal interest in learning zuul) but if we get a good answer to that, then I'm essentially neutral on adding zuul as an extra CI. Ask me again if the proposal is to replace our existing CI with zuul, though 🙂

@albinvass
Copy link

I'm still concerned over who's going to maintain this long-term (I have no personal interest in learning zuul) but if we get a good answer to that, then I'm essentially neutral on adding zuul as an extra CI. Ask me again if the proposal is to replace our existing CI with zuul, though slightly_smiling_face

Keep in mind zuul is not allowed to do what it's best at which is avoiding regression since it's not gating the project.

If I didn't already have so many other things bogging me down I would be happy to help you out but I don't think I would have (much) time to spend maintaining anything. But you can always find me on #zuul irc if you need any help.

This makes the console output less cluttered when you go to look at
the actual tox run.
@webknjaz
Copy link
Member

Add pytest-html and collect with Zuul as artifact

@ianw any chance to configure the output with contrast+accessibility? I know that it should be possible to feed custom styles into this plugin... The current color scheme is rather hard to consume.

.zuul.d/jobs.yaml Outdated Show resolved Hide resolved
Comment on lines +6 to +10
- name: Remove OpenDev mirror config
file:
path: /etc/pip.conf
state: absent
become: yes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set PIP_CONFIG_FILE=/dev/null env var instead?
cc @pradyunsg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, this could probably be done per-test to make use of the mirrors in tests that don't need to hit the production PyPI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even PIP_ISOLATED=true (is this a thing?)

@ianw
Copy link
Contributor Author

ianw commented Nov 10, 2020

@ianw any chance to configure the output with contrast+accessibility? I know that it should be possible to feed custom styles into this plugin... The current color scheme is rather hard to consume.

I guess you're referring to https://github.com/pytest-dev/pytest-html#appearance but I think that's just an example, as I can't see those CSS files actually defined anywhere?

However, if someone wants to write a custom css file, that is most certainly possible to deploy onto the host and have collected in the logs. You just want to use ansible copy: command to put it on the testing host, and collect it in the same way so it's alongside the html log. It seems to me you could download the existing HTML and work on a CSS file locally, and then I'm happy to work with anyone to help deploy it in the job.

Comment on lines +9 to +14
- job:
name: pypa-pip-py38
parent: pypa-pip-base
nodeset: ubuntu-focal
vars:
tox_envlist: py38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we look into adding more envs? Other distros?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that was the initial reasoning for exploring Zuul :) @ssbarnea I think was most involved in that disucssion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just wasn't sure if it's expected that they'd appear in this PR or follow-ups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is probably to not overload this initial support with a bunch more things, but I'm open to suggestion. Note that any new jobs proposed in a pull request are speculatively tested by Zuul, so things will never get into a failure state

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by saying that things won't get into a failure state. I think I may be unfamiliar with what speculative testing actually means in this context. Mind explaining?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is just like this change, any pull requests that modify/add/remove jobs in .zuul.d will be tested by Zuul with that config applied. i.e. if we add a pull request to add centos/debian/XYZ jobs as a follow-up pull-request, those jobs will run on that pull request before it is committed, and we can iterate, debug, etc. That's what I mean by speculative testing; the proposed config changes are tested by Zuul from the pull request, without being committed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so it's basically the same as what other CIs do... I didn't know there's a special term for this behavior.

.zuul.d/jobs.yaml Outdated Show resolved Hide resolved
@pradyunsg
Copy link
Member

To set expectations about the timeline here: I think I can come back to this in a few days after 20.3 is out.

@pfmoore
Copy link
Member

pfmoore commented Nov 20, 2020

... and to set expectations about the next steps: I think the pip developers still need a discussion about whether we even want to use zuul before we move forward with this. And getting enough of the pip devs to comment will take more time than just having any one of us be available to review this PR.

@webknjaz webknjaz requested a review from pradyunsg February 3, 2021 14:01
@webknjaz
Copy link
Member

webknjaz commented Feb 3, 2021

In the light of #9087 (comment), I think it's time to revisit this effort.

@webknjaz
Copy link
Member

webknjaz commented Apr 2, 2021

@pradyunsg do you think this effort could be resurrected anytime soon?

@webknjaz
Copy link
Member

webknjaz commented Apr 2, 2021

@ianw the Zuul logs have been cleaned up. Maybe rebase this?

@pradyunsg pradyunsg removed their request for review April 2, 2021 12:07
@pradyunsg
Copy link
Member

I think the next steps here are outlined in #7279 (comment), and I won't have time in the near future to look into this.

@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Aug 15, 2021
@pradyunsg
Copy link
Member

Closing this for now.

As an update, we've since consolidated pip's entire CI into GitHub Actions. I still think it's valuable to have additional CI resources but this is basically blocked on a discussion/decision, as noted in my previous comment. I hope this isn't percieved as a "no, never" but more along the lines of acknowledging that we're still a few steps away from saying "Yes" if we're going to.

Thanks all for the discussions and effort toward this so far! ^.^

@pradyunsg pradyunsg closed this Feb 25, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs rebase or merge PR has conflicts with current master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants