Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: check package versions and downgrades #609

Closed
wants to merge 1 commit into from

Conversation

mike-nguyen
Copy link
Member

Add two tests:

  • Verify that all packages from the plashets have the correct OCP
    release
  • Verify that no packages are downgraded from the previous OCP release

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mike-nguyen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 25, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mike-nguyen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I think we should bear in mind as a general rule is that our team can and IMO should also add tests to the main OpenShift tests, not just our kola tests. For example I did one a while ago in openshift/origin#25993 (That one was specifically about in-cluster stuff so it needed to be there of course)

I think this one would also be a good potential candidate to live in the openshift e2e tests, because it isn't actually testing the operating system itself, just metadata about the build. It's also tied to OCP and the release image.

While RHEL currently does not revert packages unfortunately, to me the ability to do so is a key selling point of image based updates. I guess if we explicitly wanted to revert, we could just denylist this test temporarily or something?

We also had a similar debate in FCOS over here: coreos/fedora-coreos-config#841

VERSION=$(oc adm release info $PAYLOAD -o json | jq -r '.displayVersions."machine-os".Version')
OCP_COMMIT=$(curl -sSL https://art-rhcos-ci.s3.amazonaws.com/releases/rhcos-$RELEASE/$VERSION/x86_64/meta.json | jq -r '."ostree-commit"')

curl -SL https://art-rhcos-ci.s3.amazonaws.com/releases/rhcos-$RELEASE/$VERSION/x86_64/rhcos-$VERSION-ostree.x86_64.tar -o $STREAM.tar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break after we try again with #593

I think it'd be a bit cleaner to download the meta.json and then do jq .images.ostree.path on it, then we can conditionalize on whether it ends in .ociarchive or .tar as is done in e.g.
https://github.com/coreos/coreos-assembler/blob/88efaff63b38542763336e12249b11f53d606776/src/cosalib/cmdlib.py#L251

Also avoid duplicating the URL would be good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgwalters it looks like the version of rpm-ostree in the latest rhcos does not have the ex-container functionality. I see that you converted the oci-archive to tar inside cosa before and copied it to the test system [0] but this is an external test, and I don't think that's possible. Do you have any ideas to work around this?

Some ideas:

  • create a ostree container to handle the conversion
  • rewrite test as a mantle kola test and do the same thing as [0]

I also saw that in the future the container may contain a http server. Would anything need to change in the oci-archive or would it just ab an additional http server to serve the ostree repo?

[0] https://github.com/coreos/coreos-assembler/blob/350c1d93bc3343c8f523d642664bccd8dc0424c9/mantle/kola/tests/rhcos/upgrade.go#L95

Copy link
Member

@cgwalters cgwalters Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more alternatives:

  • Add cosa build --disallow-downgrades and use it in the pipeline by default (and have a declarative file of allowed downgrades?)
  • Add the above, but implement in rpm-ostree (implicity adding allowed-downgrades: [] i.e. empty YAML list would disallow downgrades, then the list is an allowlist)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, this is basically the same thing as coreos/fedora-coreos-config#892 right? So if we deduplicate in either coreos-assembler or rpm-ostree it would be a win (less pipeline code, more easily reproduced outside of pipelines, etc.)

I kind of like the allowed-downgrades in rpm-ostree natively but it's also pretty easy to implement outside it in a cosa workdir which is what the FCOS pipeline PR above is doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jlebon's comment may apply here because it would be checking from the last build and not the last release. Maybe this test should be scrapped and changed to an upgrade test from the previous releases and db diff instead of just doing a db diff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think going the cosa build --disallow-downgrades + YAML path could work. There'd be overhead involved, but maybe that's OK. I do like that with coreos/fedora-coreos-config#892, it's just an easy "rubber-stamping" thing: we don't have to do anything else than just force merge the PR. In the pipeline itself, nothing cares about downgrades.

For RHCOS, that's harder to do without lockfiles because unexpected downgrades could happen anytime up to and including during the prod pipeline build. So it's much harder to make it as lightweight as in FCOS.

# 3. Get the difference between all the releases and the "from release
# indexes". This is the latest because there is no update from this
# release.
PAYLOAD=$(echo $GRAPH | jq -r '. as $graph | [$graph.edges[][0]] | unique as $from | $graph.nodes | to_entries as $indexed | [$indexed[].key] | unique as $nodes | ($nodes - $from)[] as $latest | $indexed[] | select(.key == $latest) | .value.payload')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is some veritable jq sorcery.


RHCOS_COMMIT=$(rpm-ostree status --json | jq -r .deployments[0].checksum)

if [[ $(rpm-ostree db diff $OCP_COMMIT $RHCOS_COMMIT | grep -A1000 Downgraded) ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we should add something like db diff --downgrades or something so this test could just validate it's empty.

@mike-nguyen
Copy link
Member Author

@cgwalters

One thing I think we should bear in mind as a general rule is that our team can and IMO should also add tests to the main OpenShift tests, not just our kola tests. For example I did one a while ago in openshift/origin#25993 (That one was specifically about in-cluster stuff so it needed to be there of course)

We definitely should have more OpenShift tests!

I think this one would also be a good potential candidate to live in the openshift e2e tests, because it isn't actually testing the operating system itself, just metadata about the build. It's also tied to OCP and the release image.

Do you know if we can gate on OpenShift e2e tests? Although the test doesn't test the OS itself it checks the composition of the OS which is also tied to OpenShift. I like it as a kola test because because our team closely monitors these tests, we can gate early in the process, and it isn't resource intensive as standing up a cluster.

While RHEL currently does not revert packages unfortunately, to me the ability to do so is a key selling point of image based updates. I guess if we explicitly wanted to revert, we could just denylist this test temporarily or something?

We also had a similar debate in FCOS over here: coreos/fedora-coreos-config#841

We could denylist with the new snooze support from @saqibali-2k coreos/coreos-assembler#2307

Comment on lines +27 to +30
GRAPH=$(curl -sfH "Accept:application/json" "https://api.openshift.com/api/upgrades_info/v1/graph?channel=$STREAM")
if [[ $? -ne 0 ]]; then
fatal "Unable to get graph"
fi
Copy link
Member

@miabbott miabbott Aug 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, if you provide a non-existent channel as an arg to that URL, it will still return a result:

$ curl -sfH Accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=foobar'
{"nodes":[],"edges":[]}
$ echo $?
0

So unless the infra is down, your return code is always going to be a success. Maybe you want to do some additional check that the content returned isn't empty? (edit: I see the check below now)

Also, since the script uses set -e, if curl does return a failure the whole script will exit, so you don't need to check the return code.

# Verify all rhaos packages contain the same OpenShift version
test_package_versions() {
if [[ $(rpm -qa | grep rhaos | grep -v $OPENSHIFT_VERSION) ]]; then
fatal "Error: rhaos packages do not match OpenShift version"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care if this function causes the test to exit and not do the downgrade check? Or should the test try to accumulate errors?

Copy link
Member Author

@mike-nguyen mike-nguyen Aug 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accumulating errors is probably better so we catch everything but we don't seem to do it on any of our tests

Add two tests:
- Verify that all packages from the plashets have the correct OCP
  release
- Verify that no packages are downgraded from the previous OCP release
@mike-nguyen
Copy link
Member Author

closing in favor of #635 and coreos/coreos-assembler#2459

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants