test: check package versions and downgrades #609

mike-nguyen · 2021-08-25T13:32:54Z

Add two tests:

Verify that all packages from the plashets have the correct OCP
release
Verify that no packages are downgraded from the previous OCP release

openshift-ci · 2021-08-25T13:49:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mike-nguyen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mike-nguyen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2021-08-25T13:50:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mike-nguyen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mike-nguyen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cgwalters

One thing I think we should bear in mind as a general rule is that our team can and IMO should also add tests to the main OpenShift tests, not just our kola tests. For example I did one a while ago in openshift/origin#25993 (That one was specifically about in-cluster stuff so it needed to be there of course)

I think this one would also be a good potential candidate to live in the openshift e2e tests, because it isn't actually testing the operating system itself, just metadata about the build. It's also tied to OCP and the release image.

While RHEL currently does not revert packages unfortunately, to me the ability to do so is a key selling point of image based updates. I guess if we explicitly wanted to revert, we could just denylist this test temporarily or something?

We also had a similar debate in FCOS over here: coreos/fedora-coreos-config#841

cgwalters · 2021-08-25T13:41:21Z

tests/kola/package/test.sh

+  VERSION=$(oc adm release info $PAYLOAD -o json | jq -r '.displayVersions."machine-os".Version')
+  OCP_COMMIT=$(curl -sSL https://art-rhcos-ci.s3.amazonaws.com/releases/rhcos-$RELEASE/$VERSION/x86_64/meta.json | jq -r '."ostree-commit"')
+
+  curl -SL https://art-rhcos-ci.s3.amazonaws.com/releases/rhcos-$RELEASE/$VERSION/x86_64/rhcos-$VERSION-ostree.x86_64.tar -o $STREAM.tar


This will break after we try again with #593

I think it'd be a bit cleaner to download the meta.json and then do jq .images.ostree.path on it, then we can conditionalize on whether it ends in .ociarchive or .tar as is done in e.g.
https://github.com/coreos/coreos-assembler/blob/88efaff63b38542763336e12249b11f53d606776/src/cosalib/cmdlib.py#L251

Also avoid duplicating the URL would be good.

@cgwalters it looks like the version of rpm-ostree in the latest rhcos does not have the ex-container functionality. I see that you converted the oci-archive to tar inside cosa before and copied it to the test system [0] but this is an external test, and I don't think that's possible. Do you have any ideas to work around this?

Some ideas:

create a ostree container to handle the conversion

rewrite test as a mantle kola test and do the same thing as [0]

I also saw that in the future the container may contain a http server. Would anything need to change in the oci-archive or would it just ab an additional http server to serve the ostree repo?

[0] https://github.com/coreos/coreos-assembler/blob/350c1d93bc3343c8f523d642664bccd8dc0424c9/mantle/kola/tests/rhcos/upgrade.go#L95

There's more alternatives:

Add cosa build --disallow-downgrades and use it in the pipeline by default (and have a declarative file of allowed downgrades?)

Add the above, but implement in rpm-ostree (implicity adding allowed-downgrades: [] i.e. empty YAML list would disallow downgrades, then the list is an allowlist)

Either way, this is basically the same thing as coreos/fedora-coreos-config#892 right? So if we deduplicate in either coreos-assembler or rpm-ostree it would be a win (less pipeline code, more easily reproduced outside of pipelines, etc.)

I kind of like the allowed-downgrades in rpm-ostree natively but it's also pretty easy to implement outside it in a cosa workdir which is what the FCOS pipeline PR above is doing.

jlebon's comment may apply here because it would be checking from the last build and not the last release. Maybe this test should be scrapped and changed to an upgrade test from the previous releases and db diff instead of just doing a db diff.

I think going the cosa build --disallow-downgrades + YAML path could work. There'd be overhead involved, but maybe that's OK. I do like that with coreos/fedora-coreos-config#892, it's just an easy "rubber-stamping" thing: we don't have to do anything else than just force merge the PR. In the pipeline itself, nothing cares about downgrades.

For RHCOS, that's harder to do without lockfiles because unexpected downgrades could happen anytime up to and including during the prod pipeline build. So it's much harder to make it as lightweight as in FCOS.

cgwalters · 2021-08-25T13:51:27Z

tests/kola/package/test.sh

+  #   3. Get the difference between all the releases and the "from release
+  #      indexes".  This is the latest because there is no update from this
+  #      release.
+  PAYLOAD=$(echo $GRAPH | jq -r '. as $graph | [$graph.edges[][0]] | unique as $from | $graph.nodes | to_entries as $indexed | [$indexed[].key] | unique as $nodes | ($nodes - $from)[] as $latest | $indexed[] | select(.key == $latest) | .value.payload')


This is some veritable jq sorcery.

cgwalters · 2021-08-25T13:51:58Z

tests/kola/package/test.sh

+
+  RHCOS_COMMIT=$(rpm-ostree status --json | jq -r .deployments[0].checksum)
+
+  if [[ $(rpm-ostree db diff $OCP_COMMIT $RHCOS_COMMIT | grep -A1000 Downgraded) ]]; then


Hm, we should add something like db diff --downgrades or something so this test could just validate it's empty.

mike-nguyen · 2021-08-25T15:02:47Z

@cgwalters

One thing I think we should bear in mind as a general rule is that our team can and IMO should also add tests to the main OpenShift tests, not just our kola tests. For example I did one a while ago in openshift/origin#25993 (That one was specifically about in-cluster stuff so it needed to be there of course)

We definitely should have more OpenShift tests!

I think this one would also be a good potential candidate to live in the openshift e2e tests, because it isn't actually testing the operating system itself, just metadata about the build. It's also tied to OCP and the release image.

Do you know if we can gate on OpenShift e2e tests? Although the test doesn't test the OS itself it checks the composition of the OS which is also tied to OpenShift. I like it as a kola test because because our team closely monitors these tests, we can gate early in the process, and it isn't resource intensive as standing up a cluster.

While RHEL currently does not revert packages unfortunately, to me the ability to do so is a key selling point of image based updates. I guess if we explicitly wanted to revert, we could just denylist this test temporarily or something?

We also had a similar debate in FCOS over here: coreos/fedora-coreos-config#841

We could denylist with the new snooze support from @saqibali-2k coreos/coreos-assembler#2307

miabbott · 2021-08-25T21:18:30Z

tests/kola/package/test.sh

+  GRAPH=$(curl -sfH "Accept:application/json" "https://api.openshift.com/api/upgrades_info/v1/graph?channel=$STREAM")
+  if [[ $? -ne 0 ]]; then
+    fatal "Unable to get graph"
+  fi


Interestingly, if you provide a non-existent channel as an arg to that URL, it will still return a result:

$ curl -sfH Accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=foobar' {"nodes":[],"edges":[]} $ echo $? 0

So unless the infra is down, your return code is always going to be a success. ~~Maybe you want to do some additional check that the content returned isn't empty?~~ (edit: I see the check below now)

Also, since the script uses set -e, if curl does return a failure the whole script will exit, so you don't need to check the return code.

miabbott · 2021-08-25T21:23:34Z

tests/kola/package/test.sh

+# Verify all rhaos packages contain the same OpenShift version
+test_package_versions() {
+  if [[ $(rpm -qa | grep rhaos | grep -v $OPENSHIFT_VERSION) ]]; then
+    fatal "Error: rhaos packages do not match OpenShift version"


Do we care if this function causes the test to exit and not do the downgrade check? Or should the test try to accumulate errors?

accumulating errors is probably better so we catch everything but we don't seem to do it on any of our tests

Add two tests: - Verify that all packages from the plashets have the correct OCP release - Verify that no packages are downgraded from the previous OCP release

mike-nguyen · 2021-09-23T16:46:39Z

closing in favor of #635 and coreos/coreos-assembler#2459

mike-nguyen force-pushed the check_packages branch from 75c0acc to 66f852f Compare August 25, 2021 13:33

openshift-ci bot requested review from jlebon and lucab August 25, 2021 13:47

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 25, 2021

cgwalters reviewed Aug 25, 2021

View reviewed changes

miabbott reviewed Aug 25, 2021

View reviewed changes

test: check package versions and downgrades

fd6dfb3

Add two tests: - Verify that all packages from the plashets have the correct OCP release - Verify that no packages are downgraded from the previous OCP release

mike-nguyen force-pushed the check_packages branch from 66f852f to fd6dfb3 Compare September 14, 2021 16:06

mike-nguyen closed this Sep 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: check package versions and downgrades #609

test: check package versions and downgrades #609

mike-nguyen commented Aug 25, 2021

openshift-ci bot commented Aug 25, 2021

openshift-ci bot commented Aug 25, 2021

cgwalters left a comment

cgwalters Aug 25, 2021

mike-nguyen Sep 14, 2021

cgwalters Sep 14, 2021 •

edited

Loading

cgwalters Sep 14, 2021

mike-nguyen Sep 15, 2021

jlebon Sep 15, 2021

cgwalters Aug 25, 2021

cgwalters Aug 25, 2021

mike-nguyen commented Aug 25, 2021

miabbott Aug 25, 2021 •

edited

Loading

miabbott Aug 25, 2021

mike-nguyen Aug 26, 2021 •

edited

Loading

mike-nguyen commented Sep 23, 2021


		RHCOS_COMMIT=$(rpm-ostree status --json \| jq -r .deployments[0].checksum)

		if [[ $(rpm-ostree db diff $OCP_COMMIT $RHCOS_COMMIT \| grep -A1000 Downgraded) ]]; then

test: check package versions and downgrades #609

test: check package versions and downgrades #609

Conversation

mike-nguyen commented Aug 25, 2021

openshift-ci bot commented Aug 25, 2021

openshift-ci bot commented Aug 25, 2021

cgwalters left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters Sep 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mike-nguyen commented Aug 25, 2021

miabbott Aug 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mike-nguyen Aug 26, 2021 • edited Loading

Choose a reason for hiding this comment

mike-nguyen commented Sep 23, 2021

cgwalters Sep 14, 2021 •

edited

Loading

miabbott Aug 25, 2021 •

edited

Loading

mike-nguyen Aug 26, 2021 •

edited

Loading