-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: check package versions and downgrades #609
Conversation
75c0acc
to
66f852f
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mike-nguyen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mike-nguyen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I think we should bear in mind as a general rule is that our team can and IMO should also add tests to the main OpenShift tests, not just our kola tests. For example I did one a while ago in openshift/origin#25993 (That one was specifically about in-cluster stuff so it needed to be there of course)
I think this one would also be a good potential candidate to live in the openshift e2e tests, because it isn't actually testing the operating system itself, just metadata about the build. It's also tied to OCP and the release image.
While RHEL currently does not revert packages unfortunately, to me the ability to do so is a key selling point of image based updates. I guess if we explicitly wanted to revert, we could just denylist this test temporarily or something?
We also had a similar debate in FCOS over here: coreos/fedora-coreos-config#841
VERSION=$(oc adm release info $PAYLOAD -o json | jq -r '.displayVersions."machine-os".Version') | ||
OCP_COMMIT=$(curl -sSL https://art-rhcos-ci.s3.amazonaws.com/releases/rhcos-$RELEASE/$VERSION/x86_64/meta.json | jq -r '."ostree-commit"') | ||
|
||
curl -SL https://art-rhcos-ci.s3.amazonaws.com/releases/rhcos-$RELEASE/$VERSION/x86_64/rhcos-$VERSION-ostree.x86_64.tar -o $STREAM.tar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break after we try again with #593
I think it'd be a bit cleaner to download the meta.json
and then do jq .images.ostree.path
on it, then we can conditionalize on whether it ends in .ociarchive
or .tar
as is done in e.g.
https://github.com/coreos/coreos-assembler/blob/88efaff63b38542763336e12249b11f53d606776/src/cosalib/cmdlib.py#L251
Also avoid duplicating the URL would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cgwalters it looks like the version of rpm-ostree
in the latest rhcos does not have the ex-container
functionality. I see that you converted the oci-archive to tar inside cosa before and copied it to the test system [0] but this is an external test, and I don't think that's possible. Do you have any ideas to work around this?
Some ideas:
- create a ostree container to handle the conversion
- rewrite test as a mantle kola test and do the same thing as [0]
I also saw that in the future the container may contain a http server. Would anything need to change in the oci-archive or would it just ab an additional http server to serve the ostree repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's more alternatives:
- Add
cosa build --disallow-downgrades
and use it in the pipeline by default (and have a declarative file of allowed downgrades?) - Add the above, but implement in rpm-ostree (implicity adding
allowed-downgrades: []
i.e. empty YAML list would disallow downgrades, then the list is an allowlist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, this is basically the same thing as coreos/fedora-coreos-config#892 right? So if we deduplicate in either coreos-assembler or rpm-ostree it would be a win (less pipeline code, more easily reproduced outside of pipelines, etc.)
I kind of like the allowed-downgrades
in rpm-ostree natively but it's also pretty easy to implement outside it in a cosa workdir which is what the FCOS pipeline PR above is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jlebon's comment may apply here because it would be checking from the last build and not the last release. Maybe this test should be scrapped and changed to an upgrade test from the previous releases and db diff instead of just doing a db diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think going the cosa build --disallow-downgrades
+ YAML path could work. There'd be overhead involved, but maybe that's OK. I do like that with coreos/fedora-coreos-config#892, it's just an easy "rubber-stamping" thing: we don't have to do anything else than just force merge the PR. In the pipeline itself, nothing cares about downgrades.
For RHCOS, that's harder to do without lockfiles because unexpected downgrades could happen anytime up to and including during the prod pipeline build. So it's much harder to make it as lightweight as in FCOS.
# 3. Get the difference between all the releases and the "from release | ||
# indexes". This is the latest because there is no update from this | ||
# release. | ||
PAYLOAD=$(echo $GRAPH | jq -r '. as $graph | [$graph.edges[][0]] | unique as $from | $graph.nodes | to_entries as $indexed | [$indexed[].key] | unique as $nodes | ($nodes - $from)[] as $latest | $indexed[] | select(.key == $latest) | .value.payload') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is some veritable jq sorcery.
|
||
RHCOS_COMMIT=$(rpm-ostree status --json | jq -r .deployments[0].checksum) | ||
|
||
if [[ $(rpm-ostree db diff $OCP_COMMIT $RHCOS_COMMIT | grep -A1000 Downgraded) ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we should add something like db diff --downgrades
or something so this test could just validate it's empty.
We definitely should have more OpenShift tests!
Do you know if we can gate on OpenShift e2e tests? Although the test doesn't test the OS itself it checks the composition of the OS which is also tied to OpenShift. I like it as a kola test because because our team closely monitors these tests, we can gate early in the process, and it isn't resource intensive as standing up a cluster.
We could denylist with the new snooze support from @saqibali-2k coreos/coreos-assembler#2307 |
GRAPH=$(curl -sfH "Accept:application/json" "https://api.openshift.com/api/upgrades_info/v1/graph?channel=$STREAM") | ||
if [[ $? -ne 0 ]]; then | ||
fatal "Unable to get graph" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, if you provide a non-existent channel as an arg to that URL, it will still return a result:
$ curl -sfH Accept:application/json 'https://api.openshift.com/api/upgrades_info/v1/graph?channel=foobar'
{"nodes":[],"edges":[]}
$ echo $?
0
So unless the infra is down, your return code is always going to be a success. Maybe you want to do some additional check that the content returned isn't empty? (edit: I see the check below now)
Also, since the script uses set -e
, if curl
does return a failure the whole script will exit, so you don't need to check the return code.
# Verify all rhaos packages contain the same OpenShift version | ||
test_package_versions() { | ||
if [[ $(rpm -qa | grep rhaos | grep -v $OPENSHIFT_VERSION) ]]; then | ||
fatal "Error: rhaos packages do not match OpenShift version" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care if this function causes the test to exit and not do the downgrade check? Or should the test try to accumulate errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accumulating errors is probably better so we catch everything but we don't seem to do it on any of our tests
Add two tests: - Verify that all packages from the plashets have the correct OCP release - Verify that no packages are downgraded from the previous OCP release
66f852f
to
fd6dfb3
Compare
closing in favor of #635 and coreos/coreos-assembler#2459 |
Add two tests:
release