Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rhcos: Embed full build metadata in binary #1423

Merged
merged 1 commit into from
Apr 4, 2019

Conversation

cgwalters
Copy link
Member

This way we avoid needing a service available at least for AWS
installs. The AMIs now get hardcoded into the binary.

Further, in order to make it easier for developers to test
alternative builds of RHCOS, add an override environment variable
just like is available for the release payload.

@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 15, 2019
@cgwalters
Copy link
Member Author

Some discussion in the previous PR #1402 (comment)

pkg/rhcos/builds.go Outdated Show resolved Hide resolved
@abhinavdahiya abhinavdahiya requested review from wking and removed request for tomassedovic and steveej March 15, 2019 22:53
@cgwalters cgwalters force-pushed the rhcos-pin-meta branch 2 times, most recently from 73b02bd to 3756253 Compare March 16, 2019 12:56
data/data/rhcos.json Outdated Show resolved Hide resolved
pkg/rhcos/ami.go Outdated Show resolved Hide resolved
} else {
build = rhcosBuildID
}
url := fmt.Sprintf("%s/%s/%s/meta.json", baseURL, rhcosBuildChannel, build)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather just stick with our existing OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE for overrides. Coupled with the bundled metadata for the pinned RHCOS that this PR is adding, that would mean we could drop this JSON request entirely. Different override approaches have been floated before in #1168 and #1402 though, so there's clearly some workflow where "supply the override AMI (or QCOW2 image URI, etc.)" is not seen as sufficient. Can you point me at that place, so I can float an approach that uses OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anyone who wants to test something related to RHCOS will find it far easier to supply a version number rather than parsing the version number and getting the AMI or extracting the qcow2 manually. I can very much say this is the case for me.

Further, I expect us to at some point have CI that overrides this. "Test this release payload with updated rhcos bootimage" is a valid thing to want to do for the same reason we want to override release payloads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really we need both. The existing image override is useful to test an unofficial bootimage.

In fact just now I used both - tested with the latest rhcos from our buildsystem, hit an issue, did a local build with a test fix, then used the local override to point to file:///.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anyone who wants to test something related to RHCOS will find it far easier to supply a version number rather than parsing the version number and getting the AMI or extracting the qcow2 manually.

How many people need this, though?

Further, I expect us to at some point have CI that overrides this. "Test this release payload with updated rhcos bootimage" is a valid thing to want to do for the same reason we want to override release payloads.

Sure, but my preferred approach for this is "override the release image" (via #1286).

Really we need both. The existing image override is useful to test an unofficial bootimage.

It can be used for testing official boot images too, right? The only issue is whether this lookup needs to be built into the installer, or if you can use external tool like this script.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but my preferred approach for this is "override the release image" (via #1286).

You know I agree 💯 % with that - particularly because if e.g. say one is overriding the image thinking to test a kernel change to e.g. a NIC driver for a new cloud platform, it will be a surprise probably for most people that that change gets overwritten quickly by machine-os-content - which you need a custom release payload to change. (And we need to make changing both ergonomic, but that's another issue)

But my real argument here is:

The only issue is whether this lookup needs to be built into the installer

The code's already there though! It's just missing a few config knobs.

How many people need this, though?

It's not just the number of people - it's also that I don't want lots of places parsing that release schema - while we're unlikely to change it soon, again the installer already has the code, so let's just make it more useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code's already there though! It's just missing a few config knobs.

But this is our chance to get rid of it, and "more knobs" feels like doubling down on a known dead-end approach :p.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is our chance to get rid of it, and "more knobs" feels like doubling down on a known dead-end approach :p.

I don't think we're going to deploy the dependencies necessary in #1286 anytime soon.

Also, this code would be useful for us to deal with the Ignition 3 change.

pkg/rhcos/qemu.go Outdated Show resolved Hide resolved
pkg/rhcos/builds.go Outdated Show resolved Hide resolved
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 20, 2019
@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 20, 2019
@cgwalters
Copy link
Member Author

(Sorry, had a force push accident here...reset now)

@cgwalters cgwalters force-pushed the rhcos-pin-meta branch 2 times, most recently from e245bc9 to 4cc5fdd Compare March 22, 2019 21:18
@cgwalters
Copy link
Member Author

OK updated now 🆕 to address comments.

@cgwalters
Copy link
Member Author

/retest

@ashcrow
Copy link
Member

ashcrow commented Apr 2, 2019

/retest

// rhcosBuildChannel is a name for a stream of builds
var rhcosBuildChannel string

func fetchRHCOSBuild(ctx context.Context) (metadata, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be rewired as the overrides were dropped.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept most of the code to handle multiple just dropped the override environment variables. What do you want exactly? Delete everything except support for reading the pinned JSON?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete everything except support for reading the pinned JSON?

We still need a way to override for CI, but that override does not need any build fetching, so this signature looks fine to me.

@wking
Copy link
Member

wking commented Apr 3, 2019

I've floated a fixup in 009285da2df880, adding support to the script for earlier Pythons, like my RHEL 7.5 CSB's 3.4.9. I'm also including the base URI from which the metadata was sourced, which can be used later in Go for constructing the bootimage URIs (libvirt QCOW2, etc.), so we don't have to code the likely source into the Go. I'll go back later tonight and adjust the Go to match. Thoughts?

@wking
Copy link
Member

wking commented Apr 3, 2019

Ok, I've floated cabafdc54 as well, which adds BaseURI to the Go metadata structure and drops the HTTP fetching, so we're left with just the local asset and OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE. Thoughts?

@wking
Copy link
Member

wking commented Apr 3, 2019

And now I've floated 14cfc9965 which:

@ashcrow
Copy link
Member

ashcrow commented Apr 3, 2019

/retest

@abhinavdahiya
Copy link
Contributor

And now I've floated 14cfc99 which:

Please no. This is too much python with too many options for what I'm comfortable supporting in the installer repo.

  • Restores the regions commented out by bd88157 (hack/build: Pin to RHCOS 400.7.20190306.0 #1407), and instead answers "which regions have AMIs?" by looking at the local rhcos.json asset. This removes the need to manually sync that region list as RHCOS grows (or removes) AMIs from different regions.

This needs to be a follow up.

@wking
Copy link
Member

wking commented Apr 3, 2019

  • Restores the regions commented out by bd88157 (hack/build: Pin to RHCOS 400.7.20190306.0 #1407), and instead answers "which regions have AMIs?" by looking at the local rhcos.json asset. This removes the need to manually sync that region list as RHCOS grows (or removes) AMIs from different regions.

This needs to be a follow up.

Dropped from my squash commit with 14cfc9965 -> b1daba1cc.

I've left my expanded Python script alone for now, but am fine trimming it back down if we decide to only focus on updating rhcos.json.

@ashcrow
Copy link
Member

ashcrow commented Apr 3, 2019

/retest

@wking
Copy link
Member

wking commented Apr 3, 2019

New squash, trimming the Python back down to just updating rhcos.json: 551acf1da

@wking wking changed the title rhcos: Embed full build metadata in binary, add override envvar rhcos: Embed full build metadata in binary Apr 3, 2019
@wking
Copy link
Member

wking commented Apr 3, 2019

Updated the PR topic now that there are no longer new env vars; probably update the commit message too when squashing.

@wking
Copy link
Member

wking commented Apr 3, 2019

Pushed 551acf1da -> e726a2b6c, fixing a bug in BaseURI joining. I think the squash commit is ready to go. @cgwalters, did you want to pull it into your branch?

This way we avoid needing a service available at least for AWS
installs.  The AMIs now get hardcoded into the binary.

`hack/update-rhcos-bootimage.py` is a small script which accepts
a URL to build metadata and updates our cached version with just
the subset of keys we care about (or potentially care about); e.g.
not the pkgdiff.

From Trevor:

Drop the HTTP stuff in favor of just the local asset or
`OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE`.

The codecs business works around the lack of byte-stream support in
json.load before Python 3.6 [1].

[1]: https://docs.python.org/3/library/json.html#json.load

Co-authored-by: Trevor King <[email protected]>
@cgwalters
Copy link
Member Author

@cgwalters, did you want to pull it into your branch?

Done, thanks!

@wking
Copy link
Member

wking commented Apr 3, 2019

e2e-aws:

error: could not run steps: some steps failed:
  * could not update output imagestreamtag: Operation cannot be fulfilled on imagestreamtags.image.openshift.io "stable": the object has been modified; please apply your changes to the latest version and try again
  * could not update output imagestreamtag: Operation cannot be fulfilled on imagestreamtags.image.openshift.io "stable": the object has been modified; please apply your changes to the latest version and try again

/retest

@wking
Copy link
Member

wking commented Apr 3, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 3, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 3, 2019
@wking
Copy link
Member

wking commented Apr 4, 2019

e2e-aws:

Flaky tests:

[Conformance][templates] templateinstance impersonation tests should pass impersonation creation tests [Suite:openshift/conformance/parallel/minimal]
[Feature:Builds] build have source revision metadata  started build should contain source revision information [Suite:openshift/conformance/parallel]
[Feature:Builds][Conformance] oc new-app  should fail with a --name longer than 58 characters [Suite:openshift/conformance/parallel/minimal]
[Feature:ImageLayers][registry] Image layer subresource should return layers from tagged images [Suite:openshift/conformance/parallel]
[Feature:Platform][Smoke] Managed cluster should start all core operators [Suite:openshift/conformance/parallel]
[k8s.io] Probing container should be restarted with a exec "cat /tmp/health" liveness probe [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]

/retest

@wking
Copy link
Member

wking commented Apr 4, 2019

e2e-aws:

Flaky tests:

[Feature:DeploymentConfig] deploymentconfigs with minimum ready seconds set [Conformance] should not transition the deployment to Complete before satisfied [Suite:openshift/conformance/parallel/minimal]
[Feature:DeploymentConfig] deploymentconfigs with multiple image change triggers [Conformance] should run a successful deployment with multiple triggers [Suite:openshift/conformance/parallel/minimal]
[Feature:Platform][Smoke] Managed cluster should start all core operators [Suite:openshift/conformance/parallel]
[sig-cli] Kubectl client [k8s.io] Kubectl copy should copy a file from a running Pod [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-cli] Kubectl client [k8s.io] Simple pod should support port-forward [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-scheduling] ResourceQuota [Feature:PodPriority] should verify ResourceQuota's priority class scope (quota set to pod count: 1) against a pod with different priority class (ScopeSelectorOpNotIn). [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] EmptyDir volumes volume on tmpfs should have the correct mode [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Pre-provisioned PV (default fs)] subPath should support existing directories when readOnly specified in the volumeSource [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]
[sig-storage] In-tree Volumes [Driver: cinder] [Testpattern: Pre-provisioned PV (default fs)] subPath should support existing single file [Suite:openshift/conformance/parallel] [Suite:k8s]

Crashlooping (etcd certs) is being tracked in rhbz#1694169 with a patch in coreos/kubecsr#25. Cinder failure seems to have been a one-off flake:

deck-build-log

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit f04777c into openshift:master Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants