Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Render osImageURL, handle "bootstrap" case in MCD #324

Closed
wants to merge 10 commits into from

Conversation

cgwalters
Copy link
Member

This is option 2 for #273

Still a WIP, it builds but haven't tested updates with it.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 18, 2019
@openshift-ci-robot openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 18, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 18, 2019
@@ -91,21 +91,16 @@ func generateMachineConfigs(config *RenderConfig, templateDir string) ([]*mcfgv1
return cfgs, nil
}

// GenerateMachineConfigsForRole is part of generateMachineConfigs; it operates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the documentation!

@cgwalters
Copy link
Member Author

/test e2e-aws-op

1 similar comment
@cgwalters
Copy link
Member Author

/test e2e-aws-op

@ashcrow
Copy link
Member

ashcrow commented Jan 18, 2019

Terraform/AWS flake and

level=fatal msg="waiting for openshift-console URL: context deadline exceeded"

/retest

@cgwalters cgwalters force-pushed the cvo-integration-4 branch 3 times, most recently from fb3a622 to fbd06d1 Compare January 18, 2019 22:37
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2019
@cgwalters
Copy link
Member Author

OK rolled in #329 to this and also added a hacky patch to give us some debugging data when our e2e fails.

@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2019
@cgwalters
Copy link
Member Author

/retest

I think we have this in CI but we're not noticing it.
If it's happening we need to fix it.

Ref: openshift#301
Since something is doing that, and I want to know if it's somehow
succeeding.
Have the MCC take `osImageURL` as provided by the cluster update/release payload
and generate a `00-{master,worker}-osimageurl` MC from it, which ensures
the MCD will update the node to it.

However, we need special handling for the *initial* case where we boot
into a target config, but we may be using an old OS image.  Currently
the MCD would treat this as "config drift" and go degraded.

Today we write the node annotations to a file in `/etc` as part of the
rendered Ignition.  Use that as a "bootstrap may be required" flag,
and handle it specially - if we need to pivot, do *just* that and
reboot.

We also clean things up by unlinking that node annotation file; after
that, if the `osImageURL` drifts from the expected config, we'll go
degraded, just like if someone modified a file.

Closes: openshift#183
I'd like to know how long the parts of the initial sync take.
The MCC is what's going to start churning the MachineConfigs.
And also after this we should have the pool pause until
all nodes are out of Bootstrap.

This should also help with races.
This is an ugly hack, better to do the config first with
osimageurl, but I don't quite understand the bootstrap code yet.
@ashcrow
Copy link
Member

ashcrow commented Jan 21, 2019

/retest

@ashcrow
Copy link
Member

ashcrow commented Jan 22, 2019

Errors trying to read the e2e log file

/retest

@ashcrow
Copy link
Member

ashcrow commented Jan 22, 2019

ISE trying to see logs

/retest

@ashcrow
Copy link
Member

ashcrow commented Jan 22, 2019

http2: server sent GOAWAY and closed the connection; LastStreamID=11, ErrCode=NO_ERROR, debug=""
error: watch closed before Until timeout

/retest

@openshift-ci-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws-op 85c1e17 link /test e2e-aws-op
ci/prow/e2e-aws 85c1e17 link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

@cgwalters: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 25, 2019
@cgwalters
Copy link
Member Author

OK so here's the problem I'm struggling with...the "bootstrap MC" and the "inital rendered MC" need to be the same. If we land the installer change without this PR, or vice versa - things will break because booted nodes won't be able to find the MC they expect.

That was the core problem that brought this PR to a halt.

We need to figure out how to "ratchet" the PRs to both the installer and here.

What I'm vaguely thinking about is that we somehow detect whether or not the installer set up the osimageurl. Maybe stuff something in the install-config-v1 configmap? If it didn't, then we also don't render in the non-bootstrap path?

@cgwalters
Copy link
Member Author

Splitting this PR into:

#362
#363

@cgwalters cgwalters closed this Feb 1, 2019
osherdp pushed a commit to osherdp/machine-config-operator that referenced this pull request Apr 13, 2021
…le-baseimages-to-mach-ocp-build-data-config

Bug 1878163: Updating Dockerfile baseimages to mach ocp-build-data config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants