Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-26276: update DNS after cluster reassignment #2064

Merged
merged 15 commits into from
Nov 4, 2024

Conversation

johannes94
Copy link
Contributor

@johannes94 johannes94 commented Oct 8, 2024

Description

  • Change FM to send UPSERT type requests to Route53, this way we can Update and Create CNAME records with the same operation
  • Change Admin Endpoint for assign-cluster to reset route creation DB fields
  • Change Admin Endpoint for assign-cluster to reset tenant to "provisioning" which will trigger resending of routes from FS -> Update in DB -> Upsert operation through cname_mgr -> Updated CNAME record after migration
  • Added an additional DB field EnteredProvisioning to work around timeout issues, since CreatedAt is not a valid reference time for provisioning after reassignment. Adjusted preparing and provisioning operations to set and act according to the new field if present.
  • Depends on ROX-26276: replace dinosaurs with centrals for cname components #2056 and ROX-26275: cluster reassignment admin API #2053

Checklist (Definition of Done)

  • Unit and integration tests added
  • Added test description under Test manual
  • Documentation added if necessary (i.e. changes to dev setup, test execution, ...)
  • CI and all relevant tests are passing
  • Add the ticket number to the PR title if available, i.e. ROX-12345: ...
  • Discussed security and business related topics privately. Will move any security and business related topics that arise to private communication channel.
  • Add secret to app-interface Vault or Secrets Manager if necessary
  • RDS changes were e2e tested manually
  • Check AWS limits are reasonable for changes provisioning new resources
  • (If applicable) Changes to the dp-terraform Helm values have been reflected in the addon on integration environment

Test manual

  • Adjusted integration tests to test DNS reset on AssignCentralCluster Admin API as well
  • Tested on an infractl OSD as well:
    • Setup infra cluster
    • make deploy/bootstrap deploy/dev
    • Set RHACS_CLUSTER_MIGRATION=true on FM
    • Set --enable-central-external-certificate=true on FM
    • Create a central
    • make sure DNS records are properly created
    • connect to the FM database to inspect routes_creation_id
    • Call the AssignCluster admin API endpoint (reassign to current cluster)
    • See in DB that all routes field are reseted and tenant is back to "provisioning" state
    • Verify tenant reaches ready state again, view that a new ID is stored in "routes_creation_id"
    • Verify that the change in AWS route53 has the correct timestamp (when new routes should be created)
    • Make sure to delete the tenant so that AWS resources are cleaned up

@openshift-ci openshift-ci bot added the approved label Oct 8, 2024
@johannes94 johannes94 changed the title ROX-26276: NO REVIEW, TESTING CI update DNS after cluster reassignment ROX-26276: NO REVIEW, Need to rebase update DNS after cluster reassignment Oct 9, 2024
@johannes94 johannes94 force-pushed the jmalsam/ROX-26276-cluster-assignment-dns branch from 678895b to b750f06 Compare October 14, 2024 13:37
@johannes94 johannes94 changed the title ROX-26276: NO REVIEW, Need to rebase update DNS after cluster reassignment ROX-26276: update DNS after cluster reassignment Oct 14, 2024
@johannes94 johannes94 force-pushed the jmalsam/ROX-26276-cluster-assignment-dns branch from b750f06 to a2d7c73 Compare October 14, 2024 13:57
@johannes94 johannes94 requested a review from ebensh October 14, 2024 13:58
Copy link
Collaborator

@ebensh ebensh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change Admin Endpoint for assign-cluster to reset tenant to "provisioning" which will trigger resending of routes from FS -> Update in DB -> Upsert operation through cname_mgr -> Updated CNAME record after migration

This is the one that makes me nervous. It's adding complexity to the state flow that other pieces may not be expecting - i.e. in the future, whenever we're reasoning about Provisioning and Ready, it's no longer a one-way transition.

@johannes94
Copy link
Contributor Author

Change Admin Endpoint for assign-cluster to reset tenant to "provisioning" which will trigger resending of routes from FS -> Update in DB -> Upsert operation through cname_mgr -> Updated CNAME record after migration

This is the one that makes me nervous. It's adding complexity to the state flow that other pieces may not be expecting - i.e. in the future, whenever we're reasoning about Provisioning and Ready, it's no longer a one-way transition.

I didn't like that part as well. The alternative I thought of was to have FS always report the routes as opposed to only reporting routes if the tenant is not in ready state yet.

That would essentially make the state flow one-way but bloat every state communications request from FS to FM with additional data that is not necessary in most cases (if routes already created).

A third idea would be to adjust the FM <- FS polling API so that we additional to the state report the value of RoutesCreated boolean for a tenant and act based on that.

What approach would you choose or do you have a different idea?

@johannes94
Copy link
Contributor Author

/retest

@johannes94 johannes94 force-pushed the jmalsam/ROX-26276-cluster-assignment-dns branch from b7312af to ebb9556 Compare October 25, 2024 12:39
@johannes94
Copy link
Contributor Author

/retest
cluster create failed

@johannes94
Copy link
Contributor Author

/retest
cluster create failure

@ebensh
Copy link
Collaborator

ebensh commented Oct 30, 2024

LGTM, thanks for the changes!

@openshift-ci openshift-ci bot added the lgtm label Oct 30, 2024
Copy link
Contributor

openshift-ci bot commented Oct 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ebensh, johannes94

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@johannes94
Copy link
Contributor Author

/retest

1 similar comment
@johannes94
Copy link
Contributor Author

/retest

@johannes94 johannes94 merged commit c12b5f3 into main Nov 4, 2024
15 checks passed
@johannes94 johannes94 deleted the jmalsam/ROX-26276-cluster-assignment-dns branch November 4, 2024 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants