Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: git push force to github on second retry #1313

Conversation

harishv7
Copy link
Contributor

@harishv7 harishv7 commented Apr 16, 2024

Problem

Some incidences of a divergence between EFS and GitHub recently occurred. In reality, we trust EFS as our source-of-truth. Hence, it makes sense for our to automatically recover from divergences by doing a force push to GitHub.

There still exists a case when Ops might edit on GitHub while site users are editing which can cause the pushes to override Ops' changes. For this, a solution will be to enable Ops to lock the repo -> perform the edit -> perform GGS repair using form + unlock affected repo automatically. For now, engs can manually add/remove the lock on DDB

Closes ISOM-947

Solution

On failure to git push, we retry twice. On the second retry, we use a git push --force option to forcefully push EFS commits to GitHub.

Breaking Changes

  • Yes - this PR contains breaking changes
    • Details ...
  • No - this PR is backwards compatible with ALL of the following feature flags in this doc

Screenshots of before and after

To simulate this on staging I created a divergence by editing the repo on GitHub.
Screenshot 2024-05-02 at 2 08 38 PM

Further editing on CMS caused no errors and the divergence auto-recovered by taking EFS's state as the source of truth.
Screenshot 2024-05-02 at 2 09 11 PM

The commit on GH was overriden by the CMS changes

Tests

  • Intentionally, create a divergence between EFS and GitHub for a test site by editing the test site on GitHub
  • On CMS, login via email and perform a valid update or create file action
  • This would mean there is a divergence in the Git history of EFS vs GH
  • At this point, observe the repo using git status to see if the first 2 attempts fail due to the divergence and check logs
  • Eventually, GitHub should still have been updated
  • A log line of "Performing a force push to GitHub as earlier retries have failed for ${repoName}" should be found

@harishv7 harishv7 requested a review from a team April 16, 2024 16:45
Copy link
Contributor

@alexanderleegs alexanderleegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what are your thoughts on just always force pushing since EFS is our source of truth? I think we just need to be very deliberate about ensuring that sites are repaired when users are migrated over to email login (which is already covered in our runbook!) - that way we won't run the risk of accidentally losing user commits which were previous made directly to github.

@harishv7
Copy link
Contributor Author

Hmm, what are your thoughts on just always force pushing since EFS is our source of truth? I think we just need to be very deliberate about ensuring that sites are repaired when users are migrated over to email login (which is already covered in our runbook!) - that way we won't run the risk of accidentally losing user commits which were previous made directly to github.

@alexanderleegs what does "ensuring that sites are repaired when users are migrated over to email login" mean?

@kishore03109
Copy link
Contributor

@harishv7 I think the form solution should go out with this tho, making the lock manual + deviating from the original flow feels incident prone. Could you just add this in the deployment section that this ticket should be completed together with this?

@alexanderleegs
Copy link
Contributor

Hmm, what are your thoughts on just always force pushing since EFS is our source of truth? I think we just need to be very deliberate about ensuring that sites are repaired when users are migrated over to email login (which is already covered in our runbook!) - that way we won't run the risk of accidentally losing user commits which were previous made directly to github.

@alexanderleegs what does "ensuring that sites are repaired when users are migrated over to email login" mean?

Currently cloning the site is done automatically when agencies are migrated from netlify -> amplify, but we don't always do the github login -> email login step at the same time! This means that it's possible for user content to be lost if we migrate to amplify > user edits site more > we migrate the email login but forget to run the site repair form > user makes edit on new login, which causes a force push. Just something to be aware of, it's already in our runbook to run site repair form after doing an email login migration

Copy link

linear bot commented Apr 19, 2024

kishore03109 added a commit that referenced this pull request Apr 22, 2024
## Problem

#1313 introduces git force pushes. While this is ok from the user point of view, this impedes with operations since now ops will need to ask eng to lock the repo first, then edit it on github, then unlock the repo, then submit the repair form (since the repair form also attempts to acquire the lock)


## Solution

There are 3 modes for the repair form now. 
- Lock only
- Repair without lock  
- Repair with lock


Repair with lock is what the repair form used to do. This is now not the recommended flow for editing sites. The recommended flow to edit the sites manually should be 
1. submit lock only repair form
2. make edits 
3. submit form to repair without lock


Intentionally leaving the option to continue to repair without lock in case there is any lingering divergence bet staging and staging lite. 

**Breaking Changes**

<!-- Does this PR contain any backward incompatible changes? If so, what are they and should there be special considerations for release? -->

- [ ] Yes - this PR contains breaking changes
  - Details ...
- [X] No - this PR is backwards compatible with ALL of the following feature flags in this [doc](https://www.notion.so/opengov/Existing-feature-flags-518ad2cdc325420893a105e88c432be5)

## Before & After Screenshots

**BEFORE**:

<!-- [insert screenshot here] -->

**AFTER**:

Changes to form:

<img width="910" alt="Screenshot 2024-04-22 at 11 13 02 AM" src="https://github.com/isomerpages/isomercms-backend/assets/42832651/11409ccf-45c3-4ea7-9c42-50dfdc5c983a">

## Tests
- [ ] submit repair from to just lock an email login repo
- [ ] assert that the repo is not able to be edited on the cms 
- [ ] make some changes in the email login repo directly on github 
- [ ] submit the repair form without locking (since the locking was already done on the first step)
- [ ] assert that you receive a successful repaired email 
- [ ] submit the repair form one last time for the mode "repair with locking"
- [ ] assert that you receive a successful repaired email 
 

## Deploy Notes

Modify the production form to be similar to the one in staging, need to add one more dropdown as shown in the screenshot above. This needs to be released prior/in the same release as #1313. Once this is live, inform the isomer channel about the newer flow.
@alexanderleegs alexanderleegs mentioned this pull request Apr 24, 2024
11 tasks
@harishv7
Copy link
Contributor Author

With #1327, we can now merge this PR in as discussed during incident meetings. Going forward, any divergence between EFS and GitHub will automatically take EFS as the source-of-truth and perform a git push if the normal git push (without force) fails the first time.

@harishv7 harishv7 force-pushed the harish/isom-947-fix-divergence-between-github-and-efs-automatically branch from cd223c0 to 6c02753 Compare May 2, 2024 05:49
@harishv7 harishv7 merged commit aa07992 into develop May 2, 2024
23 checks passed
@mergify mergify bot deleted the harish/isom-947-fix-divergence-between-github-and-efs-automatically branch May 2, 2024 06:11
@harishv7 harishv7 mentioned this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants