feat: git push force to github on second retry #1313

harishv7 · 2024-04-16T16:45:36Z

Problem

Some incidences of a divergence between EFS and GitHub recently occurred. In reality, we trust EFS as our source-of-truth. Hence, it makes sense for our to automatically recover from divergences by doing a force push to GitHub.

There still exists a case when Ops might edit on GitHub while site users are editing which can cause the pushes to override Ops' changes. For this, a solution will be to enable Ops to lock the repo -> perform the edit -> perform GGS repair using form + unlock affected repo automatically. For now, engs can manually add/remove the lock on DDB

Closes ISOM-947

Solution

On failure to git push, we retry twice. On the second retry, we use a git push --force option to forcefully push EFS commits to GitHub.

Breaking Changes

Yes - this PR contains breaking changes
- Details ...
No - this PR is backwards compatible with ALL of the following feature flags in this doc

Screenshots of before and after

To simulate this on staging I created a divergence by editing the repo on GitHub.

Further editing on CMS caused no errors and the divergence auto-recovered by taking EFS's state as the source of truth.

The commit on GH was overriden by the CMS changes

Tests

Intentionally, create a divergence between EFS and GitHub for a test site by editing the test site on GitHub
On CMS, login via email and perform a valid update or create file action
This would mean there is a divergence in the Git history of EFS vs GH
At this point, observe the repo using git status to see if the first 2 attempts fail due to the divergence and check logs
Eventually, GitHub should still have been updated
A log line of "Performing a force push to GitHub as earlier retries have failed for ${repoName}" should be found

src/services/db/GitFileSystemService.ts

alexanderleegs

Hmm, what are your thoughts on just always force pushing since EFS is our source of truth? I think we just need to be very deliberate about ensuring that sites are repaired when users are migrated over to email login (which is already covered in our runbook!) - that way we won't run the risk of accidentally losing user commits which were previous made directly to github.

harishv7 · 2024-04-18T15:29:25Z

Hmm, what are your thoughts on just always force pushing since EFS is our source of truth? I think we just need to be very deliberate about ensuring that sites are repaired when users are migrated over to email login (which is already covered in our runbook!) - that way we won't run the risk of accidentally losing user commits which were previous made directly to github.

@alexanderleegs what does "ensuring that sites are repaired when users are migrated over to email login" mean?

kishore03109 · 2024-04-19T00:48:17Z

@harishv7 I think the form solution should go out with this tho, making the lock manual + deviating from the original flow feels incident prone. Could you just add this in the deployment section that this ticket should be completed together with this?

alexanderleegs · 2024-04-19T02:14:17Z

Hmm, what are your thoughts on just always force pushing since EFS is our source of truth? I think we just need to be very deliberate about ensuring that sites are repaired when users are migrated over to email login (which is already covered in our runbook!) - that way we won't run the risk of accidentally losing user commits which were previous made directly to github.

@alexanderleegs what does "ensuring that sites are repaired when users are migrated over to email login" mean?

Currently cloning the site is done automatically when agencies are migrated from netlify -> amplify, but we don't always do the github login -> email login step at the same time! This means that it's possible for user content to be lost if we migrate to amplify > user edits site more > we migrate the email login but forget to run the site repair form > user makes edit on new login, which causes a force push. Just something to be aware of, it's already in our runbook to run site repair form after doing an email login migration

src/services/db/GitFileSystemService.ts

linear · 2024-04-19T07:21:23Z

ISOM-947 Fix divergence between GitHub and EFS automatically

## Problem #1313 introduces git force pushes. While this is ok from the user point of view, this impedes with operations since now ops will need to ask eng to lock the repo first, then edit it on github, then unlock the repo, then submit the repair form (since the repair form also attempts to acquire the lock) ## Solution There are 3 modes for the repair form now. - Lock only - Repair without lock - Repair with lock Repair with lock is what the repair form used to do. This is now not the recommended flow for editing sites. The recommended flow to edit the sites manually should be 1. submit lock only repair form 2. make edits 3. submit form to repair without lock Intentionally leaving the option to continue to repair without lock in case there is any lingering divergence bet staging and staging lite. **Breaking Changes**  - [ ] Yes - this PR contains breaking changes - Details ... - [X] No - this PR is backwards compatible with ALL of the following feature flags in this [doc](https://www.notion.so/opengov/Existing-feature-flags-518ad2cdc325420893a105e88c432be5) ## Before & After Screenshots **BEFORE**:  **AFTER**: Changes to form: <img width="910" alt="Screenshot 2024-04-22 at 11 13 02 AM" src="https://github.com/isomerpages/isomercms-backend/assets/42832651/11409ccf-45c3-4ea7-9c42-50dfdc5c983a"> ## Tests - [ ] submit repair from to just lock an email login repo - [ ] assert that the repo is not able to be edited on the cms - [ ] make some changes in the email login repo directly on github - [ ] submit the repair form without locking (since the locking was already done on the first step) - [ ] assert that you receive a successful repaired email - [ ] submit the repair form one last time for the mode "repair with locking" - [ ] assert that you receive a successful repaired email ## Deploy Notes Modify the production form to be similar to the one in staging, need to add one more dropdown as shown in the screenshot above. This needs to be released prior/in the same release as #1313. Once this is live, inform the isomer channel about the newer flow.

harishv7 · 2024-04-24T13:38:32Z

With #1327, we can now merge this PR in as discussed during incident meetings. Going forward, any divergence between EFS and GitHub will automatically take EFS as the source-of-truth and perform a git push if the normal git push (without force) fails the first time.

harishv7 requested a review from a team April 16, 2024 16:45

timotheeg reviewed Apr 17, 2024

View reviewed changes

src/services/db/GitFileSystemService.ts Show resolved Hide resolved

alexanderleegs reviewed Apr 17, 2024

View reviewed changes

timotheeg reviewed Apr 19, 2024

View reviewed changes

src/services/db/GitFileSystemService.ts Outdated Show resolved Hide resolved

kishore03109 mentioned this pull request Apr 21, 2024

feat(repair): add locking functionailty #1327

Merged

9 tasks

alexanderleegs mentioned this pull request Apr 24, 2024

release v0.82.0 #1329

Merged

11 tasks

harishv7 requested review from alexanderleegs, timotheeg, dcshzj, kishore03109 and seaerchin April 25, 2024 03:52

kishore03109 approved these changes Apr 28, 2024

View reviewed changes

feat: git push force to github on second retry

6c02753

harishv7 force-pushed the harish/isom-947-fix-divergence-between-github-and-efs-automatically branch from cd223c0 to 6c02753 Compare May 2, 2024 05:49

harishv7 merged commit aa07992 into develop May 2, 2024
23 checks passed

mergify bot deleted the harish/isom-947-fix-divergence-between-github-and-efs-automatically branch May 2, 2024 06:11

harishv7 mentioned this pull request May 2, 2024

release v0.85.0 #1362

Merged

kishore03109 pushed a commit that referenced this pull request May 9, 2024

feat: git push force to github on second retry (#1313)

de3b512

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: git push force to github on second retry #1313

feat: git push force to github on second retry #1313

harishv7 commented Apr 16, 2024 •

edited

Loading

alexanderleegs left a comment

harishv7 commented Apr 18, 2024

kishore03109 commented Apr 19, 2024

alexanderleegs commented Apr 19, 2024

linear bot commented Apr 19, 2024

harishv7 commented Apr 24, 2024

feat: git push force to github on second retry #1313

feat: git push force to github on second retry #1313

Conversation

harishv7 commented Apr 16, 2024 • edited Loading

Problem

Solution

Screenshots of before and after

Tests

alexanderleegs left a comment

Choose a reason for hiding this comment

harishv7 commented Apr 18, 2024

kishore03109 commented Apr 19, 2024

alexanderleegs commented Apr 19, 2024

linear bot commented Apr 19, 2024

harishv7 commented Apr 24, 2024

harishv7 commented Apr 16, 2024 •

edited

Loading