Integrate with GitHub merge queues #3926

BenHenning · 2021-10-14T06:29:42Z

From core maintainers chat:

github/roadmap#272 could help solve two different problems that are particularly useful for us:

It eliminates cases where two subsequent PRs together cause a CI breakage without conflicts preventing the second PR from being merged
(Hopefully) If GitHub adds an event for entering the merge queue, we might be able to split off expensive or long-running CI tasks to be blockers of the merge queue rather than for each commit. This could speed up development without needing to sacrifice test fidelity since the longer test runs only occur during merging. This could be coupled with more aggressive retry policies to reduce flakes blocking PR merges.

We saw (1) happen a few times this summer, and (2) could be very useful for future screenshot testing, end-to-end tests in CI, and Espresso tests in CI (/cc @vinitamurthi).

BenHenning · 2022-03-18T19:56:25Z

Setting this up seems really straightforward: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/using-a-merge-queue. Unfortunately, it only supports standard merging & rebase/merge currently, not squash & merge (which we use). We'll need to wait until squash support is added before opting into queue support (plus, it's in beta so we might not qualify for early access, anyway).

BenHenning · 2023-03-21T00:28:04Z

FWIW this has now been released to open beta (https://github.blog/changelog/2023-02-08-pull-request-merge-queue-public-beta/) and squash & merge is available. I'll look into enabling this so that we can lift the "require branches to be up-to-date" requirement.

## Explanation Fix part of #59 Fix part of #3926 As part of developing downstream PRs for #59, it was discovered that PRs which change a LOT of files (such as #4937) can run into problems where ComputeAffectedTests simply times out trying to compute the entire list of affected targets. **Critical performance and compatibility fixes** There have been past efforts to optimize the affected tests workflows (bucketing, breaking up some of the computations), but it was discovered that the most expensive part of the process is running the ``rbuildfiles`` query to figure out which BUILD files were affected by changed files. It was known this was an expensive step in the past, but it wasn't clear until this PR exactly how to address it. This PR changes the script to now: - Filter ``rbuildfiles`` to only run under first-party targets (since it shouldn't be possible for third-party BUILD files to be affected by first-party changes). This reduces the search graph. - Introduce Bazel command sharding for this step like the script already does for others. This breaks up the command to run on a subset of files multiple times, combining the output rather than running a single command with a large input. It seems that ``rbuildfiles`` does not scale linearly with its input size, so this drastically improves the script's performance. It's thought that this approach is also more logically correct due to more correct sibling matching semantics, but it's a bit hard to reason about ``bazel query`` behavior at times so I can't be 100% confident in this. Nevertheless, the existing tests pass and I haven't seen any issues from using these changes in downstream PRs. Separately, another issue was discovered wherein some commands (including certain Bazel commands) can actually cause ``CommandExecutorImpl`` to soft-lock and always time out. This was due to an issue in the previous implementation wherein it would wait to read a command's stdout until after the timeout has been completed (i.e. it assumed the process would finish). This isn't correct, however: stdout is blocking I/O, and some commands are implemented to only continue execution after their standard output is read. The new implementation makes use of coroutine actors to consume stdout and stderr at the same time as waiting for execution to ensure all commands can continue execution and that they finish within the desired timeout. Note that the new ``ScriptBackgroundCoroutineDispatcher`` was actually introduced (in #5313) specifically to support this new ``CommandExecutorImpl`` implementation, though it has since been found to have lots of other nice benefits by providing scripts with a reliable mechanism for performing asynchronous operations without having to manage their own execution dispatcher. Command execution for Bazel commands has also been updated to time out after 5 minutes rather than the previous 1 minute. Despite the optimizations and robustness improvements above, some commands still take quite some time to run for especially large and complex cases. While this change may result in a slower failure turnaround in cases when commands are soft-locked, it should result in better CI and script robustness in the long-term. **Better support for chained PRs & possibly merge queues** ComputeAffectedTests was also updated to use a merge base commit rather than a reference to the develop branch (where this new commit hash is provided by the CI workflow). The idea behind this is that the merge base commit is: - More logically correct (as ``ComputeAffectedTests`` is meant to run in contexts where a branch wants to be merged into a destination). - Better compatible with chained PRs. This allows for **significantly** better CI performance in chained PR situations since now only the tests relevant to the child PR will be run rather than all tests for the child & its parental hierarchy (see downstream PRs' CI runs to see this working in action). - Hopefully better compatibility with merge queues (#3926). This hasn't been verified, but the flexibility in customizing the destination for affected tests should be the main prerequisite to properly supporting merge queues. **Other changes** ``GitClient`` was updated to have a peace-of-mind check to ensure the base commit (provided as explained in the previous section) matches the merge base of the current branch. This should always be true (except maybe in merge queues--this will need to be verified). Note that this is only a soft warning, not an assertion failure. ``RepositoryFile`` was cleaned up slightly to be a bit more consistent with other directory management approaches done in scripts. I can see this being refactored more in the future. Callsite behavior isn't expected to be affected by these changes. Some script tests were updated to have consistent formatting (which required updating the TODO exemptions). ``TodoIssueResolvedCheckTest`` and ``TodoOpenCheckTest`` also had some of their test file management cleaned up a bit. **A note on testing** These are inherently difficult things to test. I've verified what I could via CI and general observation, but I've also largely relied on existing tests to catch regressions (and many were caught during changes to the script). Since these are mainly implementation and not behavioral changes, I'm comfortable with the level of testing that was done. ## Essential Checklist - [x] The PR title and explanation each start with "Fix #bugnum: " (If this PR fixes part of an issue, prefix the title with "Fix part of #bugnum: ...".) - [x] Any changes to [scripts/assets](https://github.com/oppia/oppia-android/tree/develop/scripts/assets) files have their rationale included in the PR explanation. - [x] The PR follows the [style guide](https://github.com/oppia/oppia-android/wiki/Coding-style-guide). - [x] The PR does not contain any unnecessary code changes from Android Studio ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#undo-unnecessary-changes)). - [x] The PR is made from a branch that's **not** called "develop" and is up-to-date with "develop". - [x] The PR is **assigned** to the appropriate reviewers ([reference](https://github.com/oppia/oppia-android/wiki/Guidance-on-submitting-a-PR#clarification-regarding-assignees-and-reviewers-section)). ## For UI-specific PRs only N/A -- This is an infrastructure-only change. --------- Co-authored-by: Adhiambo Peres <[email protected]> Co-authored-by: Sean Lip <[email protected]>

BenHenning added temp: triage for beta and removed temp: triage for beta labels Jun 11, 2022

Broppia added the issue_type_infrastructure label Jun 13, 2022

Broppia added the Impact: Low Low perceived user impact (e.g. edge cases). label Jul 7, 2022

BenHenning mentioned this issue Jul 12, 2022

Fix build on tip-of-tree #4427

Closed

BenHenning added the issue_user_developer label Sep 16, 2022

BenHenning added this to [Team] Developer Workflow & Infrastructure - Android Sep 16, 2022

BenHenning moved this to Needs Triage in [Team] Developer Workflow & Infrastructure - Android Sep 16, 2022

BenHenning added the Z-ibt Temporary label for Ben to keep track of issues he's triaged. label Sep 16, 2022

BenHenning self-assigned this Mar 21, 2023

BenHenning mentioned this issue Mar 21, 2023

Fix part of #3926: Add support for CI checks in merge queues [Blocked: #4929] #4913

Closed

6 tasks

seanlip added enhancement End user-perceivable enhancements. and removed issue_type_infrastructure labels Mar 28, 2023

seanlip moved this to Todo in [Team] Developer Workflow & Infrastructure - Android Jun 4, 2023

MohitGupta121 added the Work: Medium The means to find the solution is clear, but it isn't at good-first-issue level yet. label Jun 16, 2023

MohitGupta121 moved this from Todo to In Progress in [Team] Developer Workflow & Infrastructure - Android Jun 16, 2023

BenHenning mentioned this issue May 16, 2024

Fix part of #59, part of #3926: Upgrade compute affected tests #4929

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate with GitHub merge queues #3926

Integrate with GitHub merge queues #3926

BenHenning commented Oct 14, 2021

BenHenning commented Mar 18, 2022 •

edited

Loading

BenHenning commented Mar 21, 2023

Integrate with GitHub merge queues #3926

Integrate with GitHub merge queues #3926

Comments

BenHenning commented Oct 14, 2021

BenHenning commented Mar 18, 2022 • edited Loading

BenHenning commented Mar 21, 2023

BenHenning commented Mar 18, 2022 •

edited

Loading