Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimise local diff #1334

Merged
merged 5 commits into from
May 10, 2024
Merged

feat: optimise local diff #1334

merged 5 commits into from
May 10, 2024

Conversation

jacobkwan
Copy link
Contributor

@jacobkwan jacobkwan commented Apr 25, 2024

Problem

Closes https://linear.app/ogp/issue/ISOM-833/benchmark-impact-of-local-git-diff-on-cpu-usage
Local diff was implemented a while back but kept behind a feature flag.
Feature flag has not been enabled in production due to concerns about performance especially for larger diffs.

Solution

Understanding bottlenecks via Datadog

Everything below is in the context of motac2 with ~444 files changed in one RR
Trace for unoptimised implementation
Significant bottlenecks (in order of priority):

  1. getFilesChanged 27s / 40s or ~66%
  2. calling getLatestCommitOfPath for every single file changed
    • called in parallel but still each call is about 10s / 40s or 25%
  3. querying Users for every commit to get email (@timotheeg had already fixed this for the GitHub diff flow), actually doesn't add much to latency

Optimisations

  1. Refactor getFilesChanged to call git diff-tree -r --name-only master..staging
    • this change alone cuts the 27s to ~70ms
  2. Refactor the entire flow. Like @timotheeg mentioned, writing "clean" and "elegant" code in the original implementation - mapping over the filenames and performing the operations on that individual level - is not always the best when it comes to efficiency. Additionally, there is extra network cost from interacting with EFS many times.
    • instead getCommitsBetweenMasterAndStaging once
    • loop over these commits, constructing a filenameToLogMap. At the same time, get unique userIds from commit messages to dedupe queries
    • query DB for the unique users and construct a userIdToUserMap
    • finally use the two maps to construct the full response for each filename - minimal changes here just that it no longer needs to query DB and get latest commit from EFS for every single filename, as the two data structures are sufficient.

Final flame graph after optimisations:

Bottleneck is now the many parallel reads on EFS. Out of scope for this PR and its also current behaviour for GitHub flow.
TLDR: reads are needed to get the permalink from frontmatter, which is then used to construct the stagingUrl

Screen.Recording.2024-04-25.at.10.30.55.PM.mov

Breaking Changes

  • Yes - this PR contains breaking changes
    • Details ...
  • No - this PR is backwards compatible with ALL of the following feature flags in this doc

Tests

Copied over tests from #1172

  • Toggle feature flag is_local_diff_enabled for account being tested https://app.growthbook.io/features/is_local_diff_enabled
  • Create a page
  • Edit a page
  • Delete a page
  • Move a page
  • Try to create an RR and check that the diff shown is correct for all files (this tests that /compare works)
  • After creating the RR, navigate to the RR page and check that the diff shown is correct (this tests that /listFullReviewRequest works)
  • Approve and publish the RR
  • Check that diff is empty now (tests that master has been fastforwarded)
  • Finally, last sanity check that the diff still works after making another edit

@jacobkwan jacobkwan requested a review from a team April 25, 2024 14:28
Copy link

linear bot commented Apr 25, 2024

src/services/db/GitFileSystemService.ts Show resolved Hide resolved
@@ -777,49 +777,6 @@ describe("GitFileSystemService", () => {
})
})

describe("getLatestLocalCommitOfPath", () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm should add a getCommitsBetweenMasterAndStaging test case hor

src/services/review/ReviewRequestService.ts Show resolved Hide resolved
@kishore03109 kishore03109 self-requested a review May 9, 2024 13:57
@kishore03109 kishore03109 force-pushed the fix/optimise-local-diff branch from d736724 to 30a9257 Compare May 9, 2024 13:58
@kishore03109 kishore03109 merged commit f3fe899 into develop May 10, 2024
22 of 23 checks passed
@mergify mergify bot deleted the fix/optimise-local-diff branch May 10, 2024 07:00
@dcshzj dcshzj mentioned this pull request May 13, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants