Safer pagination and refresh for site cache #1239

timotheeg · 2024-03-22T11:32:14Z

Problem

The way isomercms retrieves and cache the list of repos is with the help of an environment variable ISOMERPAGES_REPO_PAGE_COUNT. That variable contains a number, which is used to create parallel requests to github.

In practice that works, but operationally this is not great:

It introduces a mental load to update the variables if the number of supported repo grows beyond the current page count (easy to forget and create a potential incident!)
The concurrency fetches optimize for time in all cases, while for the internal cache refreshes, optimizing for load distribution would be safer

Github APIs have a very robust pagination system, so we can derive the number of pages from that to make the system dynamic, and remove the (small) operational risk of forgetting to update the env var.

Additionally, the cache only has one exposed API, which is to retrieve the lastUpdated time by repoName. The list of repo is not stored in cache by name, but it is a list and so the lookup does a list search every time.

Solution

split the getAllRepoData() calls into 2 distinct use cases:
a. exported API optimized for time: we fetch the first page, assess how many pages there in total, and fetch all remaining pages in parallel
b. optimized for load distribution: we fetch the pages sequentially by following github pagination system's next link
populate the cache at bootstrap time with the time-optimized call
cache refresh uses the load-optimized call
Use a dictionary to store the repos indexes by name for O(1) lookup

Notes:

The time-optimized fetch is still slower than the previous version, since the first page call is always made sequentially
We could (to discuss) cache the number of pages, such that:
a. only the first bootstrap call has the 2 step fetch
b. every refresh call updates the number of pages
c. when the number of pages is cached, the fast getAllRepoData() does all its calls in parallel again.
We add a custom span to instrument the time the cache refresh process takes.

Breaking Changes

Yes - this PR contains breaking changes
No - this PR is backwards compatible with ALL of the following feature flags in this doc

Reference:

slack thread where this was discussed

Results:
Sample trace for the new refresh process that shows fetches the pages sequentially works:

Post Deploy Work (after stability is observed)

delete the SSM entries 'PROD_ISOMERPAGES_REPO_PAGE_COUNT' and STAGING_ISOMERPAGES_REPO_PAGE_COUNT. This should ONLY be done when we are sure we will not need to roll back possibly wait one full release.

mergify · 2024-04-21T11:32:58Z

This pull request has been stale for more than 30 days! Could someone please take a look at it @isomerpages/iso-engineers

…than time

timotheeg · 2024-04-24T02:25:46Z

src/services/identity/__tests__/SitesService.spec.ts

-  afterEach(() => jest.clearAllMocks())
+  afterEach(() => {
+    jest.clearAllMocks()
+    mockAxios.get.mockReset()


Without this, the tests previously were not actually doing what we think they were doing!! 😱

timotheeg · 2024-04-24T02:29:29Z

Note: Typically another way to deal with pages and fetching data is by using the github sdk octokit.js, which has pagination handling built in. I did not introduce it in this PR because the goal was just to remove the env var ISOMERPAGES_REPO_PAGE_COUNT, and I kept the parallel fetch for pages 2-N, which octokit wouldn't do.

That should probably be revisited at some point.

src/services/identity/SitesCacheService.ts

timotheeg · 2024-04-24T05:03:45Z

All comments addressed @seaerchin , Let me know if good to go. Thanks! 🙏

timotheeg · 2024-04-24T05:07:51Z

src/services/identity/SitesCacheService.ts

+  // example value: link: <https://api.github.com/organizations/40887764/repos?page=2>; rel="next", <https://api.github.com/organizations/40887764/repos?page=34>; rel="last"
+  const links: LinkSet = {}
+
+  const linkRe = /<(?<url>[^>]+)>; rel="(?<relid>[a-z]+)"(, )?/g


Note: this regex is not actually validating anything on the urls, or that relid should be one of "first" | "last" | "prev" | "next", even thought we do assert it in the types below with as IterableIterator<LinkMatch>

Because we are the one calling github, we trust their APIs, but we could add paranoia checks, and warning logging here 🤔

Or the regexp could be made more "validating" in that sense, like so for example, which would at least guarantee that relid is correct according to the type:

Suggested change

const linkRe = /<(?<url>[^>]+)>; rel="(?<relid>[a-z]+)"(, )?/g

const linkRe = /<(?<url>[^>]+)>; rel="(?<relid>first|last|prev|next)"(, )?/g

I'd rather leave the regexp loose tbh, in case github introduces more link relations in the future, they'd already be captured properly as part of the parsing routine.

timotheeg · 2024-04-26T02:16:34Z

Result in prod:

As expected: /v2/sites is slower now since the first page is always sequential (sample trace here)
The refresh is a cheaper process now, since all the ages are sequential (sample internal trace)

mergify bot requested a review from a team April 21, 2024 11:32

timotheeg marked this pull request as ready for review April 23, 2024 06:40

timotheeg added 6 commits April 23, 2024 14:42

refactor: retrieve repo pages using github's pagination link system

59f3723

refactor: clear all references to ISOMERPAGES_REPO_PAGE_COUNT

3ba96e3

refactor: retrieve site's lastUpdated in O(1)

14a141f

feat(SiteCache): optimize cache refresh for load distribution rather …

c0f9b6d

…than time

fix(SiteCache): fix invalid page number injection 🤦

1a43437

feat(SiteCache): instrument renewCache()

475d999

timotheeg force-pushed the safer-pagination-and-refresh-for-site-cache branch from 8e84614 to 475d999 Compare April 23, 2024 06:42

fix(test): update tests

f0a22d5

timotheeg force-pushed the safer-pagination-and-refresh-for-site-cache branch from 87ead11 to f0a22d5 Compare April 23, 2024 07:54

timotheeg added 2 commits April 23, 2024 16:45

fix: more test fixes

961ad24

fix: reset axios mock properly

3f2c4da

timotheeg commented Apr 24, 2024

View reviewed changes

seaerchin reviewed Apr 24, 2024

View reviewed changes

timotheeg added 5 commits April 24, 2024 12:03

fix: remove hardcoded value (was used for tests)

c5aef04

fix: add explicit return type

0337838

feat: introduce new type LinkRelation

e3ad470

refactor: using string.matchAll() instead of RegExp.exec()

de741f1

refactor: better use of types

027bc22

seaerchin self-requested a review April 24, 2024 05:06

seaerchin approved these changes Apr 24, 2024

View reviewed changes

timotheeg commented Apr 24, 2024

View reviewed changes

timotheeg merged commit c26c016 into develop Apr 24, 2024
12 checks passed

timotheeg deleted the safer-pagination-and-refresh-for-site-cache branch April 24, 2024 07:11

alexanderleegs mentioned this pull request Apr 26, 2024

release v0.83.0 #1335

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safer pagination and refresh for site cache #1239

Safer pagination and refresh for site cache #1239

timotheeg commented Mar 22, 2024 •

edited

Loading

mergify bot commented Apr 21, 2024

timotheeg Apr 24, 2024 •

edited

Loading

timotheeg commented Apr 24, 2024

timotheeg commented Apr 24, 2024

timotheeg Apr 24, 2024 •

edited

Loading

timotheeg Apr 24, 2024 •

edited

Loading

timotheeg commented Apr 26, 2024

	const linkRe = /<(?<url>[^>]+)>; rel="(?<relid>[a-z]+)"(, )?/g
	const linkRe = /<(?<url>[^>]+)>; rel="(?<relid>first\|last\|prev\|next)"(, )?/g

Safer pagination and refresh for site cache #1239

Safer pagination and refresh for site cache #1239

Conversation

timotheeg commented Mar 22, 2024 • edited Loading

Problem

Solution

Post Deploy Work (after stability is observed)

mergify bot commented Apr 21, 2024

timotheeg Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

timotheeg commented Apr 24, 2024

timotheeg commented Apr 24, 2024

timotheeg Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

timotheeg Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

timotheeg commented Apr 26, 2024

timotheeg commented Mar 22, 2024 •

edited

Loading

timotheeg Apr 24, 2024 •

edited

Loading

timotheeg Apr 24, 2024 •

edited

Loading

timotheeg Apr 24, 2024 •

edited

Loading