-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Safer pagination and refresh for site cache #1239
Safer pagination and refresh for site cache #1239
Conversation
This pull request has been stale for more than 30 days! Could someone please take a look at it @isomerpages/iso-engineers |
8e84614
to
475d999
Compare
87ead11
to
f0a22d5
Compare
afterEach(() => jest.clearAllMocks()) | ||
afterEach(() => { | ||
jest.clearAllMocks() | ||
mockAxios.get.mockReset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, the tests previously were not actually doing what we think they were doing!! 😱
Note: Typically another way to deal with pages and fetching data is by using the github sdk octokit.js, which has pagination handling built in. I did not introduce it in this PR because the goal was just to remove the env var That should probably be revisited at some point. |
All comments addressed @seaerchin , Let me know if good to go. Thanks! 🙏 |
// example value: link: <https://api.github.com/organizations/40887764/repos?page=2>; rel="next", <https://api.github.com/organizations/40887764/repos?page=34>; rel="last" | ||
const links: LinkSet = {} | ||
|
||
const linkRe = /<(?<url>[^>]+)>; rel="(?<relid>[a-z]+)"(, )?/g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this regex is not actually validating anything on the urls, or that relid
should be one of "first" | "last" | "prev" | "next"
, even thought we do assert it in the types below with as IterableIterator<LinkMatch>
Because we are the one calling github, we trust their APIs, but we could add paranoia checks, and warning logging here 🤔
Or the regexp could be made more "validating" in that sense, like so for example, which would at least guarantee that relid
is correct according to the type:
const linkRe = /<(?<url>[^>]+)>; rel="(?<relid>[a-z]+)"(, )?/g | |
const linkRe = /<(?<url>[^>]+)>; rel="(?<relid>first|last|prev|next)"(, )?/g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather leave the regexp loose tbh, in case github introduces more link relations in the future, they'd already be captured properly as part of the parsing routine.
Result in prod:
|
Problem
The way isomercms retrieves and cache the list of repos is with the help of an environment variable
ISOMERPAGES_REPO_PAGE_COUNT
. That variable contains a number, which is used to create parallel requests to github.In practice that works, but operationally this is not great:
Github APIs have a very robust pagination system, so we can derive the number of pages from that to make the system dynamic, and remove the (small) operational risk of forgetting to update the env var.
Additionally, the cache only has one exposed API, which is to retrieve the lastUpdated time by repoName. The list of repo is not stored in cache by name, but it is a list and so the lookup does a list search every time.
Solution
getAllRepoData()
calls into 2 distinct use cases:a. exported API optimized for time: we fetch the first page, assess how many pages there in total, and fetch all remaining pages in parallel
b. optimized for load distribution: we fetch the pages sequentially by following github pagination system's next link
O(1)
lookupNotes:
a. only the first bootstrap call has the 2 step fetch
b. every refresh call updates the number of pages
c. when the number of pages is cached, the fast
getAllRepoData()
does all its calls in parallel again.Breaking Changes
Reference:
Results:
Sample trace for the new refresh process that shows fetches the pages sequentially works:
Post Deploy Work (after stability is observed)
STAGING_ISOMERPAGES_REPO_PAGE_COUNT
. This should ONLY be done when we are sure we will not need to roll back possibly wait one full release.