fix(rubygems): Ensure consistency between versions and metadata #25127

zharinov · 2023-10-10T20:52:39Z

Changes

Ensure /versions endpoint data is always consistent with /api/v1/gems responses, otherwise fallback to results containing only version field.

This is important, because /versions could contain fresh data, while /api/v1/gems endpoint still may return older data for some short amount of time. If we cache the data at this moment, we're are risking to store inconsistent data for very long period of time.

To solve this, we hash list of versions from the /versions endpoint, and if it has changed, we invalidate the cache.

The key point of this PR: when persisting the cache for long term, we don't use previously calculated hash from the /versions endpoint, we calculate it based on the /api/v1/versions response (which we are about to cache). This should make both cache layers consistent.

Context

Closes: Rubygems cache: check package cache first #25448

Documentation (please check one with an [x])

I have updated the documentation, or
No documentation update is required

How I've tested my work (please select one)

I have verified these changes via:

Code inspection only, or
Newly added/modified unit tests, or
No unit tests but ran on a real repository, or
Both unit tests + ran on a real repository

lib/modules/datasource/rubygems/metadata-cache.ts

rarkins

Please add a readme.md to this datasource folder describing how the caching works, particularly for rubygems.org. Either that or much more commented code inline

zharinov · 2023-10-12T22:42:02Z

Here is the state diagram I'm about to place in the doc.
It isn't as visually appealing as I want it to be, but at least we have diagram form for this complex logic, instead of just text description.

stateDiagram-v2
  [*] --> Empty

  state "Empty" as Empty
  Empty --> FullSync: getPkgReleases()

  state "Synced" as Synced
  Synced --> DeltaSync

  state "Unsupported" as Unsupported
  Unsupported --> [*]

  state "Full sync" as FullSync : GET /versions (~20Mb)
  state full_sync_result <<choice>>
  FullSync --> full_sync_result: Response
  full_sync_result --> Synced: (1) Status 200
  full_sync_result --> Unsupported: (2) Status 404
  full_sync_result --> Empty: (3) Status other than 200 or 404\n Clear cache and throw ExternalHostError

  state "Delta sync" as DeltaSync: GET /versions with "Range" header
  state delta_sync_result <<choice>>
  DeltaSync --> delta_sync_result: Successful response
  delta_sync_result --> Synced: (1) Status other than 206\nFull data is received, extract and replace old cache\n (as if it is the full sync)
  delta_sync_result --> FullSync: (2) The head of response doesn't match\n the tail of the previously fetched data
  delta_sync_result --> Synced: (3) The head of response matches\n the tail of the previously fetched data

  state delta_sync_error <<choice>>
  DeltaSync --> delta_sync_error: Error response
  delta_sync_error --> FullSync: (1) Status 416 should not happen\nbut moves to full sync
  delta_sync_error --> Unsupported: (2) Status 404
  delta_sync_error --> Empty: (3) Status other than 404 or 416

lib/modules/datasource/rubygems/metadata-cache.spec.ts

lib/modules/datasource/rubygems/metadata-cache.ts

lib/modules/datasource/rubygems/readme.md

Co-authored-by: Michael Kriese <[email protected]>

viceice

still an open question

rarkins · 2023-10-21T08:09:12Z

@zharinov so how many cache layers do we have for rubygems.org? Eg how is the /versions cached? And do we cache per-version too? How is the data in /versions joined with the API data?

zharinov · 2023-10-21T13:22:37Z

@rarkins Answered you in the readme

lib/modules/datasource/rubygems/metadata-cache.ts

rarkins · 2023-10-26T05:53:41Z

When I run this locally, it appears to do a full /versions sync/download every run. i.e. if I run it against one repo over and over, it's downloading the full /versions every time without any caching. Is that both the current as well as new behavior, or did it change?

zharinov · 2023-10-26T11:15:58Z

Data from /versions always has been stored in-memory

rarkins · 2023-10-26T11:43:11Z

Is there any "hot" cache per package? e.g. if I have a repo with one ruby dependency, and I run on that repo two times in a row, I wouldn't expect to see any datasource lookups at all - just the package cache.

zharinov · 2023-10-26T12:19:50Z

It should query /versions endpoint 1 time and package-related endpoint 1 time, if two runs are in the same node process

rarkins · 2023-10-26T12:32:26Z

But why do we query /versions at all if we have a <1 minute "hot" cache of the exact dependency we are looking up?

rarkins · 2023-10-26T12:34:51Z

Remember that the hosted app does one repo per run so will never have /versions cached between jobs, meaning it downloads the entire /versions unnecessarily many times. It would be better if we consult the package cache for each package we look up and return that result without downloading or syncing /versions. But should it be a separate issue/PR?

zharinov · 2023-10-26T13:07:14Z

Just to clarify, Rubygems in-memory data is stored at the module level, so it isn't being reset between repo runs, like memory cache from util folder does. I.e. it exists during Node process run.

rarkins · 2023-10-26T13:20:32Z

I understand, but in the hosted app we have one run per job, so that doesn't help

zharinov · 2023-10-26T13:21:36Z

Okay, in this case yes. I didn't like current decision anyway, but it was what we did historically up to this moment.

rarkins · 2023-10-26T13:27:31Z

I'll create a new issue for this part then

lib/modules/datasource/rubygems/index.ts

renovate-release · 2023-10-30T13:54:14Z

🎉 This PR is included in version 37.36.1 🎉

The release is available on:

GitHub release
37.36.1

Your semantic-release bot 📦🚀

…vatebot#25127) Co-authored-by: Michael Kriese <[email protected]>

fix(rubygems): Handle hash mismatch for cached results

19a9f60

zharinov requested review from viceice and rarkins October 10, 2023 20:52

Fix

07707ca

zharinov commented Oct 10, 2023

View reviewed changes

lib/modules/datasource/rubygems/metadata-cache.ts Outdated Show resolved Hide resolved

zharinov added 3 commits October 11, 2023 17:13

Rename hash variables

65e5872

Merge branch 'main' into fix/rubygems-inconsistent-cache

8dd515f

Simplify implementation

18c65e6

zharinov changed the title ~~fix(rubygems): Handle hash mismatch for cached results~~ fix(rubygems): Ensure consistency between versions and timetsamp metadata Oct 11, 2023

zharinov changed the title ~~fix(rubygems): Ensure consistency between versions and timetsamp metadata~~ fix(rubygems): Ensure consistency between versions and metadata Oct 11, 2023

rarkins requested changes Oct 11, 2023

View reviewed changes

Merge branch 'main' into fix/rubygems-inconsistent-cache

d3d7ca2

zharinov added 2 commits October 15, 2023 20:15

Merge branch 'main' into fix/rubygems-inconsistent-cache

4ce7e63

Add readme

0c52aa5

viceice reviewed Oct 16, 2023

View reviewed changes

Apply suggestions from code review

916aa97

Co-authored-by: Michael Kriese <[email protected]>

zharinov requested a review from viceice October 16, 2023 23:17

viceice reviewed Oct 20, 2023

View reviewed changes

Merge branch 'main' into fix/rubygems-inconsistent-cache

614774b

rarkins requested review from viceice and rarkins October 20, 2023 12:45

zharinov added 2 commits October 20, 2023 15:05

Fix

2e9d168

Merge branch 'main' into fix/rubygems-inconsistent-cache

7772031

zharinov requested review from viceice and removed request for viceice October 20, 2023 18:06

Clarify docs

d11781f

More

d184731

viceice reviewed Oct 24, 2023

View reviewed changes

lib/modules/datasource/rubygems/metadata-cache.ts Outdated Show resolved Hide resolved

zharinov added 3 commits October 25, 2023 12:37

Fix

c715198

Fix coverage

bc36986

Merge branch 'main' into fix/rubygems-inconsistent-cache

8f76955

zharinov requested review from rarkins and viceice October 25, 2023 17:10

Merge branch 'main' into fix/rubygems-inconsistent-cache

29ccd92

Use decorator as the first caching layer

ddb1095

zharinov mentioned this pull request Oct 27, 2023

Rubygems cache: check package cache first #25448

Closed

Merge branch 'main' into fix/rubygems-inconsistent-cache

5b00c5b

rarkins reviewed Oct 27, 2023

View reviewed changes

lib/modules/datasource/rubygems/index.ts Show resolved Hide resolved

rarkins approved these changes Oct 28, 2023

View reviewed changes

Merge branch 'main' into fix/rubygems-inconsistent-cache

0329571

rarkins added this pull request to the merge queue Oct 30, 2023

Merged via the queue into renovatebot:main with commit bb0a2d3 Oct 30, 2023
34 checks passed

rarkins deleted the fix/rubygems-inconsistent-cache branch October 30, 2023 13:45

jon4hz pushed a commit to jon4hz/renovate that referenced this pull request Nov 9, 2023

fix(rubygems): Ensure consistency between versions and metadata (reno…

10a8a0e

…vatebot#25127) Co-authored-by: Michael Kriese <[email protected]>

github-actions bot locked as resolved and limited conversation to collaborators Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rubygems): Ensure consistency between versions and metadata #25127

fix(rubygems): Ensure consistency between versions and metadata #25127

zharinov commented Oct 10, 2023 •

edited

Loading

rarkins left a comment

zharinov commented Oct 12, 2023

viceice left a comment

rarkins commented Oct 21, 2023

zharinov commented Oct 21, 2023

rarkins commented Oct 26, 2023

zharinov commented Oct 26, 2023

rarkins commented Oct 26, 2023

zharinov commented Oct 26, 2023

rarkins commented Oct 26, 2023

rarkins commented Oct 26, 2023

zharinov commented Oct 26, 2023

rarkins commented Oct 26, 2023

zharinov commented Oct 26, 2023

rarkins commented Oct 26, 2023

renovate-release commented Oct 30, 2023

fix(rubygems): Ensure consistency between versions and metadata #25127

fix(rubygems): Ensure consistency between versions and metadata #25127

Conversation

zharinov commented Oct 10, 2023 • edited Loading

Changes

Context

Documentation (please check one with an [x])

How I've tested my work (please select one)

rarkins left a comment

Choose a reason for hiding this comment

zharinov commented Oct 12, 2023

viceice left a comment

Choose a reason for hiding this comment

rarkins commented Oct 21, 2023

zharinov commented Oct 21, 2023

rarkins commented Oct 26, 2023

zharinov commented Oct 26, 2023

rarkins commented Oct 26, 2023

zharinov commented Oct 26, 2023

rarkins commented Oct 26, 2023

rarkins commented Oct 26, 2023

zharinov commented Oct 26, 2023

rarkins commented Oct 26, 2023

zharinov commented Oct 26, 2023

rarkins commented Oct 26, 2023

renovate-release commented Oct 30, 2023

zharinov commented Oct 10, 2023 •

edited

Loading