Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rubygems): Ensure consistency between versions and metadata #25127

Merged
merged 24 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
19a9f60
fix(rubygems): Handle hash mismatch for cached results
zharinov Oct 10, 2023
07707ca
Fix
zharinov Oct 10, 2023
65e5872
Rename hash variables
zharinov Oct 11, 2023
8dd515f
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 11, 2023
18c65e6
Simplify implementation
zharinov Oct 11, 2023
d3d7ca2
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 12, 2023
4ce7e63
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 15, 2023
0c52aa5
Add readme
zharinov Oct 16, 2023
916aa97
Apply suggestions from code review
zharinov Oct 16, 2023
614774b
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 20, 2023
2e9d168
Fix
zharinov Oct 20, 2023
7772031
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 20, 2023
d11781f
Clarify docs
zharinov Oct 21, 2023
d184731
More
zharinov Oct 21, 2023
64d404f
More precise cache description in readme
zharinov Oct 21, 2023
9510281
Make `CacheError` a union discriminated by `type` field
zharinov Oct 21, 2023
533b77b
Return stale cache
zharinov Oct 23, 2023
c715198
Fix
zharinov Oct 25, 2023
bc36986
Fix coverage
zharinov Oct 25, 2023
8f76955
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 25, 2023
29ccd92
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 26, 2023
ddb1095
Use decorator as the first caching layer
zharinov Oct 27, 2023
5b00c5b
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 27, 2023
0329571
Merge branch 'main' into fix/rubygems-inconsistent-cache
zharinov Oct 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions lib/modules/datasource/rubygems/index.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import { Marshal } from '@qnighy/marshal';
import type { ZodError } from 'zod';
import { logger } from '../../../logger';
import { cache } from '../../../util/cache/package/decorator';
import { HttpError } from '../../../util/http';
import { AsyncResult, Result } from '../../../util/result';
import { getQueryString, joinUrlParts, parseUrl } from '../../../util/url';
Expand Down Expand Up @@ -46,6 +47,16 @@ export class RubyGemsDatasource extends Datasource {

private readonly versionsEndpointCache: VersionsEndpointCache;

@cache({
namespace: `datasource-${RubyGemsDatasource.id}`,
key: ({ packageName, registryUrl }: GetReleasesConfig) =>
// TODO: types (#22198)
`releases:${registryUrl!}:${packageName}`,
cacheable: ({ registryUrl }: GetReleasesConfig) => {
const registryHostname = parseUrl(registryUrl)?.hostname;
return registryHostname === 'rubygems.org';
},
})
rarkins marked this conversation as resolved.
Show resolved Hide resolved
async getReleases({
packageName,
registryUrl,
Expand Down
119 changes: 114 additions & 5 deletions lib/modules/datasource/rubygems/metadata-cache.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@ jest.mock('../../../util/cache/package');
const packageCache = mocked(_packageCache);

describe('modules/datasource/rubygems/metadata-cache', () => {
const cache: Map<string, unknown> = new Map();
const packageCacheMock: Map<string, unknown> = new Map();

beforeEach(() => {
cache.clear();
packageCacheMock.clear();

packageCache.get.mockImplementation(
(ns, key) => Promise.resolve(cache.get(`${ns}::${key}`)) as never
(ns, key) =>
Promise.resolve(packageCacheMock.get(`${ns}::${key}`)) as never
viceice marked this conversation as resolved.
Show resolved Hide resolved
);

packageCache.set.mockImplementation((ns, key, value) => {
cache.set(`${ns}::${key}`, value);
packageCacheMock.set(`${ns}::${key}`, value);
return Promise.resolve() as never;
viceice marked this conversation as resolved.
Show resolved Hide resolved
});
});
Expand Down Expand Up @@ -64,7 +65,11 @@ describe('modules/datasource/rubygems/metadata-cache', () => {
homepage_uri: 'https://example.com',
});

const res = await cache.getRelease('https://rubygems.org', 'foobar', []);
const res = await cache.getRelease('https://rubygems.org', 'foobar', [
'1.0.0',
'2.0.0',
'3.0.0',
]);

expect(res).toEqual({
changelogUrl: 'https://example.com/changelog',
Expand Down Expand Up @@ -93,6 +98,110 @@ describe('modules/datasource/rubygems/metadata-cache', () => {
});
});

it('handles inconsistent data between versions and endpoint', async () => {
const cache = new MetadataCache(new Http('test'));

httpMock
.scope('https://rubygems.org')
.get('/api/v1/versions/foobar.json')
.reply(200, [
{ number: '1.0.0', created_at: '2021-01-01' },
{ number: '2.0.0', created_at: '2022-01-01' },
{ number: '3.0.0', created_at: '2023-01-01' },
])
.get('/api/v1/gems/foobar.json')
.reply(200, {
name: 'foobar',
created_at: '2023-01-01',
changelog_uri: 'https://example.com/changelog',
source_code_uri: 'https://example.com/source',
homepage_uri: 'https://example.com',
});

const res = await cache.getRelease('https://rubygems.org', 'foobar', [
'1.0.0',
'2.0.0',
'3.0.0',
'4.0.0',
]);

expect(res).toEqual({
releases: [
{ version: '1.0.0' },
{ version: '2.0.0' },
{ version: '3.0.0' },
{ version: '4.0.0' },
],
});
});

it('handles inconsistent data between cache and endpoint', async () => {
packageCacheMock.set(
'datasource-rubygems::metadata-cache:https://rubygems.org:foobar',
{
hash: '123',
createdAt: '2021-01-01',
data: {
releases: [
{ version: '1.0.0' },
{ version: '2.0.0' },
{ version: '3.0.0' },
],
},
}
);
const cache = new MetadataCache(new Http('test'));

httpMock
.scope('https://rubygems.org')
.get('/api/v1/versions/foobar.json')
.reply(200, [
{ number: '1.0.0', created_at: '2021-01-01' },
{ number: '2.0.0', created_at: '2022-01-01' },
{ number: '3.0.0', created_at: '2023-01-01' },
])
.get('/api/v1/gems/foobar.json')
.reply(200, {
name: 'foobar',
created_at: '2023-01-01',
changelog_uri: 'https://example.com/changelog',
source_code_uri: 'https://example.com/source',
homepage_uri: 'https://example.com',
});

const res = await cache.getRelease('https://rubygems.org', 'foobar', [
'1.0.0',
'2.0.0',
'3.0.0',
'4.0.0',
]);

expect(res).toEqual({
releases: [
{ version: '1.0.0' },
{ version: '2.0.0' },
{ version: '3.0.0' },
],
});
expect(packageCache.set).toHaveBeenCalledWith(
'datasource-rubygems',
'metadata-cache:https://rubygems.org:foobar',
{
createdAt: '2021-01-01',
data: {
releases: [
{ version: '1.0.0' },
{ version: '2.0.0' },
{ version: '3.0.0' },
],
},
hash: '123',
isFallback: true,
},
24 * 60
);
});

it('returns cached data', async () => {
const cache = new MetadataCache(new Http('test'));

Expand Down
99 changes: 74 additions & 25 deletions lib/modules/datasource/rubygems/metadata-cache.ts
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import { logger } from '../../../logger';
import * as packageCache from '../../../util/cache/package';
import { toSha256 } from '../../../util/hash';
import type { Http } from '../../../util/http';
Expand All @@ -9,8 +10,26 @@ import { getV1Releases } from './common';
interface CacheRecord {
hash: string;
data: ReleaseResult;
isFallback?: true;
}

function hashVersions(versions: string[]): string {
return toSha256(versions.sort().join(','));
}

function hashReleases(releases: ReleaseResult): string {
return hashVersions(releases.releases.map((release) => release.version));
}

type CacheNotFoundError = { type: 'cache-not-found' };
type CacheStaleError = {
type: 'cache-stale';
cache: CacheRecord;
};
type CacheInvalidError = { type: 'cache-invalid' };
type CacheLoadError = CacheNotFoundError | CacheStaleError;
type CacheError = CacheNotFoundError | CacheStaleError | CacheInvalidError;

export class MetadataCache {
constructor(private readonly http: Http) {}

Expand All @@ -21,44 +40,74 @@ export class MetadataCache {
): Promise<ReleaseResult> {
const cacheNs = `datasource-rubygems`;
const cacheKey = `metadata-cache:${registryUrl}:${packageName}`;
const hash = toSha256(versions.join(''));
const versionsHash = hashVersions(versions);

const loadCache = (): AsyncResult<ReleaseResult, NonNullable<unknown>> =>
Result.wrapNullable(
const loadCache = (): AsyncResult<ReleaseResult, CacheLoadError> =>
Result.wrapNullable<CacheRecord, CacheLoadError, CacheLoadError>(
packageCache.get<CacheRecord>(cacheNs, cacheKey),
'cache-not-found' as const
{ type: 'cache-not-found' }
).transform((cache) => {
return hash === cache.hash
return versionsHash === cache.hash
? Result.ok(cache.data)
: Result.err('cache-outdated' as const);
: Result.err({ type: 'cache-stale', cache });
});

const saveCache = async (data: ReleaseResult): Promise<ReleaseResult> => {
const saveCache = async (
cache: CacheRecord,
ttlMinutes = 100 * 24 * 60,
ttlDelta = 10 * 24 * 60
zharinov marked this conversation as resolved.
Show resolved Hide resolved
): Promise<void> => {
const registryHostname = parseUrl(registryUrl)?.hostname;
if (registryHostname === 'rubygems.org') {
const newCache: CacheRecord = { hash, data };
const ttlMinutes = 100 * 24 * 60;
const ttlRandomDelta = Math.floor(Math.random() * 10 * 24 * 60);
await packageCache.set(
cacheNs,
cacheKey,
newCache,
ttlMinutes + ttlRandomDelta
);
const ttlRandomDelta = Math.floor(Math.random() * ttlDelta);
const ttl = ttlMinutes + ttlRandomDelta;
await packageCache.set(cacheNs, cacheKey, cache, ttl);
}

return data;
};

return await loadCache()
.catch(() =>
getV1Releases(this.http, registryUrl, packageName).transform(saveCache)
)
.catch(() =>
Result.ok({
releases: versions.map((version) => ({ version })),
})
.catch((err) =>
getV1Releases(this.http, registryUrl, packageName).transform(
async (
data: ReleaseResult
): Promise<Result<ReleaseResult, CacheError>> => {
const dataHash = hashReleases(data);
if (dataHash === versionsHash) {
await saveCache({
hash: dataHash,
data,
});
return Result.ok(data);
}

/**
* Return stale cache for 24 hours,
* if metadata is inconsistent with versions list.
*/
if (err.type === 'cache-stale') {
const staleCache = err.cache;
if (!staleCache.isFallback) {
await saveCache(
{ ...staleCache, isFallback: true },
24 * 60,
0
);
}
return Result.ok(staleCache.data);
}

return Result.err({ type: 'cache-invalid' });
}
)
)
.catch((err) => {
logger.debug(
{ err },
'Rubygems: error fetching rubygems data, falling back to versions-only result'
);
const releases = versions.map((version) => ({ version }));
return Result.ok({ releases } as ReleaseResult);
})
.unwrapOrThrow();
}
}
66 changes: 66 additions & 0 deletions lib/modules/datasource/rubygems/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Rubygems datasource

Datasource query order depends on the registry.

## Querying `rubygems.org`

Rubygems rate limits are easy to hit, so we need to be careful with the queries.
This is implemented with two-level cache:

- First, we query `https://rubygems.org/versions` endpoint for current versions for all packages.

Either full or delta sync is performed, depending on the cache state.

All the data of this layer is stored in-memory as the mapping `packageName -> version[]`.

```mermaid
stateDiagram-v2
[*] --> Empty

state "Empty" as Empty
Empty --> FullSync: getPkgReleases()

state "Synced" as Synced
Synced --> DeltaSync

state "Unsupported" as Unsupported
Unsupported --> [*]

state "Full sync" as FullSync : GET /versions (~20Mb)
state full_sync_result <<choice>>
FullSync --> full_sync_result: Response
full_sync_result --> Synced: (1) Status 200
full_sync_result --> Unsupported: (2) Status 404
full_sync_result --> Empty: (3) Status other than 200 or 404\n Clear cache and throw ExternalHostError

state "Delta sync" as DeltaSync: GET /versions with "Range" header
state delta_sync_result <<choice>>
DeltaSync --> delta_sync_result: Successful response
delta_sync_result --> Synced: (1) Status other than 206\nFull data is received, extract and replace old cache\n (as if it is the full sync)
delta_sync_result --> FullSync: (2) The head of response doesn't match\n the tail of the previously fetched data
delta_sync_result --> Synced: (3) The head of response matches\n the tail of the previously fetched data

state delta_sync_error <<choice>>
DeltaSync --> delta_sync_error: Error response
delta_sync_error --> FullSync: (1) Status 416 should not happen\nbut moves to full sync
delta_sync_error --> Unsupported: (2) Status 404
delta_sync_error --> Empty: (3) Status other than 404 or 416
```

- Then, more data is obtained from `https://rubygems.org/api/v1/versions/<package>.json` and `https://rubygems.org/api/v1/gems/<package>.json`.

From the previous layer, the cache key is formed from the `packageName`, and the list of versions is additionally hashed and stored to ensure consistency, so that we reach these API endpoints only when the key has expired or when the list of versions has changed.

The data for this cache layer is being persisted in the longer-term package cache.

## Querying `rubygems.pkg.github.com` or `gitlab.com`

These particular registries are queried using obsolete API

- `/api/v1/dependencies`

## Other registries

- Fetch from `/api/v1/versions/<package>.json`
- Fallback to `/info/<package>`, if above fails
- Fallback to the obsolete `/api/v1/dependencies`, if above fails