Skip to content

Commit

Permalink
feat(target-gh): Check asset MD5 hash by downloading the asset (#328)
Browse files Browse the repository at this point in the history
The GitHub API doesn't provide any endpoints to get the hash of an asset.
GitHub's response contains an ETag header with the MD5 hash of the asset.  In
certain cases, the ETag header is not present. This means we don't have the
hash of the asset, and thus, we cannot verify the asset correctness. To verify
it, we download the file and calculate the hash locally.

Files are downloaded into memory, and their size is not checked. This has a
risk of downloading too big files, which is something we accept at the moment.

The GitHub SDK throws exceptions when server responses are HTTP errors. These
interactions are wrapped and more detailed errors are thrown.
  • Loading branch information
iker-barriocanal authored Nov 18, 2021
1 parent 78d2ca9 commit d14a988
Showing 1 changed file with 58 additions and 29 deletions.
87 changes: 58 additions & 29 deletions src/targets/github.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ import { Octokit, RestEndpointMethodTypes } from '@octokit/rest';
import { RequestError } from '@octokit/request-error';
import { readFileSync, promises, statSync } from 'fs';
import { basename } from 'path';
import { createHash } from 'crypto';
import { BinaryLike, createHash } from 'crypto';

import { getConfiguration } from '../config';
import {
Expand Down Expand Up @@ -378,12 +378,33 @@ export class GithubTarget extends BaseTarget {
);
}

// XXX: This is a bit hacky as we rely on two things:
// 1. GitHub issuing a redirect to S3, where they store the artifacts,
// or at least pass those request headers unmodified to us
// 2. AWS S3 using the MD5 hash of the file for its ETag cache header
// when we issue a HEAD request.
const response = await this.github.request(`HEAD ${url}`, {
const remoteChecksum = await this.checksumFromUrl(url);
const localChecksum = this.checksumFromData(file);
if (localChecksum !== remoteChecksum) {
throw new Error(
`Uploaded asset checksum does not match local asset checksum for "${name} (${localChecksum} != ${remoteChecksum})`
);
}
uploadSpinner.succeed(`Uploaded asset "${name}".`);
return url;
} catch (e) {
uploadSpinner.fail(`Cannot upload asset "${name}".`);

throw e;
}
}

private async checksumFromUrl(url: string): Promise<string> {
// XXX: This is a bit hacky as we rely on various things:
// 1. GitHub issuing a redirect to AWS S3.
// 2. S3 using the MD5 hash of the file for its ETag cache header.
// 3. The file being small enough to fit in memory.
//
// Note that if assets are large (5GB) assumption 2 is not correct. See
// https://github.com/getsentry/craft/issues/322#issuecomment-964303174
let response;
try {
response = await this.github.request(`HEAD ${url}`, {
headers: {
// WARNING: You **MUST** pass this accept header otherwise you'll
// get a useless JSON API response back, instead of getting
Expand All @@ -395,32 +416,40 @@ export class GithubTarget extends BaseTarget {
Accept: DEFAULT_CONTENT_TYPE,
},
});
} catch (e) {
throw new Error(
`Cannot get asset on GitHub. Status: ${(e as any).status}\n` + e
);
}

const etag = response.headers['etag'];
if (etag && etag.length > 0) {
// ETag header comes in quotes for some reason so strip those
const remoteChecksum = response.headers['etag']?.slice(1, -1);
if (remoteChecksum) {
// XXX: This local checksum calculation only holds for simple cases:
// https://github.com/getsentry/craft/issues/322#issuecomment-964303174
const localChecksum = createHash('md5').update(file).digest('hex');
if (localChecksum !== remoteChecksum) {
logger.trace(`Checksum mismatch for "${name}"`, response.headers);
throw new Error(
`Uploaded asset MD5 checksum does not match local asset checksum for "${name} (${localChecksum} != ${remoteChecksum})`
);
}
uploadSpinner.succeed(`Uploaded asset "${name}".`);
} else {
logger.warn(`Response headers:`, response.headers);
uploadSpinner.succeed(
`Uploaded asset "${name}, but failed to find a remote checksum. Cannot verify uploaded file is as expected.`
);
}
return url;
} catch (e) {
uploadSpinner.fail(`Cannot upload asset "${name}".`);
return etag.slice(1, -1);
}

throw e;
return await this.md5FromUrl(url);
}

private async md5FromUrl(url: string): Promise<string> {
this.logger.debug('Downloading asset from GitHub to check MD5 hash: ', url);
let response;
try {
response = await this.github.request(`GET ${url}`, {
headers: {
Accept: DEFAULT_CONTENT_TYPE,
},
});
} catch (e) {
throw new Error(
`Cannot download asset from GitHub. Status: ${(e as any).status}\n` + e
);
}
return this.checksumFromData(Buffer.from(response.data));
}

private checksumFromData(data: BinaryLike): string {
return createHash('md5').update(data).digest('hex');
}

private async handleGitHubUpload(
Expand Down

0 comments on commit d14a988

Please sign in to comment.