Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: cache archive entry checksums #1004

Merged
merged 18 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/alternatives.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ There are a few different popular ROM managers that have similar features:
| Archives: creation formats | ❌ `.zip` only by design ([writing archives docs](output/writing-archives.md)) | ✅ `.zip`, `.7z`, `.rar` | ⚠️ `.zip` (TorrentZip), `.7z` | ⚠️ `.zip`, `.7z` |
| ROMs: DAT matching strategies | ✅ CRC32+size, MD5, SHA1 | ✅ CRC32+size, MD5, SHA1 | ✅ CRC32+size, MD5, SHA1 | ❓ |
| ROMs: CHD scanning | ❌ | ⚠️ via chdman | ✅ v1-5 natively | ⚠️ v1-4 natively |
| ROMs: scan/checksum caching | ❌ by design | ❌ | ✅ | ✅ |
| ROMs: scan/checksum caching | | ❌ | ✅ | ✅ |
| ROMs: header parsing | ✅ | ✅ | ✅ | ⚠️ via plugins |
| ROMs: header removal | ✅ automatic and forced | ❌ | ❌ | ❌ |
| ROMs: supported merge types | ✅ full non-merged, non-merged, split, merged | ✅ full non-merged, non-merged, split, merged | ⚠️ full non-merged, split, merged | ⚠️ full non-merged, split, merged |
Expand Down
6 changes: 1 addition & 5 deletions src/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ const ROOT_DIR = path.dirname(PACKAGE_JSON_PATH);

const GLOBAL_TEMP_DIR = fsPoly.mkdtempSync(path.join(os.tmpdir(), COMMAND_NAME));
process.once('beforeExit', async () => {
// WARN: Jest won't call this: https://github.com/jestjs/jest/issues/10927
await fsPoly.rm(GLOBAL_TEMP_DIR, {
force: true,
recursive: true,
Expand Down Expand Up @@ -102,11 +103,6 @@ export default class Constants {
*/
static readonly OUTPUT_CLEANER_BATCH_SIZE = 100;

/**
* Number of file checksums to cache in memory at once.
*/
static readonly FILE_CHECKSUM_CACHE_SIZE = 128;

/**
* Max {@link fs} highWaterMark chunk size to read and write at a time.
*/
Expand Down
7 changes: 5 additions & 2 deletions src/modules/datScanner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ import MameDAT from '../types/dats/mame/mameDat.js';
import ROM from '../types/dats/rom.js';
import File from '../types/files/file.js';
import { ChecksumBitmask } from '../types/files/fileChecksums.js';
import FileFactory from '../types/files/fileFactory.js';
import Options from '../types/options.js';
import Scanner from './scanner.js';

Expand Down Expand Up @@ -91,7 +90,11 @@ export default class DATScanner extends Scanner {
this.progressBar.logTrace(`${datFile.toString()}: downloading`);
const downloadedDatFile = await datFile.downloadToTempPath('dat');
this.progressBar.logTrace(`${datFile.toString()}: downloaded to '${downloadedDatFile.toString()}'`);
return await FileFactory.filesFrom(downloadedDatFile.getFilePath(), ChecksumBitmask.NONE);
return await this.getFilesFromPaths(
[downloadedDatFile.getFilePath()],
this.options.getReaderThreads(),
ChecksumBitmask.NONE,
);
} catch (error) {
this.progressBar.logError(`${datFile.toString()}: failed to download: ${error}`);
return [];
Expand Down
8 changes: 6 additions & 2 deletions src/polyfill/fsPoly.ts
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ export default class FsPoly {
}

static async inode(pathLike: PathLike): Promise<number> {
return (await util.promisify(fs.stat)(pathLike)).ino;
return (await this.stat(pathLike)).ino;
}

static async isDirectory(pathLike: string): Promise<boolean> {
Expand Down Expand Up @@ -336,7 +336,7 @@ export default class FsPoly {
*/
static async size(pathLike: PathLike): Promise<number> {
try {
return (await util.promisify(fs.stat)(pathLike)).size;
return (await this.stat(pathLike)).size;
} catch {
return 0;
}
Expand Down Expand Up @@ -369,6 +369,10 @@ export default class FsPoly {
return path.relative(path.dirname(realLink), realTarget);
}

static async stat(pathLike: PathLike): Promise<fs.Stats> {
return util.promisify(fs.stat)(pathLike);
}

static async touch(filePath: string): Promise<void> {
const dirname = path.dirname(filePath);
if (!await this.exists(dirname)) {
Expand Down
90 changes: 76 additions & 14 deletions src/types/cache.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,23 @@ import * as zlib from 'node:zlib';
import { Mutex } from 'async-mutex';

import FsPoly from '../polyfill/fsPoly.js';
import Timer from '../timer.js';

interface CacheData {
data: string,
}

export interface CacheProps {
filePath?: string,
fileFlushMillis?: number,
saveOnExit?: boolean,
maxSize?: number,
}

/**
* A cache of a fixed size that ejects the oldest inserted key.
*/
export default class Cache<V> implements CacheProps {
export default class Cache<V> {
private static readonly BUFFER_ENCODING: BufferEncoding = 'binary';

private keyOrder: Set<string> = new Set();
Expand All @@ -29,11 +33,23 @@ export default class Cache<V> implements CacheProps {

private readonly keyMutexesMutex = new Mutex();

private saveToFileTimeout?: NodeJS.Timeout;
private hasChanged: boolean = false;

private saveToFileTimeout?: Timer;

readonly filePath?: string;

readonly fileFlushMillis?: number;

readonly maxSize?: number;

constructor(props?: CacheProps) {
this.filePath = props?.filePath;
this.fileFlushMillis = props?.fileFlushMillis;
if (props?.saveOnExit) {
// WARN: Jest won't call this: https://github.com/jestjs/jest/issues/10927
process.once('beforeExit', this.save);
}
this.maxSize = props?.maxSize;
}

Expand Down Expand Up @@ -93,16 +109,38 @@ export default class Cache<V> implements CacheProps {
this.keyOrder.add(key);
}
this.keyValues.set(key, val);
this.saveWithTimeout();

// Evict old values (FIFO)
if (this.maxSize !== undefined && this.keyValues.size > this.maxSize) {
const staleKey = this.keyOrder.keys().next().value;
this.keyOrder.delete(staleKey);
this.keyValues.delete(staleKey);
this.keyMutexes.delete(staleKey);
this.deleteUnsafe(staleKey);
}
}

/**
* Delete a key in the cache.
*/
public async delete(key: string | RegExp): Promise<void> {
let keys: string[];
if (key instanceof RegExp) {
keys = [...this.keys().keys()].filter((k) => k.match(key));
} else {
keys = [key];
}

await Promise.all(keys.map(async (k) => {
await this.lockKey(k, () => this.deleteUnsafe(k));
}));
}

private deleteUnsafe(key: string): void {
this.keyOrder.delete(key);
this.keyValues.delete(key);
this.keyMutexes.delete(key);
this.saveWithTimeout();
}

private async lockKey<R>(key: string, runnable: () => (R | Promise<R>)): Promise<R> {
// Get a mutex for `key`
const keyMutex = await this.keyMutexesMutex.runExclusive(() => {
Expand All @@ -117,11 +155,16 @@ export default class Cache<V> implements CacheProps {
}

/**
* Load a cache file from disk.
* Load the cache from a file.
*/
public async load(filePath: string): Promise<void> {
public async load(): Promise<Cache<V>> {
if (this.filePath === undefined || !await FsPoly.exists(this.filePath)) {
// Cache doesn't exist, so there is nothing to load
return this;
}

const cacheData = JSON.parse(
await util.promisify(fs.readFile)(filePath, { encoding: Cache.BUFFER_ENCODING }),
await util.promisify(fs.readFile)(this.filePath, { encoding: Cache.BUFFER_ENCODING }),
) as CacheData;
const compressed = Buffer.from(cacheData.data, Cache.BUFFER_ENCODING);
const decompressed = await util.promisify(zlib.inflate)(compressed);
Expand All @@ -131,18 +174,36 @@ export default class Cache<V> implements CacheProps {
if (this.maxSize !== undefined) {
this.keyOrder = new Set(Object.keys(keyValuesObject));
}

return this;
}

private saveWithTimeout(): void {
this.hasChanged = true;
if (this.filePath === undefined
|| this.fileFlushMillis === undefined
|| this.saveToFileTimeout !== undefined
) {
return;
}

this.saveToFileTimeout = Timer.setTimeout(async () => this.save(), this.fileFlushMillis);
}

/**
* Save this cache to a file on disk.
* Save the cache to a file.
*/
public async save(filePath: string): Promise<void> {
public async save(): Promise<void> {
// Clear any existing timeout
if (this.saveToFileTimeout !== undefined) {
clearTimeout(this.saveToFileTimeout);
this.saveToFileTimeout.cancel();
this.saveToFileTimeout = undefined;
}

if (this.filePath === undefined || !this.hasChanged) {
return;
}

const keyValuesObject = Object.fromEntries(this.keyValues);
const decompressed = JSON.stringify(keyValuesObject);
const compressed = await util.promisify(zlib.deflate)(decompressed);
Expand All @@ -151,18 +212,19 @@ export default class Cache<V> implements CacheProps {
} satisfies CacheData;

// Ensure the directory exists
const dirPath = path.dirname(filePath);
const dirPath = path.dirname(this.filePath);
if (!await FsPoly.exists(dirPath)) {
await FsPoly.mkdir(dirPath, { recursive: true });
}

// Write to a temp file first, then overwrite the old cache file
const tempFile = await FsPoly.mktemp(filePath);
const tempFile = await FsPoly.mktemp(this.filePath);
await util.promisify(fs.writeFile)(
tempFile,
JSON.stringify(cacheData),
{ encoding: Cache.BUFFER_ENCODING },
);
await FsPoly.mv(tempFile, filePath);
await FsPoly.mv(tempFile, this.filePath);
this.hasChanged = false;
}
}
41 changes: 38 additions & 3 deletions src/types/files/archives/archiveEntry.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
import path from 'node:path';
import { Readable } from 'node:stream';

import {
Exclude, Expose, instanceToPlain, plainToClassFromExist,
} from 'class-transformer';

import Constants from '../../../constants.js';
import fsPoly from '../../../polyfill/fsPoly.js';
import Patch from '../../patches/patch.js';
Expand All @@ -14,9 +18,11 @@ export interface ArchiveEntryProps<A extends Archive> extends Omit<FileProps, 'f
readonly entryPath: string;
}

@Exclude()
export default class ArchiveEntry<A extends Archive> extends File implements ArchiveEntryProps<A> {
readonly archive: A;

@Expose()
readonly entryPath: string;

protected constructor(archiveEntryProps: ArchiveEntryProps<A>) {
Expand All @@ -34,11 +40,17 @@ export default class ArchiveEntry<A extends Archive> extends File implements Arc
): Promise<ArchiveEntry<A>> {
let finalSize = archiveEntryProps.size;
let finalCrcWithHeader = archiveEntryProps.crc32;
let finalCrcWithoutHeader = archiveEntryProps.crc32WithoutHeader;
let finalCrcWithoutHeader = archiveEntryProps.fileHeader
? archiveEntryProps.crc32WithoutHeader
: archiveEntryProps.crc32;
let finalMd5WithHeader = archiveEntryProps.md5;
let finalMd5WithoutHeader = archiveEntryProps.md5WithoutHeader;
let finalMd5WithoutHeader = archiveEntryProps.fileHeader
? archiveEntryProps.md5WithoutHeader
: archiveEntryProps.md5;
let finalSha1WithHeader = archiveEntryProps.sha1;
let finalSha1WithoutHeader = archiveEntryProps.sha1WithoutHeader;
let finalSha1WithoutHeader = archiveEntryProps.fileHeader
? archiveEntryProps.sha1WithoutHeader
: archiveEntryProps.sha1;
let finalSymlinkSource = archiveEntryProps.symlinkSource;

if (await fsPoly.exists(archiveEntryProps.archive.getFilePath())) {
Expand Down Expand Up @@ -101,6 +113,29 @@ export default class ArchiveEntry<A extends Archive> extends File implements Arc
});
}

static async entryOfObject<A extends Archive>(
archive: A,
obj: ArchiveEntryProps<A>,
): Promise<ArchiveEntry<A>> {
const deserialized = plainToClassFromExist(
new ArchiveEntry({ archive, entryPath: '' }),
obj,
{
enableImplicitConversion: true,
excludeExtraneousValues: true,
},
);
return this.entryOf({ ...deserialized, archive });
}

toObject(): object {
return instanceToPlain(this, {
exposeUnsetFields: false,
});
}

// Property getters

getArchive(): A {
return this.archive;
}
Expand Down
5 changes: 3 additions & 2 deletions src/types/files/archives/rar.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@ import path from 'node:path';
import async, { AsyncResultCallback } from 'async';
import { Mutex } from 'async-mutex';
import unrar from 'node-unrar-js';
import { Memoize } from 'typescript-memoize';

import Constants from '../../../constants.js';
import FileCache from '../fileCache.js';
import { ChecksumBitmask } from '../fileChecksums.js';
import Archive from './archive.js';
import ArchiveEntry from './archiveEntry.js';

Expand All @@ -19,7 +20,7 @@ export default class Rar extends Archive {
return new Rar(filePath);
}

@Memoize()
@FileCache.CacheArchiveEntries({ skipChecksumBitmask: ChecksumBitmask.CRC32 })
async getArchiveEntries(checksumBitmask: number): Promise<ArchiveEntry<Rar>[]> {
const rar = await unrar.createExtractorFromFile({
filepath: this.getFilePath(),
Expand Down
4 changes: 2 additions & 2 deletions src/types/files/archives/sevenZip.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ import path from 'node:path';
import _7z, { Result } from '7zip-min';
import async, { AsyncResultCallback } from 'async';
import { Mutex } from 'async-mutex';
import { Memoize } from 'typescript-memoize';

import Constants from '../../../constants.js';
import fsPoly from '../../../polyfill/fsPoly.js';
import FileCache from '../fileCache.js';
import Archive from './archive.js';
import ArchiveEntry from './archiveEntry.js';

Expand Down Expand Up @@ -39,7 +39,7 @@ export default class SevenZip extends Archive {
return new SevenZip(filePath);
}

@Memoize()
@FileCache.CacheArchiveEntries()
async getArchiveEntries(checksumBitmask: number): Promise<ArchiveEntry<SevenZip>[]> {
/**
* WARN(cemmer): even with the above mutex, {@link _7z.list} will still sometimes return no
Expand Down
4 changes: 2 additions & 2 deletions src/types/files/archives/tar.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ import fs from 'node:fs';
import path from 'node:path';

import tar from 'tar';
import { Memoize } from 'typescript-memoize';

import Constants from '../../../constants.js';
import FsPoly from '../../../polyfill/fsPoly.js';
import FileCache from '../fileCache.js';
import FileChecksums from '../fileChecksums.js';
import Archive from './archive.js';
import ArchiveEntry from './archiveEntry.js';
Expand All @@ -21,7 +21,7 @@ export default class Tar extends Archive {
return new Tar(filePath);
}

@Memoize()
@FileCache.CacheArchiveEntries()
async getArchiveEntries(checksumBitmask: number): Promise<ArchiveEntry<Tar>[]> {
const archiveEntryPromises: Promise<ArchiveEntry<Tar>>[] = [];

Expand Down
Loading
Loading