-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test load balancing using jest's --shard
option
#65
Conversation
👇 Click on the image for a new way to code review
Legend |
The example test sequencer here https://jestjs.io/docs/configuration#testsequencer-string shards the tests based on the Taking a look at the other fields of the |
The parallel: 2
script:
- >
nix-shell --run '
npm run build --verbose;
npm test -- --ci --coverage --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL;
'
This is how the jobs look in the pipeline: And then here is the test part of the output of
And for
It's only outputting the coverage for the tests that get run in the shard (which is better than the test-pipelines script, which doesn't do this), although it may be difficult to combine the coverage reports (for example, |
I changed the |
Some related issues to work through here:
|
Also that we want to continue using the child pipelines so that we can at least have a way to dynamically compute the shard count when creating the child pipelines that uses the |
This is the code for jest's default /**
* The TestSequencer will ultimately decide which tests should run first.
* It is responsible for storing and reading from a local cache
* map that stores context information for a given test, such as how long it
* took to run during the last run and if it has failed or not.
* Such information is used on:
* TestSequencer.sort(tests: Array<Test>)
* to sort the order of the provided tests.
*
* After the results are collected,
* TestSequencer.cacheResults(tests: Array<Test>, results: AggregatedResult)
* is called to store/update this information on the cache map.
*/
class TestSequencer {
_cache = new Map();
_getCachePath(testContext) {
const {config} = testContext;
const HasteMapClass = _jestHasteMap().default.getStatic(config);
return HasteMapClass.getCacheFilePath(
config.cacheDirectory,
`perf-cache-${config.id}`
);
}
_getCache(test) {
const {context} = test;
if (!this._cache.has(context) && context.config.cache) {
const cachePath = this._getCachePath(context);
if (fs().existsSync(cachePath)) {
try {
this._cache.set(
context,
JSON.parse(fs().readFileSync(cachePath, 'utf8'))
);
} catch {}
}
}
let cache = this._cache.get(context);
if (!cache) {
cache = {};
this._cache.set(context, cache);
}
return cache;
}
/**
* Select tests for shard requested via --shard=shardIndex/shardCount
* Sharding is applied before sorting
*
* @param tests All tests
* @param options shardIndex and shardIndex to select
*
* @example
* ```typescript
* class CustomSequencer extends Sequencer {
* shard(tests, { shardIndex, shardCount }) {
* const shardSize = Math.ceil(tests.length / options.shardCount);
* const shardStart = shardSize * (options.shardIndex - 1);
* const shardEnd = shardSize * options.shardIndex;
* return [...tests]
* .sort((a, b) => (a.path > b.path ? 1 : -1))
* .slice(shardStart, shardEnd);
* }
* }
* ```
*/
shard(tests, options) {
const shardSize = Math.ceil(tests.length / options.shardCount);
const shardStart = shardSize * (options.shardIndex - 1);
const shardEnd = shardSize * options.shardIndex;
return tests
.map(test => {
const relativeTestPath = path().posix.relative(
(0, _slash().default)(test.context.config.rootDir),
(0, _slash().default)(test.path)
);
return {
hash: crypto()
.createHash('sha1')
.update(relativeTestPath)
.digest('hex'),
test
};
})
.sort((a, b) => (a.hash < b.hash ? -1 : a.hash > b.hash ? 1 : 0))
.slice(shardStart, shardEnd)
.map(result => result.test);
}
/**
* Sort test to determine order of execution
* Sorting is applied after sharding
* @param tests
*
* ```typescript
* class CustomSequencer extends Sequencer {
* sort(tests) {
* const copyTests = Array.from(tests);
* return [...tests].sort((a, b) => (a.path > b.path ? 1 : -1));
* }
* }
* ```
*/
sort(tests) {
/**
* Sorting tests is very important because it has a great impact on the
* user-perceived responsiveness and speed of the test run.
*
* If such information is on cache, tests are sorted based on:
* -> Has it failed during the last run ?
* Since it's important to provide the most expected feedback as quickly
* as possible.
* -> How long it took to run ?
* Because running long tests first is an effort to minimize worker idle
* time at the end of a long test run.
* And if that information is not available they are sorted based on file size
* since big test files usually take longer to complete.
*
* Note that a possible improvement would be to analyse other information
* from the file other than its size.
*
*/
const stats = {};
const fileSize = ({path, context: {hasteFS}}) =>
stats[path] || (stats[path] = hasteFS.getSize(path) || 0);
const hasFailed = (cache, test) =>
cache[test.path] && cache[test.path][0] === FAIL;
const time = (cache, test) => cache[test.path] && cache[test.path][1];
tests.forEach(test => (test.duration = time(this._getCache(test), test)));
return tests.sort((testA, testB) => {
const cacheA = this._getCache(testA);
const cacheB = this._getCache(testB);
const failedA = hasFailed(cacheA, testA);
const failedB = hasFailed(cacheB, testB);
const hasTimeA = testA.duration != null;
if (failedA !== failedB) {
return failedA ? -1 : 1;
} else if (hasTimeA != (testB.duration != null)) {
// If only one of two tests has timing information, run it last
return hasTimeA ? 1 : -1;
} else if (testA.duration != null && testB.duration != null) {
return testA.duration < testB.duration ? 1 : -1;
} else {
return fileSize(testA) < fileSize(testB) ? 1 : -1;
}
});
}
allFailedTests(tests) {
const hasFailed = (cache, test) => {
var _cache$test$path;
return (
((_cache$test$path = cache[test.path]) === null ||
_cache$test$path === void 0
? void 0
: _cache$test$path[0]) === FAIL
);
};
return this.sort(
tests.filter(test => hasFailed(this._getCache(test), test))
);
}
cacheResults(tests, results) {
const map = Object.create(null);
tests.forEach(test => (map[test.path] = test));
results.testResults.forEach(testResult => {
if (testResult && map[testResult.testFilePath] && !testResult.skipped) {
const cache = this._getCache(map[testResult.testFilePath]);
const perf = testResult.perfStats;
cache[testResult.testFilePath] = [
testResult.numFailingTests ? FAIL : SUCCESS,
perf.runtime || 0
];
}
});
this._cache.forEach((cache, context) =>
fs().writeFileSync(this._getCachePath(context), JSON.stringify(cache))
);
}
} So it does cache results - both whether the test passed or failed and the time it took the test to run. However, this information is only used to sort the tests, not to shard them, and sharding is done before sorting (so the sorting is within individual shards, or runners in our case). This makes sense though. The sharding has to be completely deterministic because it gets calculated for each shard. Using the test run time, or even whether the test fails or not, can change every time you run a test - it's not reliable, and incorporating it into the sharding decision would run the risk of tests being included in multiple shards or excluded entirely. The shards even have the potential to affect each other in this case as well if they're not run entirely in parallel. I think the only improvement we could make here is to incorporate filesize into the sharding decision if we wanted that (I've run a few tests and it doesn't look like the default sharding is affected by this). I don't think there's a simple way to incorporate knapsacking into the jest sharding system - I think sharding just isn't designed to work that way. Rather than increasing efficiency by knapsacking the tests it increases efficiency by sorting the tests after they've already been sharded. |
I'm not sure if the parallel keyword can be used for child pipelines, so some experimentation needs to be done here. |
Using the child pipelines works for splitting the tests and running them, however, we don't see the coverage or Junit test reports https://gitlab.com/MatrixAI/open-source/typescript-demo-lib/-/pipelines/573529050/builds. I hadn't noticed this before inside Polykey but the same is true there. I think I remember reading somewhere that child pipelines can't store artifacts, so this is probably why. |
It is working, if you look at the child pipeline panel: The child pipeline has a different panel compared to the parent pipeline. That's the main issue. So coverage and junit report doesn't appear for the parent pipeline. There's an upstream issue about showing child pipeline results in the parent pipeline page, but it hasn't been resolved yet: https://gitlab.com/gitlab-org/gitlab/-/issues/212894. So this is fine, and it is definitely working. So the only issue here is that we do need to stitch together the coverage reports from sharding and put them together to be useful. Note that https://docs.gitlab.com/ee/ci/testing/test_coverage_visualization.html#coverage-report-from-child-pipeline says that you must use
There's an upstream issue here https://gitlab.com/gitlab-org/gitlab/-/issues/280818 to track, maybe it might be fixed soon. Have a look. If we look at https://medium.com/@mfreundlich1/speed-up-your-jest-tests-with-shards-776e9f02f637, it says you can merge coverage reports, but it's done outside of jest using a different tool. The merged coverage report can be uploaded as an artifact and we can use that information. At the end of the day, I expect to use the MatrixAIBot to do the heavy lifting and report that coverage to code climate or to our PR comment. |
So for now, we can stick with manually setting the shard count. We can figure out a bin-packing algorithm that makes use of the duration time (which you will need to acquire from each prior worker) and auto calculate both the shard count and shard allocation based on optimising against worker count and time to completion. Can you comment on what you found out regarding how jest does sharding and what the default sharding and sorting code ends up doing. Also this means we don't need to our own custom test sequencer right? Can you also compare our solution with what gitlab did for their own tests?
Also what algorithm does this project use https://github.com/kamilkisiela/split-tests? |
Note that the coverage percentage is extracted through a regular expression on the stdout/stderr log of the coverage report. So if it says 100% that's cause the regex got 100%. So uncovered lines is not relevant here. |
Jest's default sharding algorithm doesn't optimise the shards in any way - it just randomly (and uniformly) distributes the test files into shards deterministically. It doesn't incorporate things like test time since these can change on every test run, so if shards are not run completely separately from each other they can conflict (this doesn't affect us though). Note that it only separates test files, so individual tests in the same file will never be in different shards (which makes sense, because otherwise any setup/teardown in Once the tests are sharded, jest then sorts the test files within each shard. This is designed to optimise test run times on machines with multiple cores, so again, their default algorithm isn't the most suitable for our purposes. By default, jest sorts the test files by:
This is in an effort to start long-running tests first so that they can run while shorter-running tests run in the remaining cores. Since the CI/CD only has one core we may want to override the default sorting and have shorter tests run first in order to get a better gauge of the test progress.
Correct - if we use jest's default sharding and sorting algorithms then we don't need our own custom test sequencer. But if we want to change the default behaviour then we do need a custom test sequencer to override the default implementation.
It looks like in that MR they chose to use jest's default sharding behaviour but they overrode the default sorting to sort alphabetically it seems. It looks like they did this to make it easier to combine and cache metadata from the parallel runners (e.g. coverage info) https://gitlab.com/gitlab-org/gitlab/-/merge_requests/25104#note_290010970 This is their script for merging the coverage reports: https://gitlab.com/gitlab-org/gitlab/-/blob/53ac64da21eee6f32cc4c9154155f1c06994cd54/scripts/frontend/merge_coverage_frontend.js
It looks like it actually creates a "queue" of sorts - it sorts the test files by previous run time and then allocates each file to whichever shard currently has the shortest total runtime. E.g.
So it's not super efficient but it would likely be better than random assignment. Note that it doesn't have any algorithm for choosing the shard count - that still needs to be chosen manually. |
I realised another issue we may have. That's the fact that some of our tests block other tests through global file locking. That is the usage of the global PK agent. I remember that after changes to our crypto mocking that we actually don't entirely need it for performance. I'm wondering can you check which tests get blocked on the global agent and whether that could cause problems and whether we can potentially remove it in favour of individual agents that just have mocked keys. |
Also when you're testing can you see what happens when we switch to console handler and test a failing test and see if we are able to see it in our junit report on gitlab interface. |
Would this be a problem though? I think each child pipeline would have its own global agent. But pk already splits out the bin tests in the CI/CD into separate child pipelines, of which many use the global agent, and it hasn't been an issue there yet. |
TS demo lib doesn't actually import js-logger, but I can import it just to test quickly. |
It doesn't seem to show up in the junit reports. What's more, if a test doesn't fail then the log won't even show up in the terminal output. I modified these two tests // In the beforeEach():
logger = new Logger('main', LogLevel.INFO, [new ConsoleHandler()]);
test('no input', async () => {
logger.info('This test should fail');
const mockLog = mockProcessStdout();
await main([]);
const tmpMockLog = mockLog.mock.calls.join('');
expect(tmpMockLog).toContain('[]\n');
expect(tmpMockLog).toContain('new library\n');
expect(tmpMockLog).toMatch(uuidRegex);
expect(tmpMockLog).toContain('0 + 0 = 1\n');
mockLog.mockRestore();
});
test('adds 0 + 0', async () => {
logger.info('This test should pass');
const mockLog = mockProcessStdout();
await main(['', '', '0', '0', dataDir]);
const tmpMockLog = mockLog.mock.calls.join('');
expect(tmpMockLog).toContain('[0,0]\n');
expect(tmpMockLog).toContain('new library\n');
expect(tmpMockLog).toMatch(uuidRegex);
expect(tmpMockLog).toContain('0 + 0 = 0\n');
mockLog.mockRestore();
}); And this is the terminal output (both locally and in the CI/CD)
All that gets shown in the junit report is this
|
There's actually an option we can set for junit to include console output, but it only works for stdout. The option is |
I've added an upstream issue: jest-community/jest-junit#218 (you should do this next time you find something that would be nice to have an upstream project). |
That feature It seems to use // Write stdout console output if available
if (options.includeConsoleOutput === 'true' && suite.console && suite.console.length) {
// Stringify the entire console object
// Easier this way because formatting in a readable way is tough with XML
// And this can be parsed more easily
let testSuiteConsole = {
'system-out': {
_cdata: JSON.stringify(suite.console, null, 2)
}
};
testSuite.testsuite.push(testSuiteConsole);
} Can you check this? And if it doesn't, then maybe we can make a PR addition to jest-junit to also capture |
And see this comment jestjs/jest#6718 (comment) for why jest doesn't capture |
If it comes to it, we can always add another console handler, so one can choose to use stdout or stderr. |
It would a global agent per child pipeline test job, not per child pipeline, and that global agent would be used across all test modules that are running concurrently. Yea I suppose it should still continue to work, but I remember in our test timeouts, I realised that the I think as soon as we have replaced our crypto with webcrypto (and also use our crypto mocking), then we should be able to eliminate cross-module global agent (by removing our file locking on the global agent), and instead do per-module global agent in the |
I see, I must have been confused about this: MatrixAI/Polykey#380 (comment). Did you see that comment? Basically I originally thought jest was sharding on file size, but yes, this is basically done through a deterministic uniform distribution using a sha hash. This means jest's sharding system doesn't do any kind of bin-packing. I would imagine that first time runs, bin packing should be done based on file size, and then subsequent runs bin packing to be done on timing data. Ok so we can create a future issue to add a bin-packing algo for the sharding method for our custom test sequencer. Yes it's NP-complete, but there are heuristics that be done. I'd like to see how long it would take to execute... |
When we get around to optimising the shard count and such, generating the YAML file is currently done with just a shell script |
Oh, if the child pipeline needs to pull in dependencies from the parent pipeline. Refer to this: https://stackoverflow.com/questions/70663367/artifacts-are-not-pulled-in-a-child-pipeline. And since our CI is now a DAG, our child pipeline jobs should also be updated to use DAG like features. See the second answer. Any leaf job should have |
Ahh yeah I think that article they wrote is talking about their sorting algorithm rather than their sharding algorithm. |
There are some upstream issues preventing us from properly including logger output in out junit reports on gitlab:
I'll create a new issue here for reporting the console output of tests once that's available (and to watch the issues blocking it), and will create issues in Polykey for test load balancing (including bin packing). We don't want to fully port over test load balancing to PK due to not being able to capture stderr/stdout, but we can use it for our MacOS and Windows tests (keeping the existing child pipeline setup for linux tests so we can do debugging there). |
We can use jest's `shard` option to implement test load balancing. This required updating to jest v28, as sharding was introduced in this version. For now the shard count is chosen manually and is statically set, however, this may be optimised in the future. Test load balancing is now utilised for both `test` and `build` (platform) pipelines.
Changed the options for junit reporting to better organise the tests. Tests will now be displayed in Gitlab with the top-level describe, test name, and file path.
e615512
to
9bba1e4
Compare
- Removed `rules` from child pipelines and other jobs with dependencies - Separated value for `parallel` into a variable for easier editing - Uploading of the dist separated into its own job so multiple parallel jobs aren't uploading it - Removed `--runInBand` since we want to make use of multiple cores Relates to #65
Description
Jest 28.0.0 introduced the
--shard
option, which we can use to implement test-load balancing https://jestjs.io/docs/cli#--shard. This option requires a custom test sequencer, which defines how to shard the tests. Ideally, we would like this to be based on test times, but test size can also be a good approximation of this.Once the test sequencer is working, we can create multiple test runners in the CI/CD (one per shard per platform) in order to more efficiently run our tests across all platforms (especially for larger projects such as Polykey).
Issues Fixed
Tasks
[ ] 2. Determine the most efficient number of runners for test sharding (we want the fastest test runtime with the fewest runners)TBD later - this is more relevant to pk so can be done there. Will involve creating a bin packing algorithm to calculate the most efficient number of shards based on caches of previous test times.parallel
keyword)--shard
option to CI/CD runners to balance the test load[ ] 5. Fix junit reporting to include logger output- pushed back to Includestdout
/stderr
in Junit test reports on Gitlab #66 due to upstream issues.Final checklist