add support for running tests in parallel

> (this PR depends on most other PRs linked to #4198, so they should be merged first; documentation will be in another PR) This PR adds support for running test files in parallel via `--parallel`. For many cases, this should "just work." When the `--parallel` flag is supplied, Mocha will swap out the default `Runner` (`lib/runner.js`) for `BufferedRunner` (`lib/buffered-runner.js`). `BufferedRunner` _extends_ `Runner`. `BufferedRunner#run()` is the main point of extension. Instead of executing the tests in serial, it will create a pool of worker processes (not worker _threads_) based on the maximum job count (`--jobs`; defaults to `<number of CPU cores> - 1`). Both `BufferedRunner` and the `worker` module consume the abstraction layer, [workerpool](https://npm.im/workerpool). `BufferedRunner#run()` does _not_ load the test files, unlike `Runner#run()`. Instead, it has a list of test files, and puts these into an async queue. The `EVENT_RUN_BEGIN` event is then emitted. As files enter the queue, `BufferedRunner#run()` tells `workerpool` to execute the `run()` function of the pool. `workerpool` then launches as many worker processes are needed--up to the maximum--and executes the `run()` function with a single filepath and any options for a `Mocha` instance. The first time `lib/worker.js` is invoked, it will "bootstrap" itself, by handling `--require`'d modules and validating the UI. Note that _reporter validation_ does not occur. Once bootstrapped, it instantiate `Mocha`, add the single file, swap any reporter out for the `Buffered` reporter (`lib/reporters/buffered.js`) then execute `Mocha#run()`, which invokes `Runner#run()`. The `Buffered` reporter listens for events emitting from the `Runner` instance, like a reporter usually does. But instead of outputting to the console, it buffers the events in a queue. Once the file has completed running, the queue is drained: the events collected are (trivially) serialized for transmission back to the main process. `BufferedRunner#run()` receives the list of events, trivially _deserializes_ them, and re-emits the events to whatever the chosen reporter is (e.g., the `spec` reporter). In this way, the reporters don't know that the tests were run in parallel. Practically, the user will see reporter output in "chunks" instead of the "stream" of results they usually expect. This method ensures that while the test files run in a nondeterministic order, the reporter output will be deterministic for any given test file. Once the result (the queue of events) has been returned to the main process, the worker process stays open, but waits for further instruction. If there are more files in `BufferedRunner#run()`'s queue, `workerpool` will instruct the worker to take the next file from the list, and so on, and so forth. When all files have executed, the pool terminates, the `EVENT_RUN_END` event is emitted, and the reporter handles it. > (this section is pasted from the documentation with minimal edits) Due to the nature of the following reporters, they cannot work when running tests in parallel: - `markdown` - `progress` - `json-stream` These reporters expect Mocha to know _how many tests it plans to run_ before execution. This information is unavailable in parallel mode, as test files are loaded only when they are about to be run. In serial mode, tests results will "stream" as they occur. In parallel mode, reporter output is _buffered_; reporting will occur after each file is completed. In practice, the reporter output will appear in "chunks" (but will otherwise be identical). In parallel mode, we have no guarantees about the order in which test files will be run--or what process runs them--as it depends on the execution times of the test files. Because of this, the following options _cannot be used_ in parallel mode: - `--file` - `--sort` - `--delay` Because running tests in parallel mode uses more system resources at once, the OS may take extra time to schedule and complete some operations. For this reason, test timeouts may need to be increased either globally or otherwise. When used with `--bail` (or `this.bail()`) to exit after the first failure, it's likely other tests will be running at the same time. Mocha must shut down its worker processes before exiting. Likewise, subprocesses may throw uncaught exceptions. When used with `--allow-uncaught`, Mocha will "bubble" this exception to the main process, but still must shut down its processes. > _NOTE: This only applies to test files run parallel mode_. A root-level hook is a hook in a test file which is _not defined_ within a suite. An example using the `bdd` interface: ```js // test/setup.js beforeEach(function() { doMySetup(); }); afterEach(function() { doMyTeardown(); }); ``` When run (in the default "serial" mode) via `mocha --file "./test/setup.js" "./test/**/*.spec.js"`, `setup.js` will be executed _first_, and install the two hooks shown above for every test found in `./test/**/*.spec.js`. **When Mocha runs in parallel mode, test files do not share the same process.** Consequently, a root-level hook defined in test file _A_ won't be present in test file _B_. There are a (minimum of) two workarounds for this: 1. `require('./setup.js')` or `import './setup.js'` at the top of every test file. Best avoided for those averse to boilerplate. 1. _Recommended_: Define root-level hooks in a required file, using the new (also as of VERSION) Root Hook Plugin system. Parallel mode is only available in Node.js. If you find your tests don't work properly when run with `--parallel`, either shrug and move on, or use this handy-dandy checklist to get things working: - ✅ Ensure you are using a supported reporter. - ✅ Ensure you are not using other unsupported flags. - ✅ Double-check your config file; options set in config files will be merged with any command-line option. - ✅ Look for root-level hooks in your tests. Move them into a root hook plugin. - ✅ Do any assertion, mock, or other test libraries you're consuming use root hooks? They may need to be migrated for compatibility with parallel mode. - ✅ If tests are unexpectedly timing out, you may need to increase the default test timeout (via `--timeout`) - ✅ Ensure your tests do not depend on being run in a specific order. - ✅ Ensure your tests clean up after themselves; remove temp files, handles, sockets, etc. Don't try to share state or resources between test files. Some types of tests are _not_ so well-suited to run in parallel. For example, extremely timing-sensitive tests, or tests which make I/O requests to a limited pool of resources (such as opening ports, or automating browser windows, hitting a test DB, or remote server, etc.). Free-tier cloud CI services may not provide a suitable multi-core container or VM for their build agents. Regarding expected performance gains in CI: your mileage may vary. It may help to use a conditional in a `.mocharc.js` to check for `process.env.CI`, and adjust the job count as appropriate. It's unlikely (but not impossible) to see a performance gain from a job count _greater than_ the number of available CPU cores. That said, _play around with the job count_--there's no one-size-fits all, and the unique characteristics of your tests will determine the optimal number of jobs; it may even be that fewer is faster! - updated signal handling in `bin/mocha` to a) better work with Windows, and b) work properly with `--parallel` to avoid leaving zombie workers - docstrings in `lib/cli/collect-files.js` - refactors in `lib/cli/run-helpers.js` and `lib/cli/watch-run.js`. We now have four methods: - `watchRun()` - serial + watch - `singleRun()` - serial - `parallelWatchRun()` - parallel + watch - `parallelRun()` - parallel - `lib/cli/run.js` and `lib/cli/run-option-metadata.js`: additions for new options and checks for incompatibility - add `lib/reporters/buffered.js` (`Buffered`); this reporter is _not_ re-exported in `Mocha.reporters`, since it should only be invoked internally. - tweak `landing` reporter to avoid referencing `Runner#total`, which is incompatible with parallel mode. It didn't need to do so in the first place! - the `tap` reporter now outputs the plan at the _end_ instead of at the beginning (avoiding a call to `Runner#grepTotal()`, which is incompatible with parallel mode). This is within spec, so should not be a breaking change. - add `lib/buffered-runner.js` (`BufferedRunner`); subclass of `Runner` which overrides the `run()` method. - There's a little custom finite state machine in here. didn't want to pull in a third-party module, but we should consider doing so if we use FSM's elsewhere. - when `DEBUG=mocha:parallel*` is in the env, this module will output statistics about the worker pool every 5s - the `run()` method looks a little weird because I wanted to use `async/await`, but the method it is overriding (`Runner#run`) is _not_ `async` - traps `SIGINT` to gracefully terminate the pool - pulls in [promise.allsettled](https://npm.im/promise.allsettled) polyfill to handle workers that may have rejected with uncaught exceptions - "bail" support is best-effort. - the `ABORTING` state is only for interruption via `SIGINT` or if `allowUncaught` is true and we get an uncaught exception - `Hook`, `Suite`, `Test`: add a `serialize()` method. This pulls out the most relevant information about the object for transmission over IPC. It's called by worker processes for each event received by its `Runner`; event arguments (e.g., `test` or `suite`) are serialized in this manner. Note that this _limits what reporters have access to_, which may break compatibility with third-party reporters that may use information that is missing from the serialized object. As those cases arise, we can add more information to the serialized objects (in some cases). The `$$` convention tells the _deserializer_ to turn the property into a function which returns the passed value, e.g., `test.fullTitle()`. - `lib/mocha.js`: - refactor `Mocha#reporter` for nicer parameter & variable names - rename `loadAsync` to `lazyLoadFiles`, which is more descriptive, IMO. It's a private property, so should not be a breaking change. - Constructor will dynamically choose the appropriate `Runner` - `lib/runner.js`: `BufferedRunner` needs the options from `Mocha#options`, so I updated the parent method to define the parameter. It is unused here. - add `lib/serializer.js`: on the worker process side, manages event queue serialization; manages deserialization of the event queue in the main process. - I spent a long time trying to get this working. We need to account for things like `Error` instances, with their stack traces, since those can be event arguments (e.g., `EVENT_TEST_FAIL` sends both a `Test` and the `Error`). It's impossible to serialize circular (self-referential) objects, so we need to account for those as well. - Not super happy with the deserialization algorithm, since it's recursive, but it shouldn't be too much of an issue because the serializer won't output circular structures. - Attempted to avoid prototype pollution issues - Much of this works by mutating objects, mostly because it can be more performant. The code can be changed to be "more immutable", as that's likely to be easier to understand, if it doesn't impact performance too much. We're serializing potentially very large arrays of stuff. - The `__type` prop is a hint for the deserializer. This convention allows us to re-expand plain objects back into `Error` instances, for example. You can't send an `Error` instance over IPC! - add `lib/worker.js`: - registers its `run()` function with `workerpool` to be called by main process - if `DEBUG=mocha:parallel*` is set, will output information (on an interval) about long-running test files - afaik the only way `run()` can reject is if `allowUncaught` is true or serialization fails - any user-supplied `reporter` value is replaced with the `Buffered` reporter. thus, reporters are not validated. - the worker uses `Runner`, like usual. - tests: - see `test/integration/options/parallel.spec.js` for the interesting stuff - upgrade `unexpected` for "to have readonly property" assertion - upgrade `unexpected-eventemitter` for support async function support - integration test helpers allow Mocha's developers to use `--bail` and `--parallel`, but will default to `--no-bail` and `--no-parallel`. - etc: - update `.eslintrc.yml` for new Node-only files - increase default timeout to `1000` (also seen in another PR) and use `parallel` mode by default in `.mocharc.yml` - run node unit tests _in serial_ as sort of a smoke test, as otherwise all our tests would be run in parallel - karma, browserify: ignore files for parallel support - force color output in CI. this is nice on travis, but ugly on appveyor. either way, it's easier to read than having no color Ref: #4198
mochajs · May 20, 2020 · 7843f1f · 7843f1f
1 parent cb5eb8e
commit 7843f1f
Show file tree

Hide file tree

Showing 47 changed files with 4,324 additions and 187 deletions.
diff --git a/.eslintrc.yml b/.eslintrc.yml
@@ -21,22 +21,28 @@ rules:
       property: 'assign'
 overrides:
   - files:
-      - docs/js/**/*.js
+      - 'docs/js/**/*.js'
     env:
       node: false
   - files:
-      - scripts/**/*.js
-      - package-scripts.js
-      - karma.conf.js
-      - .wallaby.js
-      - .eleventy.js
-      - bin/*
-      - lib/cli/**/*.js
-      - test/node-unit/**/*.js
-      - test/integration/options/watch.spec.js
-      - test/integration/helpers.js
-      - lib/growl.js
-      - docs/_data/**/*.js
+      - 'scripts/**/*.js'
+      - 'package-scripts.js'
+      - 'karma.conf.js'
+      - '.wallaby.js'
+      - '.eleventy.js'
+      - 'bin/*'
+      - 'lib/cli/**/*.js'
+      - 'test/node-unit/**/*.js'
+      - 'test/integration/options/watch.spec.js'
+      - 'test/integration/helpers.js'
+      - 'lib/growl.js'
+      - 'lib/buffered-runner.js'
+      - 'lib/worker.js'
+      - 'lib/reporters/buffered.js'
+      - 'lib/serializer.js'
+      - 'lib/pool.js'
+      - 'test/reporters/buffered.spec.js'
+      - 'docs/_data/**/*.js'
     parserOptions:
       ecmaVersion: 2018
     env:

diff --git a/.mocharc.yml b/.mocharc.yml
@@ -5,6 +5,7 @@ global:
   - 'okGlobalC'
   - 'callback*'
 timeout: 1000
+parallel: true
 watch-ignore:
   - '.*'
   - 'docs/_dist/**'

diff --git a/.travis.yml b/.travis.yml
@@ -39,7 +39,8 @@ jobs:
     - script: COVERAGE=1 npm start test.node
       after_success: npm start coveralls
       name: 'Latest Node.js (with coverage)'
-
+    - script: MOCHA_PARALLEL=0 npm start test.node.unit
+      name: 'Latest Node.js (unit tests in serial mode)'
     - &node
       script: npm start test.node
       node_js: '13'

diff --git a/bin/mocha b/bin/mocha
@@ -130,8 +130,23 @@ if (Object.keys(nodeArgs).length) {
 
   // terminate children.
   process.on('SIGINT', () => {
-    proc.kill('SIGINT'); // calls runner.abort()
-    proc.kill('SIGTERM'); // if that didn't work, we're probably in an infinite loop, so make it die.
+    // XXX: a previous comment said this would abort the runner, but I can't see that it does
+    // anything with the default runner.
+    debug('main process caught SIGINT');
+    proc.kill('SIGINT');
+    // if running in parallel mode, we will have a proper SIGINT handler, so the below won't
+    // be needed.
+    if (!args.parallel || args.jobs < 2) {
+      // win32 does not support SIGTERM, so use next best thing.
+      if (require('os').platform() === 'win32') {
+        proc.kill('SIGKILL');
+      } else {
+        // using SIGKILL won't cleanly close the output streams, which can result
+        // in cut-off text or a befouled terminal.
+        debug('sending SIGTERM to child process');
+        proc.kill('SIGTERM');
+      }
+    }
   });
 } else {
   debug('running Mocha in-process');

diff --git a/karma.conf.js b/karma.conf.js
@@ -37,6 +37,11 @@ module.exports = config => {
           .ignore('./lib/esm-utils.js')
           .ignore('path')
           .ignore('supports-color')
+          .ignore('./lib/buffered-runner.js')
+          .ignore('./lib/reporters/buffered.js')
+          .ignore('./lib/serializer.js')
+          .ignore('./lib/worker.js')
+          .ignore('./lib/pool.js')
           .on('bundled', (err, content) => {
             if (err) {
               throw err;

diff --git a/lib/buffered-runner.js b/lib/buffered-runner.js
@@ -0,0 +1,247 @@
+'use strict';
+
+const allSettled = require('promise.allsettled');
+const Runner = require('./runner');
+const {EVENT_RUN_BEGIN, EVENT_RUN_END} = Runner.constants;
+const debug = require('debug')('mocha:parallel:buffered-runner');
+const {WorkerPool} = require('./pool');
+const {setInterval, clearInterval} = global;
+const {createMap} = require('./utils');
+
+/**
+ * Outputs a debug statement with worker stats
+ * @param {WorkerPool} pool - Worker pool
+ */
+const debugStats = pool => {
+  const {totalWorkers, busyWorkers, idleWorkers, pendingTasks} = pool.stats();
+  debug(
+    '%d/%d busy workers; %d idle; %d tasks queued',
+    busyWorkers,
+    totalWorkers,
+    idleWorkers,
+    pendingTasks
+  );
+};
+
+/**
+ * The interval at which we will display stats for worker processes in debug mode
+ */
+const DEBUG_STATS_INTERVAL = 5000;
+
+const ABORTED = 'ABORTED';
+const IDLE = 'IDLE';
+const ABORTING = 'ABORTING';
+const RUNNING = 'RUNNING';
+const BAILING = 'BAILING';
+const BAILED = 'BAILED';
+const COMPLETE = 'COMPLETE';
+
+const states = createMap({
+  [IDLE]: new Set([RUNNING, ABORTING]),
+  [RUNNING]: new Set([COMPLETE, BAILING, ABORTING]),
+  [COMPLETE]: new Set(),
+  [ABORTED]: new Set(),
+  [ABORTING]: new Set([ABORTED]),
+  [BAILING]: new Set([BAILED, ABORTING]),
+  [BAILED]: new Set([COMPLETE, ABORTING])
+});
+
+/**
+ * This `Runner` delegates tests runs to worker threads.  Does not execute any
+ * {@link Runnable}s by itself!
+ */
+class BufferedRunner extends Runner {
+  constructor(...args) {
+    super(...args);
+
+    let state = IDLE;
+    Object.defineProperty(this, '_state', {
+      get() {
+        return state;
+      },
+      set(newState) {
+        if (states[state].has(newState)) {
+          state = newState;
+        } else {
+          throw new Error(`invalid state transition: ${state} => ${newState}`);
+        }
+      }
+    });
+
+    this.once('EVENT_RUN_END', () => {
+      this._state = COMPLETE;
+    });
+  }
+
+  /**
+   * Runs Mocha tests by creating a thread pool, then delegating work to the
+   * worker threads.
+   *
+   * Each worker receives one file, and as workers become available, they take a
+   * file from the queue and run it. The worker thread execution is treated like
+   * an RPC--it returns a `Promise` containing serialized information about the
+   * run.  The information is processed as it's received, and emitted to a
+   * {@link Reporter}, which is likely listening for these events.
+   *
+   * @param {Function} callback - Called with an exit code corresponding to
+   * number of test failures.
+   * @param {{files: string[], options: Options}} opts - Files to run and
+   * command-line options, respectively.
+   */
+  run(callback, {files, options} = {}) {
+    /**
+     * Listener on `Process.SIGINT` which tries to cleanly terminate the worker pool.
+     */
+    let sigIntListener;
+    // This function should _not_ return a `Promise`; its parent (`Runner#run`)
+    // returns this instance, so this should do the same. However, we want to make
+    // use of `async`/`await`, so we use this IIFE.
+
+    (async () => {
+      /**
+       * This is an interval that outputs stats about the worker pool every so often
+       */
+      let debugInterval;
+
+      /**
+       * @type {WorkerPool}
+       */
+      let pool;
+
+      try {
+        pool = WorkerPool.create({maxWorkers: options.jobs});
+
+        sigIntListener = async () => {
+          if (this._state !== ABORTING) {
+            debug('run(): caught a SIGINT');
+            this._state = ABORTING;
+
+            try {
+              debug('run(): force-terminating worker pool');
+              await pool.terminate(true);
+            } catch (err) {
+              console.error(
+                `Error while attempting to force-terminate worker pool: ${err}`
+              );
+            } finally {
+              process.nextTick(() => {
+                debug('run(): imminent death');
+                this._state = ABORTED;
+                process.kill(process.pid, 'SIGINT');
+              });
+            }
+          }
+        };
+
+        process.once('SIGINT', sigIntListener);
+
+        debugInterval = setInterval(
+          () => debugStats(pool),
+          DEBUG_STATS_INTERVAL
+        ).unref();
+
+        // this is set for uncaught exception handling in `Runner#uncaught`
+        this.started = true;
+        this._state = RUNNING;
+
+        this.emit(EVENT_RUN_BEGIN);
+
+        const results = await allSettled(
+          files.map(async file => {
+            debug('run(): enqueueing test file %s', file);
+            try {
+              const {failureCount, events} = await pool.run(file, options);
+              if (this._state === BAILED) {
+                // short-circuit after a graceful bail
+                return;
+              }
+              debug(
+                'run(): completed run of file %s; %d failures / %d events',
+                file,
+                failureCount,
+                events.length
+              );
+              this.failures += failureCount; // can this ever be non-numeric?
+              /**
+               * If we set this, then we encountered a "bail" flag, and will
+               * terminate the pool once all events have been emitted.
+               */
+              let event = events.shift();
+              while (event) {
+                this.emit(event.eventName, event.data, event.error);
+                if (
+                  this._state !== BAILING &&
+                  event.data &&
+                  event.data._bail &&
+                  (failureCount || event.error)
+                ) {
+                  debug('run(): nonzero failure count & found bail flag');
+                  // we need to let the events complete for this file, as the worker
+                  // should run any cleanup hooks
+                  this._state = BAILING;
+                }
+                event = events.shift();
+              }
+              if (this._state === BAILING) {
+                debug('run(): terminating pool due to "bail" flag');
+                this._state = BAILED;
+                await pool.terminate();
+              }
+            } catch (err) {
+              if (this._state === BAILED || this._state === ABORTING) {
+                debug(
+                  'run(): worker pool terminated with intent; skipping file %s',
+                  file
+                );
+              } else {
+                // this is an uncaught exception
+                debug('run(): encountered uncaught exception: %O', err);
+                if (this.allowUncaught) {
+                  // still have to clean up
+                  this._state = ABORTING;
+                  await pool.terminate(true);
+                }
+                throw err;
+              }
+            } finally {
+              debug('run(): done running file %s', file);
+            }
+          })
+        );
+
+        // note that pool may already be terminated due to --bail
+        await pool.terminate();
+
+        results
+          .filter(({status}) => status === 'rejected')
+          .forEach(({reason}) => {
+            if (this.allowUncaught) {
+              // yep, just the first one.
+              throw reason;
+            }
+            // "rejected" will correspond to uncaught exceptions.
+            // unlike the serial runner, the parallel runner can always recover.
+            this.uncaught(reason);
+          });
+
+        if (this._state === ABORTING) {
+          return;
+        }
+        this.emit(EVENT_RUN_END);
+        debug('run(): completing with failure count %d', this.failures);
+        callback(this.failures);
+      } catch (err) {
+        process.nextTick(() => {
+          debug('run(): throwing uncaught exception');
+          throw err;
+        });
+      } finally {
+        clearInterval(debugInterval);
+        process.removeListener('SIGINT', sigIntListener);
+      }
+    })();
+    return this;
+  }
+}
+
+module.exports = BufferedRunner;
diff --git a/lib/cli/collect-files.js b/lib/cli/collect-files.js
@@ -17,13 +17,7 @@ const {NO_FILES_MATCH_PATTERN} = require('../errors').constants;
 
 /**
  * Smash together an array of test files in the correct order
- * @param {Object} opts - Options
- * @param {string[]} opts.extension - File extensions to use
- * @param {string[]} opts.spec - Files, dirs, globs to run
- * @param {string[]} opts.ignore - Files, dirs, globs to ignore
- * @param {string[]} opts.file - List of additional files to include
- * @param {boolean} opts.recursive - Find files recursively
- * @param {boolean} opts.sort - Sort test files
+ * @param {FileCollectionOptions} [opts] - Options
  * @returns {string[]} List of files to test
  * @private
  */
@@ -84,3 +78,14 @@ module.exports = ({ignore, extension, file, recursive, sort, spec} = {}) => {
 
   return files;
 };
+
+/**
+ * An object to configure how Mocha gathers test files
+ * @typedef {Object} FileCollectionOptions
+ * @property {string[]} extension - File extensions to use
+ * @property {string[]} spec - Files, dirs, globs to run
+ * @property {string[]} ignore - Files, dirs, globs to ignore
+ * @property {string[]} file - List of additional files to include
+ * @property {boolean} recursive - Find files recursively
+ * @property {boolean} sort - Sort test files
+ */