Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split tests for parallelization #947

Open
trentmwillis opened this issue Mar 3, 2016 · 23 comments
Open

Split tests for parallelization #947

trentmwillis opened this issue Mar 3, 2016 · 23 comments
Labels
Category: API Type: Enhancement New idea or feature request. Type: Meta Seek input from maintainers and contributors.

Comments

@trentmwillis
Copy link
Member

Recently, I've seen a lot of interest in and discussion on running JS tests in parallel. Most of these rely on faking concurrency through async behavior, while that is definitely an interesting approach I am not convinced it is the proper solution for browser-based tests that often rely on the document and other global/shared state. That said, I think finding a way to parallelize tests would be hugely beneficial, especially for larger code bases.

I'd like to propose that QUnit bake in support for splitting tests into groups at runtime. This would then allow users to parallelize test runs by using multiple, independent instances of their test page.

I imagine this could be based on url params like so:

/tests.html?batches=10&batch=1

This would split the tests into 10 different batches and then run the first batch. You can then adjust the batch parameter to run the other groups in other browser instances/tabs.

As an initial implementation, this could be super simple and just break the tests up into equally sized batches. Then, as an enhancement, we could standardize a test output that can be fed back into QUnit to allow weighted splitting to ensure the batches run in approximately equal amounts of time.

The primary benefit of this would be to allow parallelization, but in doing so it would also help identify non-atomic tests in the same way as the recently introduced test randomization feature does.

For full disclosure, this would benefit a recently implemented Testem feature to support running multiple test pages in parallel.

@leobalter
Copy link
Member

This is very interesting but at the same time it looks it accounts for mostly for Browsers only. I would like to see if it's possible to extend it to node environments, using any sort of async instead of browser tabs.

On this idea of browser tabs: is it a bad to use WebWorkers? With the next major version advent (2.0) we will be able to limit our browser support which will have most of them supporting this api.

@trentmwillis
Copy link
Member Author

It is definitely targeted at browsers, though it could be used in other environments. Since the core concept is just to split tests, you could theoretically do that in different processes of any sort, not just browsers (though there may be better solutions in those cases).

I've considered WebWorkers, the primary issue there, however, is they have no DOM access (amongst other things).

I could see this being a plugin for QUnit, since it doesn't necessarily apply to all use cases, but I think to properly support it we would need to expose some lower-level constructs, such as how tests get queued.

@gibson042
Copy link
Member

I think the worker model is a good one to shoot for, implementation agnostic with respect to Web Worker or setTimeout or sibling window or iframe window or Node.js child process or whatever. The single shared DOM as common test fixture is holding QUnit back, and while I don't see it going away, we can and should enable suites that abandon it to gain the commensurate performance benefits.

To that end, I'd rather see a parallelization count (e.g., ?workers=4) than a batch count.

@leobalter
Copy link
Member

This window/iframe parallelization is probably already covered on qunit-composite. Maybe for tests that needs the DOM API we might end up with something similar. I am not sure, but maybe ember-cli projects also run different test htmls in parallel as well?

@gibson042
Copy link
Member

This window/iframe parallelization is probably already covered on qunit-composite

The core of it maybe, but the worker model I want is generic and inherently self-balancing, rather than manually pre-partitioned.

@trentmwillis
Copy link
Member Author

I think the worker model is a good one to shoot for, implementation agnostic with respect to Web Worker or setTimeout or sibling window or iframe window or Node.js child process or whatever.


the worker model I want is generic and inherently self-balancing

Not sure I completely follow this "worker model" you're thinking of. Would this simply amount to scheduling the tests in worker groups? The "worker" which then handles that group is non-specific (e.g., web worker, child process, etc.)?

Or is it more along the lines of: you have a number of workers and those handle individual tests as they become available to accept a new test?

rather than manually pre-partitioned.

This is the big thing I am looking for as well with this idea.

@gibson042
Copy link
Member

Or is it more along the lines of: you have a number of workers and those handle individual tests as they become available to accept a new test?

Yes. Once we have it in place, non-browser parallelism (or thread-leveraging, or tab-distributed, or …) is just a matter of implementing the appropriate workers.

@trentmwillis
Copy link
Member Author

I think this would make sense for a lot of use cases. However, I think any use case involving DOM would require the iframe approach (which may not necessarily be a bad thing). I can't think of a good solution for allowing QUnit to communicate across multiple tabs or browser instances.

That said, my primary concern with this approach is determinism. Since we won't really have control over the order in which the workers become available, reproducing failures of non-atomic tests might become difficult.

A secondary concern with this is that it would preclude sharing state across tests. In general, I know that sharing state is bad, but I also know that there are certain patterns which could benefit from it. For instance, in the Ember community there has been talk of using a single instance of an application across tests for performance reasons.

@gibson042
Copy link
Member

However, I think any use case involving DOM would require the iframe approach (which may not necessarily be a bad thing).

Rare indeed is the test that requires a complete DOM as opposed to a container div, but any such examples are free to forgo parallelism, implement synchronization (e.g., a document-level semaphore), or utilize multiple windows as you suggest.

I can't think of a good solution for allowing QUnit to communicate across multiple tabs or browser instances.

postMessage is the obvious choice, but this has already been solved in qunit-composite.

Since we won't really have control over the order in which the workers become available, reproducing failures of non-atomic tests might become difficult.

Right, opting in to parallel execution requires more discipline in test definition. But the sequential mode (i.e., single-worker) will always remain available.

A secondary concern with this is that it would preclude sharing state across tests. In general, I know that sharing state is bad, but I also know that there are certain patterns which could benefit from it. For instance, in the Ember community there has been talk of using a single instance of an application across tests for performance reasons.

I see no reason why that would be precluded.

@mariokostelac
Copy link

Right, opting in to parallel execution requires more discipline in test definition. But the sequential mode (i.e., single-worker) will always remain available.

That's definitely right, but I am not sure how practical it is. There will always be cases where test suite will fail and reproducing the failure is the only way I am aware of to deterministically getting closer to fixing the problem. Implementing workers model draining the queue would require some way of reproducing the same ordering (and possibly concurrency), which seems like an unnecessary difficult problem to tackle.

Implementing the original proposal would gain worse utilization (since not all tests take the same time to execute), but it would be a very simple model, possibly extensible with tests weight to gain better efficiency.

I think that the same model works well for node projects too (I am not really developing production node systems so correct me here please). Multi-container CI setups could run different batches in different containers. There is definitely some extra work here because somebody has to make sure that all processes running different batches pass in order to make the whole suite pass, but that's done by CI providers anyways (and trivial to implement).

@trentmwillis
Copy link
Member Author

Finally circling back to this. After rereading the discussion and thinking about it, I'm onboard with the worker model.

In order to deal with the reproducibility problem we can simply track the order in which tests are executed by various workers. We'll then report that in the runEnd data and we could provide an API to then feedback that data into QUnit to get a reproduced run. This should be relatively straightforward since we already assign tests unique ids.

Additionally, we have now refactored the queue for tests in QUnit to make it much cleaner which should make it much easier to implement more complex scheduling algorithms.

The primary remaining hurdle I see is to define an API for actually defining tests that can run in parallel and an API for defining "workers" that run the tests.

@gibson042 I know it's been a looong time since the last discussion on this, but if you had any ideas at the time of a potential API, I'd love to hear it.

@mariokostelac
Copy link

In order to deal with the reproducibility problem we can simply track the order in which tests are executed by various workers. We'll then report that in the runEnd data and we could provide an API to then feedback that data into QUnit to get a reproduced run. This should be relatively straightforward since we already assign tests unique ids.

That would work 😊

@trentmwillis
Copy link
Member Author

I've put together a gist with a proposal for the potential API here: https://gist.github.com/trentmwillis/c8c9a8e1dcf85b9afa8fbfc4d8a4c5b1

Take a look and feel free to either leave comments on the gist or here.

@mariokostelac
Copy link

Parallelization of already loaded JavaScript (#1) is not really feasible
since transfer of runtime objects between "workers" often involves structured
cloning which doesn't support functions (amongst some other JS constructs).

Is it possible to load javascript, not execute tests and communicate just test examples, not js objects (let's say we have some unique identifier for each test)? That way the testing process and all workers can have their independent copy of objects, we can split with better granularity (test example instead of a file for example), but avoid hard problem of transferring js objects around.

(It is possible i am getting something completely wrong :))

@gibson042
Copy link
Member

Thank you for mentioning me. I have not intentionally abandoned QUnit, I just ran out of spare time to work on it (there are about 100 QUnit notifications in my inbox, waiting for me to get to "someday"). So as a result, I really don't know how it has progressed over the past year. Regardless, though, my general position on this point hasn't changed.

I always imagined that there would be a single queue of tests instead of files, though I hadn't considered the challenges of sharing host objects. Still, I think it's possible... imagine something like this iframeWorker implementation, where the worker presented to QUnit exists in the correct realm and uses postMessage to manage the "true" worker (which loads QUnit just like the parent window, but then turns it off and retrieves/runs tests directly):

let workerKey = QUnit.urlParams.iframeWorkerKey;
let controller = workerKey && window.parent;
if ( !controller ) {
    // bikesheddable name and interface (callback vs. promise)
    QUnit.registerWorkerFactory(() => new Promise(function(ready, fail) {
        const worker = {
            runTest(testId, assert) {
                const testPromise = new Promise(function(resolve, reject) {
                    testPromises.set(testId, { resolve, reject, assert });
                });
                const done = assert.async();
                workerWindow.postMessage({ key: workerKey, testId }, "*");
                return testPromise.then( v => { done(); return v }, v => { done(); throw v } );
            }
        };
        let testPromises = new Map();
        let workerWindow;
        const workerKey = String(Math.random()).replace(0, window.performance.now());
        const iframe = document.createElement("iframe");
        iframe.src = setUrl({ iframeWorkerKey: workerKey });
        window.addEventListener("message", function( evt ) {
            // Only process messages from our iframe.
            if ( typeof evt.data !== "object" || evt.data.key !== workerKey ) {
                return;
            }
            // The first message communicates worker initialization.
            if ( !workerWindow ) {
                workerWindow = evt.source;
                ready(worker);
                return;
            }
            // Check for errors.
            if ( evt.data.error ) {
                testPromises.get(evt.data.testId).reject(new Error(evt.data.error));
                return;
            }
            // Proxy assertions.
            if ( evt.data.assertion ) {
                testPromises.get(evt.data.testId).assert(…evt.data.assertion);
                return;
            }
            // Conclude.
            testPromises.get(evt.data.testId)[evt.data.promiseAction](evt.data.result);
        });
    }));
    return;
}

// Take advantage of a new (bikesheddable) interface to control QUnit in the worker iframe.
QUnit.convertToWorker();
window.addEventListener("message", function( evt ) {
    // Only process messages from our parent.
    if ( typeof evt.data !== "object" || evt.data.key !== workerKey ) {
        return;
    }
    // Take advantage of a new (bikesheddable) interface to get and run tests.
    const testId = evt.data.testId;
    const test = QUnit.getTest(testId);
    if ( !test ) {
        controller.postMessage({ key: workerKey, testId, error: "test not found" }, "*");
        return;
    }
    runTest(test, getAssertProxy(controller, workerKey, testId));
});

// Register ourselves.
controller.postMessage(null, "*");

@trentmwillis
Copy link
Member Author

Is it possible to load javascript, not execute tests and communicate just test examples, not js objects (let's say we have some unique identifier for each test)?

@mariokostelac yes. I hadn't really considered that approach since it potentially means a large amount of overhead for large test suites, but it could be possible. Also aligns with the example @gibson042 just provided (thanks for that!).

If we don't have to recreate state/context and just duplicate it by loading it into each worker, then the test-by-test parallelization should work.

However, I see two additional hurdles:

  1. How would other worker implementations know which assets need to be loaded? The iframe approach works because we simply reuse the test page's url. If you have a WebWorker implementation, then how does it know which scripts it should load? Or in Node, how would the child_process know which files to load?

  2. What if you want your main process to be of a different "type" than the worker process? For instance, what if you want the main process to be an web page but run your tests in a WebWorkers? You likely wouldn't be able to load all the tests into the main thread.

The first hurdle I think is solvable, but I don't really see a path forward for the second point. Let me know what y'all think!

@gibson042
Copy link
Member

I think the answer is for worker plugin authors (i.e., "us") to scrape the relevant information by whatever means makes sense for the master–worker pair (maybe URL for page–page as in my example, DOM selection for page–WebWorker, CLI args for Node–Node, etc.). And it won't work for every single test, but that's OK because this is just an optimization... we will always have the host runner, and if necessary can add worker selection metadata to avoid sending certain tests into the parallel queue.

P.S. As for the client interface, I'd like to avoid new surface are like QUnit.parallel. Because the default worker count is one, opt-in should be necessary for forcing serial execution instead of the other way around.

@trentmwillis
Copy link
Member Author

Great points. We'll have to note any possible limitations and edge cases, but I think the approached outlined here will provided a better user experience than my suggestion above. I believe this gives us a good starting point for doing an initial implementation.

@gabrielcsapo
Copy link
Contributor

gabrielcsapo commented Feb 26, 2020

Wanted to follow up as this is something that my team was interested in adding to qunit. Is qunit still interested in adding this as a core functionality? opt in via some flag seems like the most straightforward way of adding this without having unintended consequences.

@trentmwillis
Copy link
Member Author

I think there is still interest in this. I prototyped it out at one point, but never got around to tidying it up.

In my mind, if we do add this, then the default should essentially be running in parallel with 1 worker. I don't think there should be two different sets of logic.

@gabrielcsapo
Copy link
Contributor

@trentmwillis makes sense to me. Would be free to go over an RFC I have to add this feature?

@trentmwillis
Copy link
Member Author

@gabrielcsapo feel free to post anything here. We don't really have a formal RFC process.

@Krinkle Krinkle added the Type: Meta Seek input from maintainers and contributors. label Jun 25, 2020
@Krinkle
Copy link
Member

Krinkle commented Sep 15, 2020

A rough idea for how a "qunit parallel node" plugin could work. (I'd prefer this be a plugin, but we will most likely need to add a few things in core to make it work, which we track together under this ticket.)

As end-user you'd use qunit --require <our plugin> (or later .qunitrc.js with plugins: [ <our plugin> ]).

In the main process, the plugin would a use a new core hook to override the way tests are spawned.

The signature of this hook could be that it is given something to run and then return an EventEmitter. The default implementation would do what we do today, in-process. With the only difference being that we'd formalise the use of an EventEmitter between this and main EventEmitter used by the current reporter(s).

The plugin could then use the hook to instead spawn a subprocess of the same nodejs and qunit paths and make that run the test(s) in question. The plugin could also add a special environment variable to the sub process to help its copy in the child process understand that it is a child process and thus set up an IPC channel with the main one to send the expected event emitter data.

In terms of granularity, I'm not sure what the right answer is: By file, or by moduleId/testId?

By file seems the simplest. Doing so might not run optimal speed though, if some modules are much larger than others. On the other hand, it might also actually be faster in practice because it would mean the workers only need to process one test file, instead of executing all test files only to have the filter run a small subset of it.

If we go for supporting splitting by file, then the core hook would be placed before the CLI loads files, and would default to just requiring the file. The plugin would then spawn the subprocess the same as the parent, but with an environment variable to tell its child that it is a child, and the files argument replaced with just the given one.

If we go for supporting splittting by moduleId or testId, then the hook would be much later in the CLI process, around the time where we would normally start executing tests. The hook would then allow the plugin to decide whether to run the tests, e.g. based on random sampling, or hashing, or based on having organised the full set of registered into some buckets and determining whether or not the current process should run this one or not. Whatever it wants to do. The plugin would then need to communicate its decision on what to run to its child, e.g. by sharing the hash through an env variable, or by sending an JSON list of module IDs or test IDs to the child process early on. (The child process's requiring of the plugin would e.g. read this from stdin or some such and block until done, and then let the QUnit CLI do the rest, forwarding the events back up through IPC).

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: API Type: Enhancement New idea or feature request. Type: Meta Seek input from maintainers and contributors.
Development

No branches or pull requests

6 participants