Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests in parallel #234

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Run tests in parallel #234

wants to merge 21 commits into from

Conversation

alysbrooks
Copy link
Member

@alysbrooks alysbrooks commented Jul 21, 2021

Very much a work in progress! Mostly opening this PR so I have a place to take and share notes.

@alysbrooks
Copy link
Member Author

A few notes:
—Somewhat unusually, this is currently rebased off of Arne's cleanup branch (#233) rather than main because I thought it might benefit from some of the cleanup work and the use of Portal.
—The basic strategy is as follows: for kaocha.type/ns tests (basic Clojure tests), fire off run-tests in a new thread using future, then return the futures as normal. At the top level (the test suite), recursively search for futures and deref them. That way we essentially deref at the last possible moment (after all the tests have been kicked off but before any external code that's expecting actual test results).
—One complication is that we don't want to execute in a new thread if we're already in a worker thread. I'm not sure whether nested futures are supposed to work, but it doesn't seem to work in practice and I'm not sure it would benefit us. The approach I've taken is to modify the configuration to remove :parallel.
—I've benchmarked this on contrived tests that favor a parallel approach (very simple tests with Thread/sleep calls). This is mostly to sniff out bugs (a lot of attempts early this afternoon resulted in no speedup because Kaocha was waiting for each thread to finish before running the next one); real-world tests probably won't be this parallelizable.
—I cannot run the whole Kaocha test suite with :parallel enabled. Ideally, you'd be able to enable this option and Kaocha just ignores it for tests that don't support it, but ensuring it can handle all the corner cases in our test suite may be too ambitious for the initial PR.

@alysbrooks alysbrooks changed the title Parallelize WIP: Run tests in parallel Jul 21, 2021
@alysbrooks
Copy link
Member Author

I did a little bit more testing and realized what I have doesn't work consistently. There seems to be a nondeterministic bug with loading certain specs or vars. I don't really understand it; my intuition is that all the loading would happen before we kick off any futures so this kind of bug shouldn't really happen.

We do some fancy dynamic loading (I think jit falls into this category, too), and it's possible some of these techniques aren't thread safe.

@alysbrooks
Copy link
Member Author

Another thing to look for is places we're touching atoms or dynamic vars in code that's called within the futures. Modifying a var or atom in a secondary thread isn't inherently problematic, but it's possible we've accidentally used them in a way that makes them reliant on being modified by a single thread. (For example, two atoms won't stay in sync; you should either use one atom or add some sort of other abstraction to keep them in sync.)

@alysbrooks
Copy link
Member Author

Now it seems to be consistently giving me an error about not find the spec for :kaocha.type/var.

@alysbrooks
Copy link
Member Author

I added some retry code. I think what's happening is that there's a race condition when running require. I suppose we could create a thread-safe version of require Turns out that serialize-require is pretty easy to replicate, so I did that.

@alysbrooks
Copy link
Member Author

alysbrooks commented Aug 27, 2021

Did a slightly more realistic benchmark with 1000 simple io tests (simply slurping README.md).

with :parallel:

Executed in    9.39 secs   fish           external 
   usr time   23.17 secs    0.00 micros   23.17 secs 
   sys time    0.57 secs  418.00 micros    0.57 secs 

without :parallel:

Executed in   67.13 secs   fish           external 
   usr time   81.38 secs  569.00 micros   81.38 secs 
   sys time    0.87 secs   94.00 micros    0.87 secs 

And another that grabs random Wikipedia articles. (Only 25 tests, so as to not be rude.)

with :parallel:

Executed in    5.78 secs   fish           external 
   usr time   14.03 secs  494.00 micros   14.03 secs 
   sys time    0.42 secs   75.00 micros    0.42 secs 

without :parallel:

Executed in   20.25 secs   fish           external 
   usr time   15.82 secs  596.00 micros   15.81 secs 
   sys time    0.66 secs   83.00 micros    0.66 secs 

On the one hand, these aren't super revealing if you've seen people talk about the advantages of async and/or multithreading. On the other hand, reading files isn't too uncommon of a task for integration testing, so it's plausible you could see a significant improvement on that part of your test suite.

@alysbrooks
Copy link
Member Author

If you have a test suite that would benefit from parallelization, I'd be interested to hear from you. What tests do you have? When it's a little more polished, would you be willing to test or benchmark parallelized test runs?

It's easy enough to document how to turn on parallelized test run (set :parallel to true in tests.edn), but I think for this feature to be useful, the documentation will also have to include information on what kinds of tests benefit from it and what kinds of common tests aren't thread-safe. I'm hoping to hear from Kaocha users to help research this part of the documentation.

Having real-world tests will also let me determine whether parallelizing at the namespace level is enough (current approach) or if we need to consider working at the test level.

@nwjsmith
Copy link

I'll be able to test and benchmark in a few days. The primary reason I'd like to see test parallelization is to speed up my suite of test.check tests. Parallelization at the per-test level would have the largest impact on these, but even per-namespace should provide a speed bump. Really excited to see your work on this, thank you!

@alysbrooks
Copy link
Member Author

@nwjsmith You're welcome! I somehow blanked on test.check as a use case even though I'm a big fan of property-based testing and we use it on some of our projects, e.g., Regal. I believe test.check tests should work already (because they're gathered into namespace testables, which are parallelized), but enabling test.check tests at the test level would be another step because they're a different type.

@alysbrooks
Copy link
Member Author

alysbrooks commented Oct 19, 2021

Two updates:

You can now use the :parallel-children-exclude key to ensure that any children will not be further parallelized.

Setting a type of test whose children shouldn't be parallelized probably isn't the configuration we ultimately want but is easier to implement because currently we decide whether to parallelize something when we're given a bunch of testables. However, it does let you choose between tests at the namespace level (add :kaocha.type/ns) and those at the test level (leave it blank).

Regal's test suite in parallel passes

Our first real-world performance test:

↪ hyperfine 'clojure -A:test -m kaocha.runner clj' 'clojure -A:test-local -m kaocha.runner clj'
Benchmark #1: clojure -A:test -m kaocha.runner clj
  Time (mean ± σ):     43.235 s ±  6.081 s    [User: 76.913 s, System: 2.508 s]
  Range (min … max):   32.442 s … 54.122 s    10 runs
 
Benchmark #2: clojure -A:test-local -m kaocha.runner clj
  Time (mean ± σ):     40.678 s ± 11.089 s    [User: 83.026 s, System: 2.449 s]
  Range (min … max):   27.970 s … 57.834 s    10 runs
 
Summary
  'clojure -A:test-local -m kaocha.runner clj' ran
    1.06 ± 0.33 times faster than 'clojure -A:test -m kaocha.runner clj'

(test-local uses a local copy of Kaocha with the parallelize branch checked out.)

@alysbrooks
Copy link
Member Author

alysbrooks commented Oct 20, 2021

The previous test had high variance so I quit some resource-intensive programs and set the tests to use a fixed seed

And with variance addresed, we get a 10 percent speed up:

↪ hyperfine 'clojure -A:test -m kaocha.runner clj' 'clojure -A:test-local -m kaocha.runner clj'
Benchmark #1: clojure -A:test -m kaocha.runner clj
  Time (mean ± σ):     43.979 s ±  0.905 s    [User: 80.840 s, System: 2.393 s]
  Range (min … max):   42.434 s … 45.172 s    10 runs
 
Benchmark #2: clojure -A:test-local -m kaocha.runner clj
  Time (mean ± σ):     39.900 s ±  1.060 s    [User: 84.105 s, System: 2.453 s]
  Range (min … max):   37.989 s … 41.194 s    10 runs
 
Summary
  'clojure -A:test-local -m kaocha.runner clj' ran
    1.10 ± 0.04 times faster than 'clojure -A:test -m kaocha.runner clj'

Regal isn't the ideal example for this. Even though it has a time-consuming test suite, the majority of its time is taken up by a single test. Even though the test is a property based test that really consists of running the same simple test over a number of random inputs, from Kaocha's perspective, it's a single test.

@alysbrooks
Copy link
Member Author

I did some more tests, and parallelization's running fine! 🎉 Tests consistently show parallelization is faster (although sometimes the confidence interval overlaps with it being slower), if only a little bit faster. This is actually really good news—it means that we are not creating so much overhead that it is likely to slow down tests. I'm still not planning on turning it on by default, but it mean that people don't necessarily have to do a ton of benchmarking to ensure it won't make things worse for their test suite.

@alysbrooks
Copy link
Member Author

I noticed some cases where parallelization does indeed add overhead (I think tests ran 5 to 10 percent slower?), so I spent some time investigating. Last week, I did some benchmarking with Criterium and today I did some profiling/sampling with VisualVM to see whether part of our implementation is adding overhead and could be optimized. Profiling was a dead end (too slow), but sampling revealed that one of the methods the most time spent in was java.lang.Thread.<init>. A lot of time was spent in java.util.concurrent.AbstractExecutorService.submit (), too.

I think I should put this on hold because it could easily become a rabbit hole.

@AndreaCrotti
Copy link
Contributor

I wonder if one possible alternative solution would be to support something like https://www.juxt.pro/blog/parallel-kaocha
So circleci allows you to run tests in parallel, just by running a different subset of tests in every different container.
The more tricky thing circleci does is that it merges back the test results and present the results like tests were run normally.

Maybe supporting something like this in kaocha directly would be easier and not have the same Threading performance issues?

@alysbrooks
Copy link
Member Author

@AndreaCrotti That's a clever solution. What changes were you thinking to make? Adding the circleci-parallel namespace code to Kaocha itself?

While I am interested in that solution, I plan to go ahead with the parallel option.

@AndreaCrotti
Copy link
Contributor

So well @alysbrooks, I guess there are three things required for this to work:

  1. some way to group tests in a way that each group takes more or less the same time (which circleci does for you with from the past timing information)
  2. some code to then run in parallel multiple kaochas with the various groups of namespaces
  3. some code to get the results from all the kaochas and give a unified output.

Maybe only 3. could be done directly in kaocha, but probably the whole thing could be done with a plugin ideally.
For example, you could:

  • run all your tests without parallelization and record the timings
  • group and run all the kaochas and report everything.

One issue with the approach in this PR is that every test suite with database (or similar) writes will probably need quite a bit of adjusting to run in parallel.
This other approach I was suggesting in theory has the same issue, but if you parameterize your test resources configuration, you could also be able to just use a totally different test infrastructure for each kaocha run, and avoid the problem entirely (and it would also avoid the Threading overhead issue as well).

@AndreaCrotti
Copy link
Contributor

So well in short I guess it would be great to have this PR merged, but maybe I can investigate the other way to parallelize tests as a separate plugin as a possible alternative strategy.

@alysbrooks
Copy link
Member Author

@AndreaCrotti Interesting. I could see 2 being done with a Babashka script that we ship, and I think 3 could be a plugin, in theory. They'd be stretching the idea of plugins because the instance of Kaocha doing the combining wouldn't necessarily run any tests, it would just grab the test results and inserts them so the reporters could summarize them. Actually, a plugin that takes saved test results and feeds them back into Kaocha as though those tests just ran could be pretty interesting.

@alysbrooks
Copy link
Member Author

Threw together a simple graph of the Regal results:

20211_01_11 Regal Benchmarking

@alysbrooks alysbrooks marked this pull request as ready for review March 8, 2022 22:20
@alysbrooks alysbrooks changed the title WIP: Run tests in parallel Run tests in parallel Mar 8, 2022
@stig
Copy link
Contributor

stig commented Apr 14, 2022

I'm interested in trying this in our CI. Is a snapshot published for this branch?

Copy link
Contributor

@stig stig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited about this PR :-)

I took the liberty to add a few stylistic nitpick suggestions.

(throw+ {:kaocha/early-exit 0}))
(when (:parallel config)
(output/warn (str "Parallelization enabled. This feature is in "
"beta If you encounter errors, try "

This comment was marked as outdated.

Comment on lines +1 to +5

(ns benchmark
(:require [criterium.core :as c])
(:import [java.util.concurrent Executors ])
)

This comment was marked as outdated.

@@ -34,6 +35,7 @@
[nil "--[no-]fail-fast" "Stop testing after the first failure."]
[nil "--[no-]color" "Enable/disable ANSI color codes in output. Defaults to true."]
[nil "--[no-]watch" "Watch filesystem for changes and re-run tests."]
[nil "--[no-]parallel" "Run tests in parallel. Warning: This feature is beta."]

This comment was marked as outdated.

@@ -4,10 +4,10 @@

(defn run [testable test-plan]
(t/do-report {:type :begin-test-suite})
(let [results (testable/run-testables (:kaocha.test-plan/tests testable) test-plan)
(let [results (testable/run-testables (:kaocha.test-plan/tests testable) test-plan)

This comment was marked as outdated.

@alysbrooks
Copy link
Member Author

I added a profiling plugin you can run by enabling :kaocha.plugin.alpha/parallel-profiling to :plugins. I don't want to add too much to this PR, but it's a plugin so the risk is relatively low.

@jumarko
Copy link

jumarko commented Jul 31, 2023

Are there any plans to wrap up support for parallel test runs soon or should I resort to eftest if I need such a test runner?

@AndreaCrotti
Copy link
Contributor

Are there any plans to wrap up support for parallel test runs soon or should I resort to eftest if I need such a test runner?

In case it helps you can still get some parallelism with circleci and kaocha for example https://www.juxt.pro/blog/parallel-kaocha/

Comment on lines +260 to +264
futures (doall
(map (fn [t]
(future
(run-testable (assoc t ::thread-info (current-thread-info)) test-plan)))
testables))]
(doall (map deref futures))))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are creating an unbounded number of threads. If the test plan contains too many tests (hundreds or thousands) this will probably be a problem.

OS threads are expensive to create, and when we have a number of running threads that is significantly greater than the number of available processors, we'll also incur in a lot of context switching cost.

This probably explains what you observed in your previous comment, because for each future invocation, we're creating a new thread to be executed (via ExecutorService.submit):

I noticed some cases where parallelization does indeed add overhead (I think tests ran 5 to 10 percent slower?)

one of the methods the most time spent in was java.lang.Thread.. A lot of time was spent in java.util.concurrent.AbstractExecutorService.submit ()

One alternative we could explore instead is to create a fixed thread pool ExecutorService limited to the number of available cores.

We then can map each test to a function that runs it, and pass that to invokeAll (this works because a Clojure function implements Callable). This returns a list of Future, so we can do the (doall (map deref futures)) on it the same way we're already doing.

We're still going to create a Callable and a Future per task, but we'll limit the number of threads created and the parallelism of execution to the available number of processors.

This should avoid the overhead of too much parallelism and hopefully result in parallel tests always running faster than serial tests.

What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

(import '(java.util.concurrent ExecutorService Executors))

(defn task [num]
  (fn []
    (.println System/out num)
    (Thread/sleep 500)
    (* num num)))

(let [num-cpus (.availableProcessors (Runtime/getRuntime))
      executor (Executors/newFixedThreadPool num-cpus)
      data (range 100)
      tasks (map task data)
      futures (.invokeAll executor tasks)
      results (doall (map deref futures))]
  (println results))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do something similar in Splint:

(defn pmap*
  "Efficient version of pmap which avoids the overhead of lazy-seq.

  (doall (pmap (fn [_] (Thread/sleep 100)) coll)) => 3.34 secs
  (pmap* (fn [_] (Thread/sleep 100)) coll) => 202 ms"
  [f coll]
  (let [executor (Executors/newCachedThreadPool)
        futures (mapv #(.submit executor (reify Callable (call [_] (f %)))) coll)
        ret (mapv #(.get ^Future %) futures)]
    (.shutdownNow executor)
    ret))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't remember the default executor for future, and it appears to be newCachedThreadPool so it will save some on overhead for short-lived threads, but probably creates too many in other situations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, Clojure core's fixed thread pool uses the number of processors plus 2: https://github.com/clojure/clojure/blob/08a2d9bdd013143a87e50fa82e9740e2ab4ee2c2/src/jvm/clojure/lang/Agent.java#L49C1-L51C77

alysbrooks and others added 21 commits November 22, 2023 00:26
Add some information about the thread each test ran on. May want to move
this to something not specific to parallelization, otherwise
non-parallel tests won't have any thread info.
Internal refers to whether there were delays between running tests in
the same thread. It doesn't measure waiting for I/O or many kinds of
contention.

External refers to how much of the overall time the thread spent.

Why do I think this is useful? Well, if you've parallelized your test
suite and you end up with a thread that runs from time 0 to time 100 and
another that runs from time 1 to time 10, the burden is heavily on the
first thread, suggsting the load is not well-balanced. Both may have
good internal utilization, but the second has poor external
globalization.

Part of the reason this is WIP is that there may be better (ideally
standard) terms for "internal" and "external".

There are some calculation issues and hacks:
* Some testables are missing the ::start and ::end for some reason. I
  don't think this should happen?
* I don't think this version counts the number of threads accurately.
Clean up the changes around parallelization, make opting-in from test types
explicit, allow config on testable and metadata level, simplify future handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

9 participants