Speed up green task spawning #12172

alexcrichton · 2014-02-11T00:36:38Z

These commits pick off some low-hanging fruit which were slowing down spawning green threads. The major speedup comes from fixing a bug in stack caching where we never used any cached stacks!

The program I used to benchmark is at the end. It was compiled with rustc --opt-level=3 bench.rs --test and run as RUST_THREADS=1 ./bench --bench. I chose to use RUST_THREADS=1 due to #11730 as the profiles I was getting interfered too much when all the schedulers were in play (and shouldn't be after #11730 is fixed). All of the units below are in ns/iter as reported by --bench (lower is better).

	green	native	raw
osx before	12699	24030	19734
linux before	10223	125983	122647
osx after	3847	25771	20835
linux after	2631	135398	122765

Note that this is not a benchmark of spawning green tasks vs native tasks. I put in the native numbers just to get a ballpark of where green tasks are. This is benchmark is clearly benefiting from stack caching. Also, OSX is clearly not 5x faster than linux, I think my VM is just much slower.

All in all, this ended up being a nice 4x speedup for spawning a green task when you're using a cached stack.

extern mod extra;
extern mod native;
use std::rt::thread::Thread;

#[bench]
fn green(bh: &mut extra::test::BenchHarness) {
    let (p, c) = SharedChan::new();
    bh.iter(|| {
        let c = c.clone();
        spawn(proc() {
            c.send(());
        });
        p.recv();
    });
}

#[bench]
fn native(bh: &mut extra::test::BenchHarness) {
    let (p, c) = SharedChan::new();
    bh.iter(|| {
        let c = c.clone();
        native::task::spawn(proc() {
            c.send(());
        });
        p.recv();
    });
}

#[bench]
fn raw(bh: &mut extra::test::BenchHarness) {
    bh.iter(|| {
        Thread::start(proc() {}).join()
    });
}

alexcrichton · 2014-02-11T00:38:55Z

I also think that there are only 4 allocations remaining:

proc() for task::spawn - I think this is required no matter what
~Task - we may be able to put the Task at the top of the stack rather than allocating on the heap. This would be difficult.
~GreenTask - I think this could follow the same strategy as Task by allocating it at the top of the stack, but again, very difficult and dubiously worth it.
~Registers - apparently this is needed to align registers to 16-bytes. This should not be necessary at all to have an allocation here.

brson · 2014-02-11T02:05:14Z

Very promising results.

pcwalton · 2014-02-11T02:48:55Z

🤘 I'd love to benchmark this against raw pthread stack spawning.

Currently, a scheduler will hit epoll() or kqueue() at the end of *every task*. The reason is that the scheduler will context switch back to the scheduler task, terminate the previous task, and then return from run_sched_once. In doing so, the scheduler will poll for any active I/O. This shows up painfully in benchmarks that have no I/O at all. For example, this benchmark: for _ in range(0, 1000000) { spawn(proc() {}); } In this benchmark, the scheduler is currently wasting a good chunk of its time hitting epoll() when there's always active work to be done (run with RUST_THREADS=1). This patch uses the previous two commits to alter the scheduler's behavior to only return from run_sched_once if no work could be found when trying really really hard. If there is active I/O, this commit will perform the same as before, falling back to epoll() to check for I/O completion (to not starve I/O tasks). In the benchmark above, I got the following numbers: 12.554s on today's master 3.861s with rust-lang#12172 applied 2.261s with both this and rust-lang#12172 applied cc rust-lang#8341

The condition was the wrong direction and it also didn't take equality into account. Tests were added for both cases. For the small benchmark of `task::try(proc() {}).unwrap()`, this takes the iteration time on OSX from 15119 ns/iter to 6179 ns/iter (timed with RUST_THREADS=1) cc rust-lang#11389

One of these is allocated for every task, trying to cut down on allocations cc rust-lang#11389

Instead, use an enum to allow running both a procedure and sending the task result over a channel. I expect the common case to be sending on a channel (e.g. task::try), so don't require an extra allocation in the common case. cc rust-lang#11389

Two unfortunate allocations were wrapping a proc() in a proc() with GreenTask::build_start_wrapper, and then boxing this proc in a ~proc() inside of Context::new(). Both of these allocations were a direct result from two conditions: 1. The Context::new() function has a nice api of taking a procedure argument to start up a new context with. This inherently required an allocation by build_start_wrapper because extra code needed to be run around the edges of a user-provided proc() for a new task. 2. The initial bootstrap code only understood how to pass one argument to the next function. By modifying the assembly and entry points to understand more than one argument, more information is passed through in registers instead of allocating a pointer-sized context. This is sadly where I end up throwing mips under a bus because I have no idea what's going on in the mips context switching code and don't know how to modify it. Closes rust-lang#7767 cc rust-lang#11389

These commits pick off some low-hanging fruit which were slowing down spawning green threads. The major speedup comes from fixing a bug in stack caching where we never used any cached stacks! The program I used to benchmark is at the end. It was compiled with `rustc --opt-level=3 bench.rs --test` and run as `RUST_THREADS=1 ./bench --bench`. I chose to use `RUST_THREADS=1` due to #11730 as the profiles I was getting interfered too much when all the schedulers were in play (and shouldn't be after #11730 is fixed). All of the units below are in ns/iter as reported by `--bench` (lower is better). | | green | native | raw | | ------------- | ----- | ------ | ------ | | osx before | 12699 | 24030 | 19734 | | linux before | 10223 | 125983 | 122647 | | osx after | 3847 | 25771 | 20835 | | linux after | 2631 | 135398 | 122765 | Note that this is *not* a benchmark of spawning green tasks vs native tasks. I put in the native numbers just to get a ballpark of where green tasks are. This is benchmark is *clearly* benefiting from stack caching. Also, OSX is clearly not 5x faster than linux, I think my VM is just much slower. All in all, this ended up being a nice 4x speedup for spawning a green task when you're using a cached stack. ```rust extern mod extra; extern mod native; use std::rt::thread::Thread; #[bench] fn green(bh: &mut extra::test::BenchHarness) { let (p, c) = SharedChan::new(); bh.iter(|| { let c = c.clone(); spawn(proc() { c.send(()); }); p.recv(); }); } #[bench] fn native(bh: &mut extra::test::BenchHarness) { let (p, c) = SharedChan::new(); bh.iter(|| { let c = c.clone(); native::task::spawn(proc() { c.send(()); }); p.recv(); }); } #[bench] fn raw(bh: &mut extra::test::BenchHarness) { bh.iter(|| { Thread::start(proc() {}).join() }); } ```

no_effect_underscore_binding: _ prefixed variables can be used Prefixing a variable with a `_` does not mean that it will not be used. If such a variable is used later, do not warn about the fact that its initialization does not have a side effect as this is fine. changelog: [`no_effect_underscore_binding`]: warn only if variable is unused Fix rust-lang#12166

alexcrichton added 4 commits February 13, 2014 20:29

Don't allocate in LocalHeap::new()

aaead93

One of these is allocated for every task, trying to cut down on allocations cc rust-lang#11389

bors closed this Feb 14, 2014

bors merged commit 301ff0c into rust-lang:master Feb 14, 2014

alexcrichton deleted the green-improvements branch February 14, 2014 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up green task spawning #12172

Speed up green task spawning #12172

alexcrichton commented Feb 11, 2014

alexcrichton commented Feb 11, 2014

brson commented Feb 11, 2014

pcwalton commented Feb 11, 2014

Speed up green task spawning #12172

Speed up green task spawning #12172

Conversation

alexcrichton commented Feb 11, 2014

alexcrichton commented Feb 11, 2014

brson commented Feb 11, 2014

pcwalton commented Feb 11, 2014