Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up green task spawning #12172

Merged
merged 4 commits into from
Feb 14, 2014
Merged

Conversation

alexcrichton
Copy link
Member

These commits pick off some low-hanging fruit which were slowing down spawning green threads. The major speedup comes from fixing a bug in stack caching where we never used any cached stacks!

The program I used to benchmark is at the end. It was compiled with rustc --opt-level=3 bench.rs --test and run as RUST_THREADS=1 ./bench --bench. I chose to use RUST_THREADS=1 due to #11730 as the profiles I was getting interfered too much when all the schedulers were in play (and shouldn't be after #11730 is fixed). All of the units below are in ns/iter as reported by --bench (lower is better).

green native raw
osx before 12699 24030 19734
linux before 10223 125983 122647
osx after 3847 25771 20835
linux after 2631 135398 122765

Note that this is not a benchmark of spawning green tasks vs native tasks. I put in the native numbers just to get a ballpark of where green tasks are. This is benchmark is clearly benefiting from stack caching. Also, OSX is clearly not 5x faster than linux, I think my VM is just much slower.

All in all, this ended up being a nice 4x speedup for spawning a green task when you're using a cached stack.

extern mod extra;
extern mod native;
use std::rt::thread::Thread;

#[bench]
fn green(bh: &mut extra::test::BenchHarness) {
    let (p, c) = SharedChan::new();
    bh.iter(|| {
        let c = c.clone();
        spawn(proc() {
            c.send(());
        });
        p.recv();
    });
}

#[bench]
fn native(bh: &mut extra::test::BenchHarness) {
    let (p, c) = SharedChan::new();
    bh.iter(|| {
        let c = c.clone();
        native::task::spawn(proc() {
            c.send(());
        });
        p.recv();
    });
}

#[bench]
fn raw(bh: &mut extra::test::BenchHarness) {
    bh.iter(|| {
        Thread::start(proc() {}).join()
    });
}

@alexcrichton
Copy link
Member Author

I also think that there are only 4 allocations remaining:

  • proc() for task::spawn - I think this is required no matter what
  • ~Task - we may be able to put the Task at the top of the stack rather than allocating on the heap. This would be difficult.
  • ~GreenTask - I think this could follow the same strategy as Task by allocating it at the top of the stack, but again, very difficult and dubiously worth it.
  • ~Registers - apparently this is needed to align registers to 16-bytes. This should not be necessary at all to have an allocation here.

@brson
Copy link
Contributor

brson commented Feb 11, 2014

Very promising results.

@pcwalton
Copy link
Contributor

🤘 I'd love to benchmark this against raw pthread stack spawning.

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 12, 2014
Currently, a scheduler will hit epoll() or kqueue() at the end of *every task*.
The reason is that the scheduler will context switch back to the scheduler task,
terminate the previous task, and then return from run_sched_once. In doing so,
the scheduler will poll for any active I/O.

This shows up painfully in benchmarks that have no I/O at all. For example, this
benchmark:

    for _ in range(0, 1000000) {
        spawn(proc() {});
    }

In this benchmark, the scheduler is currently wasting a good chunk of its time
hitting epoll() when there's always active work to be done (run with
RUST_THREADS=1).

This patch uses the previous two commits to alter the scheduler's behavior to
only return from run_sched_once if no work could be found when trying really
really hard. If there is active I/O, this commit will perform the same as
before, falling back to epoll() to check for I/O completion (to not starve I/O
tasks).

In the benchmark above, I got the following numbers:

    12.554s on today's master
    3.861s  with rust-lang#12172 applied
    2.261s  with both this and rust-lang#12172 applied

cc rust-lang#8341
The condition was the wrong direction and it also didn't take equality into
account. Tests were added for both cases.

For the small benchmark of `task::try(proc() {}).unwrap()`, this takes the
iteration time on OSX from 15119 ns/iter to 6179 ns/iter (timed with
RUST_THREADS=1)

cc rust-lang#11389
One of these is allocated for every task, trying to cut down on allocations

cc rust-lang#11389
Instead, use an enum to allow running both a procedure and sending the task
result over a channel. I expect the common case to be sending on a channel (e.g.
task::try), so don't require an extra allocation in the common case.

cc rust-lang#11389
Two unfortunate allocations were wrapping a proc() in a proc() with
GreenTask::build_start_wrapper, and then boxing this proc in a ~proc() inside of
Context::new(). Both of these allocations were a direct result from two
conditions:

1. The Context::new() function has a nice api of taking a procedure argument to
   start up a new context with. This inherently required an allocation by
   build_start_wrapper because extra code needed to be run around the edges of a
   user-provided proc() for a new task.

2. The initial bootstrap code only understood how to pass one argument to the
   next function. By modifying the assembly and entry points to understand more
   than one argument, more information is passed through in registers instead of
   allocating a pointer-sized context.

This is sadly where I end up throwing mips under a bus because I have no idea
what's going on in the mips context switching code and don't know how to modify
it.

Closes rust-lang#7767
cc rust-lang#11389
bors added a commit that referenced this pull request Feb 14, 2014
These commits pick off some low-hanging fruit which were slowing down spawning green threads. The major speedup comes from fixing a bug in stack caching where we never used any cached stacks!

The program I used to benchmark is at the end. It was compiled with `rustc --opt-level=3 bench.rs --test` and run as `RUST_THREADS=1 ./bench --bench`. I chose to use `RUST_THREADS=1` due to #11730 as the profiles I was getting interfered too much when all the schedulers were in play (and shouldn't be after #11730 is fixed). All of the units below are in ns/iter as reported by `--bench` (lower is better).

|               | green | native | raw    |
| ------------- | ----- | ------ | ------ |
| osx before    | 12699 | 24030  | 19734  |
| linux before  | 10223 | 125983 | 122647 |
| osx after     |  3847 | 25771  | 20835  |
| linux after   |  2631 | 135398 | 122765 |

Note that this is *not* a benchmark of spawning green tasks vs native tasks. I put in the native numbers just to get a ballpark of where green tasks are. This is benchmark is *clearly* benefiting from stack caching. Also, OSX is clearly not 5x faster than linux, I think my VM is just much slower.

All in all, this ended up being a nice 4x speedup for spawning a green task when you're using a cached stack.

```rust
extern mod extra;
extern mod native;
use std::rt::thread::Thread;

#[bench]
fn green(bh: &mut extra::test::BenchHarness) {
    let (p, c) = SharedChan::new();
    bh.iter(|| {
        let c = c.clone();
        spawn(proc() {
            c.send(());
        });
        p.recv();
    });
}

#[bench]
fn native(bh: &mut extra::test::BenchHarness) {
    let (p, c) = SharedChan::new();
    bh.iter(|| {
        let c = c.clone();
        native::task::spawn(proc() {
            c.send(());
        });
        p.recv();
    });
}

#[bench]
fn raw(bh: &mut extra::test::BenchHarness) {
    bh.iter(|| {
        Thread::start(proc() {}).join()
    });
}
```
@bors bors closed this Feb 14, 2014
@bors bors merged commit 301ff0c into rust-lang:master Feb 14, 2014
@alexcrichton alexcrichton deleted the green-improvements branch February 14, 2014 15:43
flip1995 pushed a commit to flip1995/rust that referenced this pull request Jan 25, 2024
no_effect_underscore_binding: _ prefixed variables can be used

Prefixing a variable with a `_` does not mean that it will not be used. If such a variable is used later, do not warn about the fact that its initialization does not have a side effect as this is fine.

changelog: [`no_effect_underscore_binding`]: warn only if variable is unused

Fix rust-lang#12166
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants