Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add #[thread_local] attribute #10312

Merged
merged 3 commits into from
Nov 26, 2013
Merged

add #[thread_local] attribute #10312

merged 3 commits into from
Nov 26, 2013

Conversation

thestinger
Copy link
Contributor

This provides a building block for fast thread-local storage. It does
not change the safety semantics of static mut.

Closes #10310

@huonw
Copy link
Member

huonw commented Nov 6, 2013

Is it possible to test this? Something with spawn_sched, maybe?

@thestinger
Copy link
Contributor Author

@huonw: I'm building/testing the runtime ported to it right now, I think that should be adequate if it works :).

@thestinger
Copy link
Contributor Author

It seems this is at least twice as fast as pthread_getspecific, but Rust's local_data API is bottlenecked elsewhere. A micro-benchmark getting/setting from TLS spends 2% of the time in __tls_get_addr with this or 4% in pthread_getspecific without it. It's spending nearly all of the time doing allocations.

@thestinger
Copy link
Contributor Author

#10315 is open about getting this working on Windows by switching to mingw-w64

@thestinger
Copy link
Contributor Author

cc @brson

@brson
Copy link
Contributor

brson commented Nov 6, 2013

Let's wait for further discussion on this.

@brson
Copy link
Contributor

brson commented Nov 6, 2013

What mechanism does LLVM use for TLS that makes it faster than pthread_getspecific? Is it just avoiding a function call, or is it a completely different implementation?

@brson
Copy link
Contributor

brson commented Nov 6, 2013

Why doesn't this work on windows?

@thestinger
Copy link
Contributor Author

@brson: It will work anywhere with support for C11/C++11 (they have thread_local int x = 5) or full support for the legacy __thread annotation. The support in MinGW has been historically broken based on what I could dig up on it, but it works in MinGW-w64. There's a mention of it being fixed recently in this post about LDC.

@brson
Copy link
Contributor

brson commented Nov 6, 2013

So this adds a new runtime dependency on C11, specifically __tls_get_addr.

@thestinger
Copy link
Contributor Author

@brson: It existed before C11, because there was previously a __thread annotation. In a statically linked executable, it doesn't need the function call because the linker doesn't have to be involved.

@thestinger
Copy link
Contributor Author

http://www.akkadia.org/drepper/tls.pdf

Therefore statically linked code always has exactly one TLS block. And since only
one module is ever used there is also no question about the variable offsets. Since all
thread-local variables must be contained in this one TLS block the offset is also known
at link-time.

The linker will always be able to fill in the module ID and offset and perform code
relaxations. There is no work for the startup code to except setting up the TLS block
for the initial thread. The thread library will have to do the same for newly created
threads. This is a simple task since there is exactly one initialization image.

@thestinger
Copy link
Contributor Author

At an assembly level on x86 Linux, this simply compiles to an access in the FS segment register which Linux uses for thread-local data. The linker is responsible for taking care of all the details.

@brson
Copy link
Contributor

brson commented Nov 6, 2013

This seems to automatically solve the problem of globally initializing TLS keys, which is nice. I'm pretty concerned about adding a language feature that doesn't work on all platforms though. Also, though I see this has some potential benefits over using platform TLS libraries, I don't want to get in the habit of simply exposing every LLVM feature through Rust just because it's easy. Hopefully others will offer opinions about this.

@alexcrichton
Copy link
Member

From what I could tell, it looked like LLVM pretty much fully supported this on all platforms that LLVM supports. We'll never be able to support a platform that LLVM doesn't, so my own worries about this not necessarily working everywhere were alleviated after learning this.

I personally really like this because it solves the static initialization problem, and primarily for that reason. I suppose that this could get a little weird if you're using thread_local in an environment without a dynamic linker, but in those environments I think that people are mostly aware enough to not use thread local things.

I'm still a little wary of this not being supported on windows yet. We're not only exporting this for the runtime, but also for the language as a whole. I don't think that we quite have a roadmap for migrating to mingw64 soon, so I'm not sure that we should commit to this until we have a plan for getting this to work on windows as well.

I believe that this is a lot faster than the current scheme, but I'm not entirely convinced that it's necessary to speed up these lookups. If it is the case that we see a huge improvement in runtime-related benchmark, then there may be more of an impetus for this, and perhaps in that situation we shouldn't have a thread_local annotation, but rather a this_is_the_runtime_tls_key annotation.

@thestinger
Copy link
Contributor Author

I think this does work on every platform we'd potentially want to support (x86, x86_64, x32, ARM, SPARC, AArch64, SystemZ, MIPS, PowerPC, etc.) and it does work on Windows with either MSVC++ or MinGW-w64. Our floating point support is broken on Windows due to MinGW too (#8663, #8755).

I don't really see another way to have solid support for thread-local data in freestanding Rust. If it can speed up task-local data in libstd, that's a bonus.

@thestinger
Copy link
Contributor Author

@alexcrichton: It completely works in a statically linked executable. It's just much faster without dynamic linking because it's only a normal memory access via a pointer rather than a linker feature.

@alexcrichton
Copy link
Member

I've added this to the minutes of the next meeting.

@thestinger
Copy link
Contributor Author

If it's decided that this isn't wanted, it would be nice if it could still be added behind a feature flag because I need it for rust-core's threading API.

@emberian
Copy link
Member

I think this is useful. GCC has had this for quite some time. We can warn/error if ever we port to a platform this isn't supported on. Or, just add support to LLVM.

@thestinger
Copy link
Contributor Author

@cmr: as a standard C11/C++11 feature I think it will be more portable than Rust (it's just than MinGW is awful, MinGW-w64 on the other hand has this fully working)

@pcwalton
Copy link
Contributor

I am fine with this. I'll bring it up at the meeting.

@alexcrichton
Copy link
Member

We decided today that we should add this, but usage of the attribute should be a feature-gate. Would you mind adding a feature gate for this as well (may involve a stage0 dance)

This provides a building block for fast thread-local storage. It does
not change the safety semantics of `static mut`.

Closes #10310
bors added a commit that referenced this pull request Nov 26, 2013
This provides a building block for fast thread-local storage. It does
not change the safety semantics of `static mut`.

Closes #10310
@bors bors closed this Nov 26, 2013
@bors bors merged commit a5af479 into rust-lang:master Nov 26, 2013
@thestinger thestinger deleted the thread_local branch December 7, 2013 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add language-level support for thread-local data
7 participants