-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
thread_id_value tracking issue #67939
Comments
Shouldn't this at least guarantee that a valid id is never 0? |
Why would you expect such a guarantee? We could clearly make it, but it seems a bit odd to me for us to do so... |
Such a guarantee would allow storing the equivalent of I have a case where I need to avoid some recursive operation: So, a thread starts the operation (by executing an The current implementation stores the |
I think I would personally prefer to avoid using NonZero in this API. I'd suggest manually doing so in the consumer of the API, perhaps assigning u64::MAX instead of 0 as the sentinel for None. |
Without any guarantee that a thread id cannot be |
Well, sure, I guess, though realistically it's the sort of thing that would be fine to do IMO and panic on -- you're almost certainly not going to hit that. I'm not opposed to accepting this, I guess -- would you be willing to file a PR changing the API as you've suggested? |
I always prefer a static guarantee (
Sure, I can do that. But I still don't quite understand yet why you don't like it the idea. |
It's not really that I don't like it -- I think you've convinced me that it can be useful and is pretty minimal -- but I was uncertain about making the guarantee going forward. But thinking about it more, 64 bits should be more than enough that we shouldn't need to worry about the zero being lost to us if we make the guarantee. |
As discussed in rust-lang#67939, this allows turning Option<ThreadId> into Option<NonZeroU64> which can then be stored inside an AtomicU64.
Return NonZeroU64 from ThreadId::as_u64. As discussed in rust-lang#67939, this allows turning Option<ThreadId> into Option<NonZeroU64> which can then be stored inside an AtomicU64.
Return NonZeroU64 from ThreadId::as_u64. As discussed in rust-lang#67939, this allows turning Option<ThreadId> into Option<NonZeroU64> which can then be stored inside an AtomicU64.
This API should return a |
@Amanieu We currently use a u64 I think because (a sufficiently long lived) process could spawn more than 2^32 threads, not simultaneously. And currently we guarantee that the index is unique, i.e. will not wrap. I am unopposed to changing this, but we should do so in ThreadId::new and make a decision about behavior on wrapping. |
There's value in not reusing the IDs, such as if you use them as unique identifiers for log files. EDIT: I no longer think this is a good idea. Reuse is fine, and the tradeoffs of avoiding it are worse. |
When we detect a possibly closed loop in the end user's program, we need to determine whether the current thread is the one tracing it or not. We previously allocated a chunk of memory when we started tracing and used its address as a temporary thread ID. Ideally we would use the system provided thread ID to determine this, but that turns out to be a non-portable minefield. pthreads provides the pthread_t ID type but that's notionally opaque (on OpenBSD, as far as I can tell from a quick search, it's really an opaque struct about which no guarantees are provided, though some programs such as Chrome do seem to rely on it being equivalent to a usize). Some platforms (at least Linux and OpenBSD) use the PID ID type pid_t for threads too and that's guaranteed to be a signed integer though not it seems of a guaranteed size (though on both platforms it's actually a C int, so 32 bits in practise). Rust provides a ThreadID type and an unstable "as_64" function (rust-lang/rust#67939) but that provides no guarantees about how much of the 64-bit integer space is used. The challenge we have is that whatever ID we use it must fit into (numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a 64-bit system. If we rely on the system thread IDs it feels like we're storing up a portability time bomb that might explode one day in the dim and distant future. At least for now, the (minimal) extra performance we might get from that doesn't seem worth the danger. This commit is thus a relatively simple change. Rather than allocating a malloc'd block every time we start tracing a thread, which means we have to be quite careful about freeing memory, we allocate it on thread construction. This simplifies the code slightly and (since it's an aligned address) we can feel fairly safe that it plays nicely with PHASE_TAGS. We could, if we wanted, allocate this block lazily the first time we trace but that feels like an unnecessary optimisation at this point.
When we detect a possibly closed loop in the end user's program, we need to determine whether the current thread is the one tracing it or not. We previously allocated a chunk of memory when we started tracing and used its address as a temporary thread ID. Ideally we would use the system provided thread ID to determine this, but that turns out to be a non-portable minefield. pthreads provides the pthread_t ID type but that's notionally opaque (on OpenBSD, as far as I can tell from a quick search, it's really an opaque struct about which no guarantees are provided, though some programs such as Chrome do seem to rely on it being equivalent to a usize). Some platforms (at least Linux and OpenBSD) use the PID ID type pid_t for threads too and that's guaranteed to be a signed integer though not it seems of a guaranteed size (though on both platforms it's actually a C int, so 32 bits in practise). Rust provides a ThreadID type and an unstable "as_64" function (rust-lang/rust#67939) but that provides no guarantees about how much of the 64-bit integer space is used. The challenge we have is that whatever ID we use it must fit into (numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a 64-bit system. If we rely on the system thread IDs it feels like we're storing up a portability time bomb that might explode one day in the dim and distant future. At least for now, the (minimal) extra performance we might get from that doesn't seem worth the danger. This commit is thus a relatively simple change. Rather than allocating a malloc'd block every time we start tracing a thread, which means we have to be quite careful about freeing memory, we allocate it on thread construction. This simplifies the code slightly and (since it's an aligned address) we can feel fairly safe that it plays nicely with PHASE_TAGS. We could, if we wanted, allocate this block lazily the first time we trace but that feels like an unnecessary optimisation at this point.
168: Don't create a malloc'd block each time we start tracing. r=vext01 a=ltratt When we detect a possibly closed loop in the end user's program, we need to determine whether the current thread is the one tracing it or not. We previously allocated a chunk of memory when we started tracing and used its address as a temporary thread ID. Ideally we would use the system provided thread ID to determine this, but that turns out to be a non-portable minefield. pthreads provides the pthread_t ID type but that's notionally opaque (on OpenBSD, as far as I can tell from a quick search, it's really an opaque struct about which no guarantees are provided, though some programs such as Chrome do seem to rely on it being equivalent to a usize). Some platforms (at least Linux and OpenBSD) use the PID ID type pid_t for threads too and that's guaranteed to be a signed integer though not it seems of a guaranteed size (though on both platforms it's actually a C int, so 32 bits in practise). Rust provides a ThreadID type and an unstable "as_64" function (rust-lang/rust#67939) but that provides no guarantees about how much of the 64-bit integer space is used. The challenge we have is that whatever ID we use it must fit into (numbits(usize) - numbits(PHASE_TAGS)) i.e. we have 62 bits for the ID on a 64-bit system. If we rely on the system thread IDs it feels like we're storing up a portability time bomb that might explode one day in the dim and distant future. At least for now, the (minimal) extra performance we might get from that doesn't seem worth the danger. This commit is thus a relatively simple change. Rather than allocating a malloc'd block every time we start tracing a thread, which means we have to be quite careful about freeing memory, we allocate it on thread construction. This simplifies the code slightly and (since it's an aligned address) we can feel fairly safe that it plays nicely with PHASE_TAGS. We could, if we wanted, allocate this block lazily the first time we trace but that feels like an unnecessary optimisation at this point. This is based on a suggestion from @vext01 and a comment from @bjorn3. Co-authored-by: Laurence Tratt <[email protected]>
This changes the `Debug` impl of: - `core::any::TypeId` - `core::mem::Discriminant` - `std::thread::ThreadId` - `std::time::Instant` All these types are explicitly described as "opaque" in the documentation, so they shouldn't print implementation details in their `Debug` implementation. The change for `ThreadId` can possibly be reverted in the future, once the method `thread_id_value` is stabilized: rust-lang#67939 The same goes for `Discriminant` once/if the `DiscriminantKind` trait gets stabilized. But until then, these are opaque types.
This seems to have trailed off. It sounds like there are two major approaches here: use u64 and make no guarantees, or use NonZeroU64 and guarantee that a thread ID will never be zero. I think we can reasonably guarantee that. |
I had an earlier post here I deleted b/c I was conflating two things.
I don't know how to do 2 on Linux. A) Rust could maintain its own global u64 count that it increments at thread creation, but it is only possible to hook this for threads created by std::thread. FFI libraries could create threads undetected. AFAIK there is no platform support for hooking thread creation. B) You could lazily count threads -- on request for an ID check a global C) You could have a thread local storing the current thread id, and when it's detected as not set it bumps the global count and returns it. This number won't be stable during thread teardown, b/c destruction order of thread locals isn't defined and on some platforms they get recreated if accessed after destruction. I'm not a pthreads expert by any means, mostly commenting here because if there is a way to do this it's a very nice property to have so I want to know how to pull it off ;) Could assist with implementing safe APIs that do checks to make sure you are consistently accessing an object from the same thread. |
Why use a global thread ID when we can just use the TLS base address instead?
As an example of something that specifically required a non-zero
|
My concern was whether it would be safe to commit to thread IDs being However, if there's a platform where that doesn't hold, we can always use our own, whether derived from TLS or something else. So |
I wonder if we can split this up a little bit, in the sense that it seems that one design decision dominates. IMHO that major design decision is: does Rust's std lib keep its own thread IDs? If it doesn't, then life is easy, but thread IDs are also of limited use. If it does, then we open up more use cases, but have to worry about things like: are thread IDs unique for the lifetime of a process? should they a non-zero representation? and so on. Personally I'm rather neutral, but I think the status quo is unappealing: the documentation is vague about guarantees, but if you look at the source code, you'll see that it's currently making a strong guarantee. That strikes me as dangerous: I'm pretty sure there will be code out there now making an assumption that [I must admit, I had totally forgotten this issue even existed -- I was surprised to see I'd referenced it a few months back! I raised https://github.com//pull/84083 as a result of my forgetting about this!] |
I would say thread id's are somewhat useful for approaches to thread specific storage techniques, where you want to store things in a common data structure, that is safely separated by individual thread IDs. In theory, you can do this via just passing in a uniquely incremented |
Are there any major outstanding issues now that the clarification PR #84083 was merged? |
After examining the discussion at #84083 and the associated commit it's still not obvious to me how the ID works for threads created outside Rust. @joshtriplett mentions that only being able to access the ID via |
I would like to be able to access the thread::current().id.0. IE: the private field in ThreadId. The reason for this is that I print the TID as a "{:04.4x}" as a fixed component of a trace output line for debugging multi-threaded programmes. ThreadId(1):5c1e0b80aa859:I: fs: Printing testing lists...
ThreadId(1):5c1e0b80c4e7d:D: fs: debug
ThreadId(1):5c1e0b80c7f6e:E: fs: Error what I desire is: 0001:5c1e0b80aa859:I: fs: Printing testing lists...
0001:5c1e0b80c4e7d:D: fs: debug
0001:5c1e0b80c7f6e:E: fs: Error The second field is n/s counter in hex. If I cannot access the ThreadId as u64, I will need to write my own hash and lookup for each and every trace line output. Not conducive to most efficient implementation. |
so |
It does not. The OSes may a thread's ID after it terminates (and also many OSes have multiple things that might be called the thread id). |
FWIW our documentation on this gives a guarantee that we never reuse these: https://doc.rust-lang.org/nightly/std/thread/struct.ThreadId.html
|
Would this numeric ID be a suitable basis for implementing |
A few things as a summary:
The documentation is pretty strong, which is kind of unfortunate because it paints us into a corner - the first two points together seem to say that Loose but interesting proposal that fits all of the above: maybe
The form is comes from the soon-to-be-finalized UUIDv8 spec (all data custom except for the 0b1000 version and 0b10 variant). So with something like this we get the following:
That all seems to work together almost perfectly. The downside is, of course, no atomic storage. |
I needed this the other day (specifically a nonzero id that fits in an atomic). Skimming these discussions, why not Then the representation of the full struct--where the guarantee of non-reuse is offered--can be whatever. Docs could say something like "This is intended for advanced concurrency algorithms where..." but I am terrible at writing docs which are generally useful for people not already at least somewhat familiar with the subject matter when it comes to this stuff. It seems to me that anyone who needs a non-reused integer thread id of a specific integer width knows enough to throw together the obvious solution with a static atomic counter and a thread local which matches their needs; the value of Rust stdlib in this context would be knowing how to get something as quickly as possible, since that is slow and/or branchy. |
I have been thinking about this for a while and come to a similar conclusion. It seems like applications needing uniqueness could bring their own, is there any reason this has to be in std? Maybe they need u64 uniqueness, maybe they need u128 uniqueness, maybe the IDs need to be cryptographically unpredictable for some reason, etc. If there is no reason that this couldn’t exist in a library then I don’t think we should try too hard to provide anything other than the OS thread ID. edit: I forgot we already made these guarantees |
The problem is that ThreadId is stable and offers the guarantee of process-lifetime uniqueness and that can't be relaxed without breaking anyone who, e.g., does tls themselves with a hashmap or something. The discussion there is probably dead not because it's necessarily a good idea, but because the behavior is stable for better or worse. Any relaxing there would need to be done via methods which relax the guarantee, not changing the core type. The other problem is that on a concrete level there's a case to be made in one of my current projects to use PI-futexes and it would be really nice if I didn't have to go outside stdlib just to get a Linux tid. None of my use cases for this have absolutely required it, but I'd say every 2 months or so I come back to needing a u64 (or other integer) that's unique per thread and doesn't need to be unique for the lifetime of the process. Then I give up and write my own one way or another and go well okay kinda sucks that I can only access that id from tls on the thread that generated it but I guess it's good enough o well. |
Using the stdlib doc search bar, it took me five seconds to find Please may I have a simple, standard way to get the native OS numeric thread id? I really don't think that's asking for a lot. I don't care if it's provided by this type, a plain function, a macro, whatever. I don't care about any uniqueness guarantees or promises about its value. I don't care if it's a I just want the TID. Please :-) |
That's different to what this issue is about. It would be a new API request. You may want to fill out an API Change Proposal (ACP) suggesting that: https://github.com/rust-lang/libs-team/issues/new/choose |
Unfortunately that isn't quite as straightforward as it would seem, see discussion starting at https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Printing.20thread.20ID.20on.20panic.20bikeshed/near/412207744 Still wouldn't hurt to have an ACP to get some direction there |
Oh, my apologies. I'd read in the documentation that
That discussion seems to be about getting the OS thread ID for an arbitrary thread from its pthread_t. All I am asking for is the OS TID of the current thread, i.e., gettid(), GetCurrentThreadId(). I need this to log for reference with external tools such as debuggers and profilers. But I have been directed to make a fresh proposal for that, so, sorry for the noise. |
it seems a word is missing there, was it supposed to be |
@correabuscar |
Just a heads up, for those that didn't know (like me), rust/library/std/src/thread/mod.rs Lines 1321 to 1349 in e8ada6a
it's that therefore You may try the following example(playground) just in case it changed in the future, see if it still segfaults: #![feature(thread_id_value)]
use std::alloc::{GlobalAlloc, Layout};
struct MyGlobalAllocator;
unsafe impl GlobalAlloc for MyGlobalAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
let _=std::thread::current();
//XXX: ^ infinite recursion because it's allocating, so Segmentation fault (core dumped)
//https://github.com/rust-lang/rust/blob/e8ada6ab253b510ac88edda131021d9878f2984f/library/std/src/thread/mod.rs#L1321-L1349
let ptr = std::alloc::System.alloc(layout);
ptr
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
std::alloc::System.dealloc(ptr, layout);
}
unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 {
let new_ptr = std::alloc::System.realloc(ptr, layout, new_size);
new_ptr
}
}
#[global_allocator]
static GLOBAL_ALLOCATOR: MyGlobalAllocator = MyGlobalAllocator;
fn main() {
} |
#124881 utilizes an atomic |
This is the tracking issue for the
ThreadId::as_u64
method, which casts a thread ID to the underlyingu64
.There are currently no known blockers to stabilization beyond making sure that the API indeed works for intended use cases.
Implemented originally in #67566.
The text was updated successfully, but these errors were encountered: