Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Thread spawn hook (inheriting thread locals) #3642

Merged
merged 9 commits into from
Nov 18, 2024

Conversation

m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented May 22, 2024

@m-ou-se m-ou-se added T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-threads Threading related proposals & ideas A-thread-local Proposals relating to thread local storage (TLS). I-libs-api-nominated Indicates that an issue has been nominated for prioritizing at the next libs-api team meeting. labels May 22, 2024
@m-ou-se
Copy link
Member Author

m-ou-se commented May 22, 2024

cc @epage - You said you wanted this for your new test framework. :)

@m-ou-se m-ou-se force-pushed the thread-spawn-hook branch 2 times, most recently from dfbcd39 to de2c8b3 Compare May 22, 2024 11:57
@m-ou-se m-ou-se changed the title [RFC] Thread spawn hook [RFC] Thread spawn hook (inheriting thread locals) May 22, 2024
@m-ou-se
Copy link
Member Author

m-ou-se commented May 22, 2024

Implementation, including use in libtest: rust-lang/rust#125405

@m-ou-se
Copy link
Member Author

m-ou-se commented May 22, 2024

Demonstration:

Code
#![feature(thread_spawn_hook)]

use std::cell::Cell;
use std::thread;

thread_local! {
    static ID: Cell<usize> = panic!("ID not set!");
}

fn main() {
    ID.set(123);

    thread::add_spawn_hook(|_| {
        let id = ID.get();
        Ok(move || ID.set(id))
    });

    thread::spawn(|| {
        println!("1:     {}", ID.get());
        thread::spawn(|| {
            println!("1.1:   {}", ID.get());
            thread::spawn(|| {
                println!("1.1.1: {}", ID.get());
            }).join().unwrap();
            thread::spawn(|| {
                println!("1.1.2: {}", ID.get());
            }).join().unwrap();
        }).join().unwrap();
        thread::spawn(|| {
            ID.set(456); // <-- change thread local `ID`
            println!("1.2:   {}", ID.get());
            thread::spawn(|| {
                println!("1.2.1: {}", ID.get());
            }).join().unwrap();
            thread::spawn(|| {
                println!("1.2.2: {}", ID.get());
            }).join().unwrap();
        }).join().unwrap();
    }).join().unwrap();

    thread::spawn(|| {
        println!("2:     {}", ID.get());
    }).join().unwrap();
}
Output
1:     123
1.1:   123
1.1.1: 123
1.1.2: 123
1.2:   456
1.2.1: 456
1.2.2: 456
2:     123

@the8472
Copy link
Member

the8472 commented May 22, 2024

Could this be made more controllable through thread::Builder? We could make it Clone, modify a global default instance and new() would then clone from the default, giving users the option to modify it further (such as modifying the set of hooks) before spawning additional threads.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 22, 2024

Could this be made more controllable through thread::Builder? We could make it Clone, modify a global default instance and new() would then clone from the default, giving users the option to modify it further (such as modifying the set of hooks) before spawning additional threads.

That might make sense for the stack size, but not for the thread name. And those are (at least today) the only two settings that a Builder has.

I don't think it makes sense to allow 'modifying' the set of hooks. If you want to change/remove any hook, you need a way to identify them. Using a number or string as identifier has many issues, and using some kind of ThreadSpawnHookHandle for each of them seems a bit much, without clear use cases..

@the8472
Copy link
Member

the8472 commented May 22, 2024

The default thread name would be empty, as it is today.
And people have asked for more features in builders such as affinity or thread priorities (or hooks that can take care of those for specific threads): rust-lang/libs-team#195

I don't think it makes sense to allow 'modifying' the set of hooks. If you want to change/remove any hook, you need a way to identify them.

The current proposal provides add. With the builder we could at least have clear to opt-out for some subset of threads.
Since the closures have to be 'static anyway we could just do it based on typeID? The types could be made nameable via existential types.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 22, 2024

The current proposal provides add. With the builder we could at least have clear to opt-out for some subset of threads. Since the closures have to be 'static anyway we could just do it based on typeID? The types could be made nameable via existential types.

Why would you want to fully clear the hooks? Without knowing which hooks are registered, you can't know which things you're opting out of, so you basically can't know what clear even means. E.g. would you expect clearing these hooks to break a #[test] with threads?

I strongly believe this should just work like atexit() (global, only registering, no unregistering).

(Even if we were to allow some way to skip the hooks for a specific builder/thread, it doesn't make sense to me to have a global "clear" function to clear all the registered hooks (from the 'global builder' as you propose), because you can't really know which other hooks (of other crates) you might be removing.)

Since the closures have to be 'static anyway we could just do it based on typeID?

TypeId is not enough. Two hooks might have the same type but a different value (and different behaviour), e.g. with different captures. (Also if you register already boxed Box<dyn Fn> as hooks, they will all just have the (same) TypeId of that box type.)

@the8472
Copy link
Member

the8472 commented May 22, 2024

Why would you want to fully clear the hooks? Without knowing which hooks are registered, you can't know which things you're opting out of, so you basically can't know what clear even means.

The RFC already states that threads can be in an unhooked state through pthread spawn. And I assume some weird linkage things (dlopening statically linked plugins?) you could end up with a lot more threads not executing hooks?
So unlike atexit it should already be treated as semi-optional behavior.

Additionally I'm somewhat concerned about a tool intended for automatic state inheritance being a global. In the Enterprise Java world such kind of implicit behavior/magic automatisms occasionally get abused to do terribly bloated things which means it's great when that bloat can be limited to a smaller scope or at least be opted out of.
For similar increasingly complex inheritable process properties problems linux systems are trying to occasionally shed this kind of state (e.g. sudo vs. systemd's run0, or process spawn servers instead of fork)

In the context of a test framework I can imagine tests running in sets and then clearing global state (or at least enumerating it to print information what didn't get cleaned up) to decouple tests from each other.

E.g. would you expect clearing these hooks to break a #[test] with threads?

"break" in the sense of not doing output capture, as it already is today? Yeah, sure, if the tests need a clean environment for whatever reason that might be a side-effect.

@m-ou-se
Copy link
Member Author

m-ou-se commented May 22, 2024

as it already is today?

Today, inheriting the output capturing is done unconditionally when std spawns a thread. There is no way to disable it or bypass it for specific threads (other than just using pthread or something directly). What I'm proposing here is a drop-in replacement for that exact feature, but more flexible so it can be used for other things than just output capturing. It will work in the exact same situations.

I'm not aware of anyone doing anything specifically to avoid inheriting libtest's output capturing (such as using pthread_create directly for that purpose).

Additionally I'm somewhat concerned about a tool intended for automatic state inheritance being a global.

The destructors for e.g. thread_local!{} are also registered globally (depending on the platform). atexit() is global. All these things start with a global building block.

I agree it'd be nice to have some sort of with_thread_local_context(|| { ... }) to create a scope in which all threads spawned will have a certain context, but the building block you need to make that feature is what is proposed in this RFC.

@joshtriplett
Copy link
Member

joshtriplett commented May 22, 2024

I do think we're likely to want deregistration as well, but I think we'd want to do that via a scoped function rather than naming. I don't think it has to be in the initial version in any case.

EDIT: On second thought, I'm increasingly concerned about not having the ability to remove these hooks.

@ChrisDenton
Copy link
Member

Are there use cases other than testing frameworks in mind? Do they have needs that differ from the testing framework case?

@joshtriplett
Copy link
Member

I think hooks are going to be quite prone to potential deadlocks. That's not a blocker for doing this, just something that'll need to be documented.

@tmccombs
Copy link

tmccombs commented Jun 2, 2024

Are there use cases other than testing frameworks in mind?

I can think of a few other use cases:

  • creating a new span in a tracing framework for the new thread that uses the thread local context of the parent thread as the parent span.
  • creating a new logger in TLS with configuration based on the local logger of the parent thread
  • inheriting a value for a thread local variable across spawning a new thread could have several use cases
  • possibly recording metrics on when threads are started, although that would probably be most useful if there was some way to hook into thread destruction as well.
  • Initialize a thread local random number generator seeded by the parent thread's random number generator. I'm not sure if that is advantageous over current approaches to thread local RNGs, but it would be possible with this.

@programmerjake
Copy link
Member

  • possibly recording metrics on when threads are started, although that would probably be most useful if there was some way to hook into thread destruction as well.

TLS drop implementations could likely be used for that.

text/3642-thread-spawn-hook.md Show resolved Hide resolved
text/3642-thread-spawn-hook.md Outdated Show resolved Hide resolved
text/3642-thread-spawn-hook.md Outdated Show resolved Hide resolved
@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 24, 2024

I think hooks are going to be quite prone to potential deadlocks. That's not a blocker for doing this, just something that'll need to be documented.

It'd be quite unusual to do anything interesting with locks in a hook. They are mainly used with thread locals (e.g. a Cell) rather than globals (e.g. a Mutex).

@ian-h-chamberlain
Copy link

Are there use cases other than testing frameworks in mind? Do they have needs that differ from the testing framework case?

Just to advocate for one other use case, I think a proposal like rust-lang/libs-team#195 could be implemented in terms of a feature like this (which would be great!), but I am not sure it's possible in its current form?

Since the hook takes a &Thread, it doesn't seem like it would be possible to influence the creation of the thread using APIs like pthread_attr_setaffinity_np, pthread_attr_setschedparam, etc. — unless there were some way to influence the attributes themselves by extension APIs on std::thread::Thread.

For this reason, I'd argue in favor of something like the aforementioned ThreadSpawnHookHandle, even if it was just a thin wrapper around Thread for now, to be more extensible in the future for influencing OS-specific thread spawning behavior:

struct ThreadSpawnHookHandle {
	thread: Thread, // or &Thread ?
	// possible future extension, or a sys:: abstracted version of this:
	#[cfg(unix)]
	attrs: libc::pthread_attr_t,
}

Then it could be used with something like this:

std::thread::add_spawn_hook(|handle| {
	// Maybe something like JoinHandleExt:
	let attrs = handle.pthread_attr_t_mut();
	let my_custom_sched = todo!();
	unsafe { 
		libc::pthread_attr_setschedparam(attrs, &my_custom_sched);
	}
	|| {}
});

I think having the flexibility to add APIs that affect thread spawning in the future, whether in a standard or OS-specific way, would be worth the trouble of having the (relatively small) inconvenience of this wrapper type, if there are no other reasons &Thread is needed specifically.

@m-ou-se
Copy link
Member Author

m-ou-se commented Oct 24, 2024

@ian-h-chamberlain Yeah, that possibility is mentioned in one of the unresolved questions in the RFC:

  • Should the hook be able to access/configure more information about the child thread? E.g. set its stack size.

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 5, 2024

We discussed this in the libs-api meeting; this seems ready for FCP.

@rfcbot merge

@rfcbot
Copy link
Collaborator

rfcbot commented Nov 5, 2024

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Nov 5, 2024
@Amanieu
Copy link
Member

Amanieu commented Nov 5, 2024

We discussed this in the libs-api meeting today and we're happier with the design now that it doesn't involve global state any more.

@rfcbot fcp merge

@Amanieu Amanieu removed the I-libs-api-nominated Indicates that an issue has been nominated for prioritizing at the next libs-api team meeting. label Nov 5, 2024
@rfcbot rfcbot added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Nov 5, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 5, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Nov 5, 2024
@the8472
Copy link
Member

the8472 commented Nov 7, 2024

Since the hook takes a &Thread, it doesn't seem like it would be possible to influence the creation of the thread using APIs like pthread_attr_setaffinity_np, pthread_attr_setschedparam, etc.

You could use pthread_self, pthread_setschedparam, pthread_setaffinity_np instead. Once they're set they're inherited by the OS anyway so they don't need inheritable hooks. So a one-shot hook in the thread builder would be sufficient for that. At least on platforms that have those, but the same applies to pthread_attr_setaffinity_np, hence the _np.

@ian-h-chamberlain
Copy link

You could use pthread_self, pthread_setschedparam, pthread_setaffinity_np instead. Once they're set they're inherited by the OS anyway so they don't need inheritable hooks. So a one-shot hook in the thread builder would be sufficient for that. At least on platforms that have those, but the same applies to pthread_attr_setaffinity_np, hence the _np.

Yeah, but one of the motivations for rust-lang/libs-team#195 was to support platforms that don't allow setting these kinds of parameters for the main thread /current thread (i.e. must be set on the attributes used to spawn the thread). My main concern with the proposed API as written is that there won't be any way to extend it to support those use cases in the future, without some invasive changes to how Thread spawning works in std today.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Nov 15, 2024
@rfcbot
Copy link
Collaborator

rfcbot commented Nov 15, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@m-ou-se m-ou-se merged commit 0929b1d into rust-lang:master Nov 18, 2024
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 18, 2024

Tracking issue: rust-lang/rust#132951

@m-ou-se m-ou-se deleted the thread-spawn-hook branch November 18, 2024 16:07
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 19, 2024
…affleLapkin

Add std::thread::add_spawn_hook.

Implementation of rust-lang/rfcs#3642
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 20, 2024
Rollup merge of rust-lang#125405 - m-ou-se:thread-add-spawn-hook, r=WaffleLapkin

Add std::thread::add_spawn_hook.

Implementation of rust-lang/rfcs#3642
@m-ou-se m-ou-se restored the thread-spawn-hook branch November 20, 2024 11:27
lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Nov 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-thread-local Proposals relating to thread local storage (TLS). A-threads Threading related proposals & ideas disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.