-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagating context information to child-tasks and remote calls? #35757
Comments
C.f. #34543 |
Thanks for opening the issue. I'm actually against copying const var1 = ContextVar(:var1, 42) # with default
@contextvar var1 = 42 # possible sugar
const var2 = ContextVar{Lockable{Vector{Int}}}(:var2) # without default, with type
@contextvar var2::Lockable{Vector{Int}} # possible sugar
# Lockable from https://github.com/JuliaLang/julia/pull/34400
const var3 = ContextVar(:var3) # equivalent to ContextVar{Any}(:var3)
@contextvar var3 # possible sugar
function f()
@sync @async @show var1[] # var1[] => 42
var1[] = 0
@sync @async @show var1[] # var1[] => 0
setting(var1 => 1) do
@show var1[] # var1[] => 1
end
@show var1[] # var1[] => 0
end In contrast to
We may want to use something like HAMT (as in PEP 567) for the context storage. A possible/reference implementation of struct ContextVar{T}
name::Symbol # only for human readability
key::UUID # (for example)
default::T # or `Union{Some{T},Nothing}`?
has_default::Bool
end
getindex(var::ContextVar{T}) where {T} =
if var.has_default
get(task_local_storage(:CONTEXT), var.key, var.default)
else
task_local_storage(:CONTEXT)[var.key]
end :: T |
Here is a proof-of-concept and its usage: https://github.com/tkf/ContextVariables.jl/blob/ebc2b550e2177c4ebc059b6974f1ea36309d7f4f/test/runtests.jl#L14-L29 |
Oh, nice! Will this work across remote-call boundaries? Maybe use |
Yeah,
I think it'd work with local context variables (i.e., captured in closures) but not with global const context variables. But I think it's possible in principle. We need to tweak Distributed (or you need to propagate it manually) though. Notes: Global const doesn't work because I'm using |
Yes we need something better here, thanks again for opening this issue! I feel an alternative name for this issue could be "taking dynamic scope seriously" ;-) Though perhaps "context variable" is a more straightforward name for this. If we somehow had efficient codegen for context variables, there's a lot of interesting language facilities which can be improved based on this. For example
At #35690 (comment) I wondered about one possible way to specialize code based on context variables. As noted in the OP, specializing a whole call stack based on the presence/type of a context variable is just really heavy handed. But on the other hand, I feel like specializing a leaf function (or a few innermost inlined frames) on a context var could be very powerful, and possibly of acceptable cost. Getting this idea to be sound and practical within inference seems quite tricky. I guess it would require a context calling convention to allow inference to reason about the innermost frames in a systematic way. If it actually panned out, I'm imagining the compiler could hoist context variable access out of innermost loops and across inlined function calls. Kotlin seems to have some interesting APIs around context and coroutines (and they have taken structured concurrency seriously!) so we might learn something from examining that closer https://kotlinlang.org/docs/reference/coroutines/coroutine-context-and-dispatchers.html |
I guess it would depend on how deep we want to take it. Context attached to tasks could be done with just a few small, lightweight changes. I think that would cover many use cases already. Context at a deeper level, propagating through all function calls will come with a higher cost, I guess - but there are quite a few uses cases that would need that, of course, though I think less concerned with scheduling and resource allocation questions. |
@c42f What do you think about the design of Also, I don't think it's possible to eliminate some dynamic typing. Since we need to allow arbitrary value types, and we can't specialize for the whole set of the context variables currently set (i.e., equivalent to using
I also thought about supporting the deterministic parallel RNG. It probably is possible via implementing some kind of hook system via |
Well, I like it! It's simple and can be implemented right away with only some small changes to the runtime. My comments above about effect systems, etc, are all pretty much pie-in-the-sky speculation at this stage, though it would be nice to have a rough feeling for where things might go. So anyway, if we provided |
That's actually something I already have in BAT.jl https://github.com/bat/BAT.jl/blob/master/src/rngs/rng_init.jl This provides reproducible random numbers for hierarchical parallel applications, via hierarchical partitioning of counter-based RNGs . I'll make that more widely available via ParallelProcessingTools.jl. It's just in BAT.jl currently because I needed it fast and didn't have time to do a nice API at the time. I'll get on it. |
Context variables would be super-useful to propagate those RNG partitions! |
I think the core problem is the cost of the key lookup. I believe it would be much too slow for something like RNG. |
Well, I guess in many use cases, the task would get it's RNG once, and then pass it on to the functions it calls as an explicit parameter. And then pass partitions of that RNG to sub-tasks, when they spawn. I really wasn't trying to go for a full resource-injection solution with this issue, just something on the task-level for low/medium-frequency access. The RNG-story is a bit different from the worker-availability story, though. Which RNG is used for which part of the computation shouldn't depend on number or workers/threads/tasks, for reproducibility, so that often may need to be handled explicitly. |
@c42f Yeah, I do like pie-in-the-sky speculations and am very interested in effect systems! Also, I think context variable API helps implementing POC effect handlers, even though it won't be as efficient as we'd want. Some effect handlers don't require too much optimizations when the overhead of the handler itself is large (e.g., a process+thread pool abstracting Distributed and Threads). I think it'd be nice to have a building block for playing with effect handler interfaces and assessing programming experience with it, before start thinking about optimizing the hell out of it (though it's also important to think about compiler-friendly interface at the same time). |
@oschulz It's awesome that you already have counter-based RNGs integrated into a parallel program! But don't you need to do something at
If so, don't you need to create custom |
Yeah. To make this fast enough I guess we need to be able to hoist the load of the |
Not for the RNG, no, because all functions I use take an RNG as an explicit argument. Since this is pretty much a standard in most of the Julia ecosystem, it's easy to do that consistently, also with the third-party packages I use. In the beginning, I also had explicit function arguments for resources like threads in BAT.jl, but that was unwieldy and I got rid it when partr came around. Like I wrote, RNG distribution/partitioning can't always be automatic if the computation should be reproducible independing of parallel execution strategy. But it would still be great to have the option of propagating the RNG via context - user's don't always like to have to pass the RNG explicitly - but at points in between, algorithms that distribute computation will need to do some explicit RNG handling/partitioning. |
Hmm.... I'm not sure if I understand. Let's consider this snippet: @contextvar state = 0
function demo()
@sync begin
@async println("at task1: ", state[])
@async println("at task2: ", state[])
state[] += 1
println("at task0: ", state[])
end
end Then calling
in some order. If
|
I think there are two design questions: (1) Should Distributed handle it automatically? That is to say, should context variable values be restricted to serializable values? It may be handy to put non-serializable states like files and locks in it so I don't think we should add this restriction. But this would make it impossible to implement automatic handling by Distributed. (Note that users still can implement context propagation by wrapping Distributed API and manually propagating known-to-be-safe context variables.) A semi-automatic way may be to add an overloadable function API (2) Should it be possible to list context variables currently set? With the current design (only storing a mapping My preference is to add the context variable API without these features and see if we can get away without implementing them. |
OK so here is a full set of API I'd propose https://tkf.github.io/ContextVariables.jl/dev/. It includes a quick tutorial. |
Oh, yes - we will of course, in general, need variants of With my partitioned RNGs, the situations is a bit different, though. An RNGPartition can be passed on as it is, though, because the receiving end will instantiate the actual RNG (e.g. via However, parts of the calculation that are not aware of RNGs (e.g. some 3rd party package), should pass on the RNG or RNG partition unchanged automatically, if possible.
I think yes. The idea would be that a hierarchical calculation will, at certain points, need to control how resources (workers, RNG, etc.) are to be partitioned |
I haven't had time to read all the implementation yet, but the API looks nice. I particularly like the section on data races. I think you've nailed the correct semantics there: the runtime must ensure that the ContextVar get/set is data race free between Tasks, but the use of any context var values will need to be made threadsafe by the user. For implementation of storage we could add an extra
This seems like it can be worked around by the application author but it's going to be ugly; in principle they can make a big list of It's a tricky problem because in general remote calls can't know which of the context vars will actually be used. So should it be an error to have a contextvar installed which can't be serialized, but which won't be used on the remote side? I would think not. Perhaps it could be made to work by allowing the remote call and sending all context vars, but with the value of any non-serializable context var poisoned so that any use of it on the remote side will result in an exception.
I agree we might be get away without this. The proposed API seems to be one where module authors are likely to create a On the other hand, it would be really useful to be able to list the attached |
@oschulz Yeah, I agree it'd be nice to be automatic for these cases. But, as I mentioned, it has some undesirable consequences. For example, some objects like files and locks can't cross process boundaries. It'd be problematic if a user accidentally put huge arrays in the context variable. (Another way may be to use
@c42f Yeah, I actually have already implemented it like this :) https://github.com/tkf/julia/commits/ctxvars
One solution may be to make
Yeah, having an optional lookup table for debugging sounds like a good idea. |
I opened PR #35833 to add this API so that it'd be easier to comment on implementation/design specific to the API I'm proposing. |
I'm not so sure that we should directly inject variables into the scope of the child-tasks. There may be name clashes and it's a potential security issue too - mainly I'm worried about name conflicts though. I think it would be better if context variables were retrieved with an explicit mechanism. |
Hm, I would hope those problems can be overcome somehow - if only by the user being reasonable. However, for the original scope of this issue - propagation of available workers - we'd most certainly want fully automatic propagation, so the the information is not lost when using packages that don't use the context-API to spawn local/remote tasks. |
Thanks for having a look into the API docs.
Perhaps the documentation should be fixed to emphasize this but there will be no name clashes, by design. (Unless you can invoke a collision of UUIDs.)
The set of available workers is a very dynamic information and I don't think propagating it via "static" mechanism like context variable is a good idea. It would mean that a function doesn't get any update after it is called via This is why @c42f and I are discussing the effect system here. Task scheduler is a special case of the effect handler (#33248 (comment)) and we need dynamic scoping to implement this (or rather a small subset of effect handler that is still enough for task scheduler). Context variable just provides dynamic scoping and some effect handlers can be implemented on top of it. [*1] I guess it would need to implement something like the work-stealing approach on top of Threads and Distributed. |
There's actually precedent for task inheriting information from each other: Tasks do inherit |
Sure, of course! I was just wondering it we should split this into two different issues, or if all use cases can be covered by a common mechanism. I think that would be possible (and preferable) if we do propagate across task and remote-call boundaries. |
We could always check if the information is "too big" to be forwarded via remote call. |
It's not possible and not preferable because:
I have proposed various solutions to this problem. I think the discussion would be more productive if you explain why they don't work. |
Uh, maybe we misunderstood each other: I'm with you for rejecting things that can't be forwarded, requiring big objects to be wrapped in a Maybe this was a misunderstanding on my side, I had the impression that you didn't want non-explicit propagation to tasks and remotes at all anymore, because you wrote "I don't think automatic propagation to remote workers is reasonable". And I was wondering how many use case would still work if context had to be propagated explicitly - since the code that distributes work to tasks and remotes (say, Transducers :-) ) will often not know about the semantics of the whole context. I would assume that in the future, we'll have more and more automatic/transparent multi-threaded and also multi-process code execution. So the code that "declares" the context, and the code that "consumes" the context will often not be aware of the task/remote-call barrier in between, and may not share stack. But the code that does the parallelization (say, a multi-threaded broadcast implementation) will not know/care about what's in the context - except for the parts of the context that control parallelization. So that's why I think context must, in principle, always be automatically forwarded to spawned tasks and to remote calls. But of course we can reject/filter certain types of content, resp. require them to be wrapped appropriately - context should, in my opinion, not be abused as a data store for substantial amounts of data, and that should be discouraged. |
I guess a clean way around that is to only allow context to refer to immutable values (we do have an immutable array type somewhere, don't we?). If we do that, copies can be made as necessary, transparently, without affecting semantics. |
Yes this is exactly the reason that some portion of the context needs to be propagated automatically
In general (2) is not the end user's top level application code and there's a good chance that (1) might also not be. So they definitely need to be decoupled. I think this is why @tkf was suggesting the |
I guess in most (sane) cases, the entries of the context would be fairly small and immutable structures anyhow. I guess we can be fairly rigorous and filter everything out that can't be propagated automatically. Maybe we should actually reject everything we don't "like" during context creation/assignment already, to avoid surprises to the user later on? |
@oschulz Thanks for the clarification. Indeed, it looks like we have a different notion of "automatic", "explicit", etc. To clarify, I've been using "explicit" to mean that the user does something beyond the standard context variable declaration. This is maybe some kind of declarative API to tell Distributed.jl to propagate certain context variables for every remote-call (opt-in). Or, maybe just using a low-level API to reset the remote context (manual). If what you mean by "automatic" is what I mean by "opt-in", yes, we are actually on the same page. Perhaps I should have mentioned "unconditional propagation is not reasonable" instead of "automatic propagation is not reasonable." Concretely, I think it is reasonable to have API something like @contextvar PROCESS_LOCAL_CONTEXT = 1 # not forwarded to remote
@shared_contextvar GLOBAL_CONTEXT = 1 # automatically forwarded to remote (I think it kind of makes sense to call them
This is the point I'm still strongly against as I explained in the last comment #35757 (comment). We should make API explicit and easy to understand and manipulate. Something implicit will cause trouble.
@c42f Yeah, I think that's possible. It can be an option to |
I fully agree. I think it's perfectly fine to expect the user to declare a context variable that is to be propagated automatically in a certain way, and to restrict it to certain types of content. It's certainly good to let the user control over what should be restricted to the current process, and what should propagate beyond. The "local" |
Nice to know that we are on the same page!
I commented this in the other issue #35833 (comment) |
Nice, I think this is the way to go. My inclination is to have options to |
Right, it makes sense.
I'm glad that you are shooting for this! It'd make it possible to implement something like parallel RNG completely in the user space.
BTW, regarding the lookup overhead, we can store |
Speaking of parallel RNG, it'd require some thing like @contextvar RNG_STATE = RNGState(...) where @contextvar IMMUTABLE_POINTING_TO_MUTABLE = Ref(thing) How does it interact with the optimization you have in mind? Is it that it's important to make the default get-only? |
I think the point here is that context should behave like implicit arguments to child functions from the point of view of the compiler. Then the user has a choice to make context immutable or not as necessary, and the compiler can reason about the values of these variables locally in the same way as normal function arguments. For context like |
Thanks for the explanation. It makes sense. |
Yes exactly! I feel like whatever we come up with here should be able to support both #34852 and logger context in user space with excellent efficiency. If not, we won't really have solved two of the key use cases. Regarding mutable context, it basically has to be cloned (at the vary least) in the parent task prior to |
Right, I think it's also important to mutate the state in the current context upon t1 = @spawn ...
t2 = @spawn ... we have identical RNG state in |
RNGs have actually been on my mind as a potentially very important use case for contexts. While we often forward RNG via an explicit parameter (and should), sometimes (e.g. in for a likelihood function that happens to need an RNG internally, but just takes the model parameters as it's input) it would be great to pass it on via context. For parallel applications, I usually use a counter-based RNG, so that I can use a common seed and partition the random space in a hierarchical fashion - that does require semantic knowledge, but should be easy to do using the proposed |
I think that's the ideal way to define it, semantically. |
EDIT: Oh, except i've just realized that this thread is about communication across distributed tasks, not necessarily about communication across multithreaded tasks? Or is it covering both? We've wanted something like this for a while too! :) Thanks for opening the issue and discussing it! 👍 Since nothing like this exists right now, we've been toying with the (dirty) idea of (ab)using the logger to get a context that passes to child tasks, given that we currently do pass the logger through to child tasks. For example, we were considering a thread-aware tracing framework that does something like this, even though it's clearly terrible: using Logging
struct TraceLogger <: AbstractLogger
span_name::String
parent_logger::AbstractLogger
end
Logging.min_enabled_level(tl::TraceLogger) = Logging.min_enabled_level(tl.parent_logger)
Logging.shouldlog(tl::TraceLogger,args...) = Logging.shouldlog(tl.parent_logger, args...)
Logging.handle_message(tl::TraceLogger,args...) = Logging.handle_message(tl.parent_logger, args...)
ancestor_trace(::Any) = ""
ancestor_trace(tl::TraceLogger) = "$(tl.span_name), $(ancestor_trace(tl.parent_logger))"
function with_span(f, name)
with_logger(TraceLogger(name, Logging.current_logger())) do
info = @timed f()
@info "Finished $(ancestor_trace(Logging.current_logger())): $(info.time)"
return info.value
end
end julia> with_span("a") do
@sync begin
@async begin
with_span("b") do
@info "HI"
end
end
@async begin
with_span("C") do
@info "BYE"
end
end
end
end
[ Info: HI
[ Info: BYE
[ Info: Finished b, a, : 0.052175791
[ Info: Finished C, a, : 0.000226425
[ Info: Finished a, : 0.101278193
Task (done) @0x00000001136d58d0 :) Anyway, yeah, sign us up as another interested party! |
Definitely both, also across local tasks. And since my original proposal, @tkf, @c42f and @JeffBezanson have taken this idea even further than I had envisioned originally - something that could possibly also be used in time-critical code. I think this could potentially become an extremely powerful mechanism. |
In #50958 I made the semi-intentional decision not to address the remote-call part of this proposal. I have started a small prototype for snapshotting in vchuravy/ScopedValues.jl#6, but I won't be including this into the Base proposal for now. The implementation thereof can live in a package. (we might also be able to implement a "RemoteScope" on-top of local scopes, but I haven't thought to hard about this). |
Would be great to have "remote-enabled" scoped eventually, I think (if possible), to avoid hard-to-predict behavior in cases where remote operation is transparent to the user. |
The issue is that copying a scope is a very heavy operation. It's something I decidedly wouldn't want to do on every rpc. Now a framework should be able to define that it wants to propagate scope with its rpc and snapshot only relevant pieces. I certainly wouldn't want to send CuContext or CuDevice automatically across the wire. |
Hm, yes, there is that ... hard to control how much people will put in there. |
Seems covered by #50958 for the main case above of |
@tkf and me have been discussing ways to propagate information about available workers (or resources in general) in distributed hierarchical computations:
https://discourse.julialang.org/t/propagation-of-available-assigned-worker-ids-in-hierarchical-computations
Adding resource arguments to every function call would be impractical, and using Cassette & friends to add add context by rewriting the whole computation would be very heavy-handed, since it might tough large code stacks (and may also be not be a complete solution when remote calls are involved?).
Could we add something like a
context
field toTask
, in addition to thestorage
field - with the difference thatcontext
is automatically propagated to spawned tasks and remote calls? Adding the possibility to propagate a context through a computation in this fashion could also be useful for other use cases too, I think.The text was updated successfully, but these errors were encountered: