-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when_all
/split
: operation state might get prematurely destroyed when child completes synchronously inside its stop callback
#300
Comments
The scenario seems plausible. I think a nice way to work around the early destruction could be to increment the number of expected completions before iterating: the iteration over the children is conceptually an outstanding task. Once that is done the count is decremented and the appropriate completion is triggered if all outstanding work is completed. |
This seems eerily similar to an issue reported in libunifex some time back: |
The standard doesn't have that problem [yet?]: it sets up a callback using |
This is an excellent suggestion, thanks! Now, looking at libunifex's implementation, this is how they fixed it as well.
Yep, seems to be the exact same issue. Thanks for the pointer! In the comments they mention Reading P3409R0 earlier, I had hopes that maybe inline bool single_inplace_stop_source::request_stop() noexcept {
...
callback->execute(callback);
state_.store(stop_requested_callback_done_state(), memory_order_release); // <<< this might access the opstate after its lifetime ended
state_.notify_one();
}
return true;
} |
It is mentioned in [exec.snd.expos]: https://eel.is/c++draft/exec#snd.expos-16 |
I didn't look there! Thanks for pointing that out. The implication is, of course, that the standard does have the problem. It may be reasonable to factor out the counting behavior and the stop callback handling into a separate entity used in relevant places: |
Here is one possible outline of a fix for the standard. It is a bit fiddly because of the "basic sender" infrastructure, but not too bad I feel.
template <class State>
struct on-stop-request {
State& state;
void operator()() noexcept { state.request_stop(); }
}; The idea is that every operation state can customize how it wants stop requests to be handled.
template<class Sndr, class Rcvr>
struct basic-state { // exposition only
basic-state(Sndr&& sndr, Rcvr&& rcvr) noexcept(see below)
: rcvr(std::move(rcvr))
, state(impls-for<tag_of_t<Sndr>>::get-state(std::forward<Sndr>(sndr), rcvr)) { }
Rcvr rcvr; // exposition only
state-type<Sndr, Rcvr> state; // exposition only
void request_stop() noexcept
requires requires { state.request_stop(rcvr); }
{
state.request_stop(rcvr);
}
}; ...so
void start() & noexcept
{
inner_ops.apply(
[this](auto&... ops) { impls_for<tag_t>::start(*static_cast<basic_state<Sndr, Rcvr>*>(this), ops...); });
}
static constexpr auto start = []<class... Ops>(auto& basic_state, Ops&... ops) noexcept -> void {
auto& [rcvr, state] = basic_state;
state.on_stop.emplace(ex::get_stop_token(ex::get_env(rcvr)), ex::on_stop_request{basic_state});
void request_stop(Rcvr& rcvr) noexcept
{
if (this->count++ == 0) {
return;
}
this->stop_src.request_stop();
this->arrive(rcvr);
}
void request_stop(Rcvr&) noexcept { sh_state->request_stop(); } ...and add a void request_stop() noexcept
{
this->inc_ref();
this->stop_src.request_stop();
this->dec_ref();
} There is a simpler way that only involves the "local" state class, but then Here is a standalone example/test demonstrating the problem for namespace tst {
template <class State>
class on_stop_request {
public:
void operator()() const noexcept { state_->request_stop(); }
public: // NOLINT enable aggregate initialization
State* state_;
};
class wait_forever_sender {
public:
using sender_concept = ex::sender_t;
using completion_signatures = ex::completion_signatures<ex::set_stopped_t()>;
template <typename Rcvr>
ex::operation_state auto connect(Rcvr&& rcvr) noexcept(
std::is_nothrow_constructible_v<std::remove_cvref_t<Rcvr>, Rcvr>)
{
return opstate{std::forward<Rcvr>(rcvr)};
}
private:
template <typename Rcvr>
class opstate {
public:
using operation_state_concept = ex::operation_state_t;
void start() & noexcept
{
on_stop_.emplace(ex::get_stop_token(ex::get_env(rcvr_)), tst::on_stop_request{this});
}
void request_stop()
{
on_stop_.reset();
ex::set_stopped(std::move(rcvr_));
}
private:
using stop_callback =
ex::stop_callback_for_t<ex::stop_token_of_t<ex::env_of_t<Rcvr>>, tst::on_stop_request<opstate>>;
public: // NOLINT enable aggregate initialization
Rcvr rcvr_;
std::optional<stop_callback> on_stop_ = {};
};
};
ex::sender auto wait_forever()
{
return wait_forever_sender{};
}
struct call_function_on_stop_receiver {
public:
using receiver_concept = ex::receiver_t;
explicit call_function_on_stop_receiver(std::function<void()>* f, ex::inplace_stop_token stoken)
: f_{f}, stoken_{stoken} { };
call_function_on_stop_receiver(call_function_on_stop_receiver&& other) noexcept
: f_{std::move(other.f_)}, stoken_{std::move(other.stoken_)}, called_{other.called_.exchange(true)} { };
call_function_on_stop_receiver& operator=(call_function_on_stop_receiver&& other) = delete;
~call_function_on_stop_receiver() { assert(called_); }
void set_value() && noexcept { std::terminate(); }
void set_error(std::exception_ptr) && noexcept { std::terminate(); }
void set_stopped() && noexcept
{
called_ = true;
(*f_)();
}
auto get_env() const noexcept { return ex::prop{ex::get_stop_token, stoken_}; }
private:
std::function<void()>* f_;
ex::inplace_stop_token stoken_;
std::atomic<bool> called_ = false;
};
} // namespace tst
int main()
{
ex::inplace_stop_source ssource{};
std::function<void()> f;
auto* op = new auto(
ex::connect(ex::when_all(tst::wait_forever()), tst::call_function_on_stop_receiver{&f, ssource.get_token()}));
f = [&] { delete op; }; // NOLINT(cppcoreguidelines-owning-memory)
ex::start(*op);
ssource.request_stop();
} |
I have implemented something akin to the proposed fix here. I plan to write that up as LWG issue together with a corresponding proposed fix. I'll post the issue here once I have created it. |
I may have stumbled across a nasty lifetime issue in the handling of stop callbacks in the
when_all
/split
algorithms. But this would apply to any algorithm using ainplace_stop_source
inside its operation state.when_all
has ainplace_stop_source
inside its operation state, and then a stop callback, like this:In
on_stop
, a stop callback is registered, which callsstop_src.request_stop()
when there is a stop request on the receiver's stop token. All child senders of thewhen_all
are registered on thestop_src
. This propagates the stop request from the receiver to all children of thewhen_all
sender.Now, what I observed is the following chain of events:
on_stop
callback callsstop_src.stop_requested()
stop_src
now iterates over its list of registered stop callbacks (those are the ones from the children) (*)set_stopped
synchronously from inside its stop callback. In my case it's from deregistering a sleep timer from an self-written epoll-based "io context", something like this:arrive()
of thewhen_all
opstate gets calledcomplete()
of thewhen_all
opstate gets calledex::set_stopped
is called, satisfying the receiver contract of thewhen_all
senderwhen_all
opstate is synchronously (!) destroyed from inside theset_stopped
callback by some follow up work.-> UB, since we are still iterating over the list of registered stop callbacks of
stop_src
! (the line marked with "*" above)I've managed to "hack around" it by doing some checking of thread id's and deferring the completion to the stop callback if I detect that the completion is called synchronously from inside a stop callback. So something like this:
...and then using
stop_callback_with_thread_id
instead ofoptional<stop_callback>
insidewhen_all
's opstate, and having acomplete()
like this:This all feels very hacky to me, though.
I haven't deeply investigated
split
yet, but I think the solution could be a bit simpler there, by using a stop callback like this instead ofon-stop-request
:...i.e. just wrapping the
request_stop()
betweeninc_ref/dec_ref
to ensure the opstate object stays alive long enough.I do wonder if there is a more elegant way to solve this issue. I don't think synchronous completions from stop callbacks should be outlawed -- it seems "natural" to me to do the
set_stopped
right inside the stop callback if possible. Or maybe synchronous destruction from inside theset_stopped
completion ofwhen_all
is the problem? I've thought that you have to assume the lifetime of the opstate may end when calling the completion, though.The text was updated successfully, but these errors were encountered: