Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exchange context accesses delegate despite delegate being free'ed #10344

Open
mrjerryjohns opened this issue Oct 8, 2021 · 15 comments
Open

Exchange context accesses delegate despite delegate being free'ed #10344

mrjerryjohns opened this issue Oct 8, 2021 · 15 comments
Labels

Comments

@mrjerryjohns
Copy link
Contributor

mrjerryjohns commented Oct 8, 2021

Problem

In the latest PR that re-organizes the lifetime and allocation semantics of the CommandSender and CommandHandler objects, there is a subtle bug that it has exposed that was caught by TSAN as a use-after-free.

The PR makes it such that both objects are destroyed after they have completed processing their last message on their relevant exchanges.

Specifically for CommandSender, the following sequence ensues when handling the last message in the exchange (an invoke response):

  1. CommandSender is allocated on the heap, and a message is sent on an exchange.
  2. When a message is received on the exchange, and the OnMessageReceived function is called on the registered ExchangeDelegate. This eventually routes to the CommandSender's OnMessageReceived function.
  3. It then proceeds to 'close' the exchange by setting it to null.
  4. It calls OnDone , a callback to the application to indicate that the CommandSender object has finished doing its work and is safe to destruct the object, which the application does by eventually calling free.
  5. Unwinds from the OnMessageReceived, and at some point, because there are no more messages to be sent on this exchange, in MessageHandled, it closes the exchange.
  6. That eventually invokes the OnExchangeClosing method on the delegate, which is the CommandSender object which wait, was previously free'ed!

This unfortunately throws a wrench in the 'auto' EC closing management logic that allowed protocol logic to not have to actively close the EC, but rather, have that happen automatically by the EC management logic upon detection of a lack of further work happening on the EC.

One solution is to have the CommandSender 'null' out the delegate pointer in the EC before setting it to null in step 2 above.

The other is to perhaps entertain the idea of 'exchange handles' that wrap a bare EC pointer that can be owned by an object like the CommandSender, that will automatically decrement the ref-count as well as null out the delegate pointer on destruction of the parent object. This is kinda radical though to institute at this point.

@mrjerryjohns mrjerryjohns changed the title Exchange logic accesses delegate despite owning object being free'ed Exchange context accesses delegate despite delegate being free'ed Oct 8, 2021
@mrjerryjohns
Copy link
Contributor Author

mrjerryjohns commented Oct 8, 2021

I think this is the first time an object that 'owns' an exchange has had a shorter lifetime than the exchange itself, and was allocated on heap.

There are pool-allocated objects that do have similar 'shorter' lifetimes, but their allocation never came from the heap (i.e all of the IM pool objects are statically allocated), and, they never got constructed/destructed each and every-time (just a flag got set on them).

@mrjerryjohns
Copy link
Contributor Author

Perhaps it's worthwhile asking if invoking the delegate in the OnClose() method is necessary in such a circumstance, where the delegate itself has awareness of the fact that it just processed the last message in the exchange and has no further need for a notification of exchange closure.

@msandstedt
Copy link
Contributor

Is this a dup of #9380?

mrjerryjohns added a commit to erjiaqing/connectedhomeip that referenced this issue Oct 8, 2021
@kghost
Copy link
Contributor

kghost commented Oct 10, 2021

@msandstedt I think the mDispatcher and mDelegate are different objects.

@mrjerryjohns Instead of reset the mDelegate to nullptr, is it possible to have a do nothing delegate ? Or put the exchange into an error state to deny all future message ?

@bzbarsky-apple
Copy link
Contributor

Setting delegate to null in fact puts it into that state: it will still ack things, but do nothing else.

@msandstedt
Copy link
Contributor

I think this is the first time an object that 'owns' an exchange has had a shorter lifetime than the exchange itself, and was allocated on heap.

There are pool-allocated objects that do have similar 'shorter' lifetimes, but their allocation never came from the heap (i.e all of the IM pool objects are statically allocated), and, they never got constructed/destructed each and every-time (just a flag got set on them).

Yes, heap allocation makes it more likely sanitizers will show this bug. But even where we aren't heap allocating, we have a lot of awkwardness. For instance this:

// Defer the release for the pending Ack to be sent

In this code, OperationalDelegateProxy must wait to reuse the CASESession object until one more time around the horn with the event loop. Waiting for reuse; waiting to free: they are both the same type of problem.

It would be far better if when the sdk signaled done, it was REALLY done with the caller's memory. This stuff is very difficult to deal with otherwise. What is the utility of a 'done' callback if the sdk isn't actually done?

Setting delegate to null in fact puts it into that state: it will still ack things, but do nothing else.

My experience in #13237 is that this isn't the case. The sdk tries to ack. But if the exchange delegate has been set null, attempting to ack will hit a SIGABRT. OperationalDeviceProxy.cpp#L295 appears to admit this is currently expected. That's not great.

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Jan 6, 2022

My experience in #13237 is that this isn't the case. The sdk tries to ack. But if the exchange delegate has been set null,
attempting to ack will hit a SIGABRT.

This is the part I don't understand. What exactly is aborting? Is the problem that the delegate is null or that the dispatch pointer is dangling? The stack in #13237 sure looks to me like a deallocated this inside ExchangeMessageDispatch::SendMessage, in which case the behavior changes from #12794 might help here....

@msandstedt
Copy link
Contributor

My experience in #13237 is that this isn't the case. The sdk tries to ack. But if the exchange delegate has been set null,
attempting to ack will hit a SIGABRT.

This is the part I don't understand. What exactly is aborting? Is the problem that the delegate is null or that the dispatch pointer is dangling? The stack in #13237 sure looks to me like a deallocated this inside ExchangeMessageDispatch::SendMessage, in which case the behavior changes from #12794 might help here....

The error was 'pure virtual method called' for SendMessage. So I think I agree that 'this' was probably already freed and the vtable was nonsense when SendMessage was called.

Question: what implements the ExchangeMessageDispatch interface? In #13237, PASESession and CASESession are the Exchange Context delegates and these were freed. But on what object is SendMessage supposed to be called?

Regardless, I will check with #12794 and see if this solves my problem. It would be great if it did.

@msandstedt
Copy link
Contributor

@bzbarsky-apple ,

You are correct. #12794 unblocks this problem. Thanks for pointing me to this.

I have opened #13357 to now allow CASESession and PASESession to be freed or reused immediately on execution of their completion callbacks.

@bzbarsky-apple
Copy link
Contributor

Question: what implements the ExchangeMessageDispatch interface?

We have two implementations: ApplicationExchangeDispatch and SessionEstablishmentExchangeDispatch.

@bzbarsky-apple
Copy link
Contributor

@mrjerryjohns @msandstedt Is there still an issue here?

@msandstedt
Copy link
Contributor

@bzbarsky-apple ,

I think #12794 pretty well addressed it. Or it at least, it addressed all of the problems I was seeing.

@bzbarsky-apple
Copy link
Contributor

Talked to @mrjerryjohns.

What's left to do here is to figure out the contract between the delegate and the exchange context in terms of when OnExchangeClosing calls happen and what a delegate that is going away should be doing. This affects Abort calls (possibly made by delegates) as well as "normal" closing of an exchange that happens as the stack unwinds from a call into the delegate, if that call frees the delegate.

@stale
Copy link

stale bot commented Dec 26, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale Stale issue or PR label Dec 26, 2022
@bzbarsky-apple bzbarsky-apple removed the stale Stale issue or PR label Jan 3, 2023
@stale
Copy link

stale bot commented Jul 11, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale Stale issue or PR label Jul 11, 2023
@bzbarsky-apple bzbarsky-apple removed the stale Stale issue or PR label Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants