Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ota] Exchange leak #20565

Closed
Damian-Nordic opened this issue Jul 11, 2022 · 5 comments · Fixed by #20632
Closed

[ota] Exchange leak #20565

Damian-Nordic opened this issue Jul 11, 2022 · 5 comments · Fixed by #20632

Comments

@Damian-Nordic
Copy link
Contributor

Problem

When OTA is cancelled due inactivity timeout, the BDX transfer exchange is leaked. Apart from the possibility to run out of exchanges on constrained devices, it causes a Sleepy End Device never getting back to the idle mode, which may drain the battery.
Same thing may occur in different scenarios, such as passing an error to BDXDownloader::OnPreparedForDownload.

Proposed Solution

I tried to fix that in a stupid way to close an exchange each time we go back to the idle state, but I wasn't aware of various subtleties of the exchange management and it lead to crashes: #20563.
We need to somehow extend BDX APIs so that we can distinguish cancel events caused by the exchange layer or by external components and close the exchange in the latter cases only.

@bzbarsky-apple
Copy link
Contributor

@Damian-Nordic Where would cancel events "caused by the exchange layer" come from?

The current setup for closing exchanges is basically:

  1. Any time you close an exchange, you need to drop your reference to it.
  2. If you successfully send a message that expects no response, the exchange is closed.
  3. If you receive a message and don't tell the exchange you plan to respond, the exchange is closed.
  4. If an exchange times out, it will close unless you tell it otherwise.
  5. In any state other than while actively receiving or sending a message, it's safe to close an exchange.

In terms of BDX, I believe it always tells exchanges that it plans to respond, so item 3 is not even relevant. So this should all be a local decision in BDX code: you have to null out the exchange pointer under certain conditions....

@bzbarsky-apple
Copy link
Contributor

In particular, BDXMessenger is all sorts of broken; starting by fixing that is probably a good idea. I'll take a stab at it.

@Damian-Nordic
Copy link
Contributor Author

For example, if you take a look at BDXDownloader::EndDownload, it sends a status response and for that (and for BlockAckEOF) we don't set kExpectResponse, so in such cases we shouldn't close the exchange explicitly:

            if (!event.msgTypeData.HasMessageType(chip::bdx::MessageType::BlockAckEOF) &&
                !event.msgTypeData.HasMessageType(chip::Protocols::SecureChannel::MsgType::StatusReport))
            {
                sendFlags.Set(chip::Messaging::SendMessageFlags::kExpectResponse);
            }

But BDXDownloader::OnDownloadTimeout can be called by an external (to the exchange layer) timeout, so this should result in closing an exchange manually.

@Damian-Nordic
Copy link
Contributor Author

BDXDownloader::OnPreparedForDownload is another occurrence that may require manual closing of the exchange, because it is called before any message is sent using that exchange (I think). Also, BDXDownloader::StartDownload should probably use ExchangeHandle to release the exchange on error.

I don't guarantee there are no more places :)

@bzbarsky-apple
Copy link
Contributor

@Damian-Nordic I believe #20632 should address this..... It's your changes from #20304 plus a fix to BDXDownloader to handle the exchange lifetime correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants