Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle a lost standalone ack correctly. #9895

Merged
merged 2 commits into from
Sep 28, 2021

Conversation

bzbarsky-apple
Copy link
Contributor

If a standalone ack gets lost and then there is an application-level
reply to the message that triggered the standalone ack, we want to
piggyback an ack on that application-level reply as well. Otherwise
the other side can end up in a state where it has two outstanding
unacknowledged messages (the message we are replying to, and the reply
to our reply), which it can't handle.

The other change here is to fix ReliableMessageMgr to not crash if the
peer misbehaves and skips sending that piggyback ack described above.
Instead, we just error out from sending the response to the peer's
broken message.

Fixes #9796

Problem

See above.

Change overview

See above.

Testing

New unit test passes.

If a standalone ack gets lost and then there is an application-level
reply to the message that triggered the standalone ack, we want to
piggyback an ack on that application-level reply as well.  Otherwise
the other side can end up in a state where it has two outstanding
unacknowledged messages (the message we are replying to, and the reply
to our reply), which it can't handle.

The other change here is to fix ReliableMessageMgr to not crash if the
peer misbehaves and skips sending that piggyback ack described above.
Instead, we just error out from sending the response to the peer's
broken message.

Fixes project-chip#9796
@todo
Copy link

todo bot commented Sep 22, 2021

A test that we should have but can't write with the existing

* TODO: A test that we should have but can't write with the existing
* infrastructure we have:
*
* 1. A sends message 1 to B
* 2. B is slow to respond, A does a resend and the resend is delayed in the network.
* 3. B responds with message 2, which acks message 1.
* 4. A sends message 3 to B
* 5. B sends standalone ack to message 3, which is lost
* 6. The duplicate message from step 3 is delivered and triggers a standalone ack.
* 7. B responds with message 4, which should carry a piggyback ack for message 3
* (this is the part that needs testing!)


This comment was generated by todo based on a TODO comment in b31a9c9 in #9895. cc @bzbarsky-apple.

@github-actions
Copy link

Size increase report for "gn_qpg-example-build" from a62ee31

File Section File VM
chip-qpg6100-lighting-example.out .text 16 16
Full report output
BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-qpg6100-lighting-example.out.map and ./pull_artifact/chip-qpg6100-lighting-example.out.map:

BLOAT EXECUTION FAILED WITH CODE 1:
bloaty: unknown file type for file './pull_artifact/chip-qpg6100-lighting-example.out.map'

Comparing ./master_artifact/chip-qpg6100-lighting-example.out and ./pull_artifact/chip-qpg6100-lighting-example.out:

sections,vmsize,filesize
.debug_info,0,1837
.debug_loc,0,597
.debug_ranges,0,64
.debug_abbrev,0,55
.symtab,0,32
.debug_frame,0,16
.text,16,16
.debug_aranges,0,8
.shstrtab,0,-1
.debug_line,0,-16
[Unmapped],0,-16
.strtab,0,-87
.debug_str,0,-293


@github-actions
Copy link

Size increase report for "nrfconnect-example-build" from a62ee31

File Section File VM
chip-shell.elf text 72 72
chip-shell.elf rodata 56 56
chip-shell.elf device_handles -8 -8
chip-lock.elf text 72 72
chip-lock.elf rodata 56 56
chip-lock.elf device_handles -8 -8
Full report output
BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-shell.elf and ./pull_artifact/chip-shell.elf:

sections,vmsize,filesize
.debug_info,0,992
.debug_loc,0,712
.debug_line,0,361
.debug_ranges,0,96
text,72,72
rodata,56,56
.debug_abbrev,0,55
.symtab,0,32
.debug_frame,0,16
.debug_aranges,0,8
.shstrtab,0,1
device_handles,-8,-8
.strtab,0,-77
.debug_str,0,-248

Comparing ./master_artifact/chip-lock.elf and ./pull_artifact/chip-lock.elf:

sections,vmsize,filesize
.debug_info,0,2058
.debug_loc,0,750
.debug_line,0,123
.debug_ranges,0,96
text,72,72
rodata,56,56
.debug_abbrev,0,55
.symtab,0,32
.debug_frame,0,16
.debug_aranges,0,8
.shstrtab,0,-1
device_handles,-8,-8
.strtab,0,-87
.debug_str,0,-258


@github-actions
Copy link

Size increase report for "esp32-example-build" from a62ee31

File Section File VM
chip-all-clusters-app.elf .flash.text 84 84
chip-all-clusters-app.elf .flash.rodata 56 56
Full report output
BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-all-clusters-app.elf and ./pull_artifact/chip-all-clusters-app.elf:

sections,vmsize,filesize
.debug_info,0,2691
.debug_loc,0,121
.flash.text,84,84
.flash.rodata,56,56
.symtab,0,32
.debug_frame,0,24
.debug_aranges,0,8
.debug_abbrev,0,4
.shstrtab,0,1
.riscv.attributes,0,-1
.strtab,0,-9
.debug_line,0,-107
[Unmapped],0,-140
.debug_ranges,0,-160
.debug_str,0,-264


@kghost
Copy link
Contributor

kghost commented Sep 23, 2021

I feel that IsAckPending is indicating the active state of an exchange. It means that the exchange is busy generating an response, and it is unable to process incoming requests now.

HasPiggybackAckPending means that a ack is set, it actually means that the exchange have received a packet from peer. After this PR, all packets except the very beginning one will contains a valid ack.

So I just feel that these function names are a little bit confused.

@bzbarsky-apple
Copy link
Contributor Author

@kghost Please suggest what you'd prefer as names?

@bzbarsky-apple bzbarsky-apple merged commit fa43d00 into project-chip:master Sep 28, 2021
@bzbarsky-apple bzbarsky-apple deleted the fix-lost-ack-crash branch September 28, 2021 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reliable messaging crashes if a standalone ack is lost
6 participants