Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current MRP IDLE retry interval default value (5000ms) does not meet the latest SPEC definition (300ms) #22167

Closed
AlanLCollins opened this issue Aug 25, 2022 · 8 comments
Assignees
Labels
p1 priority 1 work spec Mismatch between spec and implementation V1.0

Comments

@AlanLCollins
Copy link

AlanLCollins commented Aug 25, 2022

Problem

Accessory devices are sharing MRP configuration during mDNS query and PASE/CASE creation. Idle retry interval = 5000ms
image
Thread network heavily relies on Application retries for congestion environments, this creates a bad customer experience because any application retry would take >5s.

The default setting needs to be 300ms so all Thread REED, Leader, Routers, FTD CHILDs will use it.
Any change for sleepy devices should be extra modification product-specific.

Proposed Solution

Change the default to match spec default = 300ms.

src/messaging/ReliableMessageProtocolConfig.h

#ifndef CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL
#define CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL (300_ms32)
#endif // CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL

@bzbarsky-apple bzbarsky-apple added spec Mismatch between spec and implementation V1.0 labels Aug 25, 2022
@bzbarsky-apple
Copy link
Contributor

@turon

turon added a commit to turon/connectedhomeip that referenced this issue Aug 28, 2022
@turon turon self-assigned this Aug 28, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Aug 28, 2022
@turon
Copy link
Contributor

turon commented Aug 31, 2022

CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL is actually setting the value a node will advertise for itself. This, by design, can be a value appropriate for the node. Interestingly, this value varies widely by platform currently:

$ git grep -e define.*MRP.*INTERVAL
examples/chef/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/light-switch-app/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/lighting-app/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/lock-app/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/ota-provider-app/linux/include/CHIPProjectAppConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/thermostat/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
examples/window-app/efr32/include/CHIPProjectConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)
src/messaging/ReliableMessageProtocolConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (300_ms32)
src/messaging/ReliableMessageProtocolConfig.h:#define CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL (5000_ms32)
src/platform/Darwin/CHIPPlatformConfig.h:#define CHIP_CONFIG_MRP_DEFAULT_INITIAL_RETRY_INTERVAL (15000)
src/platform/Darwin/CHIPPlatformConfig.h:#define CHIP_CONFIG_MRP_LOCAL_ACTIVE_RETRY_INTERVAL (2000_ms32)

The default value a sender will use to communicate with a node when the parameters are omitted from dns-sd / case / pase discovery are set as follows:

ReliableMessageProtocolConfig GetDefaultMRPConfig()
{
    // Default MRP intervals are defined in spec <2.11.3. Parameters and Constants>
    static constexpr const System::Clock::Milliseconds32 idleRetransTimeout   = 4000_ms32;
    static constexpr const System::Clock::Milliseconds32 activeRetransTimeout = 300_ms32;
    return ReliableMessageProtocolConfig(idleRetransTimeout, activeRetransTimeout);
}

Because a sender may choose a default timeout that exceeds the specification, we plan to NOT FIX this for v1.0.
The current timeout is accounting for delays to the initial message beyond MRP:

  • Platforms that send the standalone ack later than the mandated 200ms (due to imperfect concurrency implementations or slow SoC)
  • Cases where DNS-SD address discovery is required before the initial message can be sent.

turon added a commit to turon/connectedhomeip that referenced this issue Aug 31, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Aug 31, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Sep 1, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Sep 5, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Sep 6, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Sep 9, 2022
@AlanLCollins
Copy link
Author

AlanLCollins commented Oct 4, 2022

@turon , I agree, the CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL is the value that the device advertises for itself int he mDNS service as SII. This is the value that other devices will use prior to trigger a MRP retry. So consider the the following scenario:

  • matter controller is paired to a light-bulb. Both have a valid CASE session. light-bulb is advertising SII=5000
  • light-bulb gets power-cycled by the user (an easy mistake), the CASE session is erased in the light-bulb.
  • matter controller which still has a valid CASE session, attempts to toggle the light-bulb. The 3 retries will fail ~15 to 20 seconds. Now the controller knows the CASE session is broken, and attempt to fix it.
  • over all latency will be >20s. which is unacceptable for user experience.

We need MRP SII = 300 as default for matter 1.0 targeting all mains-powered (non-sleepy) devices.

CC: @chrisdecenzo

@AlanLCollins
Copy link
Author

what would be the problem to use different default values for PASE, CASE and mDNS-SD ?

  • if PASE happens off-network (over BLE), the accessory device is in commissioning mode, which is a non-sleepy mode.
  • mDNS-SD is not an issue because Thread devices advertise using the BR as proxy.
  • why is CASE session using different setting than SII ? at the time CASE session is required, the device is in-network and we discovered the IP using mDNS-sd, which includes SII.

@AlanLCollins
Copy link
Author

non-sleepy devices (e.g. light-bulb, smart-plugs,..) should advertise SII = 300 as described in the spec.

#if CHIP_DEVICE_CONFIG_ENABLE_SED
#define CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL <polling period>
#else
#define CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL 300
#endif

@turon , do you support this change?

@turon
Copy link
Contributor

turon commented Oct 11, 2022

@AlanLCollins the CHIP_CONFIG_MRP_LOCAL_IDLE_RETRY_INTERVAL is a product-specific decision. I agree that SEDs should use their product-specific polling interval, and light bulbs should use a low latency value such as 300. As such, the default in the SDK shouldn't matter much because each product will tune that value to one that makes sense for them.

@AlanLCollins
Copy link
Author

@turon , sdk default matters because developers that are not using SED are not paying attention to changing the value. The SDK should either use low latency default value or force developers to change it.

turon added a commit to turon/connectedhomeip that referenced this issue Oct 12, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Oct 13, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Oct 14, 2022
turon added a commit to turon/connectedhomeip that referenced this issue Oct 17, 2022
@franck-apple franck-apple added the p1 priority 1 work label Oct 24, 2022
@bzbarsky-apple
Copy link
Contributor

This was fixed in #23344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 priority 1 work spec Mismatch between spec and implementation V1.0
Projects
None yet
4 participants