Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operational advertisement on startup is broken on Linux for on-network pairing (and various other cases) #7911

Closed
bzbarsky-apple opened this issue Jun 25, 2021 · 1 comment

Comments

@bzbarsky-apple
Copy link
Contributor

Problem

Mdns::StartServer only does AdvertiseOperational() if IsFullyProvisioned() tests true.

IsFullyProvisioned only tests true if every networking technology we are compiled with (as hardcoded by config ifdefs) is provisioned, like so:

    return
#if CHIP_DEVICE_CONFIG_ENABLE_WIFI_STATION
        ConnectivityMgr().IsWiFiStationProvisioned() &&
#endif
#if CHIP_DEVICE_CONFIG_ENABLE_THREAD
        ConnectivityMgr().IsThreadProvisioned() &&
#endif
[etc]

CHIP_DEVICE_CONFIG_ENABLE_WIFI_STATION is set to 1 for Linux. That means that if a Linux device is on the network already but not using wifi (e.g. using ethernet, or in the case of our tests just running on localhost), calling StartServer() will not start operational advertisement.

Similarly, if a device supported both Wi-Fi and Thread (will such things exist?) it would need both to be provisioned to start operational advertisement.

Proposed Solution

Figure out what should actually control whether we start operational advertisement. Fundamentally, it seems to me that as long as we have at least one opcert and any sort of network connectivity we should be advertising operational. Looking at the spec's commissioning flows, all the steps between "add an opcert" and "connect on the operational network" are optional for the commissioner to perform, so the commissionee really needs to try to do operational advertisement as soon as it has an opcert...

@tcarmelveilleux @cecille @andy31415 @lochan-apple

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 25, 2021
There are three changes here:

1) When doing onnetwork commissioning, make sure to run the part of
the commissioning state machine that is meant to run on the
operational network, starting with the mdns resolution of the device
operational address.  This ensures that we put ourselves into a state
where we know the operational cert is fully provisioned on the device
and we can talk to it via CASE.

2) Fix the fabric id in the script's pairing command to match the
fabric id all-clusters-app advertises, so our operational discovery
looks for the right thing.

3) Ensure that we actually start operational advertisement when we get
a new opcert.  Just restarting our mdns server does not do that, due
to project-chip#7911
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 25, 2021
There are four changes here:

1) When doing onnetwork commissioning, make sure to run the part of
the commissioning state machine that is meant to run on the
operational network, starting with the mdns resolution of the device
operational address.  This ensures that we put ourselves into a state
where we know the operational cert is fully provisioned on the device
and we can talk to it via CASE.

2) Fix the fabric id in the script's pairing command to match the
fabric id all-clusters-app advertises, so our operational discovery
looks for the right thing.

3) Ensure that we actually start operational advertisement when we get
a new opcert.  Just restarting our mdns server does not do that, due
to project-chip#7911

4) Don't fail out on UpdateAddress attempts when there is no
connection; that's not an error condition and should not be
communicated as such to consumers.
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Jun 25, 2021
There are four changes here:

1) When doing onnetwork commissioning, make sure to run the part of
the commissioning state machine that is meant to run on the
operational network, starting with the mdns resolution of the device
operational address.  This ensures that we put ourselves into a state
where we know the operational cert is fully provisioned on the device
and we can talk to it via CASE.

2) Fix the fabric id in the script's pairing command to match the
fabric id all-clusters-app advertises, so our operational discovery
looks for the right thing.

3) Ensure that we actually start operational advertisement when we get
a new opcert.  Just restarting our mdns server does not do that, due
to project-chip#7911

4) Don't fail out on UpdateAddress attempts when there is no
connection; that's not an error condition and should not be
communicated as such to consumers.
bzbarsky-apple added a commit that referenced this issue Jun 26, 2021
There are four changes here:

1) When doing onnetwork commissioning, make sure to run the part of
the commissioning state machine that is meant to run on the
operational network, starting with the mdns resolution of the device
operational address.  This ensures that we put ourselves into a state
where we know the operational cert is fully provisioned on the device
and we can talk to it via CASE.

2) Fix the fabric id in the script's pairing command to match the
fabric id all-clusters-app advertises, so our operational discovery
looks for the right thing.

3) Ensure that we actually start operational advertisement when we get
a new opcert.  Just restarting our mdns server does not do that, due
to #7911

4) Don't fail out on UpdateAddress attempts when there is no
connection; that's not an error condition and should not be
communicated as such to consumers.
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this issue Sep 23, 2021
There are four changes here:

1) When doing onnetwork commissioning, make sure to run the part of
the commissioning state machine that is meant to run on the
operational network, starting with the mdns resolution of the device
operational address.  This ensures that we put ourselves into a state
where we know the operational cert is fully provisioned on the device
and we can talk to it via CASE.

2) Fix the fabric id in the script's pairing command to match the
fabric id all-clusters-app advertises, so our operational discovery
looks for the right thing.

3) Ensure that we actually start operational advertisement when we get
a new opcert.  Just restarting our mdns server does not do that, due
to project-chip#7911

4) Don't fail out on UpdateAddress attempts when there is no
connection; that's not an error condition and should not be
communicated as such to consumers.
@bzbarsky-apple
Copy link
Contributor Author

Fixed at some point. We now do this right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant