Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Identify dead devices #4207

Open
The-Deep-Sea opened this issue Feb 3, 2022 · 15 comments
Open

[feat] Identify dead devices #4207

The-Deep-Sea opened this issue Feb 3, 2022 · 15 comments
Labels
enhancement New feature or request

Comments

@The-Deep-Sea
Copy link

Is your feature request related to a problem? Please describe.
Hi. I'm new to Z-Wave, but have more experience with Zigbee.
Above all, I miss the fact that devices that do not report regularly are marked as dead.

Describe the solution you'd like
Zigbee2mqtt solves this e.g. like this:

  • Sleeping nodes are marked as dead after a period of inactivity (default: 25h).
  • Alive nodes are pinged after a period of inactivity (default: 10min). If they don't react to this, they are also marked as dead.

See https://www.zigbee2mqtt.io/guide/configuration/device-availability.html for more details.

Describe alternatives you've considered
Is this the right approach? Or is there already a solution to the problem that I'm not aware of?
How do you identify a dead passive device such as a smoke detector?

Additional context
Wouldn't it be correct to set the entities of dead nodes to "unavailable" when working with Home Assistant? Except possibly the node status.

@ghost ghost assigned robertsLando Feb 3, 2022
@jmgiaever
Copy link

Hi. Which controller do you have?

Can you also set logging for zwavejs to debug and save it to a file, so we can see what's happening when things are dying.

@The-Deep-Sea
Copy link
Author

I use a Z-Wave.Me UZB.

Yes, I can do that. I just removed the battery from a smoke sensor.
But I suspect that nothing will happen. At least the status is not changed, I had already tested that. And that is also the reason for this issue.

@The-Deep-Sea
Copy link
Author

48 hours after removing the battery and there is not a single entry for the node in the log. And the status is still asleep.
As I said: How do you identify a dead passive device such as a smoke detector?
I can't believe I'm the only one facing the question. And also not that everyone else manually checks the last activity of the nodes every day.

@robertsLando
Copy link
Member

@AlCalzone ?

@jmgiaever
Copy link

Sorry, I read this post too quickly. There's been so many post about 700-series sticks to stall and ending up nodes turning dead.

I somehow see your worries in situations where you wake up another fire alarm (e.g if it supports FLirs) if one triggers, but would you rely on it?

I would rather trust association groups, or built in communication protocols if they have?

Not sure how you would expect a node to wake up if they don't supports the the WakeUpCC? They have to wake up to receive it - and respond to it, either automatic or manually.

Maybe the controller can presume the device is dead if it hasn't reported alive within the wakeup interval x 2? (The time it should wake up itself + repond to a ping if the first one failed).

I don't know.

@The-Deep-Sea
Copy link
Author

My question wasn't aimed at waking up devices.
I only used the smoke detector as an example. Especially because it is safety critical.
Because if it is really defective, then it can no longer send any information via association groups. And probably no longer trigger the "classic" acoustic alarm.
And there are rooms in my house that I'm rarely in, so I wouldn't even notice if the light, which most smoke detectors have, stopped flashing.

Also, for example, a temperature sensor that no longer works because the battery is empty can be annoying.
I've had the case (with Zibgee) that a radiator was heating up a room all day because of this, even though it was actually already extremely warm. But the sensor was dead and this was not recognized at the time as described at the beginning.

That's right, with Z-Wave you have the wakeup interval. As far as I know, there is no such thing with Zigbee. In this respect, you don't have to assume a fixed 25h, but the wakeup interval + x%. At least for passive/battery powered devices.

(I read about the problems with the 700 series. Luckily before I bought one.)

@AlCalzone AlCalzone transferred this issue from zwave-js/zwave-js-ui Feb 8, 2022
@AlCalzone AlCalzone added the enhancement New feature or request label Feb 8, 2022
@AlCalzone
Copy link
Member

The usecase behind this issue makes sense, but I'm not decided yet what is the best way to do this.

@jmgiaever
Copy link

My question wasn't aimed at waking up devices.

No, but to be able to ping a device it needs to support to be woken up. That's why I though maybe mark it as «presumed dead» when

  1. the device haven't reported alive within the wake-up interval since zwjs was started
  2. responded to the ping within the wake-up interval, that was sent if 1 was the case

@raman325
Copy link
Contributor

raman325 commented Feb 8, 2022

Additional context Wouldn't it be correct to set the entities of dead nodes to "unavailable" when working with Home Assistant? Except possibly the node status.

Right now all entities but the node status sensor get marked as unavailable when a device is marked dead by the driver. We were actually just having a separate discussion about whether this is the right behavior because there are cases where the driver may think the device is dead but the next action against the device would cause it to respond and make the node alive again, but there's a catch 22 because the entity is unavailable in HA so you can't perform normal actions against the device to revive it (you can ping it, but there's no UI component to do that so it's not as easy of a recovery setup).

In your example, the false alarm above would be after the fire alarm was marked dead but you put the battery back in - something needs to happen to tell the driver that the node is alive, otherwise it's still dead even though it's functioning again.

A polling mechanism reduces the false positives in either direction (we think it's dead but it's alive or we think it's alive but it's dead) but comes at a cost of battery life and extra processing cycles. I don't know what the optimal solution is here either but these two discussions are somewhat related and I will be keeping an eye on this accordingly.

@The-Deep-Sea
Copy link
Author

No, but to be able to ping a device it needs to support to be woken up.

Okay, that's correct. However, in the case of Zigbee2mqtt, sleeping devices are not pinged either.
Only devices that are, or should be, alive are pinged (usually devices that are connected to the mains).


Right now all entities but the node status sensor get marked as unavailable when a device is marked dead by the driver.

I tested it: you are right. I don't know why I remembered it differently.

something needs to happen to tell the driver that the node is alive, otherwise it's still dead even though it's functioning again.

At least the devices I tested report back automatically as soon as they have power.
Otherwise, passive (battery powered) devices should do it at the latest with the next wakeup interval and active (mains powered) devices are actively pinged if you implement it as with Zigbee2mqtt.

A polling mechanism reduces the false positives in either direction [...] but comes at a cost of battery life and extra processing cycles.

If it only pings mains powered devices, it will not affect a battery.
But I can't say what about FLiRS devices, I don't know (yet) about them. But if it is configurable, every user can decide for himself whether he prefers a longer battery life or more "security", possibly also per device.
Basically, this (a dead device) should be the exception anyway.

@AlCalzone
Copy link
Member

AlCalzone commented Feb 9, 2022

At least the devices I tested report back automatically as soon as they have power.

I have at least one that doesn't. Only when you physically interact with it, or try to control it via Z-Wave.

To reiterate, we need to solve these points:

  • detect when a battery powered (sleeping) device hasn't woken up for too long
  • detect when a mains powered device no longer reacts to pings
  • detect when a FLiRS device no longer reacts, but at a reduced frequency compared to mains powered or you'll drain the battery
  • detect when all of the above are reachable again

And at the same time we need to

  • avoid false-positives for fragile meshes
  • avoid introducing unnecessary delays by pinging unreachable devices:
    a) because the timeout can take several seconds
    b) because the most used stick (sometimes?) has a firmware bug where attempting communication with a dead device can delay the serial API response that should come immediately and throw the driver and stick out of sync

@bagobones
Copy link

bagobones commented Oct 3, 2023

Some thoughts:

  1. the Z-wave JS web UI exposes a last active time stamp however this does not appear to be exposed in home assistant via the native HA API (haven't tired MQTT).. Even just having this would let me create an alert for devices that have not checked in for more than 12 hours (way longer than any of my battery devices should go without checkin)
  2. As already mentioned zigbee2mqtt availability settings have two time based settings 1 short one defaulted to 10 min for active devices (routers/mains powered), and another for passive devices (mostly battery powered) which is 1500 min AKA 25 hours. short of a hand held remotes, lightbulbs and maybe an alarm/siren (special class) I would suspect MOST device check in at least once and awhile. You might even be able to get an idea by looking at larger existing installs and looking at those last active time stamps to generate some useful info.
  3. Do dead devices not just turn alive again on their next check-in? Is the fear primarily that marking something dead will prevent it from being available for use?

As there is a fear of false positives why not implement a broad Active / Passive approach but let users opt into it as a beta feature and provide feedback.. it may be a good idea to allow excluding device classes and individual devices if it becomes a problem.

@gruntpartystyle
Copy link

gruntpartystyle commented Nov 7, 2024

Just to make sure I understand this... if a battery powered device fails to check in with the controller it will stay "asleep" into perpetuity? Isn't that in direct contrast with the principles of zwave guaranteeing reliability?
I trust that my battery operated devices report in every 6h (or whatever I set), so I trust my controller will throw a tantrum if the device doesn't check-in after a while... I rather have a warning of a suspected dead node which failed to check-in after x missed intervals than a sensor which is dead since weeks with no indication at all.
AFAIK, battery powered devices have a wake-up parameter/interval. The controller is aware of this parameter as it can read this parameter? So why not at least flag the device as AWOL after it has missed its expected check-in?

image

  1. Make it a global parameter that x missed check-ins means trouble, ...
  2. ...have device specific setting to override the global one for certain devices / deactivate the check.
  3. If a device doesn't have a wake-up interval, use a system default 24h or whatever
  4. ...which can also be customized per device if required.

@AlCalzone
Copy link
Member

So why not at least flag the device as AWOL after it has missed its expected check-in?

Because this issue is still open and a solution has not been implemented yet.

@VVVVVVVip
Copy link

I think a working report/notification for devices battery level would go a long way here to solve some of this at least.

None of my battery devices reports any change in battery level, and I found this quite annoying, especially since the battery sensor entity is discovered.
I moved from a Vera Egde to HA with a Aeotec Z‐Stick Gen5 USB Controller ZW090/ JS Add-on and it took me a while to realize that there is no reports from the devices.
For devices with temp sensors I now monitor if the temp hasn't changed for 24h and send a notification to check the device battery.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants