[feat] Identify dead devices #4207

The-Deep-Sea · 2022-02-03T21:21:06Z

Is your feature request related to a problem? Please describe.
Hi. I'm new to Z-Wave, but have more experience with Zigbee.
Above all, I miss the fact that devices that do not report regularly are marked as dead.

Describe the solution you'd like
Zigbee2mqtt solves this e.g. like this:

Sleeping nodes are marked as dead after a period of inactivity (default: 25h).
Alive nodes are pinged after a period of inactivity (default: 10min). If they don't react to this, they are also marked as dead.

See https://www.zigbee2mqtt.io/guide/configuration/device-availability.html for more details.

Describe alternatives you've considered
Is this the right approach? Or is there already a solution to the problem that I'm not aware of?
How do you identify a dead passive device such as a smoke detector?

Additional context
Wouldn't it be correct to set the entities of dead nodes to "unavailable" when working with Home Assistant? Except possibly the node status.

jmgiaever · 2022-02-03T22:10:15Z

Hi. Which controller do you have?

Can you also set logging for zwavejs to debug and save it to a file, so we can see what's happening when things are dying.

The-Deep-Sea · 2022-02-03T22:44:55Z

I use a Z-Wave.Me UZB.

Yes, I can do that. I just removed the battery from a smoke sensor.
But I suspect that nothing will happen. At least the status is not changed, I had already tested that. And that is also the reason for this issue.

The-Deep-Sea · 2022-02-05T22:47:33Z

48 hours after removing the battery and there is not a single entry for the node in the log. And the status is still asleep.
As I said: How do you identify a dead passive device such as a smoke detector?
I can't believe I'm the only one facing the question. And also not that everyone else manually checks the last activity of the nodes every day.

robertsLando · 2022-02-07T08:29:52Z

@AlCalzone ?

jmgiaever · 2022-02-08T00:28:09Z

Sorry, I read this post too quickly. There's been so many post about 700-series sticks to stall and ending up nodes turning dead.

I somehow see your worries in situations where you wake up another fire alarm (e.g if it supports FLirs) if one triggers, but would you rely on it?

I would rather trust association groups, or built in communication protocols if they have?

Not sure how you would expect a node to wake up if they don't supports the the WakeUpCC? They have to wake up to receive it - and respond to it, either automatic or manually.

Maybe the controller can presume the device is dead if it hasn't reported alive within the wakeup interval x 2? (The time it should wake up itself + repond to a ping if the first one failed).

I don't know.

The-Deep-Sea · 2022-02-08T18:50:21Z

My question wasn't aimed at waking up devices.
I only used the smoke detector as an example. Especially because it is safety critical.
Because if it is really defective, then it can no longer send any information via association groups. And probably no longer trigger the "classic" acoustic alarm.
And there are rooms in my house that I'm rarely in, so I wouldn't even notice if the light, which most smoke detectors have, stopped flashing.

Also, for example, a temperature sensor that no longer works because the battery is empty can be annoying.
I've had the case (with Zibgee) that a radiator was heating up a room all day because of this, even though it was actually already extremely warm. But the sensor was dead and this was not recognized at the time as described at the beginning.

That's right, with Z-Wave you have the wakeup interval. As far as I know, there is no such thing with Zigbee. In this respect, you don't have to assume a fixed 25h, but the wakeup interval + x%. At least for passive/battery powered devices.

(I read about the problems with the 700 series. Luckily before I bought one.)

AlCalzone · 2022-02-08T20:21:40Z

The usecase behind this issue makes sense, but I'm not decided yet what is the best way to do this.

jmgiaever · 2022-02-08T20:27:40Z

My question wasn't aimed at waking up devices.

No, but to be able to ping a device it needs to support to be woken up. That's why I though maybe mark it as «presumed dead» when

the device haven't reported alive within the wake-up interval since zwjs was started
responded to the ping within the wake-up interval, that was sent if 1 was the case

raman325 · 2022-02-08T22:42:05Z

Additional context Wouldn't it be correct to set the entities of dead nodes to "unavailable" when working with Home Assistant? Except possibly the node status.

Right now all entities but the node status sensor get marked as unavailable when a device is marked dead by the driver. We were actually just having a separate discussion about whether this is the right behavior because there are cases where the driver may think the device is dead but the next action against the device would cause it to respond and make the node alive again, but there's a catch 22 because the entity is unavailable in HA so you can't perform normal actions against the device to revive it (you can ping it, but there's no UI component to do that so it's not as easy of a recovery setup).

In your example, the false alarm above would be after the fire alarm was marked dead but you put the battery back in - something needs to happen to tell the driver that the node is alive, otherwise it's still dead even though it's functioning again.

A polling mechanism reduces the false positives in either direction (we think it's dead but it's alive or we think it's alive but it's dead) but comes at a cost of battery life and extra processing cycles. I don't know what the optimal solution is here either but these two discussions are somewhat related and I will be keeping an eye on this accordingly.

The-Deep-Sea · 2022-02-09T21:02:27Z

No, but to be able to ping a device it needs to support to be woken up.

Okay, that's correct. However, in the case of Zigbee2mqtt, sleeping devices are not pinged either.
Only devices that are, or should be, alive are pinged (usually devices that are connected to the mains).

Right now all entities but the node status sensor get marked as unavailable when a device is marked dead by the driver.

I tested it: you are right. I don't know why I remembered it differently.

something needs to happen to tell the driver that the node is alive, otherwise it's still dead even though it's functioning again.

At least the devices I tested report back automatically as soon as they have power.
Otherwise, passive (battery powered) devices should do it at the latest with the next wakeup interval and active (mains powered) devices are actively pinged if you implement it as with Zigbee2mqtt.

A polling mechanism reduces the false positives in either direction [...] but comes at a cost of battery life and extra processing cycles.

If it only pings mains powered devices, it will not affect a battery.
But I can't say what about FLiRS devices, I don't know (yet) about them. But if it is configurable, every user can decide for himself whether he prefers a longer battery life or more "security", possibly also per device.
Basically, this (a dead device) should be the exception anyway.

AlCalzone · 2022-02-09T21:12:24Z

At least the devices I tested report back automatically as soon as they have power.

I have at least one that doesn't. Only when you physically interact with it, or try to control it via Z-Wave.

To reiterate, we need to solve these points:

detect when a battery powered (sleeping) device hasn't woken up for too long
detect when a mains powered device no longer reacts to pings
detect when a FLiRS device no longer reacts, but at a reduced frequency compared to mains powered or you'll drain the battery
detect when all of the above are reachable again

And at the same time we need to

avoid false-positives for fragile meshes
avoid introducing unnecessary delays by pinging unreachable devices:
a) because the timeout can take several seconds
b) because the most used stick (sometimes?) has a firmware bug where attempting communication with a dead device can delay the serial API response that should come immediately and throw the driver and stick out of sync

bagobones · 2023-10-03T06:45:56Z

Some thoughts:

the Z-wave JS web UI exposes a last active time stamp however this does not appear to be exposed in home assistant via the native HA API (haven't tired MQTT).. Even just having this would let me create an alert for devices that have not checked in for more than 12 hours (way longer than any of my battery devices should go without checkin)
As already mentioned zigbee2mqtt availability settings have two time based settings 1 short one defaulted to 10 min for active devices (routers/mains powered), and another for passive devices (mostly battery powered) which is 1500 min AKA 25 hours. short of a hand held remotes, lightbulbs and maybe an alarm/siren (special class) I would suspect MOST device check in at least once and awhile. You might even be able to get an idea by looking at larger existing installs and looking at those last active time stamps to generate some useful info.
Do dead devices not just turn alive again on their next check-in? Is the fear primarily that marking something dead will prevent it from being available for use?

As there is a fear of false positives why not implement a broad Active / Passive approach but let users opt into it as a beta feature and provide feedback.. it may be a good idea to allow excluding device classes and individual devices if it becomes a problem.

gruntpartystyle · 2024-11-07T08:21:01Z

Just to make sure I understand this... if a battery powered device fails to check in with the controller it will stay "asleep" into perpetuity? Isn't that in direct contrast with the principles of zwave guaranteeing reliability?
I trust that my battery operated devices report in every 6h (or whatever I set), so I trust my controller will throw a tantrum if the device doesn't check-in after a while... I rather have a warning of a suspected dead node which failed to check-in after x missed intervals than a sensor which is dead since weeks with no indication at all.
AFAIK, battery powered devices have a wake-up parameter/interval. The controller is aware of this parameter as it can read this parameter? So why not at least flag the device as AWOL after it has missed its expected check-in?

Make it a global parameter that x missed check-ins means trouble, ...
...have device specific setting to override the global one for certain devices / deactivate the check.
If a device doesn't have a wake-up interval, use a system default 24h or whatever
...which can also be customized per device if required.

AlCalzone · 2024-11-07T11:44:50Z

So why not at least flag the device as AWOL after it has missed its expected check-in?

Because this issue is still open and a solution has not been implemented yet.

VVVVVVVip · 2025-01-14T23:06:49Z

I think a working report/notification for devices battery level would go a long way here to solve some of this at least.

None of my battery devices reports any change in battery level, and I found this quite annoying, especially since the battery sensor entity is discovered.
I moved from a Vera Egde to HA with a Aeotec Z‐Stick Gen5 USB Controller ZW090/ JS Add-on and it took me a while to realize that there is no reports from the devices.
For devices with temp sensors I now monitor if the temp hasn't changed for 24h and send a notification to check the device battery.

ghost assigned robertsLando Feb 3, 2022

AlCalzone transferred this issue from zwave-js/zwave-js-ui Feb 8, 2022

AlCalzone unassigned robertsLando Feb 8, 2022

AlCalzone added the enhancement New feature or request label Feb 8, 2022

raman325 mentioned this issue Feb 9, 2022

Z-wave devices go to status dead after running scene home-assistant/core#66182

Closed

AlCalzone mentioned this issue Mar 3, 2022

📋 Roadmap 2024 #4312

Open

AlCalzone added this to the Easier troubleshooting milestone Apr 21, 2022

AlCalzone removed this from the Easier troubleshooting milestone Apr 21, 2022

kpine mentioned this issue Nov 26, 2022

[feature] Mark battery devices as status=dead if wakeup is missed zwave-js/zwave-js-ui#2821

Closed

kpine mentioned this issue Apr 8, 2023

Battery based devices don't get marked as dead if they haven't checked in recently #5638

Closed

11 tasks

kpine mentioned this issue Oct 3, 2023

Node status shows alive but it has been unplugged for more than a day zwave-js/zwave-js-ui#3321

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Identify dead devices #4207

[feat] Identify dead devices #4207

The-Deep-Sea commented Feb 3, 2022

jmgiaever commented Feb 3, 2022

The-Deep-Sea commented Feb 3, 2022

The-Deep-Sea commented Feb 5, 2022

robertsLando commented Feb 7, 2022

jmgiaever commented Feb 8, 2022

The-Deep-Sea commented Feb 8, 2022

AlCalzone commented Feb 8, 2022

jmgiaever commented Feb 8, 2022

raman325 commented Feb 8, 2022

The-Deep-Sea commented Feb 9, 2022

AlCalzone commented Feb 9, 2022 •

edited

Loading

bagobones commented Oct 3, 2023 •

edited

Loading

gruntpartystyle commented Nov 7, 2024 •

edited

Loading

AlCalzone commented Nov 7, 2024

VVVVVVVip commented Jan 14, 2025

[feat] Identify dead devices #4207

[feat] Identify dead devices #4207

Comments

The-Deep-Sea commented Feb 3, 2022

jmgiaever commented Feb 3, 2022

The-Deep-Sea commented Feb 3, 2022

The-Deep-Sea commented Feb 5, 2022

robertsLando commented Feb 7, 2022

jmgiaever commented Feb 8, 2022

The-Deep-Sea commented Feb 8, 2022

AlCalzone commented Feb 8, 2022

jmgiaever commented Feb 8, 2022

raman325 commented Feb 8, 2022

The-Deep-Sea commented Feb 9, 2022

AlCalzone commented Feb 9, 2022 • edited Loading

bagobones commented Oct 3, 2023 • edited Loading

gruntpartystyle commented Nov 7, 2024 • edited Loading

AlCalzone commented Nov 7, 2024

VVVVVVVip commented Jan 14, 2025

AlCalzone commented Feb 9, 2022 •

edited

Loading

bagobones commented Oct 3, 2023 •

edited

Loading

gruntpartystyle commented Nov 7, 2024 •

edited

Loading