Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All devices lost after Raspbee II reboot #2915

Closed
andreasscherbaum opened this issue Jun 10, 2020 · 39 comments
Closed

All devices lost after Raspbee II reboot #2915

andreasscherbaum opened this issue Jun 10, 2020 · 39 comments

Comments

@andreasscherbaum
Copy link
Contributor

andreasscherbaum commented Jun 10, 2020

When I reboot the Raspberry Pi (B+) which is running the Raspbee II, and everything comes up again, the Raspbee lost all connected devices. I have to re-learn each device manually.

  • Raspberry Pi B+
  • Raspbian (all updates installes)
  • RaspBee II
  • deCONZ Version: 2.5.77 (version is up to date)

Currently (and because it's work) I'm just trying this with a Xiaomi Door Sensor. After reboot the Status is "Not reachable", and I have to delete and re-add the sensor.

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 10, 2020

Can you be a bit more precise on the "not reachable" regarding your Xiaomi door sensor? It can take up to 1h for the device to be marked as reachable and for sensors, it typically requires 24h before being marked as unreachable in case they really pass out.

@Mimiix
Copy link
Collaborator

Mimiix commented Jun 10, 2020

In addition to @SwoopX , Zigbee works best if theres multiple devices. Having only 1 sensor really limits its capabilities. If you have any way of adding a device that can act as a router, use that to see what happens.

Just having one device really makes debugging hard as the sensor itself can be the problem too.

@andreasscherbaum
Copy link
Contributor Author

pi@home-zigbee:~ $ uptime 
 14:03:50 up 4 days, 20:09,  1 user,  load average: 0,96, 0,81, 0,83

That's when I rebooted the device the last time. I did not re-learn the sensor this time, because at this point I decided to give up and wait if a further firmware upgrade will fix the problem.

The "not reachable" is simply what the web interface says (I'm running the Raspbee headless).

@Mimiix
Copy link
Collaborator

Mimiix commented Jun 10, 2020

Ah! But that is phoscon. What happens if you query the RestAPI?

@andreasscherbaum
Copy link
Contributor Author

Here is the REST API query result of this specific sensor:

  "6": {
    "config": {
      "battery": 100,
      "on": true,
      "reachable": true,
      "temperature": 0
    },
    "ep": 1,
    "etag": "3aef81afd3c3e62772fd65effa45ab83",
    "lastseen": "2020-06-10T12:08:37.424",
    "manufacturername": "LUMI",
    "modelid": "lumi.sensor_magnet",
    "name": "Test sensor",
    "state": {
      "lastupdated": "2020-06-04T11:26:49.428",
      "open": true
    },
    "type": "ZHAOpenClose",
    "uniqueid": "00:15:8d:00:02:8b:63:55-01-0006"
  }

The sensor is in front of me, about 2 meters away from the Raspbee. There is no status change when I open/close the magnet.

@Mimiix
Copy link
Collaborator

Mimiix commented Jun 10, 2020

It is interesting that the "lastseen": "2020-06-10T12:08:37.424", is today. but open close is not transfered. @SwoopX can you elaborate?

@andreasscherbaum
Copy link
Contributor Author

The "lastseen" is updated once every couple minutes. But not the actual status, or the last change of the status. This only happens once I re-learn the device.

@Mimiix
Copy link
Collaborator

Mimiix commented Jun 10, 2020

I do not have this knowledge :( I need to wait on Swoop for this.

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 10, 2020

Sry guys, work first. However, for the moment, just a quick one: Last seen is a good indication that communication is still up. Might be that there's an issue with the attributes somewhere. Anyway, we need to gain a good understanding what's happening and how any potentially odd thing could happen.

I also have one door sensor which drives me crazy, the other one is rock stable. Never checked the last seen for the one passed out however. Also, there's one thing to note: the "older" Xiaomi devices are not exactly compliant to zigbee.

EDIT:
Just noted that you got the Mija and not the Aqara version. Sorry to say that, but those are really crappy. Sold all of them. My humble opinion/experience.

@Mimiix
Copy link
Collaborator

Mimiix commented Jun 10, 2020

@andreasscherbaum Can you provide me the Product number of the switch?

@andreasscherbaum
Copy link
Contributor Author

Does that help? Otherwise how can I identify the number?
20200610_164533
20200610_164618

@SwoopX How exactly can I debug the communication between switch and Raspbee?
I've seen a couple debugging options for the deCONZ software flying around in postings, but I couldn't find any good documentation what each switch is doing and how exactly it is to use.

@Mimiix
Copy link
Collaborator

Mimiix commented Jun 10, 2020

I actually ment the product number of the Contact sensor :)

@andreasscherbaum
Copy link
Contributor Author

20200610_172257

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 10, 2020

Ok, so let's try to have a look. Run deconz with --dbg-info=2 > dbg to write debug output to a file. Let it run maybe over night or start the debug when the sensor becomes unreachable but the last seen still updates. It somehow feels as if data is coming in but is not processed correctly. However, we've made good progress in decoding the Xiaomi specific communication over the last months and maybe there's more to discover.

EDIT
You may also check https://github.com/dresden-elektronik/deconz-rest-plugin/wiki/Network-lost-and-configuration-restore-does-not-help

@andreasscherbaum
Copy link
Contributor Author

Ok, attached is the log from a run. What I did: stopped the headless deconz service I'm usually using (the sensor was still working then), and logged in into the Pi with X-forward enabled, then started "deconz" and let X forward the window to my desktop. Rebooted the Pi, and did the same login/forward again. The log is from the 2nd login.

Couple points I see there:

When deconz starts, it loads the sensor from the internal database:

11:42:46:842 Sqlite sensors: sid = 2
11:42:46:843 Sqlite sensors: name = Test sensor
11:42:46:843 Sqlite sensors: type = ZHAOpenClose
11:42:46:843 Sqlite sensors: modelid = lumi.sensor_magnet
11:42:46:843 Sqlite sensors: manufacturername = LUMI
11:42:46:858 Sqlite sensors: uniqueid = 00:15:8d:00:02:8b:63:55-01-0006
11:42:46:859 Sqlite sensors: state = {"lastupdated":"2020-06-04T01:00:53.627","open":true}
11:42:46:859 Sqlite sensors: config = {"battery":100,"on":true,"reachable":true,"temperature":0}
11:42:46:859 Sqlite sensors: fingerprint = {"d":65535,"ep":1,"in":[6],"p":260}
11:42:46:861 Sqlite sensors: deletedState = deleted
11:42:46:861 ~Resource() /sensors 0xbec64da8

11:42:46:862 Sqlite sensors: sid = 5
11:42:46:862 Sqlite sensors: name = Test sensor
11:42:46:862 Sqlite sensors: type = ZHAOpenClose
11:42:46:863 Sqlite sensors: modelid = lumi.sensor_magnet
11:42:46:863 Sqlite sensors: manufacturername = LUMI
11:42:46:863 Sqlite sensors: uniqueid = 00:15:8d:00:02:8b:63:55-01-0006
11:42:46:864 Sqlite sensors: state = {"lastupdated":"2020-06-04T11:15:31.672","open":true}
11:42:46:864 Sqlite sensors: config = {"battery":100,"on":true,"reachable":true,"temperature":0}
11:42:46:864 Sqlite sensors: fingerprint = {"d":65535,"ep":1,"in":[6],"p":260}
11:42:46:865 Sqlite sensors: deletedState = deleted
11:42:46:865 ~Resource() /sensors 0xbec64da8

11:42:46:900 Sqlite sensors: sid = 6
11:42:46:900 Sqlite sensors: name = Test sensor
11:42:46:901 Sqlite sensors: type = ZHAOpenClose
11:42:46:901 Sqlite sensors: modelid = lumi.sensor_magnet
11:42:46:901 Sqlite sensors: manufacturername = LUMI
11:42:46:902 Sqlite sensors: uniqueid = 00:15:8d:00:02:8b:63:55-01-0006
11:42:46:902 Sqlite sensors: state = {"lastupdated":"2020-06-15T00:03:06.169","open":true}
11:42:46:902 Sqlite sensors: config = {"battery":100,"on":true,"reachable":true,"temperature":0}
11:42:46:902 Sqlite sensors: fingerprint = {"d":65535,"ep":1,"in":[6],"p":260}
11:42:46:903 Sqlite sensors: deletedState = normal
11:42:46:903 Sqlite sensors: mode = 1
11:42:46:904 DB found sensor Test sensor 6
11:42:46:910 ~Resource() /sensors 0xbec64da8

Same "uniqueid", but two of them seem to be deleted (can be because I have to delete the sensor and re-learn it every time). That matches with what I've seen: even though I delete a sensor, when I learn it the next time, deconz seems to recognize the sensor again, but gives it the original name, and I have to rename it manually.

Then this:

11:42:48:892 API error 3, /sensors/2, resource, /sensors/2, not available
11:42:48:992 discard sensor state push for 1: state/dark (already pushed)
11:42:49:113 discard sensor state push for 1: state/status (already pushed)
11:42:49:206 discard sensor state push for 1: state/lastupdated (already pushed)
11:43:15:521 APS-DATA.indication from unknown node 0x00158D00028B6355
11:43:15:522 no button map for: lumi.sensor_magnet ep: 0x01 cl: 0x0006 cmd: 0x0A pl[0]: 000
11:43:15:522 ZCL attribute report 0x00158D00028B6355 for cluster: 0x0006, ep: 0x01, frame control: 0x18, mfcode: 0x0000 
11:43:15:522 	payload: 00001000

And when the GUI starts and connects to the RaspBee, the "Test Sensor" is no longer in the list. Somewhere during the start of deconz the sensor gets lost.

2020-06-15_01.log.zip
decong

@SwoopX Let me know if you need more debugging information, or what exactly I can do to provide more data.

@andreasscherbaum
Copy link
Contributor Author

Just somewhat related: I can't find any documentation about the "--dbg-info" switch. What does this switch, and the other available switches do? Is there a link somewhere?

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 15, 2020

@andreasscherbaum It's in the wiki now, https://github.com/dresden-elektronik/deconz-rest-plugin/wiki/deCONZ-debug-switches

I'll check out the log in the evening. Just to confirm I got it right: the sensor got lost after restarting deconz (seems so)? If yes, I only saw that once when the database was "corrupted". Some fields in the tables were missing due to whatever reason. You may check it out here #2882 (comment)

@andreasscherbaum
Copy link
Contributor Author

Compared my tables in the SQLite database to the ones listed in #2882, and I don't see any differences. That's not the issue.

Can you provide a full list of tables and their schema, so I can compare everything?

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 15, 2020

Do a .tables to get all tables. The schemas should already be available in the thread mentioned. I case one's missing, let me know.

@andreasscherbaum
Copy link
Contributor Author

I see the following tables in my SQLite:

  • auth
  • config2
  • device_descriptors
  • device_gui
  • devices
  • gateways
  • groups
  • nodes
  • resourcelinks
  • rules
  • scenes
  • schedules
  • sensors
  • userparameter
  • zbconf
  • zcl_values

#2882 only lists the following tables:

  • gateways
  • groups
  • nodes
  • rules
  • scenes
  • schedules
  • userparameter
  • zcl_values

I compares these 8, but there are 8 more where the schema is not listed.
Hence my question for a full schema dump.

Dunno if a corrupt database is the cause for the issues, but to rule that out I need the schema for all tables. And it's easier to compare them from one source than to grab the details from a couple different comments.

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 15, 2020

I compares these 8, but there are 8 more where the schema is not listed.

Ah, right. However, the others can't have anything to do with stuff disappearing. the functions upgradeDbToUserVersion1(), etc. should hold the schemas. To be found in database.cpp.

@andreasscherbaum
Copy link
Contributor Author

Ok, I went over all the create functions in database.cpp, and my database schema matches what's supposed to be created there. My version is "6" (latest).

Any other ideas?

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 15, 2020

Not without another debug output.

However, took a look at the one you're previously provided. It appears the sensor is happily sending data and also open/close data. I don't see any websocket notifications though... And I see you don't seem to run the latest firmware.

@andreasscherbaum
Copy link
Contributor Author

ok, please tell me how to debug that.

Also:
image

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 15, 2020

@cwildfoerster
Copy link

cwildfoerster commented Jun 15, 2020

Lights/Switches to not connect after update to 26610700. Needed to relearn them once. Connecting again even after reboot.

@Mimiix
Copy link
Collaborator

Mimiix commented Jun 16, 2020

Hi @cwildfoerster ,

Are you able to create a own bug report for this 😃 ?

Thanks!

Kind regards, Dennis

@SwoopX
Copy link
Collaborator

SwoopX commented Jun 16, 2020

@andreasscherbaum For the sensor not showing any events, you could redo the binding once more.

How to manually bind and enable attribute reporting on the example of the current position for windows covering devices (use the on/off cluster in your case)

Manually bind a cluster

  • In deconz GUI, select Panels and then Bind dropbox.
  • For your device as well as the coordinator (blue node), press the most right bullet to expand the available clusters.
  • For the coordinator: Drag&drop 01 Home Automation Endpoint as destination.
  • For the device: Select the windows covering cluster as source, also drag&drop it. Take note that this must be a server cluster (colored blue).
  • Press bind.

@andreasscherbaum
Copy link
Contributor Author

Did not forget about this Issue, but have to deal with other issues first. Will come around to debug this soon.

@andreasscherbaum
Copy link
Contributor Author

Ok, I updated the Raspbee to firmware 26610700, and tried a couple reboots. Everything is still unstable.

The sensor I'm using for the tests stayed on in the beginning, but after I rebooted the Pi the information is no longer transported to openHAB. The Phoscon webapp shows the sensor and shows that it is opening and closing. For good measurement I deleted the sensor and tried to add it again, that failed spectacular.

Stopped the headless deCONZ, and started the GUI version. When I press the reset button on the sensor, the device shows up:
deCONZ_2020-07-04_01
However it is not shown in the webapp - there it still shows "searching":
deCONZ_2020-07-04_02
Only when I also open/close the sensor a couple times, the device finally shows up with a different name in the GUI, and only then it is recognized in the webapp:
deCONZ_2020-07-04_03

And another change: with the old firmware version a sensor kept the same ID even when deleted and re-added. Now the sensor is getting a new ID every time I re-add it. That's annoying, as every openHAB "Thing" is rendered invalid.

And for the reboots: every time I reboot the bridge, the Things are still not available in openHAB. Sometimes it works when I just touch the file with the Things definition, once the bridge is connected again. Then the sensors start transporting information. But it can't be that I have to monitor the bridge status and then update the Things just to get this working after every reboot.

@SwoopX
Copy link
Collaborator

SwoopX commented Jul 4, 2020

Only when I also open/close the sensor a couple times, the device finally shows up with a different name in the GUI, and only then it is recognized in the webapp

I cut the long story short here: The entry in the DB was still present, therefore the search for new sensors alone doesn't help here. That entry gets "revived" upon triggering the device. So, as a rule of thumb: When you pair any battery powered devices, it's highly recommendable to also trigger it during sensor search, either by waving your hands, moving the magnet, warming it, etc. or just a short press on the physical button. Learned something new 😉

And another change: with the old firmware version a sensor kept the same ID even when deleted and re-added. Now the sensor is getting a new ID every time I re-add it. That's annoying, as every openHAB "Thing" is rendered invalid.

I definitively cannot confirm that. In fact, that's something that I really enjoy (meaning the ID from deconz does NOT change, with one exception). Whenever a sensor gets screwed due to whatever reason on my end, even with a full reset, it comes back with the ID. I loose one of my door sensors every now and than and this is what I could always count on. An exception, as I recall, was when a new device was paired in between.

And for the reboots: every time I reboot the bridge, the Things are still not available in openHAB. Sometimes it works when I just touch the file with the Things definition, once the bridge is connected again. Then the sensors start transporting information. But it can't be that I have to monitor the bridge status and then update the Things just to get this working after every reboot.

That more feels like an issue OpenHab has. So nothing comes in when the device is certainly triggered? Just noticed it's the Mija sensor, so the old one. You know they have a bad reputation?

@andreasscherbaum
Copy link
Contributor Author

Only when I also open/close the sensor a couple times, the device finally shows up with a different name in the GUI, and only then it is recognized in the webapp

I cut the long story short here: The entry in the DB was still present, therefore the search for new sensors alone doesn't help here. That entry gets "revived" upon triggering the device. So, as a rule of thumb: When you pair any battery powered devices, it's highly recommendable to also trigger it during sensor search, either by waving your hands, moving the magnet, warming it, etc. or just a short press on the physical button. Learned something new wink

And another change: with the old firmware version a sensor kept the same ID even when deleted and re-added. Now the sensor is getting a new ID every time I re-add it. That's annoying, as every openHAB "Thing" is rendered invalid.

I definitively cannot confirm that. In fact, that's something that I really enjoy (meaning the ID from deconz does NOT change, with one exception). Whenever a sensor gets screwed due to whatever reason on my end, even with a full reset, it comes back with the ID. I loose one of my door sensors every now and than and this is what I could always count on. An exception, as I recall, was when a new device was paired in between.

Well, that's the first time I have a sensor ID 6 on this, and it's the same sensor which previously was ID 5. And 5 is not used right now.

So, yes, I can confirm that this happened, but only after I upgraded the firmware.
Dunno if that is related though.

And for the reboots: every time I reboot the bridge, the Things are still not available in openHAB. Sometimes it works when I just touch the file with the Things definition, once the bridge is connected again. Then the sensors start transporting information. But it can't be that I have to monitor the bridge status and then update the Things just to get this working after every reboot.

That more feels like an issue OpenHab has. So nothing comes in when the device is certainly triggered? Just noticed it's the Mija sensor, so the old one. You know they have a bad reputation?

It might be the case, I'm just adding everything I observe.
It also adds a bit to the annoyance, when I see that things are not working proberly.
Yes, it is a Mija sensor, however deCONZ sees the sensor, sees it open/close, but this is not transported to openHAB.

@SwoopX
Copy link
Collaborator

SwoopX commented Jul 4, 2020

Regarding that unavailability in openHAB after reboot: Just got some minutes of headache in my dev env. Set up a play instance of FHEM and was wondering that I didn't receive anything as well. A FHEM restart brought it back alive. The point I want to make is: if you reboot and then things don't work out and you then restart openHAB, is it all good? If yes, you should defer the start of openHAB on your OS.

@andreasscherbaum
Copy link
Contributor Author

I have (intentionally) Raspbee/deCONZ and openHAB running on different Raspberry Pi. This way I can reboot one of them without bringing down my entire home automation system. Or I can test something on one system, without disturbing service on the other. All systems are setup using Ansible, so I can easily re-deploy them.

When deCONZ starts, that's something openHAB might detect (reconnect of the bridge/thing), but it still needs some action on the openHAB side (touch the files, as example). I don't know why this is required to make the bridge work, and transport data.

@SwoopX
Copy link
Collaborator

SwoopX commented Jul 4, 2020

Ok, let's approach that differently. When you rung deconz with --dbg-info=2 (either you let it run or pipe it to a file, depending on your preference) and you trigger/do something, it should evoke a websocket notification towards openHAB, which you can see there. If that's not picked up by openHAB, then it's not an issue with deconz. Maybe not use the Mija sensor for that ;)

@andreasscherbaum
Copy link
Contributor Author

Ok, I used the Mija sensor and an Aquara Motion Sensor. Setup:

  • openHAB runs on 192.168.0.35
  • deCONZ runs on 192.168.0.30
  • workstation runs on 192.168.0.20

I stopped the headless deCONZ on the .30, and used ssh X forwarding to bring the application to my workstation, in order to see the display. From the log:

03:30:08:690 New websocket 192.168.0.20:57942 (state: 3) 

Why is this opening a websocket to my workstation, instead of the openHAB system?

tcp        0      0 192.168.0.20:57942      192.168.0.30:443        ESTABLISHED 2838/(squid-1)      

No web socket to the .35 though.

And then when something changed (motion sensor):

03:30:46:696 Websocket 192.168.0.20:57942 send message: {"e":"changed","id":"9","r":"sensors","state":{"dark":false,"daylight":false,"lastupdated":"2020-07-07T01:30:46.629","lightlevel":16435,"lux":44},"t":"event","uniqueid":"00:15:8d:00:04:51:f6:c7-01-0400"} (ret = -1095554056)
03:30:46:771 Websocket 192.168.0.20:57942 send message: {"e":"changed","id":"8","r":"sensors","state":{"lastupdated":"2020-07-07T01:30:46.712","presence":true},"t":"event","uniqueid":"00:15:8d:00:04:51:f6:c7-01-0406"} (ret = -1095554056)

Looks like the reason why openHAB never gets notifications is because deCONZ never opens a websocket into that direction. The question is: why?

@SwoopX
Copy link
Collaborator

SwoopX commented Jul 7, 2020

I suppose you had Phoscon open on your workstation at that time?

Looks like the reason why openHAB never gets notifications is because deCONZ never opens a websocket into that direction. The question is: why?

No, that's not the question. It's exactly the other way around. The client has to initialize the connection, not the server. Btw, Phoscon is just another REST API client for deconz.

@andreasscherbaum
Copy link
Contributor Author

I suppose you had Phoscon open on your workstation at that time?

Crap, forgot about this one, but had to monitor the sensors and see if they are recognized.
Ok, this explains the connection to .20 then.

Looks like I need to see why openHAB is not even trying to open a connection.
Will debug this more.

@andreasscherbaum
Copy link
Contributor Author

I will not have time for the next couple days, but if openHAB does not open a websocket that is not really a problem for this issue anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants