Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[velux] hub discovery; representation properties; socket lock up issues #8777

Merged
merged 74 commits into from
Dec 1, 2020

Conversation

andrewfg
Copy link
Contributor

@andrewfg andrewfg commented Oct 17, 2020

This PR implements a number of improvements to the Velux binding.

1. The binding could not auto- discover Velux hubs

A new VeluxBridgeFinder class has been added to discover Velux KLF200 hubs. Note: the Velux mDNS hub discovery is "flaky" under JmDNS, and there are issues with MDNSDiscoveryService, so this enhancement implements its own custom mDNS service resolver.

The ReadMe.md file has been updated to say that auto- discovery is supported on hubs.

2. The binding had missing or wrong representation property names

The wrong representation property names meant that re-discovered Things were not properly being matched up with already discovered Things in the Inbox.

These errors and omissions have been corrected.

3. The binding suffered from lock-ups in its TCP communication to the hub

If the binding went offline via a catastrophic failure, the Velux hub might be left with a large amount of un-sent (or un-acknowledged) data in its write buffer, and its side of the socket might get into a "zombie" state. This would prevent the binding from re-connecting again when the binding was restarted. Note: This seems to be a bug in the hub, which I have reported to Velux, but so far without response from them..

The following has been implemented:

  • Applied a connect time out when the socket is created
  • Applied a socket read timeout
  • Applied a socket keepalive
  • Implemented the Closeable interface in a number of places, so shutdown code gets called if the class instance gets destroyed.

Note: These changes ameliorate the problem, but possibly do not completely eliminate it, since the underlying cause lies within the hub itself.

4. The binding would block on read calls

The binding used blocking socket read calls to fetch data from the hub. This meant that the whole binding could sit in a blocked state for tens of seconds or until incoming data arrived.

The following has been implemented:

  • Implemented read time outs (see item 3. above)
  • Implemented an asynchronous message queue to decouple the binding upper levels from the lower level communication stack. There is now a separate thread that polls the socket and pushes messages onto the queue. And therefore the binding can now asynchronously pop the messages from the queue in its own separate thread(s).
  • The use of an asynchronous queue allows a Thread.sleep(500) call to be removed. Thus improving response time by 0.5 seconds per message.

5. The binding wrongly used the available() method on its SSL socket

A little-known "feature" of SSL sockets is that available()always returns 0 on them. So the binding's "receive-only" calls were never actually processing incoming messages, and the data would therefore backlog in the read buffer (see item 3. above) until the next "send" call could be made.

The asynchronous message queue (see item 4.) also solves this problem, since available() can now check for entries in the queue rather than (not) checking for data in the socket buffer. This could shorten response times by 10-50 seconds.

6. The bridgeDirectCommunicate() method was not transparent

While writing, testing and debugging the above changes, I felt the bridgeDirectCommunicate() method to be written in a hard to understand form, having two nested while() loops, labelled break statements, and multiple levels of continue and break statements. Furthermore the logger messages were hard for an outsider (like me) to understand.

The bridgeDirectCommunicate() method has been refactored to flatten the loop structure, simplify the functioning of break and continue statements, and clarify the logger messages.

7. The task scheduler was not executing in the right order

The task scheduler was based on a multi- thread pool, but since the tasks were anyway constrained though the single IO socket pipeline, it meant that tasks would sometimes execute in the wrong order.

The task scheduler has been changed from a multi- thread pool to a single thread executor.

8. The OH UI was not being updated fast enough

The UI would display the old position of actuators until the actuator had finally stopped moving in a new position.

The UI is now updated immediately to display the target position as soon as the actuator starts to move in that direction.

9. User request: the remote reboot feature should be implemented

See this issue

A ThingAction for remote reboot has been implemented.

10. User request: the binding should indicate if windows are manually opened

Users on the community forum requested that when windows are manually opened, the binding should inform them. Furthermore, see this issue

The window position is now displayed as UNDEF when manually opened. And in the case that position is invalid, yet target position is valid, the window position is taken from the target position.

11. If the user changed Configuration Parameters via PaperUI they were not saved

Changes to Configuration Parameters are now saved.

12. The GW_SET_NODE_VELOCITY_REQ/CFM command set has been removed from the API

The GW_SET_NODE_VELOCITY_REQ/CFM command set has been removed from the KLF200 API specification.

The code in the binding that would have called this command set, is not called. And the respective classes have been marked as deprecated. Nevertheless these classes have NOT been deleted from the binding 'just in case' the function might be added back again in a future API release.

13. User request: Somfy actuators wrongly display undefined position

See this issue

Wherever possible if the current position is undefined, the actuator will display the target position instead. Furthermore the issue and a possible user work around are explained in the read me file.

@andrewfg andrewfg added bug An unexpected problem or unintended behavior of an add-on enhancement An enhancement or new feature for an existing add-on help wanted rebuild Triggers Jenkins PR build labels Oct 17, 2020
@TravisBuddy
Copy link

Travis tests were successful

Hey @andrewfg,
we found no major flaws with your code. Still you might want to look at this logfile, as we usually suggest some optional improvements.

@andrewfg
Copy link
Contributor Author

@gs4711 feedback appreciated

@andrewfg andrewfg added the work in progress A PR that is not yet ready to be merged label Oct 18, 2020
@andrewfg
Copy link
Contributor Author

Provisional Solution: for the Velux hubs, the ServiceInfo argument passed in by the OH core in its call to createResult() provides an empty getAddresses() list, so I cannot get the hub's IP address directly. Therefore I had to implement a workaround (below) that uses InetAddress.getByName() to resolve the mDNS textual host name into an IP address. I am not sure if using InetAddress.getByName() is the best way to do this. It seems to work on Windows and on OpenHabian. But I would appreciate feedback..

I discovered (no pun intended) that the Velux hub discovery is very "flaky" .. sometimes it works as it should, and sometimes not .. so this commit implements the "correct" way of getting the ipv4 address, and only if that fails does it fall back to my prior "work around" solution; this works under (almost) all circumstances.

@andrewfg andrewfg added rebuild Triggers Jenkins PR build and removed rebuild Triggers Jenkins PR build help wanted work in progress A PR that is not yet ready to be merged labels Oct 20, 2020
@andrewfg
Copy link
Contributor Author

Note that Jenkins is failing, but it seems to be unrelated to the code in this PR

image

image

@andrewfg andrewfg added the additional testing preferred The change works for the pull request author. A test from someone else is preferred though. label Oct 28, 2020
@gs4711
Copy link
Contributor

gs4711 commented Oct 29, 2020

I added a new MDNSDiscoveryParticipant derived class to discover Velux KLF200 hubs. Note: the Velux hub discovery is "flaky" so this enhancement has to use both mDNS and regular DNS (as a fallback) to resolve the IP address of the hub.

I also updated ReadMe.md to say that auto- discovery is supported on hubs.

Hello Andrew, thanks for pushing it further!

Currently starting from a greenfield setup, I do not see any discovery happening:

10:54:22.278 [INFO ] [ig.discovery.internal.PersistentInbox] - Added new thing 'velux:binding:org_openhab_binding_velux' to inbox.
10:54:22.279 [INFO ] [smarthome.event.InboxAddedEvent      ] - Discovery Result with UID 'velux:binding:org_openhab_binding_velux' has been added.
10:56:33.743 [INFO ] [smarthome.event.InboxRemovedEvent    ] - Discovery Result with UID 'velux:binding:org_openhab_binding_velux' has been removed.
10:56:33.779 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'velux:binding:org_openhab_binding_velux' changed from UNINITIALIZED to INITIALIZING
10:56:33.790 [INFO ] [.internal.handler.VeluxBindingHandler] - Initializing VeluxBindingHandler for 'velux:binding:org_openhab_binding_velux'.
10:56:33.805 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'velux:binding:org_openhab_binding_velux' changed from INITIALIZING to UNKNOWN
10:56:33.812 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'velux:binding:org_openhab_binding_velux' changed from UNKNOWN to ONLINE
10:56:33.816 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'velux:binding:org_openhab_binding_velux' changed from ONLINE to ONLINE (CONFIGURATION_PENDING): So far no bridge is defined. Please add a thing of type "Velux KLF200" to establish a connection to the gateway, which provides the prerequisite for further commissioning.
10:56:33.867 [INFO ] [smarthome.event.ItemStateChangedEvent] - velux_binding_org_openhab_binding_velux_information changed from NULL to So far no bridge is defined. Please add a thing of type "Velux KLF200" to establish a connection to the gateway, which provides the prerequisite for further commissioning.
10:56:33.920 [INFO ] [thome.event.ItemChannelLinkAddedEvent] - Link 'velux_binding_org_openhab_binding_velux_information-velux:binding:org_openhab_binding_velux:information' has been added.
2020-10-29 10:59:22.233 [TRACE] [rnal.discovery.VeluxDiscoveryService] - stopScan() called.
10:59:22.233 [TRACE] [ernal.discovery.VeluxDiscoveryService] - stopScan() called.
2020-10-29 10:59:22.239 [TRACE] [rnal.discovery.VeluxDiscoveryService] - stopScan() done.
10:59:22.239 [TRACE] [ernal.discovery.VeluxDiscoveryService] - stopScan() done.

Is there anything, I have to declare before?

@andrewfg
Copy link
Contributor Author

andrewfg commented Oct 29, 2020

Currently starting from a greenfield setup, I do not see any discovery happening:
Is there anything, I have to declare before?

Guenther, you don't need to do anything special .. but it often takes some time to discover the hub(s).

The KLF advertises itself runni9ng a service under _http._tcp. and obviously you need to have mDNS available on you LAN (your router or switch might need to have "mDNS pass through, or mDNS caching" enabled).

You can check if the KLF is adverising itself by means of an mDNS browser app (sometimes known as Bonjour browser) -- see below a couple of screenshots from apps on Apple and Android devices.

apple

android

@gs4711
Copy link
Contributor

gs4711 commented Oct 29, 2020

Sorry for the inconvenience, but I'm recognizing completely loss of connectivity ...

2020-10-29 19:45:07.261 [TRACE] [internal.handler.utils.ThingProperty] - setValue() connectionAttempt set to Thu Oct 29 19:44:35 CET 2020.
2020-10-29 19:45:07.263 [TRACE] [internal.handler.utils.ThingProperty] - setValue() connectionSuccess set to Thu Oct 29 13:26:14 CET 2020.

The latest successful steps are (with some lines of failure):

2020-10-29 13:26:13.580 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(BCP: Get Bridge Device Status,authenticated) called.
2020-10-29 13:26:13.581 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(Get Bridge Device Status, authenticated) on 192.168.45.9 called.
2020-10-29 13:26:13.582 [DEBUG] [ternal.bridge.slip.SCgetDeviceStatus] - getRequestCommand() returns GW_GET_STATE_REQ (0xc).
2020-10-29 13:26:13.582 [TRACE] [ternal.bridge.slip.SCgetDeviceStatus] - getRequestDataAsArrayOfBytes() returns data.
2020-10-29 13:26:13.584 [DEBUG] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(GW_GET_STATE_REQ, authenticated) on 192.168.45.9 initiated by Thread[OH-velux-1,5,main].
2020-10-29 13:26:13.585 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: working on request GW_GET_STATE_REQ with 0 bytes of data.
2020-10-29 13:26:13.586 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - SlipEncoding(constructor) for command 0xc with data size 0 called.
2020-10-29 13:26:13.587 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - SlipEncoding(constructor) successfully initialized, storing bytes: 00 03 00 0C 0F.
2020-10-29 13:26:13.588 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: transportEncoding=00 03 00 0C 0F.
2020-10-29 13:26:13.589 [TRACE] [ternal.bridge.slip.utils.SlipRFC1055] - encode() for data size 5 called.
2020-10-29 13:26:13.590 [TRACE] [ternal.bridge.slip.utils.SlipRFC1055] - encode() provides transfer encoding: C0 00 03 00 0C 0F C0.
2020-10-29 13:26:13.591 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: sending 7 bytes.
2020-10-29 13:26:13.591 [INFO ] [internal.bridge.slip.SlipVeluxBridge] - Sending command GW_GET_STATE_REQ.
2020-10-29 13:26:13.592 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: called.
2020-10-29 13:26:13.593 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: sending packet with 7 bytes: C0 00 03 00 0C 0F C0
2020-10-29 13:26:13.594 [TRACE] [nternal.bridge.slip.io.SSLconnection] - send() called, writing 7 bytes.
2020-10-29 13:26:13.595 [TRACE] [nternal.bridge.slip.io.SSLconnection] - flushReadBuffer() called.
2020-10-29 13:26:13.596 [TRACE] [nternal.bridge.slip.io.SSLconnection] - send() finished after having send 7 bytes: C0 00 03 00 0C 0F C0 
2020-10-29 13:26:13.597 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 500 msecs.
2020-10-29 13:26:14.098 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: receiving bytes.
2020-10-29 13:26:14.099 [TRACE] [nternal.bridge.slip.io.SSLconnection] - receive() called.
2020-10-29 13:26:14.476 [TRACE] [nternal.bridge.slip.io.SSLconnection] - receive() finished after having read 13 bytes: C0 00 09 00 0D 02 00 00 00 00 00 06 C0 
2020-10-29 13:26:14.478 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: received packet with 13 bytes: C0 00 09 00 0D 02 00 00 00 00 00 06 C0
2020-10-29 13:26:14.480 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: finished.
2020-10-29 13:26:14.482 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: received packet C0 00 09 00 0D 02 00 00 00 00 00 06 C0.
2020-10-29 13:26:14.484 [TRACE] [ternal.bridge.slip.utils.SlipRFC1055] - decode() for packet size 13 called.
2020-10-29 13:26:14.486 [TRACE] [ternal.bridge.slip.utils.SlipRFC1055] - decode() provides payload: 00 09 00 0D 02 00 00 00 00 00 06.
2020-10-29 13:26:14.488 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - SlipEncoding(constructor) called for decoding a packet with size 11.
2020-10-29 13:26:14.489 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - getCommand() returns 0x0D .
2020-10-29 13:26:14.491 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - getData() returns 6 bytes: 02 00 00 00 00 00.
2020-10-29 13:26:14.493 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - SlipEncoding(constructor) successfully initialized with command 0x0D and data 02 00 00 00 00 00.
2020-10-29 13:26:14.495 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - getCommand() returns 0x0D .
2020-10-29 13:26:14.497 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - getData() returns 6 bytes: 02 00 00 00 00 00.
2020-10-29 13:26:14.499 [DEBUG] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: working on response GW_GET_STATE_CFM with 6 bytes of data.
2020-10-29 13:26:14.501 [INFO ] [internal.bridge.slip.SlipVeluxBridge] - Received answer GW_GET_STATE_CFM.
2020-10-29 13:26:14.503 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: passes back command 0xd and data 02 00 00 00 00 00.
2020-10-29 13:26:14.504 [DEBUG] [ternal.bridge.slip.SCgetDeviceStatus] - setResponse(GW_GET_STATE_CFM with 6 bytes of data) called.
2020-10-29 13:26:14.506 [TRACE] [ternal.bridge.slip.SCgetDeviceStatus] - setResponse(): handling response GW_GET_STATE_CFM (0xd).
2020-10-29 13:26:14.508 [TRACE] [ternal.bridge.slip.SCgetDeviceStatus] - isLengthValid() called for GW_GET_STATE_CFM (0xd) with 6 bytes of data.
2020-10-29 13:26:14.510 [TRACE] [ternal.bridge.slip.SCgetDeviceStatus] - isLengthValid() returns true.
2020-10-29 13:26:14.513 [TRACE] [ternal.bridge.slip.SCgetDeviceStatus] - setResponse(): finished=true,success=true.
2020-10-29 13:26:14.515 [DEBUG] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(GW_GET_STATE_REQ) on 192.168.45.9: returns success.
2020-10-29 13:26:14.517 [TRACE] [g.velux.internal.things.VeluxGwState] - VeluxGwState() created.
2020-10-29 13:26:14.519 [TRACE] [ternal.bridge.slip.SCgetDeviceStatus] - getState() returns GW_S_GWM/GW_SS_IDLE (Gateway mode, with one or more actuator nodes in the system table., Idle state.).
2020-10-29 13:26:14.521 [TRACE] [ernal.bridge.VeluxBridgeDeviceStatus] - retrieve() finished successfully with result Gateway mode, with one or more actuator nodes in the system table., Idle state..
2020-10-29 13:26:14.523 [TRACE] [internal.handler.ChannelBridgeStatus] - handleRefresh() returns GW_S_GWM/GW_SS_IDLE.
2020-10-29 13:26:14.525 [DEBUG] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(): updating channel velux:klf200:49e8997e:status to GW_S_GWM/GW_SS_IDLE.
2020-10-29 13:26:14.528 [TRACE] [internal.handler.utils.ThingProperty] - setValue() connectionAttempt set to Thu Oct 29 13:26:14 CET 2020.
2020-10-29 13:26:14.531 [TRACE] [internal.handler.utils.ThingProperty] - setValue() connectionSuccess set to Thu Oct 29 13:26:14 CET 2020.
2020-10-29 13:26:14.533 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(Thread[OH-velux-1,5,main]) done.
2020-10-29 13:26:14.535 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommand.scheduled(Thread[OH-velux-1,5,main]) done.
2020-10-29 13:26:14.535 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(Thread[OH-velux-3,5,main]): command REFRESH on channel velux:rollershutter:49e8997e:DG_O_Shutter:position.
2020-10-29 13:26:14.537 [DEBUG] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(velux:rollershutter:49e8997e:DG_O_Shutter:position,REFRESH) called.
2020-10-29 13:26:14.542 [TRACE] [binding.velux.internal.VeluxItemType] - getByThingAndChannel(velux:rollershutter,position) returns enum velux:rollershutter/position.
2020-10-29 13:26:14.544 [TRACE] [nternal.things.VeluxExistingProducts] - isDirty() returns false.
2020-10-29 13:26:14.547 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(): work on refresh.
2020-10-29 13:26:14.549 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(): refreshing item velux:rollershutter:49e8997e:DG_O_Shutter:position (type velux:rollershutter/position).
2020-10-29 13:26:14.551 [DEBUG] [rnal.handler.ChannelActuatorPosition] - handleRefresh(velux:rollershutter:49e8997e:DG_O_Shutter:position,position,org.openhab.binding.velux.internal.handler.VeluxBridgeHandler@16b5145) called.
2020-10-29 13:26:14.553 [TRACE] [nternal.things.VeluxExistingProducts] - getNoMembers() returns 5.
2020-10-29 13:26:14.555 [TRACE] [rnal.handler.ChannelActuatorPosition] - handleRefresh(): there are some existing products.
2020-10-29 13:26:14.558 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeAPI() called.
2020-10-29 13:26:14.560 [TRACE] [ux.internal.bridge.slip.SCgetProduct] - setProductId(1) called.
2020-10-29 13:26:14.562 [TRACE] [ng.velux.internal.bridge.VeluxBridge] - bridgeCommunicate(Retrieve Product) called.
2020-10-29 13:26:14.564 [TRACE] [ng.velux.internal.bridge.VeluxBridge] - isAuthenticated() returns true.
2020-10-29 13:26:14.565 [TRACE] [ng.velux.internal.bridge.VeluxBridge] - bridgeCommunicate(Retrieve Product,authenticated) called.
2020-10-29 13:26:14.566 [TRACE] [ng.velux.internal.bridge.VeluxBridge] - isAuthenticated() returns true.
2020-10-29 13:26:14.567 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(BCP: Retrieve Product,authenticated) called.
2020-10-29 13:26:14.568 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(Retrieve Product, authenticated) on 192.168.45.9 called.
2020-10-29 13:26:14.569 [DEBUG] [ux.internal.bridge.slip.SCgetProduct] - getRequestCommand() returns GW_GET_NODE_INFORMATION_REQ (0x200).
2020-10-29 13:26:14.570 [TRACE] [ux.internal.bridge.slip.SCgetProduct] - getRequestDataAsArrayOfBytes() returns data for retrieving node with id 1.
2020-10-29 13:26:14.572 [DEBUG] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(GW_GET_NODE_INFORMATION_REQ, authenticated) on 192.168.45.9 initiated by Thread[OH-velux-3,5,main].
2020-10-29 13:26:14.573 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: working on request GW_GET_NODE_INFORMATION_REQ with 1 bytes of data.
2020-10-29 13:26:14.574 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - SlipEncoding(constructor) for command 0x200 with data size 1 called.
2020-10-29 13:26:14.575 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - SlipEncoding(constructor) successfully initialized, storing bytes: 00 04 02 00 01 07.
2020-10-29 13:26:14.576 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: transportEncoding=00 04 02 00 01 07.
2020-10-29 13:26:14.578 [TRACE] [ternal.bridge.slip.utils.SlipRFC1055] - encode() for data size 6 called.
2020-10-29 13:26:14.579 [TRACE] [ternal.bridge.slip.utils.SlipRFC1055] - encode() provides transfer encoding: C0 00 04 02 00 01 07 C0.
2020-10-29 13:26:14.580 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: sending 8 bytes.
2020-10-29 13:26:14.581 [INFO ] [internal.bridge.slip.SlipVeluxBridge] - Sending command GW_GET_NODE_INFORMATION_REQ.
2020-10-29 13:26:14.582 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: called.
2020-10-29 13:26:14.583 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: sending packet with 8 bytes: C0 00 04 02 00 01 07 C0
2020-10-29 13:26:14.584 [TRACE] [nternal.bridge.slip.io.SSLconnection] - send() called, writing 8 bytes.
2020-10-29 13:26:14.585 [TRACE] [nternal.bridge.slip.io.SSLconnection] - flushReadBuffer() called.
2020-10-29 13:26:14.586 [TRACE] [nternal.bridge.slip.io.SSLconnection] - send() finished after having send 8 bytes: C0 00 04 02 00 01 07 C0 
2020-10-29 13:26:14.588 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 500 msecs.
2020-10-29 13:26:15.089 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: receiving bytes.
2020-10-29 13:26:15.090 [TRACE] [nternal.bridge.slip.io.SSLconnection] - receive() called.
2020-10-29 13:26:15.592 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: read timeout.
2020-10-29 13:26:15.594 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 500 msecs.
2020-10-29 13:26:16.097 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:16.099 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:16.101 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:16.104 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:16.110 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:16.113 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 1000 msecs.
2020-10-29 13:26:17.115 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:17.117 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:17.119 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:17.122 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:17.233 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:17.235 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 2000 msecs.
2020-10-29 13:26:19.238 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:19.240 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:19.242 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:19.245 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:19.993 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:19.995 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 4000 msecs.
2020-10-29 13:26:23.998 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:23.999 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:24.000 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:24.001 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:24.660 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:24.662 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 8000 msecs.
2020-10-29 13:26:32.664 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:32.667 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:32.669 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:32.671 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:33.310 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:33.312 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 16000 msecs.
2020-10-29 13:26:49.315 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: socket I/O failed 5 times.
2020-10-29 13:26:49.552 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: shutting down connection.
2020-10-29 13:26:49.553 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: finishes with failure by throwing exception.
2020-10-29 13:26:49.555 [WARN ] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: connection.io returns raised a non-recoverable error during connection setup: Connection refused (Connection refused)
2020-10-29 13:26:49.558 [DEBUG] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(GW_GET_NODE_INFORMATION_REQ) on 192.168.45.9: returns failure.
2020-10-29 13:26:49.560 [TRACE] [rnal.handler.ChannelActuatorPosition] - handleRefresh() returns null.
2020-10-29 13:26:49.562 [INFO ] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(velux:rollershutter:49e8997e:DG_O_Shutter:position,REFRESH): updating of item velux:rollershutter:49e8997e:DG_O_Shutter:position (type velux:rollershutter/position) failed.
2020-10-29 13:26:49.565 [TRACE] [internal.handler.utils.ThingProperty] - setValue() connectionAttempt set to Thu Oct 29 13:26:14 CET 2020.
2020-10-29 13:26:49.567 [TRACE] [internal.handler.utils.ThingProperty] - setValue() connectionSuccess set to Thu Oct 29 13:26:14 CET 2020.
2020-10-29 13:26:49.569 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(Thread[OH-velux-3,5,main]) done.
2020-10-29 13:26:49.571 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommand.scheduled(Thread[OH-velux-3,5,main]) done.
2020-10-29 13:26:49.571 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(Thread[OH-velux-5,5,main]): command REFRESH on channel velux:rollershutter:49e8997e:DG_M_Shutter:position.
2020-10-29 13:26:49.573 [DEBUG] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(velux:rollershutter:49e8997e:DG_M_Shutter:position,REFRESH) called.
2020-10-29 13:26:49.577 [TRACE] [binding.velux.internal.VeluxItemType] - getByThingAndChannel(velux:rollershutter,position) returns enum velux:rollershutter/position.
2020-10-29 13:26:49.579 [TRACE] [nternal.things.VeluxExistingProducts] - isDirty() returns false.
2020-10-29 13:26:49.581 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(): work on refresh.
2020-10-29 13:26:49.582 [TRACE] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(): refreshing item velux:rollershutter:49e8997e:DG_M_Shutter:position (type velux:rollershutter/position).
2020-10-29 13:26:49.584 [DEBUG] [rnal.handler.ChannelActuatorPosition] - handleRefresh(velux:rollershutter:49e8997e:DG_M_Shutter:position,position,org.openhab.binding.velux.internal.handler.VeluxBridgeHandler@16b5145) called.
2020-10-29 13:26:49.586 [TRACE] [nternal.things.VeluxExistingProducts] - getNoMembers() returns 5.
2020-10-29 13:26:49.588 [TRACE] [rnal.handler.ChannelActuatorPosition] - handleRefresh(): there are some existing products.
2020-10-29 13:26:49.589 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeAPI() called.
2020-10-29 13:26:49.591 [TRACE] [ux.internal.bridge.slip.SCgetProduct] - setProductId(0) called.
2020-10-29 13:26:49.596 [TRACE] [ng.velux.internal.bridge.VeluxBridge] - bridgeCommunicate(Retrieve Product) called.
2020-10-29 13:26:49.598 [TRACE] [ng.velux.internal.bridge.VeluxBridge] - isAuthenticated() returns true.
2020-10-29 13:26:49.600 [TRACE] [ng.velux.internal.bridge.VeluxBridge] - bridgeCommunicate(Retrieve Product,authenticated) called.
2020-10-29 13:26:49.601 [TRACE] [ng.velux.internal.bridge.VeluxBridge] - isAuthenticated() returns true.
2020-10-29 13:26:49.603 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(BCP: Retrieve Product,authenticated) called.
2020-10-29 13:26:49.605 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(Retrieve Product, authenticated) on 192.168.45.9 called.
2020-10-29 13:26:49.607 [DEBUG] [ux.internal.bridge.slip.SCgetProduct] - getRequestCommand() returns GW_GET_NODE_INFORMATION_REQ (0x200).
2020-10-29 13:26:49.609 [TRACE] [ux.internal.bridge.slip.SCgetProduct] - getRequestDataAsArrayOfBytes() returns data for retrieving node with id 0.
2020-10-29 13:26:49.611 [DEBUG] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(GW_GET_NODE_INFORMATION_REQ, authenticated) on 192.168.45.9 initiated by Thread[OH-velux-5,5,main].
2020-10-29 13:26:49.613 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: working on request GW_GET_NODE_INFORMATION_REQ with 1 bytes of data.
2020-10-29 13:26:49.615 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - SlipEncoding(constructor) for command 0x200 with data size 1 called.
2020-10-29 13:26:49.617 [TRACE] [ernal.bridge.slip.utils.SlipEncoding] - SlipEncoding(constructor) successfully initialized, storing bytes: 00 04 02 00 00 06.
2020-10-29 13:26:49.619 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: transportEncoding=00 04 02 00 00 06.
2020-10-29 13:26:49.621 [TRACE] [ternal.bridge.slip.utils.SlipRFC1055] - encode() for data size 6 called.
2020-10-29 13:26:49.623 [TRACE] [ternal.bridge.slip.utils.SlipRFC1055] - encode() provides transfer encoding: C0 00 04 02 00 00 06 C0.
2020-10-29 13:26:49.624 [TRACE] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: sending 8 bytes.
2020-10-29 13:26:49.626 [INFO ] [internal.bridge.slip.SlipVeluxBridge] - Sending command GW_GET_NODE_INFORMATION_REQ.
2020-10-29 13:26:49.628 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: called.
2020-10-29 13:26:49.630 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:49.632 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:49.633 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:49.636 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:49.749 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:49.751 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 500 msecs.
2020-10-29 13:26:50.253 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:50.255 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:50.257 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:50.259 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:50.635 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:50.637 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 1000 msecs.
2020-10-29 13:26:51.639 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:51.641 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:51.643 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:51.645 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:51.895 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:51.897 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 2000 msecs.
2020-10-29 13:26:53.899 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:53.901 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:53.903 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:53.906 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:54.392 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:54.394 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 4000 msecs.
2020-10-29 13:26:58.396 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:26:58.398 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:26:58.400 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:26:58.403 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:26:58.987 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:26:58.989 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 8000 msecs.
2020-10-29 13:27:06.991 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: connecting to port 51200
2020-10-29 13:27:06.993 [DEBUG] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(192.168.45.9,51200) called.
2020-10-29 13:27:07.000 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Starting velux bridge connection.
2020-10-29 13:27:07.003 [TRACE] [nternal.bridge.slip.io.SSLconnection] - SSLconnection(): creating socket...
2020-10-29 13:27:07.027 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: Exception occurred during I/O: raised a non-recoverable error during connection setup: Connection refused (Connection refused).
2020-10-29 13:27:07.029 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: wait time 16000 msecs.
2020-10-29 13:27:23.031 [INFO ] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: socket I/O failed 5 times.
2020-10-29 13:27:23.033 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: shutting down connection.
2020-10-29 13:27:23.035 [TRACE] [x.internal.bridge.slip.io.Connection] - io() on 192.168.45.9: finishes with failure by throwing exception.
2020-10-29 13:27:23.037 [WARN ] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate() on 192.168.45.9: connection.io returns raised a non-recoverable error during connection setup: Connection refused (Connection refused)
2020-10-29 13:27:23.039 [DEBUG] [internal.bridge.slip.SlipVeluxBridge] - bridgeDirectCommunicate(GW_GET_NODE_INFORMATION_REQ) on 192.168.45.9: returns failure.
2020-10-29 13:27:23.040 [TRACE] [rnal.handler.ChannelActuatorPosition] - handleRefresh() returns null.
2020-10-29 13:27:23.042 [INFO ] [.internal.handler.VeluxBridgeHandler] - handleCommandScheduled(velux:rollershutter:49e8997e:DG_M_Shutter:position,REFRESH): updating of item velux:rollershutter:49e8997e:DG_M_Shutter:position (type velux:rollershutter/position) failed.
2020-10-29 13:27:23.044 [TRACE] [internal.handler.utils.ThingProperty] - setValue() connectionAttempt set to Thu Oct 29 13:26:49 CET 2020.
2020-10-29 13:27:23.046 [TRACE] [internal.handler.utils.ThingProperty] - setValue() connectionSuccess set to Thu Oct 29 13:26:14 CET 2020.

The bridge is still working but no longer responding on the API.

@gs4711
Copy link
Contributor

gs4711 commented Oct 29, 2020

I forgot: After restarting the binding, everything is fine, again. Therefore, there is still bug within the communication of the binding.

@andrewfg
Copy link
Contributor Author

andrewfg commented Oct 29, 2020

Hi @gs4711 from your logs, it looks like you are testing an earlier build.

Depending on which version of OH you are testing on, the latest jar files are here. Note that the v2.5.x version is called org.openhab.binding.velux-2.5.10-SNAPSHOT but this is a later SNAPSHOT than the one officially released from openhab.org.

@gs4711
Copy link
Contributor

gs4711 commented Oct 30, 2020

the latest jar files are here

Thanks. With the new jar-file it seems to work very smoothly. Digging into the advertisments, I can recognize the KLF, but it is not recognized by the binding:

Bildschirmfoto 2020-10-30 um 19 31 01

Any idea?

@andrewfg
Copy link
Contributor Author

andrewfg commented Oct 30, 2020

Digging into the advertisments, I can recognize the KLF, but it is not recognized by the binding:
Any idea?

Guenther, I am not 100% sure what is going wrong, but as far as I can tell, the KLF is not fully compliant with the mDNS specifications: it seems that sometimes its mDNS response contains both a host name VELUX_KLF_LAN_XXX and an IP address 192.168.x.xxx, but sometimes it seems to only return the hostname. In the case that it does not return an IP address, I am also asking regular DNS if it has a lookup for the IP address of the respective host name. However if both mDNS and regular DNS are unable to resolve the host name to an IP address then the discovery fails. => Can you please check if your mDNS browser resolves an IP address (your screen shot did not show anthing)?

PS I think it may also depend on your network router etc. -- some routers and some clients on the network, implement an mDNS cache and mDNS forwarding. If that is the case then the forwarder may resolve the IP address from its cache, even if the KLF is currently not showing its IP address. Or something like that.

Another idea, is that instead of putting the IP address in the discovered hub's Configuration Parameter, we could perhaps put the host name in there. That would postpone the IP address resolution problem from the time of discovery to the time of connection. But I am not sure if that helps very much. => What do you think?

@gs4711
Copy link
Contributor

gs4711 commented Oct 31, 2020 via email

@andrewfg
Copy link
Contributor Author

andrewfg commented Oct 31, 2020

from my point of view the announcements of the KLF200 looks good:

Guenther, sorry but I did not really understand you: do you mean a) it looks good in OH, or b) it looks good in your mDNS browser but not in OH?

image

@gs4711
Copy link
Contributor

gs4711 commented Oct 31, 2020 via email

@andrewfg
Copy link
Contributor Author

The initial discovery seems to work well … but the ip resolution fails within java code - whereas other tools provide the correct address as well.

Correct.

The OH MDNS discovery service is based on the JMDNS Java library (partly written by @kaikreuzer). So my conclusion is that either a) the KLF is not compliant to the mDNS specification, or b) the the JMDNS library is not compliant to the specification. But -- just to complicate things further -- the two things do sometime work together and resolve the IP address. It is very strange.

PS I was thinking about writing my own mDNS resolver, or alternatively modifying the the JMDNS library. But that would be a lot of work, and it goes way beyond this PR. :)

Copy link
Contributor

@gs4711 gs4711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect extension. Minor issues with discovery (probably either device malfunctions or common library misimplementation).

@andrewfg
Copy link
Contributor Author

andrewfg commented Nov 1, 2020

I am going to try to fix the mDNS discovery issues. So I will set the WIP flag for a couple more days.

@andrewfg
Copy link
Contributor Author

andrewfg commented Nov 1, 2020

@cpmeister
Copy link
Contributor

No you have a real build error.

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project org.openhab.binding.velux: Compilation failure
[ERROR] /home/jenkins/jenkins-agent1/workspace/PR-openHAB-Addons@2/bundles/org.openhab.binding.velux/src/main/java/org/openhab/binding/velux/internal/bridge/slip/io/DataInputStreamWithTimeout.java:[113,37] Null type mismatch (type annotations): required 'java.lang.@NonNull String' but this expression has type 'java.lang.@Nullable String'

@andrewfg
Copy link
Contributor Author

andrewfg commented Dec 1, 2020

DataInputStreamWithTimeout.java:[113,37] Null type mismatch (type annotations): required 'java.lang.@nonnull String' but this expression has type 'java.lang.@nullable String'

That's weird. The error is on IOException.getMessage() which I think should always return NonNull. Furthermore it compiles without error locally on my machine both in Eclipse and MVN. Does your system rely on a different dependency version of IOException ??

image

@cpmeister
Copy link
Contributor

There have been recent changes to the null checker that has implicitly added the annotations to the standard library. In general you would be correct, that getMessage would be non null. But looking strictly at the standard library that field is allow to be null, hence the build error. All you need to do is change the code to take that into account.

Your local build probably succeeds because you haven't merged the main branch recently. I don't suggest you do since it would be more of a headache than it is worth atm, but just make the null checks to please jenkins for now.

Signed-off-by: Andrew Fiddian-Green <[email protected]>
@andrewfg andrewfg added rebuild Triggers Jenkins PR build and removed rebuild Triggers Jenkins PR build labels Dec 1, 2020
@andrewfg
Copy link
Contributor Author

andrewfg commented Dec 1, 2020

Ok the build worked this time.

@cpmeister cpmeister merged commit 072113f into openhab:main Dec 1, 2020
@cpmeister cpmeister added this to the 3.0.0.M5 milestone Dec 1, 2020
@andrewfg
Copy link
Contributor Author

andrewfg commented Dec 1, 2020 via email

@EjvindHald
Copy link

EjvindHald commented Dec 20, 2020

Hi Andrew
You wrote
If the binding went offline via a catastrophic failure, the Velux hub might be left with a large amount of un-sent (or un-acknowledged) data in its write buffer, and its side of the socket might get into a "zombie" state. This would prevent the binding from re-connecting again when the binding was restarted. Note: This seems to be a bug in the hub, which I have reported to Velux, but so far without response from them..

I have just experienced that after the binding has been running without problems for one month. As I could not recennect, the option for reboot did not work.
If you talk to Velux supprt, then let them know, that there are more people having this problem. Thx.

@andrewfg
Copy link
Contributor Author

@EjvindHald thank you for the feedback. By contrast my binding has been running perfectly. :)

@andrewfg
Copy link
Contributor Author

andrewfg commented Dec 20, 2020

PS are you using the v3 or v2.5 build? The v3.0.0 of the binding had some further TCP socket improvements beyond what I put in the last v2.5 build.

@EjvindHald
Copy link

Yes, your stuff is working very good. I am using 2.5.

Do you know if this is accumulating problem with the hub, which could be solved by a weekly automatic reboot?

@andrewfg
Copy link
Contributor Author

Do you know if this is accumulating problem with the hub

I have seen no evidence of that.

How does your KLF200 get its IP address? Is it static, or via DHCP? If the latter, I wonder if there might be an issue with DHCP renewals? Perhaps setting the IP address as ‘reserved’ in your DHCP server would help? Just an idea...

@EjvindHald
Copy link

It has a static IP address.

@andrewfg
Copy link
Contributor Author

Ok. I have no more ideas. Let us see how OH3 goes..

@EjvindHald
Copy link

So the KLF200 died again after only after 2 days, so it is not an accumulating thing. Instead, it happens after a reboot of OH 2.5.10, which I am using. Same pattern as last time. When OH comes up again after reboot it cannot access the KLF200. Power recycle of KLF200 is the only solution, when this happens.

I hope to be on OH 3 in a few days. I am currently testing the compability.

@andrewfg
Copy link
Contributor Author

andrewfg commented Dec 22, 2020

@EjvindHald a plain reboot is a "hard" reboot, that does not let OH shutdown cleanly, so it is very likely to leave the sockets in a zombie state. You need to isssue a stop command to the openhab service first, and let that complete before you issue the reboot command.

@EjvindHald
Copy link

I am simply restarting the Docker container which is running OH. I would expect that to be the same as restarting the OH service as you write and not a reboot of an OS.

@andrewfg
Copy link
Contributor Author

andrewfg commented Dec 23, 2020

I suggest the following

OH2

sudo systemctl stop openhab2.service
sudo reboot

OH3

sudo systemctl stop openhab.service
sudo reboot

would expect that to be the same as restarting the OH service

PS I am interested to know what makes you expect that?

@EjvindHald
Copy link

EjvindHald commented Dec 25, 2020

Docker is a way of virtualize a service. So normally a service like openhab equals one container. Therefore, a restart of a container would as far as I can see be the same as your "systemctl stop openhab.service" which is used for stopping an ordinary service.

When I stop the container, I get this in the openhab log:

2020-12-25 16:44:38.925 [ERROR] [me.core.internal.events.EventHandler] - Dispatching/filtering event for subscriber 'org.eclipse.smarthome.core.events.EventSubscriber' failed: null

java.lang.NullPointerException: null

at org.openhab.core.events.internal.EventBridge.receive(EventBridge.java:139) ~[?:?]

at org.eclipse.smarthome.core.internal.events.EventHandler.lambda$0(EventHandler.java:155) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_275]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_275]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

2020-12-25 16:44:38.926 [INFO ] [io.openhabcloud.internal.CloudClient] - Shutting down openHAB Cloud service connection

2020-12-25 16:44:38.935 [ERROR] [me.core.internal.events.EventHandler] - Dispatching/filtering event for subscriber 'org.eclipse.smarthome.core.events.EventSubscriber' failed: null

java.lang.NullPointerException: null

at org.openhab.core.events.internal.EventBridge.receive(EventBridge.java:139) ~[?:?]

at org.eclipse.smarthome.core.internal.events.EventHandler.lambda$0(EventHandler.java:155) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_275]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_275]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

2020-12-25 16:44:38.938 [ERROR] [me.core.internal.events.EventHandler] - Dispatching/filtering event for subscriber 'org.eclipse.smarthome.core.events.EventSubscriber' failed: null

java.lang.NullPointerException: null

at org.openhab.core.events.internal.EventBridge.receive(EventBridge.java:147) ~[?:?]

at org.eclipse.smarthome.core.internal.events.EventHandler.lambda$0(EventHandler.java:155) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_275]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_275]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_275]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_275]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_275]

2020-12-25 16:44:38.954 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = 1f7d7507-b5e1-4552-9aec-174da4e81ec0, base URL = http://localhost:8079)

2020-12-25 16:44:39.027 [INFO ] [basic.internal.servlet.WebAppServlet] - Stopped Basic UI

I do not know why I get the Java errors for EventHandler. But I do not see any message saying Velux is shutdown or similar. This is openhab 2.5.11 - I still do not know the behaviour in 3.0

EDIT: Here is the messages from stop of OH3 container:
2020-12-25 17:11:25.349 [WARN ] [g.mqtt.handler.AbstractBrokerHandler] - Tried to unsubscribe org.openhab.binding.mqtt.homeassistant.internal.discovery.HomeAssistantDiscovery@69b555af from discovery topic +/+/$homie on broker mqtt:broker:ejvind but topic not registered at all. Check discovery logic!
2020-12-25 17:11:25.350 [WARN ] [g.mqtt.handler.AbstractBrokerHandler] - Tried to unsubscribe org.openhab.binding.mqtt.homeassistant.internal.discovery.HomeAssistantDiscovery@69b555af from discovery topic homeassistant/# on broker mqtt:broker:ejvind but topic not registered at all. Check discovery logic!
2020-12-25 17:11:25.351 [WARN ] [g.mqtt.handler.AbstractBrokerHandler] - Tried to unsubscribe org.openhab.binding.mqtt.homeassistant.internal.discovery.HomeAssistantDiscovery@69b555af from discovery topic +/+/$homie on broker mqtt:broker:ejvind but topic not registered at all. Check discovery logic!
2020-12-25 17:11:27.419 [INFO ] [ab.ui.habpanel.internal.HABPanelTile] - Stopped HABPanel
2020-12-25 17:11:27.500 [INFO ] [basic.internal.servlet.WebAppServlet] - Stopped Basic UI
2020-12-25 17:11:30.471 [INFO ] [org.openhab.ui.internal.UIService ] - Stopped UI

So still no Velux friendly sign off message....

EDIT2: There is a Velux graceful shutdown - it is working!

@andrewfg
Copy link
Contributor Author

@EjvindHald I don’t know anything about Docker, so I can’t really help you. But my gut feeling is that a thing like that has no way it could know how to shut down OH in a proper manner. And I have already given you my own suggestion above. But if you want more expert advice on the proper way with Docker, then I think you need to open a thread elsewhere where the Docker experts are likely to see it.

@EjvindHald
Copy link

@andrewfg : With your high expertise, you are probably right. I will try to figure out, if the openhab guys maybe can create the Docker openhab image differently, so it caters for this. The link is here and here.
If you can provide example openhab.log entries via an ordinary shutdown, it would be great. That is a "systemctl stop openhab.service" when openhab is installed in the traditional way.
Thanks for all your support!

@andrewfg
Copy link
Contributor Author

provide example openhab.log entries via an ordinary shutdown
@EjvindHald ..

2020-12-26 13:42:10.938 [INFO ] [.internal.handler.VeluxBridgeHandler] - Shutting down Velux Bridge 'velux:klf200:xxxx'.
2020-12-26 13:42:10.941 [INFO ] [nternal.bridge.slip.io.SSLconnection] - Shutting down Velux bridge connection.

@EjvindHald
Copy link

@andrewfg : Thanks for all your contribution and support.
It IS actually working correctly with a graceful shutdown of Velux binding in openhab 3.0 running in Docker. I had not installed the Velux binding in the example above, because I am in the process of moving to 3.0, so it was missing.
But the Velux binding is installed now, and working like a charm in Docker in OH3.

boehan pushed a commit to boehan/openhab-addons that referenced this pull request Apr 12, 2021
…es (openhab#8777)

* [velux] set explicit timeouts & keepalives on socket
* [velux] implement mdns service
* [velux] fix representation property names
* [velux] fix representation properties
* [velux] finalize mdns
* [velux] spotless
* [velux] use both mDNS and regular DNS to resolve ip addresses
* [velux] complete class rewrite using asynchronous polling thread
* [velux] refactor bridgeDirectCommunicate to simplify looping
* [velux] asynchronous polling means Thread.sleep no longer needed
* [velux] faster synch of actuator changes
* [velux] use single thread executor instead of thread pool
* [velux] faster synch of actuator changes
* [velux] shut down task executor

Signed-off-by: Andrew Fiddian-Green <[email protected]>
marcfischerboschio pushed a commit to bosch-io/openhab-addons that referenced this pull request May 5, 2022
…es (openhab#8777)

* [velux] set explicit timeouts & keepalives on socket
* [velux] implement mdns service
* [velux] fix representation property names
* [velux] fix representation properties
* [velux] finalize mdns
* [velux] spotless
* [velux] use both mDNS and regular DNS to resolve ip addresses
* [velux] complete class rewrite using asynchronous polling thread
* [velux] refactor bridgeDirectCommunicate to simplify looping
* [velux] asynchronous polling means Thread.sleep no longer needed
* [velux] faster synch of actuator changes
* [velux] use single thread executor instead of thread pool
* [velux] faster synch of actuator changes
* [velux] shut down task executor

Signed-off-by: Andrew Fiddian-Green <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on enhancement An enhancement or new feature for an existing add-on
Projects
None yet
6 participants