-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INSTEON: duplicated messages are processed #169
Comments
The longish discussion around this bug is here: http://misterhouse.10964.n7.nabble.com/INSTEON-mesh-network-tt17890.html After I posted my last comment there I started wondering, why don't I see two duplicate messages on my network? Everything is sent with max hops 1 to start with so I should see one duplicate message when the message is repeated. I just did a test and got this: 04/29/2013 08:36:31 AM [Insteon_PLM] DEBUG3: Received PLM raw data: 0250211195000001cb1300 04/29/2013 08:36:31 AM [Insteon_PLM] DEBUG4: Milliseconds 690.85 PLM Command: (0250) insteon_received From Address: 21:11:95 To Address: 00:00:01 Message Flags: cb Message Type: (110) All-Link Broadcast Message Message Length: (0) Standard Length Hops Left: 2 Max Hops: 3 Insteon Message: 1300 Cmd 1: (13) ALL-Link Alias 1 Low Cmd 2: 00 04/29/2013 08:36:31 AM [Insteon::BaseInterface] DEBUG2: Message received with 2 hops left, delaying next transmit to avoid collisions with remaining hops. Very odd that the first message I see has max hops 3 and that's the only time I see that message. It really seems like my PLM is filtering out retransmissions, so maybe Eloy's PLM is defective after all, or maybe some PLM models don't filter dupes? Mine is a 2413U v1.A dated late 2012 with firmware 9b. Eloy, what is your PLM model/date/firmware? You should see if there is newer firmware that can be installed on it. |
Hi Chris, On 04/29/2013 11:44 AM, cdragon wrote:
My PLM is a 2413U bought in July 2010. Firmware version 92. I would Note that I do not see (apparently) duplicate messages for my Cheers, Eloy Paris.- |
The Switchlinc above that isn't showing duplicate messages at the PLM is a late 2012 dual-band switchlinc. Unfortunately I don't have any dual-band kpls to see if they exhibit the behavior you're seeing. SmartLinc 2.0 does have field-upgradable firmware, but I'm not sure about the USB PLMs. You should search the internets to see! =) Note that I didn't see duplicate messages when using the SmartLinc either (also firmware 9b). |
My two cents and a few corrections.
** I suspect that this is the real cause of Eloy's problems. I will look at my network tonight and I imagine I will see the same thing. |
But even with no ACK, the broadcast should be sent a total of 4 times with max hops 3. Page 23 example 3 shows that happening. And even if that particular case can somehow be explained, I've never seen a duplicate message received in all the logs I've looked at, so something is filtering them. |
Going through my log this morning I found these:
I could probably find more. Although I appear to have fewer than Eloy does. That said, I do have a dual band device right next to the PLM, and I am unable to find duplicate messages from that device in my log. I can hammer it tonight and see if I can get any. My PLM is also rather new firmware = 9B. Moreover, like I said, I know I had a tremendous problem with duplicate messages when I worked on the hop_count problem. They would often cause errors in the ALDB link database stored by MH. These would result in bytes being repeated twice, for example an address 20.f5.10 would appear as 20.f5.f5. Many people reported seeing these as well, so I am relatively confident duplicate messages were a wide spread problem. The PLM may have some filtering capability, but if it does, it isn't perfect. |
I think you're right that the PLM tries to filter duplicates but doesn't always succeed. Even in your examples that show duplicates, there's a number of duplicates that got filtered, like where you see Hops left: 2 then Hops left: 0. The Hops left: 1 message isn't reported to MH but the PLM must have received it since all devices that have heard the message in the past will repeat the message again in each time slot with hops left - 1 until they reach hops left 0. Your second example is extremely strange since both copies of the message have hops of 3/1. Since the messages are potentially one second apart according to the time stamp, the second one would likely be a completely new broadcast of the same message starting with hops 3/3, 3/2, and the PLM only reports the 3/1 version. But since it's a broadcast message, the original device shouldn't have reason to transmit it again unless maybe the button on the device was double-pressed quickly. |
It looks like the Insteon White Paper was updated recently, it may only be a cosmetic thing, i certainly see a few new diagrams. Maybe a more astute observer can determine what is new: As to multiple messages arriving with the same hop counts. We have seen this before in working on the ALDB scan. Our assumption is that the bridge between the wired and rf signal is the cause of this. It is believed that at least one of the directions wired->rf or rf->wired does not cause a decrease in the hop count. Additionally, Insteon relies on the powerline carrier frequency for message timing and synchronization. Since most homes have two-phase electricity, rf repeating and phase bridging creates the very real possibility of two different message paths forming. UPDATE: The likelihood for duplicate messages paths increases as the hop count increases. My only concern with adding a filter function, is that the message timing in MH has never matched what Insteon claims it should be. By this I mean that the Insteon whitepaper quotes timings for how long it may take to receive an acknowledgement based on the number of hop counts. Unfortunately, this doesn't match real world experience. Possibly a delay in the PLM, the serial connection, MH, or in the end device itself causes some level of variability in how long it takes for messages to be acknowledged. This can be upwards of 3 or more seconds. |
On 04/30/2013 02:55 PM, krkeegan wrote: [...]
Perhaps this is the reason the PLM cannot successfully filter all Anyway, if trying to follow the theoretical INSTEON timing as a means to
Of course, it's possible for a device to generate two identical messages It'd be a lot easier if the protocol designers had included an ID, Now for the differentiators... the number of hops left, of course, and If we treat as a duplicate a message that meets criteria (1) through (6) Cheers, Eloy Paris.- P.S. Sorry, all this is probably obvious to the INSTEON experts here; I |
What if you put in a configuration param that defaults to 1000ms and have MH ignore messages that are identical in all values except the hops remaining and the D14 CRC byte that appear within that timeout value? For the rare installations where a dupe message takes 3 seconds to appear, they could set the value to 3000ms, but I think 1000ms would take care of the majority of installations and wouldn't generally interfere with people who actually want to say turn a light on twice to make it do something different (they would just have to wait at least a second between the two times they turned it on). Are there any Insteon devices that actually send two valid identical messages in short succession where it wouldn't hurt to drop one of them? Maybe dim/brighten? Of course you could exclude those special cases. |
I would be willing to take a stab at it. As I mentioned duplicate messages are still sometimes the cause of some aldb corruption, so eliminating them has some real benefits. My idea would be to make this really simple.
As I think about it, my 3 second scenario is way off, it deals with a full round trip. We shouldn't see that much variation here as the duplication is happening in transit (at least we think it is). This should be less than a second. I see chris beat me to it with pretty much the same design ideas. |
On second thought, D14 CRC is based on Cmd1, Cmd2, and D1-13, so it shouldn't change as flags change and thus it should be identical in two duplicate messages. I think you should also only ignore bits 2-3 (hops remaining) in the flags because a duplicate message would have the same max hops value, NAK/ACK, etc that the other bits describe. |
We should add a small fudge factor for the PLM. Theoretically half a second should be more than enough to catch duplicate messages. Real world testing may vary. Chris: Good call on the CRC. As to the other flags. I was going to be lazy and just ignore the other bits (ACK, Type...) since it just makes life easier (I can just drop a byte rather than get into bitwise calculations). In this short of time, what are the odds that the rest of the message would be otherwise identical? ** As a side note, my two phase theory may no be correct, since messages are sent at zero crossings. But at this point I am really exceeding my knowledge. |
Kevin, I'm not sure either if the zero crossings on each of the two phases of the powerline are always identicaly timed. I think they're meant to be but in the real world maybe not in all installations.. |
OK, its done, and it seems to work:
I originally tried just hard coding 500 milliseconds as the "delay." However, that caught legitimate duplicate messages. For example my PLM id is 20.f5.f5, a number of legitimate instances of the second f5 were caught by this. So it had to become a little more sophisticated, and now the delay is based on the number of hops left received, plus some extra wiggle room. So far I have no more false positives. There is one small glitch with this design. If a messages becomes stacked up in the receiving queue, the received time is calculated from when this sub is run, not when the message is actually received by the PLM. This could permit some duplicates to sneak through. I will push the commit in a second, Eloy in particular, please test it and let me know if it catches your KPL problem. |
This creates a log of recently received messages with a timestamp. Subsequent messages are check to see if they match. If a match is found and if they message was received within the time permitted by the number of remaining hops plus a little wiggle room, the message is dropped. Fixes issue hollie#169
OK, the branch is: https://github.com/krkeegan/misterhouse/tree/fix_issue_169 It was forked off of the merged_i2 branch. Eloy if you need me to make you a merged branch for you to test just let me know what needs to be in the merge. |
Hi Kevin, On 04/30/2013 08:59 PM, krkeegan wrote:
[...]
Wow, that was fast; thank you! I am running with fix_issue_169, and so far I have not seen a device I'll keep running this and report back if I notice any side effects. I'm Cheers, Eloy Paris.- |
Hi Kevin, On 04/30/2013 09:06 PM, krkeegan wrote:
I'm good; thanks! I just applied fix_issue_169 (plus fix_issue_17, Cheers, Eloy Paris.- |
This is still working fine for me. It has caught a small handful of legitimately duplicate messages in the past 6 days. It has had zero false positives. I have not notices any duplicate messages that have crept through, but they are harder to find. Eloy, since you were having the issue, how is it working for you? |
I started getting duplicate message problems either after I turned on the wireless thermostat or maybe as a result of switching to merge_i2_aldb_support. So I installed this fix but it didn't eliminate the duplicate messages. I noticed the 0xF3 bitmask wasn't being applied correctly so it was effectively setting the whole flags byte to 0, but I don't think that had to do with not detecting duplicates. I found an example of a standard message with 2 hops remaining followed by the same message with 0 hops remaining 250ms later which is longer than the 200ms timeout allowed. That might have to do with additional delays caused by running on Raspberry Pi - a slow mini computer. After adding 500ms to the timeout and fixing the 0xF3 flags bitmask, I also found a case where I got a message with 2 max hops followed by the same message with 3 max hops, so I changed the flags bitmask to 0xF0 to ignore max hops as well. Here is the modified code: # $key will be set to $message_data with max hops and hops left set to 0 my $key = $message_data; substr($key,12,2) = unpack("H2",chr(hex(substr($key,12,2)) & 0xF0)); #How long does it take to transmit each hop of this message my $message_time = (length($message_data) > 18) ? 108 : 50; #Get hops left my $hops_left = hex(substr($message_data,13,1)) >> 2; #Add additional delay to catch PLM delays or out of sync messages my $delay = ($message_time * ($hops_left+2)) + 500; #Clean hash of outdated entries for (keys $$self{received_commands}){ if ($$self{received_commands}{$_} < $curr_milli){ delete($$self{received_commands}{$_}); } } #Check if the message exists if (exists($$self{received_commands}{$key})){ $is_duplicate = 1; ::print_log("[Insteon::BaseInterface] WARN! Dropped duplicate incoming message " . $message_data . " at $curr_milli") if $main::Debug{insteon}; } else { #Message was not in hash, so add it main::print_log("Adding dupe message check key '$key' at $curr_milli + $delay = " . ($curr_milli + $delay) ) if $main::Debug{insteon}; $$self{received_commands}{$key} = $curr_milli + $delay; } return $is_duplicate; Here's a log of duplicate message with hops left - 2 after 250 ms: 05/03/2013 06:03:48 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 025022ae3a000001cb1100 05/03/2013 06:03:48 PM Adding dupe message check key '22ae3a000001c31100' at 1367629428749 + 200 = 1367629428949 05/03/2013 06:03:48 PM [Insteon::BaseInterface] DEBUG2: Message received with 2 hops left, delaying next transmit to avoid collisions with remaining hops. 05/03/2013 06:03:48 PM [Insteon::BaseInterface] Received message from: $sl_WorkroomBack; command: on; type: alllink; group: 01 05/03/2013 06:03:48 PM [Insteon::BaseObject] DEBUG3: Adding hop count of 1 to hop_array of $sl_WorkroomBack 05/03/2013 06:03:48 PM [Insteon::BaseObject] $sl_WorkroomBack::set(on, $sl_WorkroomBack) 05/03/2013 06:03:48 PM [Insteon_PLM] x10 sending code: B1 as insteon msg: e600 05/03/2013 06:03:48 PM [Insteon_PLM] x10 sending code: BK as insteon msg: e380 05/03/2013 06:03:48 PM [Insteon_PLM] Queueing message 'e380'. Command stack size: 1, active message 'none' 05/03/2013 06:03:48 PM transmit_in_progress returning xmit_in_progress=0 || not _check_timeout('xmit')=1 05/03/2013 06:03:48 PM Aborting process_queue because transmit_in_progress 05/03/2013 06:03:48 PM *** Workroom back switch turned on when on, so toggle lamps, $workroomLamps=0 (1=WorkroomSewingLamp, 2=WorkroomBackLamp) 05/03/2013 06:03:48 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 025022ae3a000001c31100 # This message received at 1367629429001 and duplicate copy received 250ms earlier at 1367629428749 05/03/2013 06:03:48 PM Adding dupe message check key '22ae3a000001c31100' at 1367629429001 + 100 = 1367629429101 And a log with max hops 2 then 3: 05/06/2013 05:31:37 PM [Insteon_PLM] x10 sending code: B1 as insteon msg: e600 05/06/2013 05:31:37 PM [Insteon_PLM] x10 sending code: BK as insteon msg: e380 05/06/2013 05:31:37 PM process_queue: Attempting to send 'e600' 05/06/2013 05:31:37 PM [Insteon_PLM] DEBUG2: Sending e600 incurred delay of 0.01 seconds; starting hop-count: ? 05/06/2013 05:31:37 PM [Insteon_PLM] DEBUG3: Sending PLM raw data: 0263e600 05/06/2013 05:31:37 PM *** Workroom back switch turned on when on, so toggle lamps, $workroomLamps=0 (1=WorkroomSewingLamp, 2=WorkroomBackLamp) 05/06/2013 05:31:37 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 0263e60006 05/06/2013 05:31:37 PM [Insteon_PLM] DEBUG3: Received PLM acknowledge: e600 05/06/2013 05:31:38 PM process_queue: Attempting to send 'e380' 05/06/2013 05:31:38 PM [Insteon_PLM] DEBUG2: Sending e380 incurred delay of 0.55 seconds; starting hop-count: ? 05/06/2013 05:31:38 PM [Insteon_PLM] DEBUG3: Sending PLM raw data: 0263e380 05/06/2013 05:31:38 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 1515 05/06/2013 05:31:38 PM [Insteon_PLM] DEBUG3: Interface extremely busy. Resending command after delaying for 1 second # Max hops 2 cleanup message: 05/06/2013 05:31:38 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 025022ae3a20f4a94b1101 05/06/2013 05:31:38 PM Adding dupe message check key '22ae3a20f4a9431101' at 1367886698682 + 700 = 1367886699382 05/06/2013 05:31:38 PM [Insteon::BaseInterface] DEBUG2: Message received with 2 hops left, delaying next transmit to avoid collisions with remaining hops. 05/06/2013 05:31:38 PM [Insteon::BaseInterface] Received message from: $sl_WorkroomBack; command: on; type: cleanup; group: 01 05/06/2013 05:31:38 PM [Insteon::BaseObject] DEBUG3: Adding hop count of 1 to hop_array of $sl_WorkroomBack 05/06/2013 05:31:38 PM [Insteon::BaseObject] Ignoring Received Direct AllLink Cleanup Message for $sl_WorkroomBack since AllLink Broadcast Message was Received. 05/06/2013 05:31:38 PM process_queue: Attempting to send 'e380' 05/06/2013 05:31:38 PM [Insteon::BaseMessage] WARN: now resending e380 after 1 attempts. 05/06/2013 05:31:38 PM [Insteon_PLM] DEBUG2: Sending e380 incurred delay of 1.09 seconds; starting hop-count: ? 05/06/2013 05:31:38 PM [Insteon_PLM] DEBUG3: Sending PLM raw data: 0263e380 05/06/2013 05:31:39 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 0263e38006 05/06/2013 05:31:39 PM [Insteon_PLM] DEBUG3: Received PLM acknowledge: e380 05/06/2013 05:31:39 PM process_queue: No pending message. Clear timer, return 0 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 025022ae3a000001c71100 05/06/2013 05:32:39 PM Adding dupe message check key '22ae3a000001c31100' at 1367886759099 + 650 = 1367886759749 05/06/2013 05:32:39 PM [Insteon::BaseInterface] DEBUG2: Message received with 1 hops left, delaying next transmit to avoid collisions with remaining hops. 05/06/2013 05:32:39 PM [Insteon::BaseInterface] Received message from: $sl_WorkroomBack; command: on; type: alllink; group: 01 05/06/2013 05:32:39 PM [Insteon::BaseObject] DEBUG3: Adding hop count of 2 to hop_array of $sl_WorkroomBack 05/06/2013 05:32:39 PM [Insteon::BaseObject] $sl_WorkroomBack::set(on, $sl_WorkroomBack) 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 025022ae3a000001cb1100 05/06/2013 05:32:39 PM [Insteon::BaseInterface] WARN! Dropped duplicate incoming message 22ae3a000001cb1100 at 1367886759231 05/06/2013 05:32:39 PM process_queue: No pending message. Clear timer, return 0 05/06/2013 05:32:39 PM [Insteon_PLM] x10 sending code: B1 as insteon msg: e600 05/06/2013 05:32:39 PM [Insteon_PLM] x10 sending code: BJ as insteon msg: e280 05/06/2013 05:32:39 PM process_queue: Attempting to send 'e600' 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG2: Sending e600 incurred delay of 0.00 seconds; starting hop-count: ? 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG3: Sending PLM raw data: 0263e600 05/06/2013 05:32:39 PM *** Workroom back switch turned on when on, so toggle lamps, $workroomLamps=1 (1=WorkroomSewingLamp, 2=WorkroomBackLamp) 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 0263e60006 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG3: Received PLM acknowledge: e600 05/06/2013 05:32:39 PM process_queue: Attempting to send 'e280' 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG2: Sending e280 incurred delay of 0.55 seconds; starting hop-count: ? 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG3: Sending PLM raw data: 0263e280 # Got cleanup with max hops 2 above, then same cleanup with max hops 3 below 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG3: Received PLM raw data: 025022ae3a20f4a94611010263e28006 05/06/2013 05:32:39 PM [Insteon_PLM] DEBUG3: Received PLM acknowledge: e280 05/06/2013 05:32:39 PM Adding dupe message check key '22ae3a20f4a9421101' at 1367886759959 + 650 = 1367886760609 05/06/2013 05:32:39 PM [Insteon::BaseInterface] DEBUG2: Message received with 1 hops left, delaying next transmit to avoid collisions with remaining hops. 05/06/2013 05:32:39 PM [Insteon::BaseInterface] Received message from: $sl_WorkroomBack; command: on; type: cleanup; group: 01 |
On 05/06/2013 09:44 PM, cdragon wrote:
Heh, sorry to hear that :-( Were the duplicate messages causing you some trouble? In my case they
[...] Did the modified code improve things? I am running with Kevin's original Cheers, Eloy Paris.- |
Hi Kevin, On 05/06/2013 07:16 PM, krkeegan wrote:
It is working pretty well. I noticed yesterday that one duplicate made Is there anything I should capture when I notice what looks like a Cheers, Eloy Paris.- |
Eloy, No problems so far with the above fix. Kevin said originally that he tried a 500ms delay and it was causing legitimate duplicate messages to be discarded, so my fix may cause that problem as well. I didn't understand what kind of legitimate messages should be sent twice in under 500ms though. I'm still worried about bright/dim commands but I don't have any dimming devices to test that on. Your occasional problems are likely due to what I was seeing with a duplicate message appearing in 250ms which the original timeout was not large enough to catch. I have some code I have that toggles an X10 relay when you press On on an insteon fixture that's already on. Without duplicate message filtering, the X10 relay is told to toggle on, then off, then on, etc for each duplicate insteon "on" message received. |
On 05/07/2013 12:11 AM, cdragon wrote:
Okay, cool. Let's see what Kevin things about your modifications.
Yup, that's the exact same problem I was having/am having very Cheers, Eloy Paris.- |
I am fine with zeroing out the max hops as well. There are likely instances in which the PLM ACK does not reach a device in time causing a retry with an increased hop count. I think it is reasonable that we suppress those duplicates as well. The increased "delay" causes major issues for me though, depending on the message the total allowed delay is upwards of 800 milliseconds now. When I ran scan all link tables it gobbled up the following legitimate messages:
But the worst case was when I factory reset a device and rewrote all of the links. The device had 8 links, and it caused 7 false positives. In this case the PLM sends an extended length command with the full ALDB record, but the device only responds back with a standard length 2F00 command and nothing else. Since the ACK is identical every time it was caught as a duplicate every time. A few months back I was dealing with corruption in the ALDB records, and I have to say that is the worst possible scenario in MH. You can get caught in an infinite delete_orphans sync_links loop and never get a clean record of the links. So I would rather risk a duplicate message slipping through, rather than create the potential for ALDB corruption. I think the best solution is to add in a user configurable setting in private.mh.ini to adjust the additional "padding" beyond the default hop_count message_time calculation. |
I think the best solution would be to not include certain messages in the duplicate filter. Or even better might be to ONLY INCLUDE certain messages in the filter - namely "command: on or off; type: cleanup or alllink;". If other duplicate messages are found to cause problems they can always be added later. |
Oh, good call. Only using the filter on a defined list of message types would work well. |
OK, I pushed an update to the branch that achieves what Chris mentioned above. However, I am not totally happy with the result. The code as is is a very low level process but it is being limited to a rather high level function of preventing multiple set calls in a short period of time. I think the better option is to split these two features. The is_duplicate process is still valuable if the duplicate window is cut back to the insteon spec. MH already waits to send a message if it sees an incoming message with hops_left, it makes sense that it should also ignore identical messages in this same time frame. Then to solve Eloy's issue, I will add a check in the set routine to prevent re-setting a device to the same state in less than 1 second. |
Hi Kevin, On 05/07/2013 09:49 PM, krkeegan wrote:
Should I wait until you add this check or you'd like me to upgrade to Cheers, Eloy Paris.- |
It duplicated chris's changes, so there shouldn't be much need to testing On Tue, May 7, 2013 at 7:08 PM, Eloy Paris [email protected] wrote:
|
OK, I made some extensive changes. I forked the functions as I discussed before. So now there is a check in BaseObject that prevents a device from being set to the same state more frequently than once a second. I reworked the message window calculation in is_duplicate. I tried to have some basis in the specification for the design. Based on the design the message window is much longer for messages that require an ACK such as a set command from a device. But the message window is much shorter for incoming ACKs similar to what comes in during an ALDB scan. I have used it quite a bit tonight and it seems to be working well. It has caught a number of legitimately duplicative messages without any false positives. It has been working so well that it caught a lot more duplicative messages in the ALDB scan than I realized were occurring. It also caught the duplicate set commands from the device without needing to rely on the fallback code which stops duplicate set commands in 1 second. The fix_issue_169 tree has all of the updates. |
Hi Kevin, On 05/09/2013 12:27 AM, krkeegan wrote:
Thanks for the improvements. I'm now running this latest code to help Cheers, Eloy Paris.- |
Likewise. Thanks! |
This
is still working well for me. So far it has only had one false negative
which was the ACK of a status request, not a big deal that it was
missed.
I have not had any false
positives. Nor have I had any instances in which
the set within 1 second logic was needed. |
I've been running this the last couple days and haven't seen any problems so far. I only had time to do a couple simple tests as I'm going out of town tomorrow. None of the regular things I use have malfunctioned and I did a couple of ALDB scans successfully. |
Eloy's recent work uncovered a bug that has been duplicated throughout the MH code and which I duplicated here. The bug is in the code I added to BaseObject which prevents the calling of set_receive more than once per second. I allow set_receive to be called more frequently if the state set does not match the current state of the device. This doesn't work properly when multiple set calls are made in the same loop. To fix this I just need to add a check to state_final. |
Resolves one of the three loopholes discovered by @peloy which permitted duplicate messages received in the same pass to be processed. Hopefully the final fix for hollie#169
After pressing once a button of a KeypadLinc, MH sees what seem to be multiple duplicate messages from the KeypadLinc, and reacts (sets state) for each received message. The log excerpt below shows the issue; see how set(off) is being called 3 times.
I don't know the protocol well enough but it seems to me like we should look at the hopsleft field and inter-message arrival times to try to determine if a message received from a specific device is a duplicate that reached us through some other devices in the mesh network that just repeated (relayed) the original message from the sending message. The discussion with subject "INSTEON mesh network" in the misterhouse-users mailing list has good analysis by Chris Dragon on this topic.
Avoiding setting state unless the new state is different is probably not a good solution because KeypadLinc buttons can be configured for toggle mode (send 'on' one time the button is pressed, then send 'off' the next time), 'always off' mode (send 'off' every time the button is pressed), and 'always on' mode (send 'on' every time the button is pressed). So, if we do not set the state unless the state changes then it will not be possible to use the 'always on' and 'always off' modes.
04/29/13 10:10:45 AM [Insteon::BaseInterface] DEBUG2: Message received with 2 hops left, delaying next transmit to avoid collisions with remaining hops.
04/29/13 10:10:45 AM [Insteon::BaseInterface] Received message from: $kpl_garage_button_a; command: off; type: alllink; group: 03
04/29/13 10:10:45 AM [Insteon::BaseObject] $kpl_garage_button_a::set(off, $kpl_garage_button_a)
04/29/13 10:10:45 AM [Insteon::BaseInterface] DEBUG2: Message received with 1 hops left, delaying next transmit to avoid collisions with remaining hops.
04/29/13 10:10:45 AM [Insteon::BaseInterface] Received message from: $kpl_garage_button_a; command: off; type: alllink; group: 03
04/29/13 10:10:45 AM [Insteon::BaseObject] $kpl_garage_button_a::set(off, $kpl_garage_button_a)
04/29/13 10:10:46 AM [Insteon::BaseInterface] Received message from: $kpl_garage_button_a; command: off; type: alllink; group: 03
04/29/13 10:10:46 AM [Insteon::BaseObject] $kpl_garage_button_a::set(off, $kpl_garage_button_a)
04/29/13 10:10:46 AM [Insteon::BaseInterface] Received message from: $kpl_garage_button_a; command: off; type: cleanup; group: 03
04/29/13 10:10:46 AM [Insteon::BaseObject] Ignoring Received Direct AllLink Cleanup Message for $kpl_garage_button_a since AllLink Broadcast Message was Received.
04/29/13 10:10:46 AM [Insteon::BaseInterface] DEBUG2: Message received with 2 hops left, delaying next transmit to avoid collisions with remaining hops.
04/29/13 10:10:46 AM [Insteon::BaseInterface] Received message from: $kpl_garage_button_a; command: link_cleanup_report; type: alllink; group: 03
04/29/13 10:10:46 AM [Insteon::BaseObject] $kpl_garage_button_a::set(link_cleanup_report, $kpl_garage_button_a)
I am experiencing this problem with a very recent (i2CS, probably very recent firmware as of April 2013) dual-band KeypadLinc. I do not see this problem with ICON switchs in my network. RemoteLincs and motion sensors apparently send the same message twice in short succession, but there is code in the stack to prevent re-setting the state in these situations, as the following log shows:
04/29/13 11:11:20 AM [Insteon::BaseInterface] DEBUG2: Message received with 2 hops left, delaying next transmit to avoid collisions with remaining hops.
04/29/13 11:11:20 AM [Insteon::BaseInterface] Received message from: $remote_button_4; command: on; type: alllink; group: 04
04/29/13 11:11:20 AM [Insteon::RemoteLinc] $remote_button_4::set_receive(on, $remote_button_4)
04/29/13 11:11:20 AM [Insteon::BaseInterface] DEBUG2: Message received with 2 hops left, delaying next transmit to avoid collisions with remaining hops.
04/29/13 11:11:20 AM [Insteon::BaseInterface] Received message from: $remote_button_4; command: on; type: alllink; group: 04
04/29/13 11:11:20 AM [Insteon::RemoteLinc] $remote_button_4::set_receive(on, $remote_button_4) deferred due to repeat within 1 second
04/29/13 11:11:20 AM [Insteon::BaseInterface] Received message from: $remote_button_4; command: on; type: cleanup; group: 04
04/29/13 11:11:20 AM [Insteon::BaseObject] Ignoring Received Direct AllLink Cleanup Message for $remote_button_4 since AllLink Broadcast Message was Received.
The text was updated successfully, but these errors were encountered: