Protocol / Queue handling issue #151

susisstrolch · 2019-07-16T18:06:07Z

Today my Logic Analyzer arrived, so I found the time to look what's going on on EMS-Bus.
Environment:

RC35 (DeviceID:0x10 ProductID:86 Version:01.11)
MC10 (DeviceID:0x08 ProductID:72 Version:02.07)
EMS-ESP version 1.8.1b17
- external power supply
- tx_mode = 0

Issue 1 - EMS-ESP continues sending, even not being polled again

1. MC10:	08 0B 16 00 FF 5A 64 00 06 FA 0A 01 0F 64 64 02 
	        08 F8 0F 0F 0F 0F 1E 05 04 09 09 00 6D 00
2. ESP:	0B 00
3. MC10:	   0B 00 89 00 85
4. ESP:		  0B 88 14 00 20 E4 00
<200ms pause>

MC10 sends a MC10Parameter message to us.
We acknowledge with "0B " -> ok, fini
MC10 echos our acknowledge???
The from MC10 looks fine, it's approx. 1.035ms
We continue to poll MC10, which is simply ignored by the busmaster by not sending an echo.

Issue 2 - pretty similiar to Issue 1

1. RC35:	10 0B 3E 00 00 00 00 7D 00 00 00 00 00 00 00 00
	        00 11 05 00 E9 00
2. ESP:	0B 00
3. RC35:	   0B 00
4. ESP:		  0B 90 3D 00 20 80 00
5. RC35:		     0B 90 7A
<200ms pause>

RC35 sends a HK1MonitorMessage to us
We respond with "0B " -> ok, fini
RC35 echos our acknowledge
We start sending w/o request
Busmaster stops echo after 2nd byte and stays silent for 200ms.

The text was updated successfully, but these errors were encountered:

proddy · 2019-07-16T18:51:01Z

interesting, if you use tx_mode=2 (your implementation) does it start to clear up a little?

susisstrolch · 2019-07-16T19:19:39Z

That will be the next step.. first, I have to create triggers for the LA - especially on Rx / Tx BRK.
digitalWrite doesn't work in emsuart - that leads to reboot loops from watchdog.

proddy · 2019-07-17T07:20:24Z

getting digitalWrite to work in an ISR is tricky. although it is atomic its slow. I've managed to do this before in earlier versions using the LED to show BRKs using

WRITE_PERI_REG(PERIPHS_GPIO_BASEADDR + (state ? 4 : 8), (1 << EMSESP_Status.led_gpio)); // 4 is on, 8 is off

susisstrolch · 2019-07-17T11:00:07Z

Jep.. it was more a "which one to use and how/where to mark the actions".
Anyway...
Added two markers to visualize whats going on. One is related to Rx, the other to Tx.
Short Tx mark is set when we fill the Tx FIFO. Long Tx mark is set when we generate a BRK, after waiting for Tx-FIFO empty.
Rx mark (long) is set as soon as we detect a BRK on Rx.

Now, Iet's have a look at the protocol problem (still in tx_mode 0):

MC10 sends a MC10Parameter message to us.

At the blue marker you see that ESP acknowledges with a 0B <BRK> - which is imho wrong, because the command isn't a poll request. So we should answer with ACK or NACK.
EMS acknowledges with 0B <BRK> - so far ok - and, because the <BRK terminates a communication - EMS assumes that the bus is free.
ESP however continues to send - which is politely ignored by EMS.

So, I suggest the following extensions to protocol handler:
If we are targeted by the message AND if bit7 isn't set we have a direct telegram, which should be ACKed with the 01 - and after that by sending a 0B <BRK.
In addition, ESP should wait for the next poll request before sending the next telegram.

The tx_mode 2 is my next target...

susisstrolch · 2019-07-17T11:01:10Z

Uups - seems my JPEG rendering is a bit too conservative...

susisstrolch · 2019-07-18T21:10:43Z

Ok, here's the next one...
Switching to alternate Tx/Rx is much too late after reboot. The EMS-Bus keeps pulled low to long.
Green Marker: echo from reboot
Blue Marker: Tx moved over to alternate Tx
As you can see the EMS is pulled down for >4s after reboot...

Zoomed in before blue marker:

Imho switching over to D8/D7 should occur before WIFI and other stuff...

proddy · 2019-07-18T21:22:43Z

the uart being initialized and the interrupts enabled at the last minute was by design as to not interfere with anything else that was happening during bootup (serial transmissions, wifi, mqtt, websockets etc). It seemed the sensible thing to do at the time as I didn't want to fill up the Rx buffers. The logic is in line 1569 of ems-esp.cpp. It should be safe to move it forward as you suggested although I'm not sure what advantage it will have?

proddy · 2019-07-18T21:28:13Z

btw how you finding the dslogic? I have one too from https://www.dreamsourcelab.com/product/dslogic-plus but found it quite cumbersome to use. I often switch over to a 4 EUR USB logic analyzer and using Saleae's free software

susisstrolch · 2019-07-18T21:36:58Z

Advantage is not to block the EMS bus - there's nothing on the line during this time...

proddy · 2019-07-18T21:40:56Z

I see. I would have expected it wouldn't pull down the line as the D7/D8 pins are not activated though ?

susisstrolch · 2019-07-18T21:48:42Z

Next one... found in tx_mode 2...
Between receiving a "ITS_TO_ME" telegram and processing we could be interrupted / delayed by WIFI/MQTT transmission.
Currently, the tx doesn't care about the Rx status. This will lead to overlapping / misleading operations on the EMS bus.
Green marker: EMS is sending a poll request
Dark green marker: Poll request is processed
Overview:
EMS sends a poll request which is handled ~130ms after the request. Meanwhile, the next device is sending, which gets disturbed by us.

Request details:

Response disturbing bus:

IMHO, Tx should honour the Rx-Status - simply fail if Rx != IDLE...

susisstrolch · 2019-07-18T21:51:50Z

btw how you finding the dslogic

It's a really great tool - needs a bit time to find out how to work, but after that you find anything you like!!

proddy · 2019-07-18T22:01:50Z

jealous !

susisstrolch · 2019-07-18T22:02:24Z

I see. I would have expected it wouldn't pull down the line as the D7/D8 pins are not activated though ?

I also don't understand why they are pulled down. My first thought was that it happens in the Arduino core - at least the traces looks alike...

susisstrolch · 2019-07-18T22:06:45Z

jealous !

Playground :) accepted by Susi :))
(won't talk about the 3D printer in background )

susisstrolch · 2019-07-18T22:23:42Z

Somehow we must handle the duedate of the received package.
Maybe by a timestamp, If delta > 100ms we won't process the Tx anymore...

proddy · 2019-07-18T22:34:40Z

jealous !

Playground :) accepted by Susi :))
(won't talk about the 3D printer in background )

3D printer's on my wish list too. Velleman make some nice kits.

susisstrolch · 2019-07-18T22:40:01Z

Yep... but it needs some tweeks until it's really usable...
F.e. the sliders are suboptimal...
Simply ping me if you go to K8800...

susisstrolch · 2019-07-18T22:43:19Z

Totaly off-topic:
iPhone8p holder for the bicycle.. printed in PLA+Carbon...

[fixed horrible typo]

bbqkees · 2019-07-25T08:31:56Z

I see. I would have expected it wouldn't pull down the line as the D7/D8 pins are not activated though ?

I also don't understand why they are pulled down. My first thought was that it happens in the Arduino core - at least the traces looks alike...

Is the EMS bus itself actually blocked?
If I recall correctly when the TX input is floating or 0 the TX transistor is active (so the current is drawn through the resistors).
Could be the I/O of the ESP is pulled low at boot so the analyzer sees a logic 0 at those pins but the EMS bus itself might be just fine.

susisstrolch · 2019-07-25T08:43:55Z

It‘s simply because GPIO15 has a pulldown resistor connected. Otherwise (GPIO15 open) the ESP tries to boot from SD-Card... Sent by mobile device

…

Am 25.07.2019 um 10:31 schrieb Kees ***@***.***>: I see. I would have expected it wouldn't pull down the line as the D7/D8 pins are not activated though ? I also don't understand why they are pulled down. My first thought was that it happens in the Arduino core - at least the traces looks alike... Is the EMS bus itself actually blocked? If I recall correctly when the TX input is floating or 0 the TX transistor is active (so the current is drawn through the resistors). Could be the I/O of the ESP is pulled low at boot so the analyzer sees a logic 0 at those pins but the EMS bus itself might be just fine. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

proddy · 2019-07-25T08:44:32Z

GPIO15 (D8) which we use for Tx is automatically pulled high by the ESP8266 during boot up, which causes the block on the EMS bus. In the code its forced down so there's only a very short window now (3 bytes I think).

susisstrolch · 2019-07-25T09:42:37Z

GPIO15 (D8) which we use for Tx is automatically pulled high by the ESP8266 during boot up

No... GPIO15 has a pull down (10k) on (nearly) all boards (Wemos, NodeMCU).
The adapter board itself is inverting the Tx signal - pulling current from EMS-Bus when Tx (GPIO15) is low.
So, during boot the EMS-Bus was pulled low for (my impression) much to long.

bbqkees · 2019-07-25T10:03:20Z

Yes indeed.
But you only block slave TX for a short moment, the EMS master can still send uninterrupted.

susisstrolch · 2019-07-26T12:13:44Z

@proddy we have a protocol handling issue when sending to EMS bus. It happens with tx_mode 0 and also with tx_mode 2.
Look at those traces:

(00:09:16.948) ems_parseTelegram: 00 01: 8B
(00:09:16.948) emsuart_tx_buffer: 00 01: 0B 90 3E 00 20 8C
(00:09:16.964) ems_parseTelegram: 00 02: 0B 90 3E 00 20 8C
(00:09:16.964) 0B 90 3E 00 20 8C
(00:09:16.964) echo:telegram: 0B 90 3E 00 20 (CRC=8C) #data=1
(00:09:17.078) ems_parseTelegram: 00 02: 10 0B 3E 00 00 00 00 7D 00 00 00 00 00 00 00 00 00 00 05 00 AD
(00:09:17.078) 10 0B 3E 00 00 00 00 7D 00 00 00 00 00 00 00 00 00 00 05 00 AD
(00:09:17.080) emsuart_tx_buffer: 00 01: 0B
(00:09:17.085) ems_parseTelegram: 00 01: 0B
(00:09:17.085) emsuart_tx_buffer: 00 01: 0B 90 3D 00 20 80
[557087] ** error sending buffer: BRK
(00:09:17.297) ems_parseTelegram: 00 01: 89 00 86 90

(00:09:16.948) ems_parseTelegram: 00 01: 8B Bus master sends poll request
(00:09:16.948) emsuart_tx_buffer: 00 01: 0B 90 3E 00 20 8C We send a query for 0x10
(00:09:16.964) ems_parseTelegram: 00 02: 0B 90 3E 00 20 8C We get the echo
(00:09:17.078) ems_parseTelegram: 00 02: 10 0B .... We process the data
(00:09:17.080) emsuart_tx_buffer: 00 01: 0B We tell Busmaster we're ready
(00:09:17.085) ems_parseTelegram: 00 01: 0B Busmaster echo
(00:09:17.085) emsuart_tx_buffer: 00 01: 0B 90 3D 00 20 80 We send again, w/o invitation/poll
[557087] ** error sending buffer: BRK Bus master signals "shut up"

The issue is that we interpret the echo as a poll request because of masking out the "poll bit" from Target ID.

proddy · 2019-07-26T13:21:07Z

funny, I noticed that too this morning when I was trying to debug why the 0x19 MonitorSlow messages were getting blocked (buffer of 32 was too small). I saw many telegrams with a length of 1 and a value of 0x19 which was a recursive echo->poll

susisstrolch · 2019-07-26T16:15:40Z

Seems I have a more stable version at the moment. Much less Rx-idle warnings and no Tx Brk or Tx Timeout during the last hour. Still have to clean up a bit and - add log debug with more Info (Junkers) - show reboot reason when connecting Sent by mobile device

…

Am 26.07.2019 um 15:21 schrieb Proddy ***@***.***>: funny, I noticed that too this morning when I was trying to debug why the 0x19 MonitorSlow messages were getting blocked (buffer of 32 was too small). I saw many telegrams with a length of 1 and a value of 0x19 which was a recursive echo->poll — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

proddy · 2019-07-26T16:25:47Z

great, thanks Juergen. Amazing how you can code in these extreme heats. Shall I take a look at fixing the echo/poll issue or have you already squished that bug?

proddy · 2019-07-29T18:09:19Z

wonderful! I'll test tonight and reach out the the HT3 guys to see if they can test

proddy · 2019-07-30T07:47:47Z

Been running the txmode2 branch for 12hrs now with no crashes and hardly any corrupt telegrams. Looking good!

susisstrolch · 2019-07-30T08:27:01Z

Great! Now we need the result of the Junkers users. About the reboots - I suspect MQTT or telnet. Maybe we should ad a watchdog feed before calling them. Sent by mobile device

…

Am 30.07.2019 um 09:47 schrieb Paul ***@***.***>: Been running the txmode2 branch for 12hrs now with no crashes and hardly any corrupt telegrams. Looking good! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

proddy · 2019-07-30T08:58:44Z

with 1.8.1 I would get reboots every 10-15 mins with tx_mode 2, but none with tx_mode 0 so I don't expect its telnet/mqtt related. you can switch all that off (set mqtt_ip and set publish_time 0) and it still happens. Anyway seems better in 1.9

susisstrolch · 2019-07-30T14:52:46Z

We should really try the watchdog feed before each timeconsuming function call.
Because it‘s the software watchdog which gets triggered we don‘t get a stacktrace.
So we must iterate by try and error.

proddy · 2019-07-30T15:05:41Z

yes, we need to keep the code in ISRs non-blocking and highly optimized. emsuart_tx_buffer() has grown in complexity quite significantly since 1.7 with many loops and race conditions.

proddy · 2019-07-30T15:18:44Z

Or disable wdt before the Tx is called from ems.cpp with ESP.wdtDisable()and enable it after the acknowledgement poll has been received with ESP.wdtEnable(0)

susisstrolch · 2019-07-30T15:45:37Z

I‘ll upload a log in jabber mode - there you can see that Tx/Rx aren‘t the bottlenecks.

susisstrolch · 2019-07-30T16:57:24Z

And here an interesting arcticle about soft-wtd:

https://www.sigmdel.ca/michel/program/esp8266/arduino/watchdogs_en.html#ESP8266_WDT_TIMEOUT

susisstrolch · 2019-07-31T09:23:11Z

pushed a new release of txmode2 branch which injects wdtfeed() in MyESP.loop.

proddy · 2019-07-31T11:23:03Z

In all versions up to 1.8.0 I had this line in MyESP.loop():

yield(); // ...and breath

which somehow got lost after 1.8.1. I think it does the same as the wdtfeed() no?

susisstrolch · 2019-07-31T18:29:56Z

It‘s still in. But yield does more than calming the WTD - it also cares about WIFI stuff. wtdFeed only restarts/resets the HW and SW watchdog and doesn’t have any further overhead. Sent by mobile device

…

Am 31.07.2019 um 13:23 schrieb Paul ***@***.***>: In all versions up to 1.8.0 I had this line in MyESP.loop(): yield(); // ...and breath which somehow got lost after 1.8.1. I think it does the same as the wdtfeed() no? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

bbqkees · 2019-08-01T07:18:00Z

So I've been running the latest build from yesterday evening in the txmode2 branch.
At first I got lots of reboots (every 2 minutes or so) but now its been running for 10h without hickups (jack powered, not bus powered).

proddy · 2019-08-01T07:38:33Z

It‘s still in. But yield does more than calming the WTD - it also cares about WIFI stuff. wtdFeed only restarts/resets the HW and SW watchdog and doesn’t have any further overhead. Sent by mobile device
…
Am 31.07.2019 um 13:23 schrieb Paul @.***>: In all versions up to 1.8.0 I had this line in MyESP.loop(): yield(); // ...and breath which somehow got lost after 1.8.1. I think it does the same as the wdtfeed() no? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

There is also a delay(1) in the ems-esp.cpp loop I used to calm down the wifi after seeing how ESPhome and other projects did this because of ardunio 2.5.0. It might no longer be necessary?

susisstrolch · 2019-08-01T09:09:31Z

delay() is also doing the WIFI and SWDT handling, so it shouldn't hurt at all.
But I can try to remove it...

susisstrolch · 2019-08-02T06:00:15Z

Found that one: letscontrolit/ESPEasy#2477

proddy · 2019-08-02T06:08:41Z

Found that one: letscontrolit/ESPEasy#2477

nice, didn't know you could do that. Let's add that too as it'll help us find the root cause for the WDT resets.

bbqkees · 2019-08-02T10:14:23Z

Possibly unrelated but ran txmode2 firmware for 20+ hours (with an open but idle Telnet session) without problems.
However, after doing 'log v' it rebooted after a few minutes.

proddy · 2019-08-02T14:00:12Z

that's good news for @susisstrolch's new tx code. the logv does a lot of string manipulation (as I avoid using the String library and sprintf() ) so most probably its a memory error I need to look into.

proddy · 2019-08-02T18:53:03Z

@bbqkees @susisstrolch also unrelated - I uploaded my latest web version under the newweb branch if you want to play with it. Still need to refine a few things but I think its stable. Look carefully at the CHANGELOG on how to build it because the build scripts have also changed. I'm off now and will pick things up when I'm back in a week.

bbqkees · 2019-08-05T07:44:32Z

Ok will try.
The txmode2 build from last week is still running here uninterrupted at 3 days (bus powered).
Telnet session still active.

susisstrolch · 2019-08-05T08:28:17Z

I have 2days 6hr parasitare Mode. Sent by mobile device

…

Am 05.08.2019 um 09:44 schrieb Kees ***@***.***>: Ok will try. The txmode2 build from last week is still running here uninterrupted at 3 days (bus powered). Telnet session still active. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

bbqkees · 2019-08-08T12:27:31Z

@proddy Took me some time because of lots of build errors but I was able to build the firmware in the end.

The newweb branch needs two additional libraries: ESPAsyncUDP and ESPAsyncWebServer. (PIO Home->Find libraries->type name-> install)
Maybe its Windows or just my particular setup but the gulp build did not work with the 'debug' parameter.
So did node gulp command in the correct folder and after that went through fine and the compacted web code files were added to the 'webh' folder, the build in pio went Ok.

The new web interface looks great, will test it over the weekend.

proddy · 2019-08-08T12:53:48Z

Did you use the latest platformio.ini example file? I would have expected pio to download the libraries automatically.

…

On Thu, 8 Aug 2019 at 14:27, Kees ***@***.***> wrote: @proddy <https://github.com/proddy> Took me some time because of lots of build errors but I was able to build the firmware in the end. The newweb branch needs two additional libraries: ESPAsyncUDP and ESPAsyncWebServer. (PIO Home->Find libraries->type name-> install) Maybe its Windows or just my particular setup but the gulp build did not work with the 'debug' parameter. So did node gulp command in the correct folder and after that went through fine and the compacted web code files were added to the 'webh' folder, the build in pio went Ok. The new web interface looks great, will test it over the weekend. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#151>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAJMO6DM4KG4COMWJ7WMRK3QDQGLJANCNFSM4IEEHHTQ> .

bbqkees · 2019-08-08T12:58:42Z

Yes, the new one with the 'pre:scripts/buildweb.py' etc.

I had the same issue when you added f.i. OneWire initially.

proddy · 2019-08-08T13:08:58Z

That's strange. In theory if you remove the whole .pio folder/directory it should go fetch all the libs. I'll try on a fresh win install this weekend

proddy · 2019-08-11T14:56:29Z

@bbqkees tested on a fresh win install with platformio 4 and you shouldn't need to download anything manually. As soon as the platformio.ini file is there it will automatically fetch the latest libraries. We can look at your config next week.

@susisstrolch I merged the txmode2 branch into the newweb branch and it's been running fine for the last 8hrs. Eventually I'll move all this into dev which is 1.9.0 so let me know if you're planning any further changes. We still need to test with Junkers.

proddy · 2019-08-27T10:24:53Z

closing for now. txmode2 merged into dev.

susisstrolch added the bug Something isn't working label Jul 16, 2019

susisstrolch mentioned this issue Jul 21, 2019

Brkdetect #154

Merged

proddy closed this as completed Aug 27, 2019

nhmariend mentioned this issue Jun 1, 2020

Bosch Compress 6000AW Missing Heating circuits via KM200 gateway #390

Closed

Protocol / Queue handling issue #151

Protocol / Queue handling issue #151

Comments

susisstrolch commented Jul 16, 2019

proddy commented Jul 16, 2019

susisstrolch commented Jul 16, 2019

proddy commented Jul 17, 2019

susisstrolch commented Jul 17, 2019 • edited Loading

susisstrolch commented Jul 17, 2019

susisstrolch commented Jul 18, 2019 • edited Loading

proddy commented Jul 18, 2019

proddy commented Jul 18, 2019

susisstrolch commented Jul 18, 2019

proddy commented Jul 18, 2019

susisstrolch commented Jul 18, 2019

susisstrolch commented Jul 18, 2019 • edited Loading

proddy commented Jul 18, 2019

susisstrolch commented Jul 18, 2019

susisstrolch commented Jul 18, 2019 • edited Loading

susisstrolch commented Jul 18, 2019

proddy commented Jul 18, 2019

susisstrolch commented Jul 18, 2019

susisstrolch commented Jul 18, 2019 • edited Loading

bbqkees commented Jul 25, 2019

susisstrolch commented Jul 25, 2019 via email

proddy commented Jul 25, 2019

susisstrolch commented Jul 25, 2019 • edited Loading

bbqkees commented Jul 25, 2019

susisstrolch commented Jul 26, 2019

proddy commented Jul 26, 2019

susisstrolch commented Jul 26, 2019 via email

proddy commented Jul 26, 2019

proddy commented Jul 29, 2019

proddy commented Jul 30, 2019

susisstrolch commented Jul 30, 2019 via email

proddy commented Jul 30, 2019

susisstrolch commented Jul 30, 2019

proddy commented Jul 30, 2019

proddy commented Jul 30, 2019 • edited Loading

susisstrolch commented Jul 30, 2019

susisstrolch commented Jul 30, 2019

susisstrolch commented Jul 31, 2019

proddy commented Jul 31, 2019

susisstrolch commented Jul 31, 2019 via email

bbqkees commented Aug 1, 2019

proddy commented Aug 1, 2019

susisstrolch commented Aug 1, 2019

susisstrolch commented Aug 2, 2019

proddy commented Aug 2, 2019

bbqkees commented Aug 2, 2019

proddy commented Aug 2, 2019 • edited Loading

proddy commented Aug 2, 2019

bbqkees commented Aug 5, 2019

susisstrolch commented Aug 5, 2019 via email

bbqkees commented Aug 8, 2019

proddy commented Aug 8, 2019 via email

bbqkees commented Aug 8, 2019

proddy commented Aug 8, 2019

proddy commented Aug 11, 2019

proddy commented Aug 27, 2019

susisstrolch commented Jul 17, 2019 •

edited

Loading

susisstrolch commented Jul 18, 2019 •

edited

Loading

susisstrolch commented Jul 18, 2019 •

edited

Loading

susisstrolch commented Jul 18, 2019 •

edited

Loading

susisstrolch commented Jul 18, 2019 •

edited

Loading

susisstrolch commented Jul 25, 2019 •

edited

Loading

proddy commented Jul 30, 2019 •

edited

Loading

proddy commented Aug 2, 2019 •

edited

Loading