Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If optimize for performance in menuconfig, modbus failed on incomplete message (IDFGH-2371) #4499

Closed
keelung-yang opened this issue Dec 17, 2019 · 65 comments
Assignees

Comments

@keelung-yang
Copy link

keelung-yang commented Dec 17, 2019

On latest master.
Optimize for size works, but if set to Optimize for performance, I got:

Modbus Error: [Input/Output] Modbus Error: [Invalid Message] Incomplete message received, expected at least 2 bytes (0 received)

Just change sdkconfig, no source code modified.

I'm using baudrate 230400, and I tried 115200 but still failed on the same error.
ModbusSerialClient(strict=False or True) also have no effect.

Here is my test script in Python3(>=3.6):

import time

import serial

from pymodbus.client.sync import ModbusSerialClient


def reset(port, wait=1):
        with serial.Serial(port) as ser:
                ser.dtr = False
                ser.rts = False
        time.sleep(wait)


port = 'COM32'
baudrate=230400

reset(port)

client = ModbusSerialClient(
        method="rtu", strict=False,
        port=port, baudrate=baudrate, bytesize=8, stopbits=1, parity='N')
client.connect()

unit = 1
res = client.write_registers(0, [500], unit=unit)
if res.isError():
        print(f'{res}')
else:
        print('OK')

client.close()

sdkconfig and esp32 debug log also attached:
SDKConfig & Log.zip

@github-actions github-actions bot changed the title If optimize for performance in menuconfig, modbus failed on incomplete message If optimize for performance in menuconfig, modbus failed on incomplete message (IDFGH-2371) Dec 17, 2019
@keelung-yang
Copy link
Author

After compare log, I found that ESP32 reboot +1 then optimize for size.
This should be the reason of modbus failed

@alisitsyn
Copy link
Collaborator

Hi @keelung-yang,
Thank you for the issue. I am trying to catch what is happened with Modbus in your example but do not see any messages that say about Modbus started. I even do not see any messages related to Guru meditation error in both logs. I don't see your code and without log messages it is hard to find the reason for this issue. Could you provide more information to clarify your issue? I think core dump log file in elf format would be very good artifact for this. I continue trying to reproduce it but could you trigger the coredump in your code to make it more clear? Thank you.

@alisitsyn
Copy link
Collaborator

alisitsyn commented Dec 17, 2019

How the Modbus UART pins configured in your project? I see that your console is configured to use GPIO 18 and 23 which are used for Modbus UART by default.

@keelung-yang
Copy link
Author

keelung-yang commented Dec 17, 2019

@alisitsyn
Core dump not works if CONFIG_ESP32_ENABLE_COREDUMP_TO_UART=y:
There is no ================= CORE DUMP START ================= in console.
The log say: E (370) esp_core_dump_common: No core dump partition found!
I've set dump to uart, so no dump partition needed. Is there something I missed?

Here is pin configure & relevant code:
image

#define MB_PORT_NUM     (UART_NUM_0)
#define MB_DEV_ADDR     (1)
#define MB_DEV_SPEED    (230400)
#define MB_UART_TXD     (1)
#define MB_UART_RXD     (3)

void dmodbus_start()
{
    mbc_slave_init(MB_PORT_SERIAL_SLAVE, &mbc_slave_handler);

    mb_communication_info_t comm_info;
    ESP_ERROR_CHECK(uart_set_pin(MB_PORT_NUM, MB_UART_TXD, MB_UART_RXD,
                                 UART_PIN_NO_CHANGE, UART_PIN_NO_CHANGE));
    
    comm_info.mode = MB_MODE_RTU;
    comm_info.slave_addr = MB_DEV_ADDR;
    comm_info.port = MB_PORT_NUM;
    comm_info.baudrate = MB_DEV_SPEED;
    comm_info.parity = MB_PARITY_NONE;
    ESP_ERROR_CHECK(mbc_slave_setup((void*)&comm_info));

    mb_register_area_descriptor_t reg_area;
    reg_area.type = MB_PARAM_HOLDING;
    reg_area.start_offset = MB_DIAG_MB_START;
    reg_area.address = (void*)&diag_mb;
    reg_area.size = sizeof(diag_mb);
    ESP_ERROR_CHECK(mbc_slave_set_descriptor(reg_area));

    ESP_ERROR_CHECK(mbc_slave_start());
    ESP_ERROR_CHECK(uart_set_mode(MB_PORT_NUM, UART_MODE_UART));
}

void dmodbus_print_regs()
{
    uint32_t can = (uint32_t)((uint8_t*)diag_mb.can);
    uint32_t ecu = (uint32_t)((uint8_t*)diag_mb.ecu);
    uint32_t frame = (uint32_t)((uint8_t*)diag_mb.frame);
    uint32_t ftx = (uint32_t)((uint8_t*)&diag_mb.ftx);
    uint32_t trace = (uint32_t)((uint8_t*)&diag_mb.trace);
    uint32_t sim = (uint32_t)((uint8_t*)&diag_mb.sim);

    uint32_t count = \
        MB_DIAG_CAN_COUNT + \
        MB_DIAG_ECU_COUNT + \
        MB_DIAG_FRAME_COUNT + \
        1 + 1 + 1;

    printf(\
        "\nModbus Register Address:\n" \
        "--------------------------------------\n" \
        "\tcan:\t%04X\t%4d\t%4d\n"    \
        "\tecu:\t%04X\t%4d\t%4d\n"    \
        "\tdiag:\t%04X\t%4d\t%4d\n"  \
        "\tftx:\t%04X\t%4d\t%4d\n"   \
        "\ttrace:\t%04X\t%4d\t%4d\n"   \
        "    +   sim:\t%04X\t%4d\t%4d\n"   \
        "--------------------------------------\n" \
        "\t\t\t%4d\t%4d\n\n"   \
        "Available memory: %18d\n\n",   \
        (can   - can) / 2,  MB_DIAG_CAN_COUNT,      MB_DIAG_CAN_COUNT   * sizeof(diag_can_t),
        (ecu   - can) / 2,  MB_DIAG_ECU_COUNT,      MB_DIAG_ECU_COUNT   * sizeof(diag_ecu_t),
        (frame - can) / 2,  MB_DIAG_FRAME_COUNT,    MB_DIAG_FRAME_COUNT * sizeof(diag_frame_t),
        (ftx   - can) / 2,  1, sizeof(diag_ftx_t),
        (trace - can) / 2,  1, sizeof(diag_trace_t),
        (sim   - can) / 2,  1, sizeof(diag_sim_t),
        count, sizeof(diag_mb), esp_get_free_heap_size()
    );
}

void dmodbus_loop(bool cantp_started)
{
    dmodbus_print_regs();

    while (true) {
        mb_event_group_t ev_filter = MB_EVENT_HOLDING_REG_RD | MB_EVENT_HOLDING_REG_WR;
        mb_event_group_t event = mbc_slave_check_event(ev_filter);

        mb_param_info_t reg_info;
        ESP_ERROR_CHECK(mbc_slave_get_param_info(&reg_info, MB_PAR_INFO_GET_TIMEOUT));
        dmodbus_log_register(event, &reg_info);

        if (event & MB_EVENT_HOLDING_REG_RD) {
            continue;
        }
        else if (event & MB_EVENT_HOLDING_REG_WR) {
            if (reg_info.address == (uint8_t*)&diag_mb.frame) {
                dmodbus_frame(&diag_mb.frame[0]);
            }
            else if (reg_info.address == (uint8_t*)&diag_mb.can) {
                if (cantp_started)
                    cantp_stop();
                dmodbus_can(&diag_mb.can[0]);
                cantp_started = cantp_start(diag_mb.ecu[0].rxid) == 0;
            }
           ...


void app_main()
{
    esp_log_level_set(LOG_TAG, ESP_LOG_INFO);

    led_setup();
    led_blink(LED_MD_SYS, LED_FREQ_RUNNING);
    led_on(LED_MD_DIAG);

    bool cantp_started = false;
    bool sd_ready = sd_connect() == 0;

    if (sd_ready)
        cantp_started = sd_download();
    else
        ESP_LOGW(LOG_TAG, "SD Card is not ready");

    led_on(LED_MD_SYS);

    dmodbus_start();
    dmodbus_loop(cantp_started);
    dmodbus_stop();
}

@alisitsyn
Copy link
Collaborator

alisitsyn commented Dec 17, 2019

@keelung-yang,
As I understand you are using USB-bridge on the PICO board to communicate through the Modbus. You reconfigured console to be on different pins. This port is used for flashing firmware and has DTR and RTS pins which triggers bootloader and resets chip after flashing. These pins are the reason for the restart.
You can reconfigure them using #client.socket.setRTS(False), client.socket.setDTR(True) but it will not solve all the issues because it depends on USB driver behavior on your host machine and CP2102 can make short pulse which resets ESP32 once you call client.connect() and then close the port. The Modbus component was not designed to use the Flash port for communication due to this behavior.

@keelung-yang
Copy link
Author

@alisitsyn
Thanks for your quickly reply! I'll try tomorrow.
But if that's the reason, why Optimize for size works, but Optimize for performance failed?

@alisitsyn
Copy link
Collaborator

alisitsyn commented Dec 17, 2019

@keelung-yang,
I did not check this case completely because could not reproduce this behavior. I have just some guess for now. However in both logs I see that target was reset after printing of registers (once and twice).

@keelung-yang
Copy link
Author

@alisitsyn
I've remove all components of our application, only simplest modbus reserved for reproducting this issue.
mb_demo.zip
image image

@keelung-yang
Copy link
Author

keelung-yang commented Dec 19, 2019

@alisitsyn
There is another way to cause ESP32 reboot: write 57 registers to modbus, but 56 works sometime, and 55 always ok.
image

I rewrite the test script and modified some c source posted in yesterday.
Here is the new one: test_modbus.zip

@alisitsyn
Copy link
Collaborator

@keelung-yang,
Thank you for this information. Could you also send your log output from UART1 connected to pins 18, 23 when device restarts after failure? Unfortunately can't check it right now because of holidays. The log should include the core dump output.
Thanks.

@keelung-yang
Copy link
Author

@alisitsyn Sorry. I cann't generate core dump: I set core dump to UART in menuconfig, but got an error: No core dump partition found at boot time. Even running test.py -n 57, ESP32 rebooted but not dumped

@alisitsyn
Copy link
Collaborator

@keelung-yang,
Sorry for the late response.
The error "No core dump partition found at boot time" is not critical in this case because you selected core dump output to UART and can be ignored. The coredump code tries to find the coredump partition which is absent in your table. Your example does not cause any exception (GuruMeditation error) and as a result does not trigger core dump. I think resets caused by bootloader reset logic but behavior can vary on different machines. I was able to reproduce the issue with 56 registers being read or write through modbus using your script. However, I can read 56 and more registers using other modbus client (example: ModbusPoll). It is able to read data just fine. I think this error is related to pymodbus component (sync.py:ModbusSerialClient:_recv(self, size)). Could you try to use other client instead of pymodbus and let me know result?
Thanks.

@keelung-yang
Copy link
Author

keelung-yang commented Jan 7, 2020

@alisitsyn
I tested minimalmodbus on latest esp-idf, updated yesterday:

import minimalmodbus as minimd

client = minimd.Instrument('COM32', 1)  # port name, slave address
client.serial.baudrate = 230400

addr = 0
count = 56     # 55 works, 56 failed
print('Read: ', client.read_registers(addr, count))
client.write_registers(addr, [0x1234] * count)
print('Read: ', client.read_registers(addr, count))
Read:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Traceback (most recent call last):
  File "C:\Users\jyang37\Desktop\test_minimd.py", line 9, in <module>
    client.write_registers(0, [0x1234] * count)
  File "C:\Python\lib\site-packages\minimalmodbus.py", line 895, in write_registers
    self._generic_command(
  File "C:\Python\lib\site-packages\minimalmodbus.py", line 1170, in _generic_command
    payload_from_slave = self._perform_command(functioncode, payload_to_slave)
  File "C:\Python\lib\site-packages\minimalmodbus.py", line 1240, in _perform_command
    response = self._communicate(request, number_of_bytes_to_read)
  File "C:\Python\lib\site-packages\minimalmodbus.py", line 1406, in _communicate
    raise NoResponseError("No communication with the instrument (no answer)")
minimalmodbus.NoResponseError: No communication with the instrument (no answer)

Please note, both pymodbus and minimalmodbus are using pyserial as serial port backend. I also tried pyserial v3.4 and master branch v3.4.1, both of them failed too. I checked modbus-tk and umodbus, they also use pyserial.

Now, there are some doubtful points:

  1. CP210x driver
  2. ESP32 modbus module
  3. PySerial
  4. PyModbus/minimalmodbus

For CP210x driver, there are some official ESP32 boards using FTDI USB chip, but I've no such board.
For PySerial, it's widely used in many python packages, and this issue existed more then 10 months at least.

@keelung-yang
Copy link
Author

keelung-yang commented Jan 7, 2020

@alisitsyn
I found a tool QModMaster(v0.5.2-2) which use libmodbus(v3.1.4) instead of pymodbus: https://sourceforge.net/projects/qmodmaster
read always OK, but write FAILED:
image
image
Please note, I've changed baudrate to 9600, since 230400 is not supported by QModMaster

@alisitsyn
Copy link
Collaborator

@keelung-yang,
Thank you for your update. I can reproduce the issue with the writing of 57 registers and confirm it.
The freemodbus stack has the maximum limit of registers for one read/write operations but this limit is bigger than 57 registers. It should generate modbus exception when it exceeds the limit. In this case the reason can be different. I will be able to verify this issue next week and will update you then. For now, I would propose you to divide your write requests to contain 55 or less registers.

@alisitsyn
Copy link
Collaborator

alisitsyn commented Jan 20, 2020

Hi @keelung-yang,

Sorry for late response. I investigated the issue and have some results. Let me explain finding step by step.

  1. Writing 56 and more registers:
    It was the issue in the Modbus serial module.
    The portserial.c:vUartTask incorrectly processed the packet longer than 120 bytes because UART driver was triggering two UART_DATA events per one long packet in this case (UART_INTR_RXFIFO_FULL, then UART_INTR_RXFIFO_TOUT).
    UART driver and Modbus serial event handler is fixed to handle receive timeout event correctly and patch is attached. please check it on your side.
    0001-freemodbus-fix-long-buffer-failure-no-temp-frame-buf.zip

  2. The second issue with resets by the script and unreliable reads:
    If the packet received right after reset the Modbus stack is in INIT stage and can be able to process it after timeout (~10ms).
    The host resets board when it opens port and closes. The script is able to write and read more than 56 registers but there is predefined limit in Modbus stack = 120 registers.
    Below is the picture which describes read/write process. These are physical resets through the EN (RESET) pin of ESP32. However, I cannot explain when on your side the script reboots board twice with optimization for size and just once with debug option.
    In my case it resets always twice as showed on the picture.
    test_py_execution_timings

Please let me know if you have any questions.

@keelung-yang
Copy link
Author

@alisitsyn Thanks for your digging! I'm in Chinese New Year holidays now, and will try it once back into office after 10 days.

@keelung-yang
Copy link
Author

keelung-yang commented Mar 3, 2020

@alisitsyn Sorry for delayed validation, my holidays extended by coronavirus outbreak.

The patch works now, and improved our ECU download speed from 2.8 to 3.3KB/s.
I will close this issue once the patch merged into master.

THANKS for your digging again!

@alisitsyn
Copy link
Collaborator

@keelung-yang,
I glad to hear that your issue solved and project is going. It will take some time to merge it.

@keelung-yang
Copy link
Author

@alisitsyn Seems we back to starting point in this issue's title:

If optimize for performance in menuconfig, modbus failed on incomplete message

I noticed freemodbus long buffer issue has been fixed and merged into master , and I tested it on master branch today, it works now.

But the optimize for performance issue still exists, optimize for size works.

C:\dev\esp\bootloader\test\modbus>test.py -p COM4
Namespace(addr=0, baudrate=230400, data=4660, n=56, port='COM4', strict=False, unit=1)
Write Error: Modbus Error: [Input/Output] Modbus Error: [Invalid Message] Incomplete message received, expected at least 2 bytes (0 received)
Read Error: Modbus Error: [Input/Output] No Response received from the remote unit/Unable to decode response

@alisitsyn
Copy link
Collaborator

@keelung-yang,
I was not able to reproduce the issue with optimize for performance option. Could you send your logs with debug option enabled? See below.
log_debug
I need read/write log when you start script. Thanks.

@alisitsyn
Copy link
Collaborator

alisitsyn commented Apr 1, 2020

Could you also try to reproduce the issue on your side with original examples (do not modify console output pins)? If it exists, please send me extended modbus logs.

@keelung-yang
Copy link
Author

@alisitsyn If set log output level to Debug, my board reboot again and again, even if I changed uart pins from tx(1) / rx(3) to tx(25) / rx(26).

Here are log outputs and updated sdkconfig(enable static allocation) since #5046 cause compiling problem.

I'll try original example ASAP.

sdkconfig.txt
putty_log_level_debug_tx1_rx3.log
putty_log_level_debug_tx26_rx26.log

@alisitsyn
Copy link
Collaborator

alisitsyn commented Apr 1, 2020

There is critical exception when your boards start. The only coredump output in the log that can be useful.
You can decode the coredump output saved into coredump.b64 file using this command:
python ~/esp/esp-idf/components/espcoredump/espcoredump.py info_corefile -t b64 -c coredump.b64 build/your_proj_binary.elf -m

In order to solve the #5046, please regenerate the new sdkconfig by default. Your solution has to work too.

@keelung-yang
Copy link
Author

keelung-yang commented Apr 1, 2020

@alisitsyn Here is coredump output for my app: coredump.txt

For original example, if no any change and disconnect all devices except USB cable from my board, idf -p COM4 flash monitor seems OK.
image
But if set log to debug level, it reboot again and again.

And BTW, I'm using ESP32-PICO-KIT V4.1, if you cann't reproduce it, trying same board please.

@alisitsyn
Copy link
Collaborator

I have exactly the same board and do not have any issues when debugging is enabled.
Please try to remove all build artifacts including sdkconfig and rebuild example then start.
It is strange behavior on your side. The original slave example works just fine with debug messages.

@keelung-yang
Copy link
Author

keelung-yang commented Apr 9, 2020

@alisitsyn

But unstable in #5050 if blocksize == 1000 and optimize for performance, after applying this patch.

It's weird. I'm trying back to status in last week, no error output in #5050. so I reset this patch, and optimize for size in menuconfig, but still get error randomly.

10	blocksize=1000	baudrate=115200	bps=78058.70	upto=67.76%	9.8 KB/s
11	Error
12	Error
20	blocksize=1000	baudrate=115200	bps=78267.54	upto=67.94%	9.8 KB/s
21	Error
22	Error
23	Error
30	blocksize=1000	baudrate=115200	bps=78424.12	upto=68.08%	9.8 KB/s
40	blocksize=1000	baudrate=115200	bps=78359.66	upto=68.02%	9.8 KB/s
50	blocksize=1000	baudrate=115200	bps=78602.23	upto=68.23%	9.8 KB/s
51	Error
52	Error

@alisitsyn
Copy link
Collaborator

alisitsyn commented Apr 9, 2020

@keelung-yang,
The existing uart driver is very heavy and it may be a reason for performance and errors.
I did not investigate the issue enough and do not start tests yet, but can advice some points to check for improvement:

  1. In your script to check performance it would be useful to log read bytes in case of error. This would allow to guess about reason of error (if there continuous duplicated bytes read from FIFO for example).
  2. The Kconfig option FMB_TIMER_ISR_IN_IRAM can be used to place timer and UART interrupt handler into IRAM. This could allow to avoid delays for interrupt handling in some cases. The option for UART driver is UART_ISR_IN_IRAM (already set in dmodbus KConfig, but check it in echo_test).
  3. It is possible to modify Modbus port serial driver to use custom interrupt handler. This is the original way for freemodbus to implement serial r/w. This can allow to avoid some overhead of original ESP_IDF UART driver.
  4. There is other fix in the UART driver that reads UART FIFO through APB instead of DPORT as in last patch (may cause side effects). This is also thing to check.

@keelung-yang
Copy link
Author

@alisitsyn
I've enabled both FMB_TIMER_ISR_IN_IRAM and UART_ISR_IN_IRAM in uart_echo example, but no improving performance. BTW, I'm using updated master branch today.

I'll check the fourth if I have enough time.

@alisitsyn
Copy link
Collaborator

@keelung-yang,

I checked the fix as per option four with optimization for performance. It works for some time then starts failing:

10      100     115200  38553.38        33.47%  4.8 KB/s
20      100     115200  39361.50        34.17%  4.9 KB/s
30      100     115200  39596.45        34.37%  4.9 KB/s
40      100     115200  39753.20        34.51%  5.0 KB/s
50      100     115200  39805.55        34.55%  5.0 KB/s
60      100     115200  39834.80        34.58%  5.0 KB/s
70      100     115200  39855.38        34.60%  5.0 KB/s
80      100     115200  39876.91        34.62%  5.0 KB/s
90      100     115200  39894.54        34.63%  5.0 KB/s
100     100     115200  39907.27        34.64%  5.0 KB/s
110     100     115200  39922.53        34.65%  5.0 KB/s
120     100     115200  39949.10        34.68%  5.0 KB/s
130     100     115200  39956.04        34.68%  5.0 KB/s
....
3050    100     115200  40095.52        34.81%  5.0 KB/s
3060    100     115200  40095.56        34.81%  5.0 KB/s
3070    100     115200  40096.04        34.81%  5.0 KB/s
3080    100     115200  40095.94        34.81%  5.0 KB/s
3090    100     115200  40096.44        34.81%  5.0 KB/s
3100    100     115200  40096.52        34.81%  5.0 KB/s
3108 Error:     45:(5e,5d) 46:(5e,58) 47:(5d,30) 48:(58,93) 49:(30,d9) 50:(93,80) 51:(d9,15) 52:(80,d1) 53:(15,d3) 54:(d1,15) 55:(d3,1f) 56:(15,fc) 57:(1f,72) 58:(fc,01) 59:(72,0c) 60:(01,30) 61:(0c,d9) 62:(30,3f) 63:(d9,08) 64:(3f,6b) 65:(08,db) 66:(6b,d3) 67:(db,ff) 68:(d3,7b) 69:(ff,87) 70:(7b,a3) 71:(87,99) 72:(a3,29) 73:(99,dc) 74:(29,2f) 75:(dc,de) 76:(2f,57) 78:(57,fc) 79:(de,5c) 80:(fc,10) 81:(5c,9f) 82:(10,15) 83:(9f,1a) 84:(15,13) 85:(1a,2c) 86:(13,41) 87:(2c,ad) 88:(41,de) 89:(ad,13) 90:(de,d0) 91:(13,74) 92:(d0,0b) 93:(74,56) 94:(0b,20) 95:(56,48) 96:(20,f0) 97:(48,5f) 98:(f0,07) 99:(5f,2d)
340     100     115200  40034.67        34.75%  5.0 KB/s
350     100     115200  40032.62        34.75%  5.0 KB/s
360     100     115200  40031.36        34.75%  5.0 KB/s
370     100     115200  40030.05        34.75%  5.0 KB/s
380     100     115200  40040.59        34.76%  5.0 KB/s
390     100     115200  40040.99        34.76%  5.0 KB/s
391 Error:      31:(c3,1b) 32:(1b,4c) 33:(4c,86) 34:(86,57) 35:(57,09) 36:(09,e7) 37:(e7,4b) 38:(4b,66) 39:(66,15) 40:(15,97) 41:(97,6f) 42:(6f,35) 43:(35,c1) 44:(c1,03) 45:(03,83) 46:(83,87) 47:(87,d7) 48:(d7,61) 49:(61,3a) 50:(3a,21) 51:(21,77) 52:(77,e2) 53:(e2,1e) 54:(1e,16) 55:(16,33) 56:(33,09) 57:(09,17) 58:(17,10) 59:(10,dd) 60:(dd,bd) 61:(bd,43) 62:(43,41) 63:(41,1c) 64:(1c,c0) 65:(c0,e2) 66:(e2,00) 67:(00,c7) 68:(c7,65) 69:(65,7d) 70:(7d,f0) 71:(f0,6d) 72:(6d,99) 73:(99,05) 74:(05,ed) 75:(ed,83) 76:(83,a5) 77:(a5,8f) 78:(8f,be) 79:(be,7c) 80:(7c,46) 81:(46,7b) 82:(7b,a1) 83:(a1,7d) 84:(7d,56) 85:(56,68) 86:(68,18) 87:(18,24) 88:(24,c6) 89:(c6,5c) 90:(5c,23) 92:(23,6f) 93:(6f,3c) 94:(3c,98) 95:(98,49) 96:(49,97) 97:(97,9a) 98:(9a,36) 99:(36,d1)

31:(c3,1b) - means two different bytes at index 31 of data array when compare sent and received data of echo example. Data from 0 up to 30 is the same.

For this fix the bytes (one or two) are inserted in the middle of packet that looks like known hardware issue with reading the RXFIFO through the AHB address. I saw the same behavior before appying the fix but the duplicated byte was inserted in FIFO at index 0 when read FIFO using DPORT. The patch is attached:
0001-freemodbus_fix_opt_perf_comm_fail_apb_read.zip

@keelung-yang
Copy link
Author

keelung-yang commented Apr 12, 2020

@alisitsyn Maybe there are something about Windows 10 version:

On my home computer win10 1909, I got almost same results, 5.0x KB/s, before and after applying your patch.

But on my work computer, before applying your patch, there is 4.0 KB/s. I'll check it's OS version tomorrow, it's older then 1909.

Update

It's win10 1809 on my work computer.
But is not about windows versions, it's about PINs.
I've updated test script and results on my work computer in #5050

@alisitsyn
Copy link
Collaborator

@keelung-yang,

It can be dependent of Windows serial driver when you use communication over USB bridge. I think it is more important to solve Errors during communication. I will let you know when have new information related to this topic but its cause is rare enough.

Thanks for update.

@keelung-yang
Copy link
Author

@alisitsyn I bought a new ESP32 board ESP-WROVER-KIT V4.1, which using FTDI FT2232HL chip as USB bridge. And benchmarked it as in #5050 once it arrived today. But no obvious improvement. So maybe it's not about windows serial driver since wrover board is using a different USB bridge chip.

I agree solving errors is more important then improving performance, whether developing drivers or apps. But our aplication is aware of performance: we use esp32 to download ECU by CAN bus on vehicle, the battery in both notebook computer and vehicle are limited. For example, need about 10 minutes to download a 5MB S-Record file, only 3~4KB/s on 500kbps CAN bus, and UART communication takes too more time.

@alisitsyn
Copy link
Collaborator

@keelung-yang,
Ok, I can agree that it may be UART driver performance issue. I tested the UART with the patch I sent you before (0001-freemodbus_fix_opt_perf_comm_fail.zip) and it works just fine for several hours with echo example over USB bridge (packet = 100 and 1000 bytes). According to this I think the issue UART communication is solved. Could you try your tests again with the above patch? For Modbus (serial) performance standpoint I think the option with custom serial interrupt handler would work. UART driver performance will be addressed by #5050. Please let me know about your results/ideas.

6730    1000    115200  80044.64        69.48%  10.0 KB/s
6740    1000    115200  80044.70        69.48%  10.0 KB/s
6750    1000    115200  80043.68        69.48%  10.0 KB/s
6760    1000    115200  80044.66        69.48%  10.0 KB/s
6770    1000    115200  80043.72        69.48%  10.0 KB/s
6780    1000    115200  80043.75        69.48%  10.0 KB/s
6790    1000    115200  80044.76        69.48%  10.0 KB/s
6800    1000    115200  80044.78        69.48%  10.0 KB/s
6810    1000    115200  80043.85        69.48%  10.0 KB/s
6820    1000    115200  80044.80        69.48%  10.0 KB/s
6830    1000    115200  80044.84        69.48%  10.0 KB/s
6840    1000    115200  80043.88        69.48%  10.0 KB/s
6850    1000    115200  80043.89        69.48%  10.0 KB/s
6860    1000    115200  80043.88        69.48%  10.0 KB/s
6870    1000    115200  80044.79        69.48%  10.0 KB/s
6880    1000    115200  80043.89        69.48%  10.0 KB/s
6890    1000    115200  80044.89        69.48%  10.0 KB/s

test_results

@keelung-yang
Copy link
Author

keelung-yang commented Apr 25, 2020

@alisitsyn I don't think it's solved. UART_ISR_IN_IRAM cann't enable on latest master. And there are still some failed cases without UART_ISR_IN_IRAM.

If you doubt Windows driver, you can bypass ESP32 board by shorting Tx/Rx pins, and then run test script without any modifying.

Hear are my test results:

  • Host OS: Windows 10 64bit v10.0.18363.815
  • Host UART: FTDI FT232RL, USB to TTL
  • ESP-IDF: master, updated on 2020-04-24, 741960d
  • ESP32 Board: ESP32-PICO-KIT V4.1
  • ESP32 Board CPU frequency: 240 MHz
  • ESP32 Board UART Driver: 10.1.8.2466
  • Base Baudrate: 115200 bps
  • Test Cycle: 1000
Patched Optimize IRAM Baudrate BlockSize Theory Real Failed
No Size Disable x 1 100 14.4 KB/s 4.05 KB/s 0
No Size Disable x 1 1000 14.4 KB/s 9.97 KB/s 0
No Size Disable x 5 100 72 KB/s 6.56 KB/s 0
No Size Disable x 5 1000 72 KB/s 28.77 KB/s 0
No Size Disable x 10 100 144 KB/s 6.64 KB/s 0
No Size Disable x 10 1000 144 KB/s 40.24 KB/s 0
No Size Enable x 1 100 14.4 KB/s ALL
No Size Enable x 1 1000 14.4 KB/s ALL
No Size Enable x 5 100 72 KB/s ALL
No Size Enable x 5 1000 72 KB/s ALL
No Size Enable x 10 100 144 KB/s ALL
No Size Enable x 10 1000 144 KB/s ALL
Yes Perf Enable x 1 100 14.4 KB/s ALL
Yes Perf Enable x 1 1000 14.4 KB/s ALL
Yes Perf Enable x 5 100 72 KB/s ALL
Yes Perf Enable x 5 1000 72 KB/s ALL
Yes Perf Enable x 10 100 144 KB/s ALL
Yes Perf Enable x 10 1000 144 KB/s ALL
Yes Perf Disable x 1 100 14.4 KB/s 4.01 KB/s 0
Yes Perf Disable x 1 1000 14.4 KB/s 27.84 KB/s 0
Yes Perf Disable x 5 100 72 KB/s 6.59 KB/s 0
Yes Perf Disable x 5 1000 72 KB/s 28.73 KB/s 3~5
Yes Perf Disable x 10 100 144 KB/s 6.63 KB/s 0
Yes Perf Disable x 10 1000 144 KB/s 40.12 KB/s 2~5

@alisitsyn
Copy link
Collaborator

alisitsyn commented Apr 28, 2020

@keelung-yang,

UART_ISR_IN_IRAM cann't enable on latest master.

Do you have issues with the uart_echo example crashes when CONFIG_UART_ISR_IN_IRAM is set? In my case it crashes and increasing the stack usStackDepth for the echo_task helps. Please clarify what kind of issue you have with the setting of CONFIG_UART_ISR_IN_IRAM. On my side it can be set and works without failures after solving the above issue.

@keelung-yang
Copy link
Author

@alisitsyn Yes. It's crashing. Here are my settings & log
sdkconfig.txt
putty.log

@keelung-yang
Copy link
Author

Is there any update? This reoccurable issue have been fired almost half a year.

@alisitsyn
Copy link
Collaborator

@keelung-yang,

This issue is closed. In order to track progress appropriately please open new separate issue for this.

Thanks.

@keelung-yang
Copy link
Author

@alisitsyn But why? It's not solved.

@alisitsyn
Copy link
Collaborator

@keelung-yang,

The issue you mentioned is about uart-echo example and uart driver in master branch. It is not about freemodbus component itself and I propose to open new issue.

@keelung-yang
Copy link
Author

@alisitsyn OK then. I'll split it into uart and freemodbus issues, and rewrite reoccurable scripts/steps/setting ... on master branch, once it updated in this week.

@alisitsyn
Copy link
Collaborator

@keelung-yang,

Ok. Thanks.

@alisitsyn
Copy link
Collaborator

Hi @keelung-yang,
The merge of the fixes we discussed will take more time than expected because of some dependencies. I will let you know when it is ready for merge.

@alisitsyn
Copy link
Collaborator

Hi @keelung-yang,

The fixes for issues discussed here have been merged. Could you check it on your side? In case it still exists please reopen separate issues against UART performance, freemodbus or something else. Thanks.

@keelung-yang
Copy link
Author

keelung-yang commented Aug 19, 2020

@alisitsyn Optimizing for performance setting has no infinite reboot loop now on my board. But sending modbus message still get incomplete message in PyModbus (board rebooted I guess). I'm not sure whether the cause is in my app or esp-idf.

I think I need to reproduce it on official examples before firing a new issue, but I'm on other projects in full-time. I'll dig this once released from other projects. If reproducable issuse still exists then I'll fire a new issue.

Thank you very very much!

@alisitsyn
Copy link
Collaborator

@keelung-yang,

I will try to reproduce your issue on my side little bit later. I will be waiting for your test results as well.

Thank you for your feedback and feel free to open new issues.

@keelung-yang
Copy link
Author

keelung-yang commented Aug 22, 2020

@alisitsyn I tried on my simplified modbus test app. It still failed on optimizing for performance setting.

C:\dev\esp\PicoBox\test\modbus>test.py -p COM5
Namespace(addr=0, baudrate=230400, data=4660, n=56, port='COM5', strict=False, unit=1)
Write Error: Modbus Error: [Input/Output] Modbus Error: [Invalid Message] Incomplete message received, expected at least 2 bytes (0 received)
Read Error: Modbus Error: [Input/Output] No Response received from the remote unit/Unable to decode response

As shown in output of test.py and the caption of this issue, we are going back to the beginning.
Here is the source code to reproduct it and the console log:
test_modbus.zip
putty.log

@alisitsyn
Copy link
Collaborator

alisitsyn commented Aug 25, 2020

@keelung-yang,

Thank you for information. I was able to reproduce this. The issue is related to side effects when access FIFOs. The required fix for this was dropped from my MR because it was implemented in updated version of 2020r2 toolchain (compiler inserts memw instrruction as it was in patch above).
I am working on this and will let you know as soon as possible.

@alisitsyn
Copy link
Collaborator

Update for the issue:
I was able to confirm this bug using official uart_events example.
The actual reason for the failure is that uart driver uart_hal_read_rxfifo() reads the UART FIFO using DPORT address.
Actual cause:
The packet sent: 0x01 0x03 0x00 0x00 0x00 0x01 0x84 0x0A
The packet received from FIFO: 0x01 0x01 0x03 0x00 0x00 0x00 0x01 0x84 0x00
One duplicated byte is inserted at first position when reading the FIFO. This causes the Modbus driver to skip the packet as incorrect. The additional "memw" instruction fixes the issue as below:
components/soc/esp32/include/hal/uart_ll.h:

static inline void uart_ll_read_rxfifo(uart_dev_t *hw, uint8_t *buf, uint32_t rd_len)
{
    //Get the UART APB fifo addr. Read fifo, we use APB address
    uint32_t fifo_addr = (hw == &UART0) ? UART_FIFO_REG(0) : (hw == &UART1) ? UART_FIFO_REG(1) : UART_FIFO_REG(2);
    for(int i = 0; i < rd_len; i++) {
        __asm__ __volatile__("memw");
        buf[i] = READ_PERI_REG(fifo_addr);
    }
}

The bug happens only when CONFIG_ESP32_DEFAULT_CPU_FREQ_240=y and CONFIG_COMPILER_OPTIMIZATION_PERF=y, otherwise the UART works just fine.
I think this is access desynchronization issue between the DPORT which uses CPU frequency and AHB bus (uses different frequency). The additional "memw" instruction adds additional cycles to synchronize access.
Please try the above fix and try to set the CPU frequency to 160Mhz. Any of these options should solve the issue.
Please let me know the result. Thanks.

@keelung-yang
Copy link
Author

@alisitsyn I've test __asm__ __volatile__("memw"); both in demo app test_modbus.zip posted here, and a more complicated application in a real world, downloading ECU by ISO14229-3 UDSonCAN by modbus from PC to ESP32 to ECU.

This one line patch works! THANKS!

@alisitsyn
Copy link
Collaborator

@keelung-yang,
Thank you for the test results. I am glad that it works for you. The issue with UART FIFO will be investigated by digital team and documented later. As a temporary fix the nop.n instructions will be added to add additional delay cycles. Please feel free to open new issues in separate areas once you find them.

@AxelLin
Copy link
Contributor

AxelLin commented Feb 26, 2022

The uart fix is in 3a51584, maybe this issue should be closed?

@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting and sharing the notes @AxelLin , feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants