W5500 ethernet examples don't work unless compiler is set to Debug (-Og) (IDFGH-4776) #6579

ntwallace · 2021-02-19T17:36:21Z

Environment

Module or chip used: ESP32-WROVER-E
IDF version: v4.4-dev
Build System: CMake
Compiler version: xtensa-esp32-elf-gcc (crosstool-NG esp-2020r3) 8.4.0
Operating System: macOS
Using an IDE?: VSCode
Power Supply: USB & External

Problem Description

The existing W5500 ethernet implementation does not work unless COMPILER_OPTIMIZATION = COMPILER_OPTIMIZATION_DEFAULT. Setting compiler flags to any other level results in WDT panics from the TCP/IP task. This is true of all examples and other code. I have tested with IDF 4.2-release, 4.3-dev, and 4.4-dev.

This appears similar to #6233, where the emac is stuck inside a while loop at the end of the transmit function, but in this case it is stuck inside the do while loop in w5500_get_rx_received_size

static esp_err_t w5500_get_rx_received_size(emac_w5500_t *emac, uint16_t *size)
{
    esp_err_t ret = ESP_OK;
    uint16_t received0, received1 = 0;
    do {
        MAC_CHECK(w5500_read(emac, W5500_REG_SOCK_RX_RSR(0), &received0, sizeof(received0)) == ESP_OK, "read RX RSR failed", err, ESP_FAIL);
        MAC_CHECK(w5500_read(emac, W5500_REG_SOCK_RX_RSR(0), &received1, sizeof(received1)) == ESP_OK, "read RX RSR failed", err, ESP_FAIL);
        ESP_LOGI(TAG, "In get_rx_received_size loop");
    } while (received0 != received1);  **// stuck here**
    *size = __builtin_bswap16(received0);

err:
    return ret;
}

Expected Behavior

W5500 implementation works at all compile levels.

Actual Behavior

WDT panic, I have added LOGIs inside all the while loops in esp_eth_mac_w5500.c:

I (1522) esp_netif_objects: esp_netif_add_to_list 0x3ffe1440
I (1527) esp_netif_objects: esp_netif_add_to_list netif added successfully (total netifs: 1)
I (1554) w5500-mac: EMAC setup parent functions
I (1554) w5500-mac: EMAC created SPI lock
I (1554) w5500-mac: EMAC created task
I (1559) w5500-mac: version=4
I (1573) esp_eth.netif.glue: 02:00:00:12:34:56
I (1573) esp_eth.netif.glue: ethernet attached to netif
I (3584) w5500-phy: Updating link status
I (5584) w5500-phy: Updating link status
I (7584) w5500-phy: Updating link status
I (9584) w5500-phy: Updating link status
I (11584) w5500-phy: Updating link status
I (13584) w5500-phy: Updating link status
I (13584) w5500-mac: Link 100M
I (13584) w5500-mac: Full duplex
I (13585) w5500-mac: link is up
I (13589) w5500-mac: In get_tx_free_size loop
I (13782) w5500-mac: In get_rx_received_size loop
I (13783) w5500-mac: In get_rx_received_size loop
I (13783) w5500-mac: In get_rx_received_size loop
I (13787) w5500-mac: In get_rx_received_size loop
I (13793) w5500-mac: In get_rx_received_size loop
I (13798) w5500-mac: In get_rx_received_size loop
I (13804) w5500-mac: In get_rx_received_size loop
I (13809) w5500-mac: In get_rx_received_size loop

...

E (18781) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (18781) task_wdt:  - IDLE (CPU 0)
E (18781) task_wdt: Tasks currently running:
E (18781) task_wdt: CPU 0: w5500_tsk
E (18781) task_wdt: CPU 1: IDLE
E (18781) task_wdt: Aborting.

Steps to reproduce

Set COMPILER_OPTIMIZATION to level other than DEBUG (-Og) in menuconfig for ethernet examples
Flash and get WDT reset

The text was updated successfully, but these errors were encountered:

ntwallace · 2021-02-19T19:56:41Z

Quick fix based on similar solution in #6233

static esp_err_t w5500_get_rx_received_size(emac_w5500_t *emac, uint16_t *size)
{
    esp_err_t ret = ESP_OK;
    uint16_t received0, received1 = 0;
    volatile uint16_t timeout = 0;

    do {
        MAC_CHECK(w5500_read(emac, W5500_REG_SOCK_RX_RSR(0), &received0, sizeof(received0)) == ESP_OK, "read RX RSR failed", err, ESP_FAIL);
        MAC_CHECK(w5500_read(emac, W5500_REG_SOCK_RX_RSR(0), &received1, sizeof(received1)) == ESP_OK, "read RX RSR failed", err, ESP_FAIL);
    } while (received0 != received1 && (timeout++ < 5000));

    if (timeout >= 5000 && (received0 != received1)) {
        ESP_LOGE(TAG, "Timeout exceeded: %d, received0 = %d, received1 = %d\n", timeout, received0, received1); 
        ret = ESP_FAIL;
        goto err;
    }

    *size = __builtin_bswap16(received0);

err:
    return ret;
}

david-cermak · 2021-04-08T08:47:10Z

Hi @ntwallace

Thanks for reporting this bug. The reason for this infinite looping is different from the linked issue, though (we should read the register multiple times according to the datasheet) The problem here is that the underlying SPI communication uses 4-byte boundary operations and thus rewriting the other register during the second read (registers are 2 byte size, but usually placed 4 bytes apart, esp. for -Og)

The below patch should fix the issue:
esp_eth-Fix-w5500-to-correctly-read-registers-on-Os.patch.txt

KaeLL · 2021-04-15T15:37:54Z

@david-cermak would you care to elaborate a bit more about this issue? I didn't understand.

david-cermak · 2021-04-17T16:33:37Z

@KaeLL The main trouble was in reading w5500 registers with SPI driver using DMA, which implies 1) data must be aligned to 4 byte boundary 2) data size must min 4 bytes.
If you look at the code above #6579 (comment), we're reading 16bit registers directly to a stack memory with no restrictions about any 4 byte alignment or padding, so it is only up to the compiler to decide where and how the two uint16_t will be placed. And, here, the xtensa-esp32-elf-gcc(8.4.0) (with -Os) decided to place these two stack variables exactly next to each other (in a packed way), so every read operation wrote 4 bytes on stack corrupting the other register to never exit this loop (Curiously, the same version of xtensa-esp32s2-elf-gcc kept the variables padded, so the issue wasn't reproducible on S2)

Fixed by using direct reads to the SPI internal structure (which correctly defines the alignment and padding requirements) for all registers of size smaller than 4 bytes.

An alternatively we could promote all register declarations to int, e.g.

    uint32_t received0, received1 = 0;

I'm also looking at other drivers which could suffer from the same symptoms.

Reading SPI data may come in 4-byte units and thus result in unwanted overwrites if smaller size registers read, especially if multiple placed one after another. Fixed by using direct reads to `trans` structure for sizes smaller or equal to 4. Closes #6579

github-actions bot changed the title ~~W5500 ethernet examples don't work unless compiler is set to Debug (-Og)~~ W5500 ethernet examples don't work unless compiler is set to Debug (-Og) (IDFGH-4776) Feb 19, 2021

david-cermak mentioned this issue Apr 9, 2021

W5500 Ethernet driver bug (IDFGH-4400) #6233

Closed

espressif-bot added the Status: Done Issue is done internally label Apr 12, 2021

espressif-bot closed this as completed in dfe9580 Apr 15, 2021

This was referenced May 5, 2021

Ethernet Basic Example not working with W5500 (IDFGH-4672) #6484

Closed

UDP freezes after a few seconds with W5500, spi transmit fails. (IDFGH-4769) #6572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

W5500 ethernet examples don't work unless compiler is set to Debug (-Og) (IDFGH-4776) #6579

W5500 ethernet examples don't work unless compiler is set to Debug (-Og) (IDFGH-4776) #6579

ntwallace commented Feb 19, 2021 •

edited

Loading

ntwallace commented Feb 19, 2021 •

edited

Loading

david-cermak commented Apr 8, 2021

KaeLL commented Apr 15, 2021

david-cermak commented Apr 17, 2021

W5500 ethernet examples don't work unless compiler is set to Debug (-Og) (IDFGH-4776) #6579

W5500 ethernet examples don't work unless compiler is set to Debug (-Og) (IDFGH-4776) #6579

Comments

ntwallace commented Feb 19, 2021 • edited Loading

Environment

Problem Description

Expected Behavior

Actual Behavior

Steps to reproduce

ntwallace commented Feb 19, 2021 • edited Loading

david-cermak commented Apr 8, 2021

KaeLL commented Apr 15, 2021

david-cermak commented Apr 17, 2021

ntwallace commented Feb 19, 2021 •

edited

Loading

ntwallace commented Feb 19, 2021 •

edited

Loading