Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W5500 ethernet examples don't work unless compiler is set to Debug (-Og) (IDFGH-4776) #6579

Closed
ntwallace opened this issue Feb 19, 2021 · 4 comments
Labels
Status: Done Issue is done internally

Comments

@ntwallace
Copy link

ntwallace commented Feb 19, 2021

Environment

  • Module or chip used: ESP32-WROVER-E
  • IDF version: v4.4-dev
  • Build System: CMake
  • Compiler version: xtensa-esp32-elf-gcc (crosstool-NG esp-2020r3) 8.4.0
  • Operating System: macOS
  • Using an IDE?: VSCode
  • Power Supply: USB & External

Problem Description

The existing W5500 ethernet implementation does not work unless COMPILER_OPTIMIZATION = COMPILER_OPTIMIZATION_DEFAULT. Setting compiler flags to any other level results in WDT panics from the TCP/IP task. This is true of all examples and other code. I have tested with IDF 4.2-release, 4.3-dev, and 4.4-dev.

This appears similar to #6233, where the emac is stuck inside a while loop at the end of the transmit function, but in this case it is stuck inside the do while loop in w5500_get_rx_received_size

static esp_err_t w5500_get_rx_received_size(emac_w5500_t *emac, uint16_t *size)
{
    esp_err_t ret = ESP_OK;
    uint16_t received0, received1 = 0;
    do {
        MAC_CHECK(w5500_read(emac, W5500_REG_SOCK_RX_RSR(0), &received0, sizeof(received0)) == ESP_OK, "read RX RSR failed", err, ESP_FAIL);
        MAC_CHECK(w5500_read(emac, W5500_REG_SOCK_RX_RSR(0), &received1, sizeof(received1)) == ESP_OK, "read RX RSR failed", err, ESP_FAIL);
        ESP_LOGI(TAG, "In get_rx_received_size loop");
    } while (received0 != received1);  **// stuck here**
    *size = __builtin_bswap16(received0);

err:
    return ret;
}

Expected Behavior

W5500 implementation works at all compile levels.

Actual Behavior

WDT panic, I have added LOGIs inside all the while loops in esp_eth_mac_w5500.c:

I (1522) esp_netif_objects: esp_netif_add_to_list 0x3ffe1440
I (1527) esp_netif_objects: esp_netif_add_to_list netif added successfully (total netifs: 1)
I (1554) w5500-mac: EMAC setup parent functions
I (1554) w5500-mac: EMAC created SPI lock
I (1554) w5500-mac: EMAC created task
I (1559) w5500-mac: version=4
I (1573) esp_eth.netif.glue: 02:00:00:12:34:56
I (1573) esp_eth.netif.glue: ethernet attached to netif
I (3584) w5500-phy: Updating link status
I (5584) w5500-phy: Updating link status
I (7584) w5500-phy: Updating link status
I (9584) w5500-phy: Updating link status
I (11584) w5500-phy: Updating link status
I (13584) w5500-phy: Updating link status
I (13584) w5500-mac: Link 100M
I (13584) w5500-mac: Full duplex
I (13585) w5500-mac: link is up
I (13589) w5500-mac: In get_tx_free_size loop
I (13782) w5500-mac: In get_rx_received_size loop
I (13783) w5500-mac: In get_rx_received_size loop
I (13783) w5500-mac: In get_rx_received_size loop
I (13787) w5500-mac: In get_rx_received_size loop
I (13793) w5500-mac: In get_rx_received_size loop
I (13798) w5500-mac: In get_rx_received_size loop
I (13804) w5500-mac: In get_rx_received_size loop
I (13809) w5500-mac: In get_rx_received_size loop

...

E (18781) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (18781) task_wdt:  - IDLE (CPU 0)
E (18781) task_wdt: Tasks currently running:
E (18781) task_wdt: CPU 0: w5500_tsk
E (18781) task_wdt: CPU 1: IDLE
E (18781) task_wdt: Aborting.

Steps to reproduce

  1. Set COMPILER_OPTIMIZATION to level other than DEBUG (-Og) in menuconfig for ethernet examples
  2. Flash and get WDT reset
@github-actions github-actions bot changed the title W5500 ethernet examples don't work unless compiler is set to Debug (-Og) W5500 ethernet examples don't work unless compiler is set to Debug (-Og) (IDFGH-4776) Feb 19, 2021
@ntwallace
Copy link
Author

ntwallace commented Feb 19, 2021

Quick fix based on similar solution in #6233

static esp_err_t w5500_get_rx_received_size(emac_w5500_t *emac, uint16_t *size)
{
    esp_err_t ret = ESP_OK;
    uint16_t received0, received1 = 0;
    volatile uint16_t timeout = 0;

    do {
        MAC_CHECK(w5500_read(emac, W5500_REG_SOCK_RX_RSR(0), &received0, sizeof(received0)) == ESP_OK, "read RX RSR failed", err, ESP_FAIL);
        MAC_CHECK(w5500_read(emac, W5500_REG_SOCK_RX_RSR(0), &received1, sizeof(received1)) == ESP_OK, "read RX RSR failed", err, ESP_FAIL);
    } while (received0 != received1 && (timeout++ < 5000));

    if (timeout >= 5000 && (received0 != received1)) {
        ESP_LOGE(TAG, "Timeout exceeded: %d, received0 = %d, received1 = %d\n", timeout, received0, received1); 
        ret = ESP_FAIL;
        goto err;
    }

    *size = __builtin_bswap16(received0);

err:
    return ret;
}

@david-cermak
Copy link
Collaborator

Hi @ntwallace

Thanks for reporting this bug. The reason for this infinite looping is different from the linked issue, though (we should read the register multiple times according to the datasheet) The problem here is that the underlying SPI communication uses 4-byte boundary operations and thus rewriting the other register during the second read (registers are 2 byte size, but usually placed 4 bytes apart, esp. for -Og)

The below patch should fix the issue:
esp_eth-Fix-w5500-to-correctly-read-registers-on-Os.patch.txt

@KaeLL
Copy link
Contributor

KaeLL commented Apr 15, 2021

@david-cermak would you care to elaborate a bit more about this issue? I didn't understand.

@david-cermak
Copy link
Collaborator

@KaeLL The main trouble was in reading w5500 registers with SPI driver using DMA, which implies 1) data must be aligned to 4 byte boundary 2) data size must min 4 bytes.
If you look at the code above #6579 (comment), we're reading 16bit registers directly to a stack memory with no restrictions about any 4 byte alignment or padding, so it is only up to the compiler to decide where and how the two uint16_t will be placed. And, here, the xtensa-esp32-elf-gcc(8.4.0) (with -Os) decided to place these two stack variables exactly next to each other (in a packed way), so every read operation wrote 4 bytes on stack corrupting the other register to never exit this loop (Curiously, the same version of xtensa-esp32s2-elf-gcc kept the variables padded, so the issue wasn't reproducible on S2)

Fixed by using direct reads to the SPI internal structure (which correctly defines the alignment and padding requirements) for all registers of size smaller than 4 bytes.

An alternatively we could promote all register declarations to int, e.g.

    uint32_t received0, received1 = 0;

I'm also looking at other drivers which could suffer from the same symptoms.

projectgus pushed a commit that referenced this issue Jun 30, 2021
Reading SPI data may come in 4-byte units and thus result in unwanted
overwrites if smaller size registers read, especially if multiple placed
one after another. Fixed by using direct reads to `trans` structure for
sizes smaller or equal to 4.

Closes #6579
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

4 participants