Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UART RX inserts extra byte when Clock is 240MHz and Optimize is set for performance (IDFGH-3934) #5824

Closed
davidthings opened this issue Sep 3, 2020 · 4 comments

Comments

@davidthings
Copy link

This is actually noted and solved in #4499, but the context (modbus) might not lead people to the correct solution.

Per #4499 (comment) , an extra line is needed in the uart hal internals)

static inline void uart_ll_read_rxfifo(uart_dev_t *hw, uint8_t *buf, uint32_t rd_len)
{
    //Get the UART APB fifo addr. Read fifo, we use APB address
    uint32_t fifo_addr = (hw == &UART0) ? UART_FIFO_REG(0) : (hw == &UART1) ? UART_FIFO_REG(1) : UART_FIFO_REG(2);
    for(int i = 0; i < rd_len; i++) {
        __asm__ __volatile__("memw");  // <<<<<<< Add this
        buf[i] = READ_PERI_REG(fifo_addr);
    }
}

At least on my set up, this does fix the bug. Since I've spent some time describing the problem before I stumbled on the fix, here it is!

Environment

  • Development Kit: ESP32-DevKitC
  • Kit version: v4
  • Module or chip used: ESP32-WROOM-32D
  • IDF version: v4.2-dev-2678-g6c17e3a64-dirty
  • Build System: idf.py
  • Compiler version :xtensa-esp32-elf-gcc (crosstool-NG esp-2020r2) 8.2.0
  • Operating System: Linux
  • Using an IDE?: No
  • Power Supply: USB

Problem Description

When reading from the UART at 240MHz with optimize for performance, the first byte of a message is duplicated (and the last lost).

The problem appears independent of FreeRTOS tick frequency, or baud rate.

The test set up is a loopback from pin 17 -> pin 16, and run the following:

#include <esp_pthread.h>
#include <freertos/FreeRTOS.h>
#include <freertos/task.h>
#include <esp_log.h>
#include <esp_task_wdt.h>
#include "esp_system.h"
#include "driver/uart.h"
#include "driver/gpio.h"

#define UART_RX_BUFFER_SIZE 1024 * 2
#define UART_TX_BUFFER_SIZE 1024 * 2

void app_main() {

    int uart_num = 2;
    int tx_pin = 17;
    int rx_pin = 16;
    int baud = 115200;
    // int baud = 1000000;

    const uart_config_t uart_config = {
        .baud_rate = baud,
        .data_bits = UART_DATA_8_BITS,
        .parity = UART_PARITY_DISABLE,
        .stop_bits = UART_STOP_BITS_1,
        .flow_ctrl = UART_HW_FLOWCTRL_DISABLE,
        .rx_flow_ctrl_thresh = 0,
        .source_clk = UART_SCLK_APB,
    };

    uart_param_config(uart_num, &uart_config);
    uart_driver_install(uart_num, UART_RX_BUFFER_SIZE, UART_TX_BUFFER_SIZE, 0, NULL, 0);
    uart_set_pin(uart_num, tx_pin, rx_pin, UART_PIN_NO_CHANGE, UART_PIN_NO_CHANGE);

    int size = 8;

    uint8_t out_buffer[ size ];
    for ( int i = 0; i < size; i++ )
        out_buffer[ i ] = i + 10;

    for ( int trial = 0; trial < 4; trial++ ) {

        printf( "Trial %d\n", trial );

        // Write the buffer out
        printf( "    Write %d\n", size );
        uart_write_bytes(uart_num, (const char*)out_buffer, size );

        int total = 0;
        int count = 0;

        // Attempt to read the buffer back in
        uint8_t in_buffer[ 2 * size ];

        while ( ( count < 10000 ) && ( total < size ) ) {
            int s = uart_read_bytes(uart_num, &in_buffer[total], size-total, 0 );
            printf( "    Read %d\n", s );
            if ( s > 0 )
                total += s;
            if ( total < size ) {
                vTaskDelay( 1 );
            }
            count++;
        }

        // Wait around for any extra chars
        vTaskDelay( 10 );

        int s = uart_read_bytes(uart_num, &in_buffer[total], size-total, 0 );
        printf( "    Extra Read %d\n", s );


        if ( total != size ) {
            printf( "   Failed Size %d\n", total );
        } else {
            printf( "    Correct Length %d (Loops %d) \n", total, count );

            int cc = 0;
            for ( int i = 0; i < total; i++ ) {
                if ( in_buffer[ i ] == out_buffer[ i ] ) {
                    cc++;
                }
            }

            if ( cc != total ) {
                printf( "    Read Failure (%d/%d correct) \n", cc, size );
                for ( int i = 0; i < total; i++ ) {
                    printf( "        %02d: O:%d I:%d \n", i, (int)out_buffer[ i ], (int)in_buffer[ i ] );
                }

            }
        }
    }
}

Expected Behavior

When either running slower than 240MHz or not optimizing for performance this is the output

Trial 0
    Write 8
    Read 0
    Read 8
    Extra Read 0
    Correct Length 8 (Loops 2) 
Trial 1
    Write 8
    Read 0
    Read 0
    Read 8
    Extra Read 0
    Correct Length 8 (Loops 3) 
Trial 2
    Write 8
    Read 0
    Read 0
    Read 8
    Extra Read 0
    Correct Length 8 (Loops 3) 
Trial 3
    Write 8
    Read 0
    Read 0
    Read 8
    Extra Read 0
    Correct Length 8 (Loops 3) 

Actual Behavior

Whereas - at 240MHz and Optimizing for performance (o2), this is the result:

Trial 0
    Write 8
    Read 0
    Read 8
    Extra Read 0
    Correct Length 8 (Loops 2) 
    Read Failure (1/8 correct) 
        00: O:10 I:10 
        01: O:11 I:10 
        02: O:12 I:11 
        03: O:13 I:12 
        04: O:14 I:13 
        05: O:15 I:14 
        06: O:16 I:15 
        07: O:17 I:16 
Trial 1
    Write 8
    Read 0
    Read 8
    Extra Read 0
    Correct Length 8 (Loops 2) 
    Read Failure (1/8 correct) 
        00: O:10 I:10 
        01: O:11 I:10 
        02: O:12 I:11 
        03: O:13 I:12 
        04: O:14 I:13 
        05: O:15 I:14 
        06: O:16 I:15 
        07: O:17 I:16 
Trial 2
    Write 8
    Read 0
    Read 0
    Read 8
    Extra Read 0
    Correct Length 8 (Loops 3) 
    Read Failure (1/8 correct) 
        00: O:10 I:10 
        01: O:11 I:10 
        02: O:12 I:11 
        03: O:13 I:12 
        04: O:14 I:13 
        05: O:15 I:14 
        06: O:16 I:15 
        07: O:17 I:16 
Trial 3
    Write 8
    Read 0
    Read 0
    Read 8
    Extra Read 0
    Correct Length 8 (Loops 3) 
    Read Failure (1/8 correct) 
        00: O:10 I:10 
        01: O:11 I:10 
        02: O:12 I:11 
        03: O:13 I:12 
        04: O:14 I:13 
        05: O:15 I:14 
        06: O:16 I:15 
        07: O:17 I:16 

Steps to reproduce

Build with and without 240MHz & O2 optimization.

@github-actions github-actions bot changed the title UART RX inserts extra byte when Clock is 240MHz and Optimize is set for performance UART RX inserts extra byte when Clock is 240MHz and Optimize is set for performance (IDFGH-3934) Sep 3, 2020
@alisitsyn
Copy link
Collaborator

Hi @davidthings,
Thank you for this issue. This is a desynchronization issue between DPORT and AHB when reading FIFO. I initiated an investigation involving the digital team in order to find the correct solution. As a temporary solution propose to add delay ticks into the loop when reading the FIFO. However, I understand it can be dependent on some factors and might not solve the issue reliably. I propose to wait for a response from the digital team.

Current solution similar to what I proposed in #4499 and it works on my side is:
(the memw barrier is inserted by compiler 2020r2)

static inline void uart_ll_read_rxfifo(uart_dev_t *hw, uint8_t *buf, uint32_t rd_len)
{
    //Get the UART APB fifo addr. Read fifo, we use APB address
    uint32_t fifo_addr = (hw == &UART0) ? UART_FIFO_REG(0) : (hw == &UART1) ? UART_FIFO_REG(1) : UART_FIFO_REG(2);
    for(int i = 0; i < rd_len; i++) {
        buf[i] = READ_PERI_REG(fifo_addr);
#ifdef CONFIG_COMPILER_OPTIMIZATION_PERF
        __asm__ __volatile__("nop"); // add additional delays to synchronize access
        __asm__ __volatile__("nop");
#endif
    }
}

I will let you know as soon as I get new information or results.

@alisitsyn
Copy link
Collaborator

Hi @davidthings,

Below is the information about this issue:
3.16. There are restrictions on the CPU access to the two address spaces 0x3FF0_0000 ~ 0x3FF1_EFFF and 0x3FF4_0000 ~ 0x3FF7_FFFF.
description:

  1. The CPU read operations that fall in these two address spaces are speculative. Speculative read operations will cause the behavior of the program description to be inconsistent with the actual behavior of the hardware.
  2. If two CPUs continuously access the contents of the address space 0x3FF0_0000 ~ 0x3FF1_EFFF at the same time, some of the accesses may be lost.
  3. When the CPU reads the FIFO through the 0x3FF4_0000 ~ 0x3FF7_0000 address space, the FIFO read pointer will be updated with a delay. After the CPU frequency increases, the time interval between two consecutive FIFO reads initiated by the CPU is shortened. When a new read FIFO request comes, the FIFO read pointer has not been updated, which will cause the CPU to read the value of the previous read FIFO.

Solution:

  1. The CPU accesses falling in these two address spaces need to add the "MEMW" instruction before the corresponding operation. That is, in C/C++, software needs to add the "volatile" attribute when accessing the registers in these two addresses.
  2. When the CPU frequency is 160 MHz, add 6 “nop” instructions between two consecutive FIFO reads. When the CPU frequency is 240 MHz, add 7 "nop" instructions between two consecutive FIFO reads.

The preliminary fix for driver to read rx FIFO is already merged. The documentation update is in progress.

@alisitsyn
Copy link
Collaborator

The issue has been fixed (Commit: 3a51584) and ECO document has been updated: www.espressif.com/sites/default/files/documentation/eco_and_workarounds_for_bugs_in_esp32_en.pdf

The issue can be closed.

@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting, feel free to reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants