From a9d0825b1b55eb5d15866754b276b12d6de32ae1 Mon Sep 17 00:00:00 2001 From: Li Xiaoyi Date: Fri, 19 May 2023 18:38:22 +0800 Subject: [PATCH] Docs: add CN translation for api-guides/performance/speed.rst --- docs/doxygen/Doxyfile | 1 + docs/en/api-guides/performance/speed.rst | 151 +++++------ docs/zh_CN/api-guides/performance/speed.rst | 262 +++++++++++++++++++- 3 files changed, 339 insertions(+), 75 deletions(-) diff --git a/docs/doxygen/Doxyfile b/docs/doxygen/Doxyfile index e76ff563b01c..c4be92f90b96 100644 --- a/docs/doxygen/Doxyfile +++ b/docs/doxygen/Doxyfile @@ -177,6 +177,7 @@ INPUT = \ $(PROJECT_PATH)/components/esp_system/include/esp_system.h \ $(PROJECT_PATH)/components/esp_system/include/esp_systick_etm.h \ $(PROJECT_PATH)/components/esp_system/include/esp_task_wdt.h \ + $(PROJECT_PATH)/components/esp_system/include/esp_task.h \ $(PROJECT_PATH)/components/esp_timer/include/esp_timer.h \ $(PROJECT_PATH)/components/esp_wifi/include/esp_mesh.h \ $(PROJECT_PATH)/components/esp_wifi/include/esp_now.h \ diff --git a/docs/en/api-guides/performance/speed.rst b/docs/en/api-guides/performance/speed.rst index b168b003db04..62be87270030 100644 --- a/docs/en/api-guides/performance/speed.rst +++ b/docs/en/api-guides/performance/speed.rst @@ -1,20 +1,22 @@ -Maximizing Execution Speed -========================== +Speed Optimization +================== + +:link_to_translation:`zh_CN:[中文]` {IDF_TARGET_CONTROLLER_CORE_CONFIG:default="CONFIG_BT_CTRL_PINNED_TO_CORE", esp32="CONFIG_BTDM_CTRL_PINNED_TO_CORE_CHOICE", esp32s3="CONFIG_BT_CTRL_PINNED_TO_CORE_CHOICE"} -{IDF_TARGET_RF_TYPE:default="Wi-Fi/BT", esp32s2="Wi-Fi", esp32c6="Wi-Fi/BT/802.15.4", esp32h2="BT/802.15.4"} +{IDF_TARGET_RF_TYPE:default="Wi-Fi/Bluetooth", esp32s2="Wi-Fi", esp32c6="Wi-Fi/Bluetooth/802.15.4", esp32h2="Bluetooth/802.15.4"} Overview -------- -Optimizing execution speed is a key element of software performance. Code that executes faster can also have other positive effects, like reducing overall power consumption. However, improving execution speed may have trade-offs with other aspects of performance such as :doc:`size`. +Optimizing execution speed is a key element of software performance. Code that executes faster can also have other positive effects, e.g., reducing overall power consumption. However, improving execution speed may have trade-offs with other aspects of performance such as :doc:`size`. -Choose What To Optimize +Choose What to Optimize ----------------------- If a function in the application firmware is executed once per week in the background, it may not matter if that function takes 10 ms or 100 ms to execute. If a function is executed constantly at 10 Hz, it matters greatly if it takes 10 ms or 100 ms to execute. -Most application firmwares will only have a small set of functions which require optimal performance. Perhaps those functions are executed very often, or have to meet some application requirements for latency or throughput. Optimization efforts should be targeted at these particular functions. +Most kinds of application firmware only have a small set of functions that require optimal performance. Perhaps those functions are executed very often, or have to meet some application requirements for latency or throughput. Optimization efforts should be targeted at these particular functions. Measuring Performance --------------------- @@ -24,7 +26,7 @@ The first step to improving something is to measure it. Basic Performance Measurements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -If measuring performance relative to an external interaction with the world, you may be able to measure this directly (for example see the examples :example:`wifi/iperf` and :example:`ethernet/iperf` for measuring general network performance, or you can use an oscilloscope or logic analyzer to measure timing of an interaction with a device peripheral.) +You may be able to measure directly the performance relative to an external interaction with the world, e.g., see the examples :example:`wifi/iperf` and :example:`ethernet/iperf` for measuring general network performance. Or you can use an oscilloscope or logic analyzer to measure the timing of an interaction with a device peripheral. Otherwise, one way to measure performance is to augment the code to take timing measurements: @@ -46,17 +48,17 @@ Otherwise, one way to measure performance is to augment the code to take timing MEASUREMENTS, (end - start)/1000, (end - start)/MEASUREMENTS); } -Executing the target multiple times can help average out factors like RTOS context switches, overhead of measurements, etc. +Executing the target multiple times can help average out factors, e.g., RTOS context switches, overhead of measurements, etc. - Using :cpp:func:`esp_timer_get_time` generates "wall clock" timestamps with microsecond precision, but has moderate overhead each time the timing functions are called. -- It's also possible to use the standard Unix ``gettimeofday()`` and ``utime()`` functions, although the overhead is slightly higher. -- Otherwise, including ``hal/cpu_hal.h`` and calling the HAL function ``cpu_hal_get_cycle_count()`` will return the number of CPU cycles executed. This function has lower overhead than the others. It is good for measuring very short execution times with high precision. +- It is also possible to use the standard Unix ``gettimeofday()`` and ``utime()`` functions, although the overhead is slightly higher. +- Otherwise, including ``hal/cpu_hal.h`` and calling the HAL function ``cpu_hal_get_cycle_count()`` returns the number of CPU cycles executed. This function has lower overhead than the others, which is good for measuring very short execution times with high precision. .. only:: not CONFIG_FREERTOS_UNICORE The CPU cycles are counted per-core, so only use this method from an interrupt handler, or a task that is pinned to a single core. -- While performing "microbenchmarks" (i.e. benchmarking only a very small routine of code that runs in less than 1-2 milliseconds), the flash cache performance can sometimes cause big variations in timing measurements depending on the binary. This happens because binary layout can cause different patterns of cache misses in a particular sequence of execution. If the test code is larger then this effect usually averages out. Executing a small function multiple times when benchmarking can help reduce the impact of flash cache misses. Alternatively, move this code to IRAM (see :ref:`speed-targeted-optimizations`). +- While performing "microbenchmarks" (i.e., benchmarking only a very small routine of code that runs in less than 1-2 milliseconds), the flash cache performance can sometimes cause big variations in timing measurements depending on the binary. This happens because binary layout can cause different patterns of cache misses in a particular sequence of execution. If the test code is larger, then this effect usually averages out. Executing a small function multiple times when benchmarking can help reduce the impact of flash cache misses. Alternatively, move this code to IRAM (see :ref:`speed-targeted-optimizations`). External Tracing ^^^^^^^^^^^^^^^^ @@ -66,58 +68,58 @@ The :doc:`/api-guides/app_trace` allows measuring code execution with minimal im Tasks ^^^^^ -If the option :ref:`CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS` is enabled then the FreeRTOS API :cpp:func:`vTaskGetRunTimeStats` can be used to retrieve runtime information about the processor time used by each FreeRTOS task. +If the option :ref:`CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS` is enabled, then the FreeRTOS API :cpp:func:`vTaskGetRunTimeStats` can be used to retrieve runtime information about the processor time used by each FreeRTOS task. :ref:`SEGGER SystemView ` is an excellent tool for visualizing task execution and looking for performance issues or improvements in the system as a whole. Improving Overall Speed ----------------------- -The following optimizations will improve the execution of nearly all code - including boot times, throughput, latency, etc: +The following optimizations improve the execution of nearly all code, including boot times, throughput, latency, etc: .. list:: - :esp32: - Set :ref:`CONFIG_ESPTOOLPY_FLASHFREQ` to 80 MHz. This is double the 40 MHz default value and will double the speed at which code is loaded or executed from flash. You should verify that the board or module that connects the {IDF_TARGET_NAME} to the flash chip is rated for 80 MHz operation at the relevant temperature ranges, before changing this setting. The hardware datasheet(s) will have this information. - - Set :ref:`CONFIG_ESPTOOLPY_FLASHMODE` to QIO or QOUT mode (Quad I/O). Both will almost double the speed at which code is loaded or executed from flash compared to the default DIO mode. QIO is slightly faster than QOUT if both are supported. Note that both the flash chip model and the electrical connections between the {IDF_TARGET_NAME} and the flash chip must support quad I/O modes or the SoC will not work correctly. - - Set :ref:`CONFIG_COMPILER_OPTIMIZATION` to "Optimize for performance (-O2)". This may slightly increase binary size compared to the default setting, but will almost certainly increase performance of some code. Note that if your code contains C or C++ Undefined Behaviour then increasing the compiler optimization level may expose bugs that otherwise are not seen. - :esp32: - If the application uses PSRAM and is based on ESP32 rev. 3 (ECO3), setting :ref:`CONFIG_ESP32_REV_MIN` to ``3`` will disable PSRAM bug workarounds, reducing the code size and improving overall performance. - :SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic (``float``). Even though {IDF_TARGET_NAME} has a single precision hardware floating point unit, floating point calculations are always slower than integer calculations. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point. - :not SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic (``float``). On {IDF_TARGET_NAME} these calculations are emulated in software and are very slow. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point. - - Avoid using double precision floating point arithmetic (``double``). These calculations are emulated in software and are very slow. If possible then use an integer-based representation, or single-precision floating point. + :esp32: - Set :ref:`CONFIG_ESPTOOLPY_FLASHFREQ` to 80 MHz. This is double the 40 MHz default value and doubles the speed at which code is loaded or executed from flash. You should verify that the board or module that connects the {IDF_TARGET_NAME} to the flash chip is rated for 80 MHz operation at the relevant temperature ranges before changing this setting. This information is contained in the hardware datasheet(s). + - Set :ref:`CONFIG_ESPTOOLPY_FLASHMODE` to QIO or QOUT mode (Quad I/O). Both almost double the speed at which code is loaded or executed from flash compared to the default DIO mode. QIO is slightly faster than QOUT if both are supported. Note that both the flash chip model, and the electrical connections between the {IDF_TARGET_NAME} and the flash chip must support quad I/O modes or the SoC will not work correctly. + - Set :ref:`CONFIG_COMPILER_OPTIMIZATION` to ``Optimize for performance (-O2)`` . This may slightly increase binary size compared to the default setting, but almost certainly increases the performance of some code. Note that if your code contains C or C++ Undefined Behavior, then increasing the compiler optimization level may expose bugs that otherwise are not seen. + :esp32: - If the application uses PSRAM and is based on ESP32 rev. 3 (ECO3), setting :ref:`CONFIG_ESP32_REV_MIN` to ``3`` disables PSRAM bug workarounds, reducing the code size and improving overall performance. + :SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. Even though {IDF_TARGET_NAME} has a single precision hardware floating point unit, floating point calculations are always slower than integer calculations. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point. + :not SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. On {IDF_TARGET_NAME} these calculations are emulated in software and are very slow. If possible, use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point. + - Avoid using double precision floating point arithmetic ``double``. These calculations are emulated in software and are very slow. If possible then use an integer-based representation, or single-precision floating point. Reduce Logging Overhead ^^^^^^^^^^^^^^^^^^^^^^^ -Although standard output is buffered, it's possible for an application to be limited by the rate at which it can print data to log output once buffers are full. This is particularly relevant for startup time if a lot of output is logged, but can happen at other times as well. There are multiple ways to solve this problem: +Although standard output is buffered, it is possible for an application to be limited by the rate at which it can print data to log output once buffers are full. This is particularly relevant for startup time if a lot of output is logged, but such problem can happen at other times as well. There are multiple ways to solve this problem: .. list:: - Reduce the volume of log output by lowering the app :ref:`CONFIG_LOG_DEFAULT_LEVEL` (the equivalent bootloader setting is :ref:`CONFIG_BOOTLOADER_LOG_LEVEL`). This also reduces the binary size, and saves some CPU time spent on string formatting. - :not SOC_USB_OTG_SUPPORTED: - Increase the speed of logging output by increasing the :ref:`CONFIG_ESP_CONSOLE_UART_BAUDRATE` - :SOC_USB_OTG_SUPPORTED: - Increase the speed of logging output by increasing the :ref:`CONFIG_ESP_CONSOLE_UART_BAUDRATE`. (Unless using internal USB-CDC for serial console, in which case the serial throughput doesn't depend on the configured baud rate.) + :not SOC_USB_OTG_SUPPORTED: - Increase the speed of logging output by increasing the :ref:`CONFIG_ESP_CONSOLE_UART_BAUDRATE`. + :SOC_USB_OTG_SUPPORTED: - Increase the speed of logging output by increasing the :ref:`CONFIG_ESP_CONSOLE_UART_BAUDRATE`. However, if you are using internal USB-CDC, the serial throughput is not dependent on the configured baud rate. Not Recommended ^^^^^^^^^^^^^^^ -The following options will also increase execution speed, but are not recommended as they also reduce the debuggability of the firmware application and may increase the severity of any bugs. +The following options also increase execution speed, but are not recommended as they also reduce the debuggability of the firmware application and may increase the severity of any bugs. .. list:: - - Set :ref:`CONFIG_COMPILER_OPTIMIZATION_ASSERTION_LEVEL` to disabled. This also reduces firmware binary size by a small amount. However, it may increase the severity of bugs in the firmware including security-related bugs. If necessary to do this to optimize a particular function, consider adding ``#define NDEBUG`` in the top of that single source file instead. + - Set :ref:`CONFIG_COMPILER_OPTIMIZATION_ASSERTION_LEVEL` to disabled. This also reduces firmware binary size by a small amount. However, it may increase the severity of bugs in the firmware including security-related bugs. If it is necessary to do this to optimize a particular function, consider adding ``#define NDEBUG`` at the top of that single source file instead. .. _speed-targeted-optimizations: Targeted Optimizations ---------------------- -The following changes will increase the speed of a chosen part of the firmware application: +The following changes increase the speed of a chosen part of the firmware application: .. list:: - - Move frequently executed code to IRAM. By default, all code in the app is executed from flash cache. This means that it's possible for the CPU to have to wait on a "cache miss" while the next instructions are loaded from flash. Functions which are copied into IRAM are loaded once at boot time, and then will always execute at full speed. + - Move frequently executed code to IRAM. By default, all code in the app is executed from flash cache. This means that it is possible for the CPU to have to wait on a "cache miss" while the next instructions are loaded from flash. Functions which are copied into IRAM are loaded once at boot time, and then always execute at full speed. IRAM is a limited resource, and using more IRAM may reduce available DRAM, so a strategic approach is needed when moving code to IRAM. See :ref:`iram` for more information. - - Jump table optimizations can be re-enabled for individual source files that don't need to be placed in IRAM. For hot paths in large switch cases this will improve performance. For instructions on how to add the -fjump-tables -ftree-switch-conversion options when compiling individual source files, see :ref:`component_build_control` + - Jump table optimizations can be re-enabled for individual source files that do not need to be placed in IRAM. For hot paths in large ``switch cases``, this improves performance. For instructions on how to add the ``-fjump-tables`` and ``-ftree-switch-conversion`` options when compiling individual source files, see :ref:`component_build_control` Improving Startup Time ---------------------- @@ -126,28 +128,28 @@ In addition to the overall performance improvements shown above, the following o .. list:: - - Minimizing the :ref:`CONFIG_LOG_DEFAULT_LEVEL` and :ref:`CONFIG_BOOTLOADER_LOG_LEVEL` has a large impact on startup time. To enable more logging after the app starts up, set the :ref:`CONFIG_LOG_MAXIMUM_LEVEL` as well and then call :cpp:func:`esp_log_level_set` to restore higher level logs. The :example:`system/startup_time` main function shows how to do this. - :SOC_RTC_FAST_MEM_SUPPORTED: - If using deep sleep, setting :ref:`CONFIG_BOOTLOADER_SKIP_VALIDATE_IN_DEEP_SLEEP` allows a faster wake from sleep. Note that if using Secure Boot this represents a security compromise, as Secure Boot validation will not be performed on wake. - - Setting :ref:`CONFIG_BOOTLOADER_SKIP_VALIDATE_ON_POWER_ON` will skip verifying the binary on every boot from power-on reset. How much time this saves depends on the binary size and the flash settings. Note that this setting carries some risk if the flash becomes corrupt unexpectedly. Read the help text of the :ref:`config item ` for an explanation and recommendations if using this option. - - It's possible to save a small amount of time during boot by disabling RTC slow clock calibration. To do so, set :ref:`CONFIG_RTC_CLK_CAL_CYCLES` to 0. Any part of the firmware that uses RTC slow clock as a timing source will be less accurate as a result. + - Minimizing the :ref:`CONFIG_LOG_DEFAULT_LEVEL` and :ref:`CONFIG_BOOTLOADER_LOG_LEVEL` has a large impact on startup time. To enable more logging after the app starts up, set the :ref:`CONFIG_LOG_MAXIMUM_LEVEL` as well, and then call :cpp:func:`esp_log_level_set` to restore higher level logs. The :example:`system/startup_time` main function shows how to do this. + :SOC_RTC_FAST_MEM_SUPPORTED: - If using Deep-sleep mode, setting :ref:`CONFIG_BOOTLOADER_SKIP_VALIDATE_IN_DEEP_SLEEP` allows a faster wake from sleep. Note that if using Secure Boot, this represents a security compromise, as Secure Boot validation are not be performed on wake. + - Setting :ref:`CONFIG_BOOTLOADER_SKIP_VALIDATE_ON_POWER_ON` skips verifying the binary on every boot from the power-on reset. How much time this saves depends on the binary size and the flash settings. Note that this setting carries some risk if the flash becomes corrupt unexpectedly. Read the help text of the :ref:`config item ` for an explanation and recommendations if using this option. + - It is possible to save a small amount of time during boot by disabling RTC slow clock calibration. To do so, set :ref:`CONFIG_RTC_CLK_CAL_CYCLES` to 0. Any part of the firmware that uses RTC slow clock as a timing source will be less accurate as a result. The example project :example:`system/startup_time` is pre-configured to optimize startup time. The file :example_file:`system/startup_time/sdkconfig.defaults` contain all of these settings. You can append these to the end of your project's own ``sdkconfig`` file to merge the settings, but please read the documentation for each setting first. Task Priorities --------------- -As ESP-IDF FreeRTOS is a real-time operating system, it's necessary to ensure that high throughput or low latency tasks are granted a high priority in order to run immediately. Priority is set when calling :cpp:func:`xTaskCreate` or :cpp:func:`xTaskCreatePinnedToCore` and can be changed at runtime by calling :cpp:func:`vTaskPrioritySet`. +As ESP-IDF FreeRTOS is a real-time operating system, it is necessary to ensure that high-throughput or low-slatency tasks are granted a high priority in order to run immediately. Priority is set when calling :cpp:func:`xTaskCreate` or :cpp:func:`xTaskCreatePinnedToCore` and can be changed at runtime by calling :cpp:func:`vTaskPrioritySet`. -It's also necessary to ensure that tasks yield CPU (by calling :cpp:func:`vTaskDelay`, ``sleep()``, or by blocking on semaphores, queues, task notifications, etc) in order to not starve lower priority tasks and cause problems for the overall system. The :ref:`task-watchdog-timer` provides a mechanism to automatically detect if task starvation happens, however note that a Task WDT timeout does not always indicate a problem (sometimes the correct operation of the firmware requires some long-running computation). In these cases tweaking the Task WDT timeout or even disabling the Task WDT may be necessary. +It is also necessary to ensure that tasks yield CPU (by calling :cpp:func:`vTaskDelay`, ``sleep()``, or by blocking on semaphores, queues, task notifications, etc) in order to not starve lower-priority tasks and cause problems for the overall system. The :ref:`task-watchdog-timer` provides a mechanism to automatically detect if task starvation happens. However, note that a TWDT timeout does not always indicate a problem, because sometimes the correct operation of the firmware requires some long-running computation. In these cases, tweaking the TWDT timeout or even disabling the TWDT may be necessary. .. _built-in-task-priorities: -Built-In Task Priorities +Built-in Task Priorities ^^^^^^^^^^^^^^^^^^^^^^^^ -ESP-IDF starts a number of system tasks at fixed priority levels. Some are automatically started during the boot process, some are started only if the application firmware initializes a particular feature. To optimize performance, structure application task priorities so that they are not delayed by system tasks, while also not starving system tasks and impacting other functions of the system. +ESP-IDF starts a number of system tasks at fixed priority levels. Some are automatically started during the boot process, while some are started only if the application firmware initializes a particular feature. To optimize performance, structure the task priorities of your application properly to ensure the tasks are not delayed by the system tasks, while also not starving system tasks and impacting other functions of the system. -This may require splitting up a particular task. For example, perform a time-critical operation in a high priority task or an interrupt handler and do the non-time-critical part in a lower priority task. +This may require splitting up a particular task. For example, perform a time-critical operation in a high-priority task or an interrupt handler and do the non-time-critical part in a lower-priority task. Header :idf_file:`components/esp_system/include/esp_task.h` contains macros for the priority levels used for built-in ESP-IDF tasks system. See :ref:`freertos_system_tasks` for more details about the system tasks. @@ -159,78 +161,80 @@ Common priorities are: .. list:: - - :ref:`Main task that executes app_main function ` has minimum priority (1). + - :ref:`app-main-task` that executes app_main function has minimum priority (1). - :doc:`/api-reference/system/esp_timer` system task to manage timer events and execute callbacks has high priority (22, ``ESP_TASK_TIMER_PRIO``) - FreeRTOS Timer Task to handle FreeRTOS timer callbacks is created when the scheduler initializes and has minimum task priority (1, :ref:`configurable `). - - :doc:`/api-reference/system/esp_event` system task to manage the default system event loop and execute callbacks has high priority (20, ``ESP_TASK_EVENT_PRIO``). This configuration is only used if the application calls :cpp:func:`esp_event_loop_create_default`, it's possible to call :cpp:func:`esp_event_loop_create` with a custom task configuration instead. + - :doc:`/api-reference/system/esp_event` system task to manage the default system event loop and execute callbacks has high priority (20, ``ESP_TASK_EVENT_PRIO``). This configuration is only used if the application calls :cpp:func:`esp_event_loop_create_default`. It is possible to call :cpp:func:`esp_event_loop_create` with a custom task configuration instead. - :doc:`/api-guides/lwip` TCP/IP task has high priority (18, ``ESP_TASK_TCPIP_PRIO``). - :SOC_WIFI_SUPPORTED: - :doc:`Wi-Fi Driver ` task has high priority (23). + :SOC_WIFI_SUPPORTED: - :doc:`/api-guides/wifi` task has high priority (23). :SOC_WIFI_SUPPORTED: - Wi-Fi wpa_supplicant component may create dedicated tasks while the Wi-Fi Protected Setup (WPS), WPA2 EAP-TLS, Device Provisioning Protocol (DPP) or BSS Transition Management (BTM) features are in use. These tasks all have low priority (2). - :SOC_BT_SUPPORTED: - :doc:`Bluetooth Controller ` task has high priority (23, ``ESP_TASK_BT_CONTROLLER_PRIO``). The Bluetooth Controller needs to respond to requests with low latency, so it should always be close to the highest priority task in the system. - :SOC_BT_SUPPORTED: - :doc:`NimBLE Bluetooth Host ` host task has high priority (21). + :SOC_BT_SUPPORTED: - :doc:`/api-reference/bluetooth/controller_vhci` task has high priority (23, ``ESP_TASK_BT_CONTROLLER_PRIO``). The Bluetooth Controller needs to respond to requests with low latency, so it should always be among the highest priority task in the system. + :SOC_BT_SUPPORTED: - :doc:`/api-reference/bluetooth/nimble/index` task has high priority (21). - The Ethernet driver creates a task for the MAC to receive Ethernet frames. If using the default config ``ETH_MAC_DEFAULT_CONFIG`` then the priority is medium-high (15). This setting can be changed by passing a custom :cpp:class:`eth_mac_config_t` struct when initializing the Ethernet MAC. - - If using the :doc:`MQTT ` component, it creates a task with default priority 5 (:ref:`configurable`, depends on :ref:`CONFIG_MQTT_USE_CUSTOM_CONFIG` (also configurable runtime by ``task_prio`` field in the :cpp:class:`esp_mqtt_client_config_t`) - - To see what is the task priority for ``mDNS`` service, please check `Performance Optimization `__. + - If using the :doc:`/api-reference/protocols/mqtt` component, it creates a task with default priority 5 (:ref:`configurable`), depending on :ref:`CONFIG_MQTT_USE_CUSTOM_CONFIG`, and also configurable at runtime by ``task_prio`` field in the :cpp:class:`esp_mqtt_client_config_t`) + - To see what is the task priority for ``mDNS`` service, please check `Performance Optimization `__. .. only :: not CONFIG_FREERTOS_UNICORE .. list:: - - :ref:`Main task that executes app_main function ` has minimum priority (1). This task is pinned to Core 0 by default (:ref:`configurable`). + - :ref:`app-main-task` that executes app_main function has minimum priority (1). This task is pinned to Core 0 by default (:ref:`configurable`). - :doc:`/api-reference/system/esp_timer` system task to manage high precision timer events and execute callbacks has high priority (22, ``ESP_TASK_TIMER_PRIO``). This task is pinned to Core 0. - FreeRTOS Timer Task to handle FreeRTOS timer callbacks is created when the scheduler initializes and has minimum task priority (1, :ref:`configurable `). This task is pinned to Core 0. - - :doc:`/api-reference/system/esp_event` system task to manage the default system event loop and execute callbacks has high priority (20, ``ESP_TASK_EVENT_PRIO``) and pinned to Core 0. This configuration is only used if the application calls :cpp:func:`esp_event_loop_create_default`, it's possible to call :cpp:func:`esp_event_loop_create` with a custom task configuration instead. + - :doc:`/api-reference/system/esp_event` system task to manage the default system event loop and execute callbacks has high priority (20, ``ESP_TASK_EVENT_PRIO``) and it is pinned to Core 0. This configuration is only used if the application calls :cpp:func:`esp_event_loop_create_default`, it is possible to call :cpp:func:`esp_event_loop_create` with a custom task configuration instead. - :doc:`/api-guides/lwip` TCP/IP task has high priority (18, ``ESP_TASK_TCPIP_PRIO``) and is not pinned to any core (:ref:`configurable`). - :SOC_WIFI_SUPPORTED: - :doc:`Wi-Fi Driver ` task has high priority (23) and is pinned to Core 0 by default (:ref:`configurable`). + :SOC_WIFI_SUPPORTED: - :doc:`/api-guides/wifi` task has high priority (23) and is pinned to Core 0 by default (:ref:`configurable`). :SOC_WIFI_SUPPORTED: - Wi-Fi wpa_supplicant component may create dedicated tasks while the Wi-Fi Protected Setup (WPS), WPA2 EAP-TLS, Device Provisioning Protocol (DPP) or BSS Transition Management (BTM) features are in use. These tasks all have low priority (2) and are not pinned to any core. - :SOC_BT_SUPPORTED: - :doc:`Bluetooth Controller ` task has high priority (23, ``ESP_TASK_BT_CONTROLLER_PRIO``) and is pinned to Core 0 by default (:ref:`configurable <{IDF_TARGET_CONTROLLER_CORE_CONFIG}>`). The Bluetooth Controller needs to respond to requests with low latency, so it should always be close to the highest priority task assigned to a single CPU. - :SOC_BT_SUPPORTED: - :doc:`NimBLE Bluetooth Host ` host task has high priority (21) and is pinned to Core 0 by default (:ref:`configurable `). - :esp32: - :doc:`Bluedroid Bluetooth Host ` creates multiple tasks when used: + :SOC_BT_SUPPORTED: - :doc:`/api-reference/bluetooth/controller_vhci` task has high priority (23, ``ESP_TASK_BT_CONTROLLER_PRIO``) and is pinned to Core 0 by default (:ref:`configurable <{IDF_TARGET_CONTROLLER_CORE_CONFIG}>`). The Bluetooth Controller needs to respond to requests with low latency, so it should always be among the highest priority task assigned to a single CPU. + :SOC_BT_SUPPORTED: - :doc:`/api-reference/bluetooth/nimble/index` task has high priority (21) and is pinned to Core 0 by default (:ref:`configurable `). + :esp32: - :doc:`/api-reference/bluetooth/index` creates multiple tasks when used: - Stack event callback task ("BTC") has high priority (19). - Stack BTU layer task has high priority (20). - Host HCI host task has high priority (22). All Bluedroid Tasks are pinned to the same core, which is Core 0 by default (:ref:`configurable `). + - The Ethernet driver creates a task for the MAC to receive Ethernet frames. If using the default config ``ETH_MAC_DEFAULT_CONFIG`` then the priority is medium-high (15) and the task is not pinned to any core. These settings can be changed by passing a custom :cpp:class:`eth_mac_config_t` struct when initializing the Ethernet MAC. - - If using the :doc:`MQTT ` component, it creates a task with default priority 5 (:ref:`configurable `, depends on :ref:`CONFIG_MQTT_USE_CUSTOM_CONFIG`) and not pinned to any core (:ref:`configurable `). - - To see what is the task priority for ``mDNS`` service, please check `Performance Optimization `__. + - If using the :doc:`/api-reference/protocols/mqtt` component, it creates a task with default priority 5 (:ref:`configurable `, depending on :ref:`CONFIG_MQTT_USE_CUSTOM_CONFIG`) and not pinned to any core (:ref:`configurable `). + - To see what is the task priority for ``mDNS`` service, please check `Performance Optimization `__. -Choosing application task priorities -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Choosing Task Priorities of the Application +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. only:: CONFIG_FREERTOS_UNICORE - In general, it's not recommended to set task priorities higher than the built-in {IDF_TARGET_RF_TYPE} operations as starving them of CPU may make the system unstable. For very short timing-critical operations that don't use the network, use an ISR or a very restricted task (very short bursts of runtime only) at highest priority (24). Choosing priority 19 will allow lower layer {IDF_TARGET_RF_TYPE} functionality to run without delays, but still preempts the lwIP TCP/IP stack and other less time-critical internal functionality - this is the best option for time-critical tasks that don't perform network operations. Any task that does TCP/IP network operations should run at lower priority than the lwIP TCP/IP task (18) to avoid priority inversion issues. + In general, it is not recommended to set task priorities higher than the built-in {IDF_TARGET_RF_TYPE} operations as starving them of CPU may make the system unstable. For very short timing-critical operations that do not use the network, use an ISR or a very restricted task (with very short bursts of runtime only) at the highest priority (24). Choosing priority 19 allows lower-layer {IDF_TARGET_RF_TYPE} functionality to run without delays, but still preempts the lwIP TCP/IP stack and other less time-critical internal functionality - this is the best option for time-critical tasks that do not perform network operations. Any task that does TCP/IP network operations should run at a lower priority than the lwIP TCP/IP task (18) to avoid priority-inversion issues. .. only:: not CONFIG_FREERTOS_UNICORE - With a few exceptions (most importantly the lwIP TCP/IP task), in the default configuration most built-in tasks are pinned to Core 0. This makes it quite easy for the application to place high priority tasks on Core 1. Using priority 19 or higher will guarantee an application task can run on Core 1 without being preempted by any built-in task. To further isolate the tasks running on each CPU, configure the :ref:`lwIP task ` to only run on Core 0 instead of either core (this may reduce total TCP/IP throughput depending on what other tasks are running). + With a few exceptions, most importantly the lwIP TCP/IP task, in the default configuration most built-in tasks are pinned to Core 0. This makes it quite easy for the application to place high priority tasks on Core 1. Using priority 19 or higher guarantees that an application task can run on Core 1 without being preempted by any built-in task. To further isolate the tasks running on each CPU, configure the :ref:`lwIP task ` to only run on Core 0 instead of either core, which may reduce total TCP/IP throughput depending on what other tasks are running. - In general, it's not recommended to set task priorities on Core 0 higher than the built-in {IDF_TARGET_RF_TYPE} operations as starving them of CPU may make the system unstable. Choosing priority 19 and Core 0 will allow lower layer {IDF_TARGET_RF_TYPE} functionality to run without delays, but still pre-empts the lwIP TCP/IP stack and other less time-critical internal functionality - this is an option for time-critical tasks that don't perform network operations. Any task that does TCP/IP network operations should run at lower priority than the lwIP TCP/IP task (18) to avoid priority inversion issues. + In general, it is not recommended to set task priorities on Core 0 higher than the built-in {IDF_TARGET_RF_TYPE} operations as starving them of CPU may make the system unstable. Choosing priority 19 and Core 0 allows lower-layer {IDF_TARGET_RF_TYPE} functionality to run without delays, but still pre-empts the lwIP TCP/IP stack and other less time-critical internal functionality. This is an option for time-critical tasks that do not perform network operations. Any task that does TCP/IP network operations should run at lower priority than the lwIP TCP/IP task (18) to avoid priority-inversion issues. .. note:: - Setting a task to always run in preference to built-in ESP-IDF tasks does not require pinning to Core 1. The task can be left unpinned - at priority 17 or lower - to optionally run on Core 0 as well, if no higher priority built-in task is running there. Using unpinned tasks can improve the overall CPU utilization, however it makes reasoning about task scheduling more complex. + Setting a task to always run in preference to built-in ESP-IDF tasks does not require pinning the task to Core 1. Instead, the task can be left unpinned and assigned a priority of 17 or lower. This allows the task to optionally run on Core 0 if there are no higher-priority built-in tasks running on that core. Using unpinned tasks can improve the overall CPU utilization, however it makes reasoning about task scheduling more complex. .. note:: - Task execution is always completely suspended when writing to the built-in SPI flash chip. Only :ref:`iram-safe-interrupt-handlers` will continue executing. + Task execution is always completely suspended when writing to the built-in SPI flash chip. Only :ref:`iram-safe-interrupt-handlers` continues executing. Improving Interrupt Performance ------------------------------- -ESP-IDF supports dynamic :doc:`/api-reference/system/intr_alloc` with interrupt preemption. Each interrupt in the system has a priority, and higher priority interrupts will preempt lower priority ones. +ESP-IDF supports dynamic :doc:`/api-reference/system/intr_alloc` with interrupt preemption. Each interrupt in the system has a priority, and higher-priority interrupts preempts lower priority ones. -Interrupt handlers will execute in preference to any task (provided the task is not inside a critical section). For this reason, it's important to minimize the amount of time spent executing in an interrupt handler. +Interrupt handlers execute in preference to any task, provided the task is not inside a critical section. For this reason, it is important to minimize the amount of time spent in executing an interrupt handler. To obtain the best performance for a particular interrupt handler: .. list:: - Assign more important interrupts a higher priority using a flag such as ``ESP_INTR_FLAG_LEVEL2`` or ``ESP_INTR_FLAG_LEVEL3`` when calling :cpp:func:`esp_intr_alloc`. - :not CONFIG_FREERTOS_UNICORE: - Assign the interrupt on a CPU where built-in {IDF_TARGET_RF_TYPE} tasks are not configured to run (this means assigning on Core 1 by default, see :ref:`built-in-task-priorities`). Interrupts are assigned on the same CPU where the :cpp:func:`esp_intr_alloc` function call is made. - - If you're sure the entire interrupt handler can run from IRAM (see :ref:`iram-safe-interrupt-handlers`) then set the ``ESP_INTR_FLAG_IRAM`` flag when calling :cpp:func:`esp_intr_alloc` to assign the interrupt. This prevents it being temporarily disabled if the application firmware writes to the internal SPI flash. - - Even if the interrupt handler is not IRAM safe, if it is going to be executed frequently then consider moving the handler function to IRAM anyhow. This minimizes the chance of a flash cache miss when the interrupt code is executed (see :ref:`speed-targeted-optimizations`). It's possible to do this without adding the ``ESP_INTR_FLAG_IRAM`` flag to mark the interrupt as IRAM-safe, if only part of the handler is guaranteed to be in IRAM. + :not CONFIG_FREERTOS_UNICORE: - Assign the interrupt on a CPU where built-in {IDF_TARGET_RF_TYPE} tasks are not configured to run, which means assigning the interrupt on Core 1 by default, see :ref:`built-in-task-priorities`. Interrupts are assigned on the same CPU where the :cpp:func:`esp_intr_alloc` function call is made. + - If you are sure the entire interrupt handler can run from IRAM (see :ref:`iram-safe-interrupt-handlers`) then set the ``ESP_INTR_FLAG_IRAM`` flag when calling :cpp:func:`esp_intr_alloc` to assign the interrupt. This prevents it being temporarily disabled if the application firmware writes to the internal SPI flash. + - Even if the interrupt handler is not IRAM-safe, if it is going to be executed frequently then consider moving the handler function to IRAM anyhow. This minimizes the chance of a flash cache miss when the interrupt code is executed (see :ref:`speed-targeted-optimizations`). It is possible to do this without adding the ``ESP_INTR_FLAG_IRAM`` flag to mark the interrupt as IRAM-safe, if only part of the handler is guaranteed to be in IRAM. Improving Network Speed ----------------------- @@ -239,21 +243,20 @@ Improving Network Speed :SOC_WIFI_SUPPORTED: * For Wi-Fi, see :ref:`How-to-improve-Wi-Fi-performance` and :ref:`wifi-buffer-usage` * For lwIP TCP/IP (Wi-Fi and Ethernet), see :ref:`lwip-performance` - :SOC_WIFI_SUPPORTED: * The :example:`wifi/iperf` example contains a configuration that is heavily optimized for Wi-Fi TCP/IP throughput. Append the contents of the files :example_file:`wifi/iperf/sdkconfig.defaults`, :example_file:`wifi/iperf/sdkconfig.defaults.{IDF_TARGET_PATH_NAME}` and :example_file:`wifi/iperf/sdkconfig.ci.99` to your project ``sdkconfig`` file in order to add all of these options. Note that some of these options may have trade-offs in terms of reduced debuggability, increased firmware size, increased memory usage, or reduced performance of other features. To get the best result, read the documentation pages linked above and use this information to determine exactly which options are best suited for your app. + :SOC_WIFI_SUPPORTED: * The :example:`wifi/iperf` example contains a configuration that is heavily optimized for Wi-Fi TCP/IP throughput. Append the contents of the files :example_file:`wifi/iperf/sdkconfig.defaults`, :example_file:`wifi/iperf/sdkconfig.defaults.{IDF_TARGET_PATH_NAME}` and :example_file:`wifi/iperf/sdkconfig.ci.99` to the ``sdkconfig`` file in your project in order to add all of these options. Note that some of these options may have trade-offs in terms of reduced debuggability, increased firmware size, increased memory usage, or reduced performance of other features. To get the best result, read the documentation pages linked above and use related information to determine exactly which options are best suited for your app. -Improving I/O performance +Improving I/O Performance ------------------------- -Using standard C library functions like ``fread`` and ``fwrite`` instead of platform specific unbuffered syscalls such as ``read`` and ``write`` can be slow. -These functions are designed to be portable, so they are not necessarily optimized for speed, have a certain overhead and are buffered. +Using standard C library functions like ``fread`` and ``fwrite`` instead of platform specific unbuffered syscalls such as ``read`` and ``write`` can be slow.These functions are designed to be portable, so they are not necessarily optimized for speed, have a certain overhead and are buffered. -:doc:`FatFS ` specific information and tips: +:doc:`/api-reference/storage/fatfs` specific information and tips: .. list:: - - Maximum size of the R/W request == FatFS cluster size (allocation unit size) - - Use ``read`` and ``write`` instead of ``fread`` and ``fwrite`` - - To increase speed of buffered reading functions like ``fread`` and ``fgets``, you can increase a size of the file buffer (Newlib's default is 128 bytes) to a higher number like 4096, 8192 or 16384. This can be done locally via ``setvbuf`` function used on a certain file pointer or globally applied to all files via modifying :ref:`CONFIG_FATFS_VFS_FSTAT_BLKSIZE`. - + - Maximum size of the R/W request = FatFS cluster size (allocation unit size). + - Use ``read`` and ``write`` instead of ``fread`` and ``fwrite``. + - To increase speed of buffered reading functions like ``fread`` and ``fgets``, you can increase a size of the file buffer (Newlib's default is 128 bytes) to a higher number like 4096, 8192 or 16384. This can be done locally via the ``setvbuf`` function used on a certain file pointer or globally applied to all files via modifying :ref:`CONFIG_FATFS_VFS_FSTAT_BLKSIZE`. + .. note:: - Setting a bigger buffer size will also increase the heap memory usage. + Setting a bigger buffer size also increases the heap memory usage. diff --git a/docs/zh_CN/api-guides/performance/speed.rst b/docs/zh_CN/api-guides/performance/speed.rst index a30be244a036..5d4e71b37699 100644 --- a/docs/zh_CN/api-guides/performance/speed.rst +++ b/docs/zh_CN/api-guides/performance/speed.rst @@ -1,2 +1,262 @@ -.. include:: ../../../en/api-guides/performance/speed.rst +速度优化 +========= +:link_to_translation:`en:[English]` + +{IDF_TARGET_CONTROLLER_CORE_CONFIG:default="CONFIG_BT_CTRL_PINNED_TO_CORE", esp32="CONFIG_BTDM_CTRL_PINNED_TO_CORE_CHOICE", esp32s3="CONFIG_BT_CTRL_PINNED_TO_CORE_CHOICE"} +{IDF_TARGET_RF_TYPE:default="Wi-Fi/蓝牙", esp32s2="Wi-Fi", esp32c6="Wi-Fi/蓝牙/802.15.4", esp32h2="蓝牙/802.15.4"} + +概述 +----------- + +提高代码执行速度是提升软件性能的关键要素,该优化也可能带来其他积极影响,比如降低总体功耗。然而,提高代码执行速度可能会牺牲其他性能,如 :doc:`size` 。 + +决定优化目标 +----------------------------- + +如果应用程序固件中的某个函数仅每周在后台执行一次,其执行时间是 10 ms 还是 100 ms 对整体性能的影响或可忽略不计。但如果某个函数以 10 Hz 的频率持续执行,其执行时间是 10 ms 还是 100 ms 就会对系统性能产生显著影响。 + +大多数应用程序固件中,只有一小部分函数需要优化性能,例如频繁执行的函数,或者必须满足应用程序对延迟或吞吐量的要求的函数。应针对这些特定函数优化其执行速度。 + +测量性能 +--------------------- + +想要提升某方面性能,首先要对其进行测量。 + +基本性能测量方法 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +可以直接测量与外部交互的性能,例如,测量一般网络性能可以参考 :example:`wifi/iperf` 和 :example:`ethernet/iperf` ,或者使用示波器或逻辑分析仪来测量与设备外设的交互时间。 + +此外,另一种测量性能的方法是在代码中添加计时测量: + +.. code-block:: c + + #include "esp_timer.h" + + void measure_important_function(void) { + const unsigned MEASUREMENTS = 5000; + uint64_t start = esp_timer_get_time(); + + for (int retries = 0; retries < MEASUREMENTS; retries++) { + important_function(); // 需要测量的代码 + } + + uint64_t end = esp_timer_get_time(); + + printf("%u iterations took %llu milliseconds (%llu microseconds per invocation)\n", + MEASUREMENTS, (end - start)/1000, (end - start)/MEASUREMENTS); + } + +通过多次执行目标代码可以降低其他因素的影响,例如实时操作系统 (RTOS) 的上下文切换、测量的开销等。 + +- 使用 :cpp:func:`esp_timer_get_time` 可以生成微秒级精度的“墙钟”时间戳,但每次调用计时函数都会产生适量开销。 +- 也可以使用标准 Unix 函数 ``gettimeofday()`` 和 ``utime()`` 来进行计时测量,尽管其开销略高一些。 +- 此外,代码中包含 ``hal/cpu_hal.h`` 头文件,并调用 HAL 函数 ``cpu_hal_get_cycle_count()`` 可以返回已执行的 CPU 循环数。该函数开销较低,适用于高精度测量执行时间极短的代码。 + + .. only:: not CONFIG_FREERTOS_UNICORE + + CPU 周期是各核心独立计数的,因此本方法仅适用于测量中断处理程序或固定在单个核心上的任务。 + +- 在执行“微基准测试”时(即仅对运行时间不到 1-2 ms 的小代码段进行基准测试),二进制文件会影响 flash 缓存的性能,进而可能会导致计时测量出现较大差异。这是因为二进制布局可能会导致在特定的执行顺序中产生不同模式的缓存缺失。执行较大测试代码通常可以抵消这种影响。在基准测试时多次执行一个小函数可以减少 flash 缓存缺失的影响。另外,将该代码移到 IRAM 中(参见 :ref:`speed-targeted-optimizations` )也可以解决这个问题。 + +外部跟踪 +^^^^^^^^^^^^^^^^^^^^ + +:doc:`/api-guides/app_trace` 可以在几乎不影响代码执行的情况下测量其执行速度。 + +任务 +^^^^^^^ + +如果启用了选项 :ref:`CONFIG_FREERTOS_GENERATE_RUN_TIME_STATS` ,则可以使用 FreeRTOS API :cpp:func:`vTaskGetRunTimeStats` 来获取各个 FreeRTOS 任务运行时占用处理器的时间。 + +:ref:`SEGGER SystemView ` 是一款出色的工具,可将任务执行情况可视化,也可用于排查系统整体的性能问题或改进方向。 + +提高整体速度 +----------------------------- + +以下优化措施将提高几乎所有代码的执行效果,包括启动时间、吞吐量、延迟等: + +.. list:: + + :esp32: - 设置 :ref:`CONFIG_ESPTOOLPY_FLASHFREQ` 为 80 MHz。该值为默认值 40 MHz 的两倍,这意味着从 flash 加载或执行代码的速度也将翻倍。在更改此设置之前,应事先确认连接 {IDF_TARGET_NAME} 和 flash 的板或模块在温度限制范围内支持 80 MHz 的操作。相关信息参见硬件数据手册。 + - 设置 :ref:`CONFIG_ESPTOOLPY_FLASHMODE` 为 QIO 或 QOUT 模式(四线 I/O 模式)。相较于默认的 DIO 模式,在这两种模式下,从 flash 加载或执行代码的速度几乎翻倍。如果两种模式都支持,QIO 会稍微快于 QOUT。请注意,flash 芯片以及 {IDF_TARGET_NAME} 与 flash 芯片之间的电气连接都必须支持四线 I/O 模式,否则 SoC 将无法正常工作。 + - 设置 :ref:`CONFIG_COMPILER_OPTIMIZATION` 为 ``Optimize for performance (-O2)`` 。相较于默认设置,这可能会略微增加二进制文件大小,但几乎必然会提高某些代码的性能。请注意,如果代码包含 C 或 C++ 的未定义行为,提高编译器优化级别可能会暴露出原本未发现的错误。 + :esp32: - 如果应用程序是基于 ESP32 rev. 3 (ECO3) 的项目并且使用 PSRAM,设置 :ref:`CONFIG_ESP32_REV_MIN` 为 ``3`` 将禁用 PSRAM 的错误修复工作,可以减小代码大小并提高整体性能。 + :SOC_CPU_HAS_FPU: - 避免使用浮点运算 ``float``。尽管 {IDF_TARGET_NAME} 具备单精度浮点运算器,但是浮点运算总是慢于整数运算。因此可以考虑使用不同的整数表示方法进行运算,如定点表示法,或者将部分计算用整数运算后再切换为浮点运算。 + :not SOC_CPU_HAS_FPU: - 避免使用浮点运算 ``float``。{IDF_TARGET_NAME} 通过软件模拟进行浮点运算,因此速度非常慢。可以考虑使用不同的整数表示方法进行运算,如定点表示法,或者将部分计算用整数运算后再切换为浮点运算。 + - 避免使用双精度浮点运算 ``double``。{IDF_TARGET_NAME} 通过软件模拟进行双精度浮点运算,因此速度非常慢。可以考虑使用基于整数的表示方法或单精度浮点数。 + +减少日志开销 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +尽管标准输出会先存储在缓冲区中,但缓冲区缺少可用空间时,应用程序将数据输出到日志的速度可能会受限。这点在程序启动并输出大量日志时尤为明显,但也可能随时发生。为解决这一问题,可以采取以下几种方法: + +.. list:: + + - 通过调低应用日志默认等级 :ref:`CONFIG_LOG_DEFAULT_LEVEL` (引导加载程序日志等级的相应配置为 :ref:`CONFIG_BOOTLOADER_LOG_LEVEL`)来减少日志输出量。这样做不仅可以减小二进制文件大小,还可以节省一些 CPU 用于格式化字符串的时间。 + :not SOC_USB_OTG_SUPPORTED: - 增加 :ref:`CONFIG_ESP_CONSOLE_UART_BAUDRATE` ,可以提高日志输出速度。 + :SOC_USB_OTG_SUPPORTED: - 增加 :ref:`CONFIG_ESP_CONSOLE_UART_BAUDRATE` ,可以提高日志输出速度。如果使用内置 USB-CDC 作为串口控制台,那么串口传输速率不会受配置的波特率影响。 + +不建议的选项 +^^^^^^^^^^^^^^^^^^ + +以下选项也可以提高执行速度,但不建议使用,因为它们会降低固件应用程序的可调试性,并可能导致出现更严重的 bug。 + +.. list:: + + - 禁用 :ref:`CONFIG_COMPILER_OPTIMIZATION_ASSERTION_LEVEL` 。这也会略微减小固件二进制文件大小。然而,它可能导致出现更严重的 bug,甚至出现安全性 bug。如果为了优化特定函数而必须禁用该选项,可以考虑在该源文件的顶部单独添加 ``#define NDEBUG`` 。 + +.. _speed-targeted-optimizations: + +针对性优化 +--------------------------- + +以下更改将提高固件应用程序特定部分的速度: + +.. list:: + + - 将频繁执行的代码移至 IRAM。应用程序中的所有代码都默认从 flash 中执行。这意味着缓存缺失时,CPU 需要等待从 flash 加载后续指令。如果将函数复制到 IRAM 中,则仅需要在启动时加载一次,然后始终以全速执行。 + + IRAM 资源有限,使用更多的 IRAM 可能会减少可用的 DRAM。因此,将代码移动到 IRAM 需要有所取舍。更多信息参见 :ref:`iram` 。 + + - 针对不需要放置在 IRAM 中的单个源文件,可以重新启用跳转表优化。这将提高大型 ``switch cases`` 代码中的热路径性能。关于如何在编译单个源文件时添加 -fjump-tables -ftree-switch-conversion 选项,参见 :ref:`component_build_control` 。 + +减少启动时间 +---------------------------- + +除了上述提高整体性能的方法外,还可以微调以下选项来专门减少启动时间: + +.. list:: + + - 最小化 :ref:`CONFIG_LOG_DEFAULT_LEVEL` 和 :ref:`CONFIG_BOOTLOADER_LOG_LEVEL` 可以大幅减少启动时间。如要在应用程序启动后获取更多日志,可以设置 :ref:`CONFIG_LOG_MAXIMUM_LEVEL`,然后调用 :cpp:func:`esp_log_level_set` 来恢复更高级别的日志输出。示例 :example:`system/startup_time` 的主函数展示了如何实现这一点。 + :SOC_RTC_FAST_MEM_SUPPORTED: - 如果使用 Deep-sleep 模式,启用 :ref:`CONFIG_BOOTLOADER_SKIP_VALIDATE_IN_DEEP_SLEEP` 可以加快从睡眠中唤醒的速度。请注意,启用该选项后在唤醒时将不会执行安全启动验证,需要考量安全风险。 + - 设置 :ref:`CONFIG_BOOTLOADER_SKIP_VALIDATE_ON_POWER_ON` 可以在每次上电复位启动时跳过二进制文件验证,节省的时间取决于二进制文件大小和 flash 设置。请注意,如果 flash 意外损坏,此设置将有一定风险。更多关于使用该选项的解释和建议,参见 :ref:`项目配置 ` 。 + - 禁用 RTC 慢速时钟校准可以节省一小部分启动时间。设置 :ref:`CONFIG_RTC_CLK_CAL_CYCLES` 为 0 可以实现该操作。设置后,以 RTC 慢速时钟为时钟源的固件部分精确度将降低。 + +示例项目 :example:`system/startup_time` 预配了优化启动时间的设置,文件 :example_file:`system/startup_time/sdkconfig.defaults` 包含了所有相关设置。可以将这些设置追加到项目中 ``sdkconfig`` 文件的末尾并合并,但请事先阅读每个设置的相关说明。 + +任务优先级 +-------------------- + +ESP-IDF FreeRTOS 是实时操作系统,因此需确保高吞吐量或低延迟的任务获得更高优先级,以便立即运行。调用 :cpp:func:`xTaskCreate` 或 :cpp:func:`xTaskCreatePinnedToCore` 会设定优先级,并且可以在运行时调用 :cpp:func:`vTaskPrioritySet` 进行更改。 + +此外,还需确保任务适时释放 CPU(通过调用 :cpp:func:`vTaskDelay` 或 ``sleep()`` ,或在信号量、队列、任务通知等方面进行阻塞),以避免低优先级任务饥饿并造成系统性问题。 :ref:`task-watchdog-timer` 提供任务饥饿自动检测机制,但请注意,正确的固件操作有时需要长时间运算,因此任务看门狗定时器超时并不总意味着存在问题。在这些情况下,可能需要微调超时时限,甚至禁用任务看门狗定时器。 + +.. _built-in-task-priorities: + +内置任务优先级 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +ESP-IDF 启动的系统任务预设了固定优先级。启动时,一些任务会自动启动,而另一些仅在应用程序固件初始化特定功能时启动。为优化性能,请合理设置应用程序任务优先级,以确保它们不会被系统任务阻塞,同时需确保系统任务不会饥饿进而影响其他系统功能。 + +为此,可能需要分解特定任务。例如,可以在高优先级任务或中断处理程序中执行实时操作,并在较低优先级任务中处理非实时操作。 + +头文件 :idf_file:`components/esp_system/include/esp_task.h` 包含了用于设置 ESP-IDF 内置任务系统优先级的宏定义。更多系统任务详情,参见 :ref:`freertos_system_tasks` 。 + +常见优先级包括: + +.. Note: 以下两个列表应保持一致,但第二个列表还展示了 CPU 亲和性。 + +.. only:: CONFIG_FREERTOS_UNICORE + + .. list:: + + - :ref:`app-main-task` 中执行 app_main 函数的主任务优先级最低 (1)。 + - 系统任务 :doc:`/api-reference/system/esp_timer` 用于管理定时器事件并执行回调函数,优先级较高 (22, ``ESP_TASK_TIMER_PRIO``)。 + - FreeRTOS 初始化调度器时会创建定时器任务,用于处理 FreeRTOS 定时器的回调函数,优先级最低(1, :ref:`可配置 ` )。 + - 系统任务 :doc:`/api-reference/system/esp_event` 用于管理默认的系统事件循环并执行回调函数,优先级较高 (20, ``ESP_TASK_EVENT_PRIO``)。仅在应用程序调用 :cpp:func:`esp_event_loop_create_default` 时使用此配置。可以调用 :cpp:func:`esp_event_loop_create` 添加自定义任务配置。 + - :doc:`/api-guides/lwip` TCP/IP 任务优先级较高 (18, ``ESP_TASK_TCPIP_PRIO``)。 + :SOC_WIFI_SUPPORTED: - :doc:`/api-guides/wifi` 任务优先级较高 (23). + :SOC_WIFI_SUPPORTED: - 使用 Wi-Fi Protected Setup (WPS)、WPA2 EAP-TLS、Device Provisioning Protocol (DPP) 或 BSS Transition Management (BTM) 等功能时,Wi-Fi wpa_supplicant 组件可能会创建优先级较低的专用任务 (2)。 + :SOC_BT_SUPPORTED: - :doc:`/api-reference/bluetooth/controller_vhci` 任务优先级较高 (23, ``ESP_TASK_BT_CONTROLLER_PRIO``)。蓝牙控制器需要以低延迟响应请求,因此其任务应始终为系统最高优先级的任务之一。 + :SOC_BT_SUPPORTED: - :doc:`/api-reference/bluetooth/nimble/index` 任务优先级较高 (21). + - 以太网驱动程序会创建一个 MAC 任务,用于接收以太网帧。如果使用默认配置 ``ETH_MAC_DEFAULT_CONFIG`` ,则该任务为中高优先级 (15)。可以在以太网 MAC 初始化时输入自定义 :cpp:class:`eth_mac_config_t` 结构体来更改此设置。 + - 如果使用 :doc:`/api-reference/protocols/mqtt` 组件,它会创建优先级默认为 5 的任务( :ref:`可配置 ` ),可通过 :ref:`CONFIG_MQTT_USE_CUSTOM_CONFIG` 调整,也可以在运行时通过 :cpp:class:`esp_mqtt_client_config_t` 结构体中的 ``task_prio`` 字段调整。 + - 关于 ``mDNS`` 服务的任务优先级,参见 `性能优化 `__ 。 + +.. only :: not CONFIG_FREERTOS_UNICORE + + .. list:: + + - :ref:`app-main-task` 中执行 app_main 函数的主任务优先级最低 (1) 且默认固定在核心 0 上执行( :ref:`可配置 ` )。 + - 系统任务 :doc:`/api-reference/system/esp_timer` 用于管理定时器事件并执行回调函数,优先级较高 (22, ``ESP_TASK_TIMER_PRIO``) 且固定在核心 0 上执行。 + - FreeRTOS 初始化调度器时会创建定时器任务,用于处理 FreeRTOS 定时器的回调函数,优先级最低(1, :ref:`可配置 ` )且固定在核心 0 上执行。 + - 系统任务 :doc:`/api-reference/system/esp_event` 用于管理默认的系统事件循环并执行回调函数,优先级较高 (20, ``ESP_TASK_EVENT_PRIO``) 且固定在核心 0 上执行。此配置仅在应用程序调用 :cpp:func:`esp_event_loop_create_default` 时使用。可以调用 :cpp:func:`esp_event_loop_create` 添加自定义任务配置。 + - :doc:`/api-guides/lwip` TCP/IP 任务优先级较高 (18, ``ESP_TASK_TCPIP_PRIO``) 且并未固定在特定核心上执行( :ref:`可配置 ` )。 + :SOC_WIFI_SUPPORTED: - :doc:`/api-guides/wifi` 任务优先级较高 (23) 且默认固定在核心 0 上执行( :ref:`可配置 ` )。 + :SOC_WIFI_SUPPORTED: - 使用 Wi-Fi Protected Setup (WPS)、WPA2 EAP-TLS、Device Provisioning Protocol (DPP) 或 BSS Transition Management (BTM) 等功能时,Wi-Fi wpa_supplicant 组件可能会创建优先级较低的专用任务 (2),这些任务并未固定在特定核心上执行。 + :SOC_BT_SUPPORTED: - :doc:`/api-reference/bluetooth/controller_vhci` 任务优先级较高 (23, ``ESP_TASK_BT_CONTROLLER_PRIO``) 且默认固定在核心 0 上执行( :ref:`可配置 <{IDF_TARGET_CONTROLLER_CORE_CONFIG}>` )。蓝牙控制器需要以低延迟响应请求,因此其任务应始终为最高优先级的任务之一并分配给单个 CPU 执行。 + :SOC_BT_SUPPORTED: - :doc:`/api-reference/bluetooth/nimble/index` 任务优先级较高 (21) 且默认固定在核心 0 上执行( :ref:`可配置 ` ). + :esp32: - 使用 :doc:`/api-reference/bluetooth/index` 时会创建多个任务: + - 堆栈事件回调任务 ("BTC") 优先级较高 (19)。 + - 堆栈 BTU 层任务优先级较高 (20)。 + - Host HCI 主任务优先级较高 (22)。 + + 所有 Bluedroid 任务默认固定在同一个核心上执行,即核心 0( :ref:`可配置 ` )。 + + - 以太网驱动程序会创建一个 MAC 任务,用于接收以太网帧。如果使用默认配置 ``ETH_MAC_DEFAULT_CONFIG`` ,则该任务为中高优先级 (15) 且并未固定在特定核心上执行。可以在以太网 MAC 初始化时输入自定义 :cpp:class:`eth_mac_config_t` 结构体来更改此设置。 + - 如果使用 :doc:`/api-reference/protocols/mqtt` 组件,它会创建优先级默认为 5 的任务( :ref:`可配置 ` ,也可通过 :ref:`CONFIG_MQTT_USE_CUSTOM_CONFIG` 调整)。该任务未固定在特定核心上执行( :ref:`可配置 ` )。 + - 关于 ``mDNS`` 服务的任务优先级,参见 `性能优化 `__ 。 + + +设定应用程序任务优先级 +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. only:: CONFIG_FREERTOS_UNICORE + + 由于 {IDF_TARGET_RF_TYPE} 操作饥饿可能导致系统不稳定,通常不建议让特定任务的优先级高于 {IDF_TARGET_RF_TYPE} 操作的内置优先级。对于非常短且无需网络的实时操作,可以使用中断服务程序或极受限的任务(仅运行极短时间)并设置为最高优先级 (24)。将特定任务优先级设为 19 不会妨碍较低层级的 {IDF_TARGET_RF_TYPE} 功能无延迟运行,但仍然会抢占 lwIP TCP/IP 堆栈以及其他非实时内部功能,这对于不执行网络操作的实时任务而言是最佳选项。lwIP TCP/IP 任务优先级 (18) 应高于所有执行 TCP/IP 网络操作的任务,以保证任务正常执行。 + +.. only:: not CONFIG_FREERTOS_UNICORE + + 默认配置下,除了个别例外,尤其是 lwIP TCP/IP 任务,大多数内置任务都固定在核心 0 上执行。因此,应用程序可以方便地将高优先级任务放置在核心 1 上执行。优先级大于等于 19 的应用程序任务在核心 1 上运行时可以确保不会被任何内置任务抢占。为了进一步隔离各个 CPU 上运行的任务,配置 :ref:`lwIP 任务 ` ,可以使 lwIP 任务仅在核心 0 上运行,而非上述任一核心,这可能会根据其他任务的运行情况减少总 TCP/IP 吞吐量。 + + 由于 {IDF_TARGET_RF_TYPE} 操作饥饿可能导致系统不稳定,通常不建议让核心 0 上特定任务的优先级高于 {IDF_TARGET_RF_TYPE} 操作的内置优先级。将特定任务优先级设置为 19 并在核心 0 上运行,不会妨碍较低层级的 {IDF_TARGET_RF_TYPE} 功能无延迟运行,但仍然会抢占 lwIP TCP/IP 堆栈以及其他非实时内部功能,该选项适用于不执行网络操作的实时任务。lwIP TCP/IP 任务优先级 (18) 应高于所有执行 TCP/IP 网络操作的任务,以保证任务正常执行。 + + .. note:: + + 如果要让特定任务始终先于 ESP-IDF 内置任务运行,并不需要将其固定在核心 1 上。将该任务优先级设置为小于等于 17,则无需与核心绑定,那么核心 0 上没有执行较高优先级的内置任务时,该任务也可以选择在核心 0 上执行。使用未固定的任务可以提高整体 CPU 利用率,但这会增加任务调度的复杂性。 + +.. note:: + + 对内置 SPI flash 芯片进行写入操作时,任务会完全暂停执行。只有 :ref:`iram-safe-interrupt-handlers` 会继续执行。 + +提高中断性能 +------------------------------------- + +ESP-IDF 支持动态 :doc:`/api-reference/system/intr_alloc` 和中断抢占。系统中每个中断都有相应优先级,较高优先级的中断将优先执行。 + +只要其他任务不在临界区内,中断处理程序将优先于所有其他任务执行。因此,尽量减少中断处理程序的执行时间十分重要。 + +要以最佳性能执行特定中断处理程序,可以考虑: + +.. list:: + + - 调用 :cpp:func:`esp_intr_alloc` 时使用 ``ESP_INTR_FLAG_LEVEL2`` 或 ``ESP_INTR_FLAG_LEVEL3`` 等标志,可以为更重要的中断设定更高优先级。 + :not CONFIG_FREERTOS_UNICORE: - 将中断分配到不运行内置 {IDF_TARGET_RF_TYPE} 任务的 CPU 上执行,即默认情况下,将中断分配到核心 1 上执行,参见 :ref:`built-in-task-priorities` 。调用 :cpp:func:`esp_intr_alloc` 函数即可将中断分配到函数所在 CPU。 + - 如果确定整个中断处理程序可以在 IRAM 中运行(参见 :ref:`iram-safe-interrupt-handlers` ),那么在调用 :cpp:func:`esp_intr_alloc` 分配中断时,请设置 ``ESP_INTR_FLAG_IRAM`` 标志,这样可以防止在应用程序固件写入内置 SPI flash 时临时禁用中断。 + - 即使是非 IRAM 安全的中断处理程序,如果需要频繁执行,可以考虑将处理程序的函数移到 IRAM 中,从而尽可能规避执行中断代码时发生 flash 缓存缺失的可能性(参见 :ref:`speed-targeted-optimizations` )。如果可以确保只有部分处理程序位于 IRAM 中,则无需添加 ``ESP_INTR_FLAG_IRAM`` 标志将程序标记为 IRAM 安全。 + +提高网络速度 +----------------------------- + +.. list:: + + :SOC_WIFI_SUPPORTED: * 关于提高 Wi-Fi 网速,参见 :ref:`How-to-improve-Wi-Fi-performance` 和 :ref:`wifi-buffer-usage` 。 + * 关于提高 lwIP TCP/IP(Wi-Fi 和以太网)网速,参见 :ref:`lwip-performance` 。 + :SOC_WIFI_SUPPORTED: * 示例 :example:`wifi/iperf` 包含了一种针对 Wi-Fi TCP/IP 吞吐量进行了大量优化的配置。将文件 :example_file:`wifi/iperf/sdkconfig.defaults` 、 :example_file:`wifi/iperf/sdkconfig.defaults.{IDF_TARGET_PATH_NAME}` 和 :example_file:`wifi/iperf/sdkconfig.ci.99` 的内容追加到项目的 ``sdkconfig`` 文件中,即可添加所有相关选项。请注意,部分选项可能会导致可调试性降低、固件大小增加、内存使用增加或其他功能的性能降低等影响。为了获得最佳结果,请阅读上述链接文档,并据此确定哪些选项最适合当前应用程序。 + +提高 I/O 性能 +---------------------------------- + +使用标准 C 库函数,如 ``fread`` 和 ``fwrite`` 时,相较于使用平台特定的不带缓冲系统调用,I/O 性能可能更慢,如 ``read`` 和 ``write`` 。标准 C 库函数是为可移植性而设计的,它们会在执行时会引入一定开销和缓冲延迟,因此并不适用需要较高执行速度的场景。 + +:doc:`/api-reference/storage/fatfs` 具体信息和提示如下: + +.. list:: + + - 读取/写入请求的最大大小等于 FatFS 簇大小(分配单元大小)。 + - 使用 ``read`` 和 ``write`` 而非 ``fread`` 和 ``fwrite`` 可以提高性能。 + - 要提高诸如 ``fread`` 和 ``fgets`` 等缓冲读取函数的执行速度,可以增加文件缓冲区的大小(Newlib 的默认值为 128 字节),例如 4096、8192 或 16384 字节。为此,可以在特定文件的指针上使用 ``setvbuf`` 函数进行局部更改,或者修改 :ref:`CONFIG_FATFS_VFS_FSTAT_BLKSIZE` 实现全局应用。 + + .. note:: + 增加缓冲区的大小会增加堆内存的使用量。