-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FPU registers not properly switched for ESP32-S3 (IDFGH-10439) #11690
Comments
a pair of vTaskSuspendAll() and xTaskResumeAll() calls fixed the issue immediately, so it's definitely something related to a context switch. |
Since the context switch is caused by an interrupt I suspect |
I think I'm running into this issue with my own project as well. It manifests itself as mainly assertion failures in the library I'm using (Codec2). An example of such a failure is below:
For me, this didn't start happening until I switched from ESP-IDF 5.0 to 5.1. Additionally, I seem to be able to work around this by pinning the Codec2-using task on core 1 and all other tasks that use the FPU onto core 0 (but I could easily be running into issues with incorrect calculations on core 0 tasks instead, just harder to tell as no assertions are thrown). Anyway, I can try backing out the changes I made and recompiling using ESP-IDF master to see if that helps. It sounds like it might per #11225, anyway. |
@tmiw In the meantime you can try |
@igrr Sorry to disturb you again, but this is a pretty serious problem (computation result is silently corrupted instead of failing) |
I actually tried this on the Codec2 task and while it did work, it introduced unacceptable latency in the resulting network traffic (the output from the library I mentioned basically gets sent to a radio on the same LAN as the ESP32-S3). However, ensuring that only one task per core uses FP operations seems to be working so far. I'll probably try |
Hello @NaivelyWritten , This issue is currently under investigation on our side, we will come back to you as soon as we have information. |
Can you please try the following patch and tell me if it fixes your issues? You need to be able to modify your IDF
If it does fix your issue, it will first be merged internally before making it to the next release. |
@o-marshmallow, I was looking at the patch and wanted to confirm that |
Same question as @tmiw, it seems they are not paired. |
The register numbers passed to this macro are scratch registers used within the macro. The actual spinlock variable is the third argument. So the patch is fine, please give it a try! |
glad to see someone manage to pin the root of this issue. I got into this in #11225 but didn't dig much because it was unable to reproduce promptly. Thanks for the efforts! |
I'm currently running that patch here. So far, so good, although the original issue was pretty intermittent for me before. I'll keep running with it over the long weekend and see how it goes. |
Is this in 5.1.1 now? Didn't see this in the release notes, maybe worth to check |
Hello @ProfFan , This fix is part of the release v5.1.1. In the release notes, look for the line: xtensa: Fixed a bug that altered Tasks' FPU registers (#11690) (cadf80e) |
Hello, i have a similar problem using ESP32 WROOM with previous IDF (using arduino-esp32 2.1.7 / IDF 4.4). |
Answers checklist.
IDF version.
5.1-rc1
Operating System used.
macOS
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
ESP32-S3-MINI1N8
Power Supply used.
USB
What is the expected behavior?
FPU registers properly filled at context switch.
What is the actual behavior?
race condition where sometimes the FPU registers stay the same.
Steps to reproduce.
This behavior depends heavily on internal code but I can describe the problem:
I have a quite similar issue recently with an assertion in fmtlib hits which checks for wrong signs of a float value. However in a previous function this value has been set with a non-negative value. And upon more checking it revealed that this negative value is the register content of another thread. This indicates that the FPU context switch is indeed not done correctly (in IDF v5.1-rc).
Debug Logs.
More Information.
Similar to #11225
The text was updated successfully, but these errors were encountered: