-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long Running Futures Eventually Stall #3
Comments
Aww, this sounds frightening :( . I do hope it is not really Perhaps it is related to our queues. Btw - unlike async-executor, Anything you could share about that problem w.r.t. minimal reproducible case would be greatly appreciated. If your employer is OK with that, sharing your own Local executor might also be interesting. |
Just for testing... what happens if you |
I already tested it with the edge-executor = { version = "0.4", features = ["unbounded"] } I talked to my supervisor and he is okay with providing a minial reproducable example and also with sharing my own executor implementation. The question would just be, how to handle the latter part. Maybe for starters, if I push it into a private Github repo and give you access? Then we could discuss if we want to merge it into edge-executor and how. For now, I will try to build a minimal example for the bug. |
Yes - a private GH repo for the executor would be ideal, many thanks! |
Also, the following question: have you tried with |
Interesting... I cannot reproduce the issue in my minimal example. My best guess would be, that this is a compiler bug again. I've seen them before in my firmware with stuff like:
I don't know whats happening. But maybe its not a bug with |
I don't know whether this is even an option, but how about flashing (a subset of) your firmware on a RISCV MCU and trying that out instead? The compiler backend for RISCV is upstreamed in LLVM and I guess it is significantly more stable than We are currently having this miscompilation issue, which seems to be triggered by a very large alignment (padding) in Coincidentally, Chances this to be the culprit are slim, but trying out with RISCV is one option to eliminate this hypothesis; another would be to enable A third option is to try your full firmware with |
Also: are you using any async drivers from |
I think you might be getting paranoid :-) The linker would erase all traces of tokio if you don't call into it.
As I said - maybe trying it out with RISCV is an option. Definitely sounds weird though, but without at least seeing a portion of the code, I'm running out of hypotheses. |
A short update for today: I finally can reproduce the issue with my minimal example. It happens with both Whatever, it will take me until tomorrow to give you a clean report and access to my private executor. |
Hmmmmm. This might be related. Are you confining the PSRAM to just one of the two CPUs, or are you happily using |
@fko-kuptec Is this running on the stock ESP32, or on ESP32S3? Asking because - at least in theory - the ESP32S3 should not have the atomics-in-PSRAM issue I mentioned in my previous comment. (Also - and to my understanding - no RISCV chip should be affected by that.) |
I am running the tests on ESP32-S3 :) |
Ok, so either the s3 is also affected, or there is something else going on with PSRAM. In any case, your reproducible example, as well as your infallible executor would help greatly in figuring out what is going in. Especially given that you have confirmed that |
Also and if you have them handy - can you paste here other stack crashes you have recorded? |
Finally, you can find the reproduction of the bug in this repo. The README includes instructions, information on what works and what doesn't, and a few stack traces. I have also given you private access to the |
Thanks! One thing I noticed (still shooting in the dark) is that you are running both |
I tried it with
|
Note, though, that I am also running my own executor with the ESP-IDF |
Oh, and regarding RISC-V: Unfortunately, we don't have any normal ESP32 RISC-V devkits here. The only thing we have is the brand new P4 evaluation kit, but that's a different kind of beast... |
I was able to reproduce twice or trice your crashes with a stock esp32 + psram (I don't have any s3 with psram). I'm not able to reproduce your crashes without psram so far, but honestly I did not try hard enough. Even with psram, reproducing it after the first crash was a bit difficult. I have to wait quite a bit (sometimes 15+ minutes) until it happens. Neither with, nor without psram I'm not observing any delays or stalls. Might be an s3-specific behavior of the same problem. It is true that opening 3rd 4th 6th chrome window becomes slower and slower (the esp accepting 7th connection does not even work for me!) but this happens with or without psram and I think it is just how many sockets poor little esp can handle simultaneously or something I still suspect the problem is related to atomics not working Ok when they are in psram, but I can't prove it yet (noticed your own executor does not use any atomics). Will continue working on this next week, juts keep in mind it might take a bit of time (will be away on Monday). |
Thank you for taking a look :)
Without PSRAM I couldn't find any crashes, as well (see the table in the repo's README). The issue only seems to occur when PSRAM is enabled.
It's quite random, that's true. For me, however, the stalls or crashes usually start to show up within one or two minutes, maybe.
Yeah, the poor thing is not really made for that. But becomming slower is not really my concern, its just that it stalls for me under the described circumstances.
Oh, my executor is using atomics, just the ones from
I am using my own executor in our firmware, which seems to run rock-solid now. So no pressure :) |
I am currently developing a firmware for the ESP that needs to serve the camera feed as HTTP stream. For testing purposes, I reduced that further down to serving a simple JPEG animation. I initially used
edge-executor::LocalExecutor
to run my custom async HTTP server, handling every connection in a separate future spawned to the same thread-local executor. One stream to one client is therefore one (possibly infinitely) running async task.Unfortunately, I ran into some weird issues: When running multiple streams in parallel, any stream might freeze randomly and never get back to live. Sometimes, the whole ESP crashes with some unhandled exception. It does not seem to be the same exception every time, but here is one:
At some point I tried using a different executor. After replacing
edge-executor
byfutures-executor::LocalPool
, I could not reproduce these issues anymore. The streams seem to continue running without problems. Then, I've written my own version ofLocalExecutor
from scratch without using any third-party crates, and this also seems to work fine.This makes me believe, that
edge-executor
has a bug... somewhere. I have not really a clue where it is. I just found this open issue inasync-task
talking about tasks not getting rescheduled randomly. That would fit my observations, at least.Sorry for not providing sample code. I am developing the firmware as employee and therefore cannot just share our product's firmware. If you are interested, I can however try to build a minimal example, when I find the time.
The text was updated successfully, but these errors were encountered: