-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simultaneous playback and capture will cause IPC timed out #266
Comments
Comments by Jakub: You can find fw build I used attached in zip (it has clearer ipc triger presentation in fw logs so counting is easier). " |
Comment to missing traces - @keqiaozhang did not see this issue and provided fw traces. Kernel side log on the other hand states that ipc replay was not sent in time. (inline) [ 92.037887] sof-audio sof-audio: pcm: trigger stream 1 dir 1 cmd 1 |
I can still see this issue with latest codes. |
@dabekjakub from the logs you pasted, looks FW received the trigger start msg, but handle/schedule too late? |
Interesting bug. |
I investigated the FW logs and every trigger was handled in the same time @keyonjie and there were no indications of timeout on FW side that is why i need someone from kernel to look at it. I am 99.99% sure that we write response correctly and in time. |
it looks there is extra delay on some part: [ 92.344758] sof-audio sof-audio: error: rx list empty but received 0x0 fw response recieved for the ipc that timed out the response is too late, longer than 300ms. |
it is not easy for me to reduplicate it now, with SOF glk-004-drop-stable and linux topic/cros-sof-v4.14-rebase, I tried the step #1 for more than 100 times, but no ipc timeout or xrun happens. |
As i said before - i know that there is a timeout in log that points to fw. But when it comes to FW ipc handling i saw no delay in response in fw traces. So FW handled ipc (we handle it as IRQ) as fast as it was possible it was not even close to 300ms. That is why i ask for kernel dev to check it out. Maybe the response is already there but system has some kind of hickup on reading the memory window. So simply trace time between sending and reading response. If You try to read it many times during this 300ms then it means that fw either didn't recive or respond to in time and i can continiue. |
we miss an interrupt here on driver side, under investigation about why this happens. |
|
with the latest code base, I can't find interrupt missing issue. Here summarize what I found:
I believe CoE friends can do similar slim driver scripts to duplicate the same issue(if not, please contact with me). So, to resolve this kind of issues thoroughly, IMO, we need to:
@dabekjakub @lgirdwood @plbossart @mmaka1 any comments? |
I haven't seen any xruns in fw when debugging this issue. However , @slawblauciak has uncovered reason for xruns and proposed a fix thesofproject/sof#645. We should recheck this issue after proposed fix will be implemented. |
@keyonjie can you please check this issue with thesofproject/sof#692 |
@ranj063 so for this kind of ipc timeout, we need to confirm what stage we actually failed: |
I found that sometimes scheduler crash before Xrun happens(looks like copy task is not scheduled then xrun happens), there might be scheduler bug. |
I have done several optimizations, now it can pass 200 (2playback+1capture+1dma-trace) iterations. Still hit several corner cases after long enough try, e.g. 300 or 500 iterations, one classical one as below:
|
as this issue mostly happens in FW side now(e.g. FW hang, FW panic, FW can't get IPC interrupts, ...), let me duplicate this in sof issue. |
@mengdonglin is this still open or can we close? |
last call @mengdonglin I will close at the end of the week if there is no update. |
Confirmed that this issue is fixed on GLK. |
…om-kernel-result rust/kernel: move from_kernel_result! macro to error.rs
Add various tests to check maximum number of supported programs being attached: # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.185325] bpf_testmod: loading out-of-tree module taints kernel. [ 1.186826] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel [ 1.270123] tsc: Refined TSC clocksource calibration: 3407.988 MHz [ 1.272428] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fc932722, max_idle_ns: 440795381586 ns [ 1.276408] clocksource: Switched to clocksource tsc #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK <--- (new test) #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_replace:OK #269 tc_opts_revision:OK Summary: 18/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
Add a new test case which performs double query of the bpf_mprog through libbpf API, but also via raw bpf(2) syscall. This is testing to gather first the count and then in a subsequent probe the full information with the program array without clearing passed structs in between. # ./vmtest.sh -- ./test_progs -t tc_opts [...] ./test_progs -t tc_opts [ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz [ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns [ 1.402734] clocksource: Switched to clocksource tsc [ 1.426639] bpf_testmod: loading out-of-tree module taints kernel. [ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK <--- (new test) #269 tc_opts_replace:OK #270 tc_opts_revision:OK Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
Add a new test case to query on an empty bpf_mprog and pass the revision directly into expected_revision for attachment to assert that this does succeed. ./test_progs -t tc_opts [ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz [ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns [ 1.412419] clocksource: Switched to clocksource tsc [ 1.428671] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #252 tc_opts_after:OK #253 tc_opts_append:OK #254 tc_opts_basic:OK #255 tc_opts_before:OK #256 tc_opts_chain_classic:OK #257 tc_opts_chain_mixed:OK #258 tc_opts_delete_empty:OK #259 tc_opts_demixed:OK #260 tc_opts_detach:OK #261 tc_opts_detach_after:OK #262 tc_opts_detach_before:OK #263 tc_opts_dev_cleanup:OK #264 tc_opts_invalid:OK #265 tc_opts_max:OK #266 tc_opts_mixed:OK #267 tc_opts_prepend:OK #268 tc_opts_query:OK #269 tc_opts_query_attach:OK <--- (new test) #270 tc_opts_replace:OK #271 tc_opts_revision:OK Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
This issue happens when running playback and capture at same time. Possibility: 5%~10%.
How to reproduce:
Error log:
chronos@localhost ~ $ aplay -Dhw:0,1 /root/100sine.wav -d 5 & arecord -Dhw:0,1 -f dat -c 2 test9.wav -d 5
[1] 2489
Recording WAVE 'test9.wav' : Signed 16 bit Little Endian, Playing WAVE '/root/100sine.wav' : Rate 48000 Hz, Signed 16 bit Little Endian, StereoRate 48000 Hz,
Stereo
overrun!!! (at least 0.111 ms long)
aplay: pcm_write:2051: write error: Input/output error
[1]+ Exit 1 aplay -Dhw:0,1 /root/100sine.wav -d 5
The text was updated successfully, but these errors were encountered: