Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Plug/unplug on headset stress breaks hw_params #4970

Closed
cujomalainey opened this issue Nov 8, 2021 · 19 comments
Closed

[BUG] Plug/unplug on headset stress breaks hw_params #4970

cujomalainey opened this issue Nov 8, 2021 · 19 comments
Assignees
Labels
bug Something isn't working as expected P1 Blocker bugs or important features

Comments

@cujomalainey
Copy link
Member

Describe the bug
Toggling between outputs/inputs breaks SOF, this is another bug found after a workaround for #4769

To Reproduce

cras_test_client --listen /dev/null &
cras_test_client --capture_f /dev/null &
while true; do cras_test_client --plug 11:0:1 && cras_test_client --plug 8:0:1 && sleep 0.1 && cras_test_client --plug 11:0:0 && cras_test_client --plug 8:0:0 && sleep 0.1; done

on a chromebook loaded with NC and Hotword to really stress the DSP.

Reproduction Rate
Indeterminate

Expected behavior
DSP would stay alive

Impact
showstopper

Environment

  1. Branch name and commit hash of the 2 repositories: sof (firmware/topology) and linux (kernel driver).
    • Kernel: Chromeos-5.4 tree
    • SOF: TGL-013 drop stable + some external patches to enable NC
  2. Name of the topology file
  3. Name of the platform(s) on which the bug is observed.
    • Platform: multiple TGL files

Screenshots or console output
You will see signs of stress such as
[ 356.374910] sof-audio-pci 0000:00:1f.3: error: hda_dsp_stream_trigger: cmd 0: timeout on STREAM_SD_OFFSET read

and

[  371.388866] sof-audio-pci 0000:00:1f.3: error: ipc error for 0x60050000 size 12
[  371.389477] sof-audio-pci 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_trigger on 0000:00:1f.3: -62
[  371.389482]  DMIC: ASoC: trigger FE cmd: 0 failed: -62

But the actual error is when hw_params starts failing

[ 4094.534023] sof-audio-pci 0000:00:1f.3: error: ipc error for 0x60010000 size 20                                                            
[ 4094.534029] sof-audio-pci 0000:00:1f.3: error: hw params ipc failed for stream 2                                                           
[ 4094.534031] sof-audio-pci 0000:00:1f.3: ASoC: error at snd_soc_pcm_component_hw_params on 0000:00:1f.3: -19                                
[ 4094.534035]  Headset: ASoC: hw_params FE failed -19 

^this requires a reboot to recover from

It is also not specific to headset hw_params, it can break dmic as well

@cujomalainey cujomalainey added bug Something isn't working as expected P1 Blocker bugs or important features labels Nov 8, 2021
@johnylin76
Copy link
Contributor

The firmware we use is built by TGL-013 drop stable (including PR#4833). The topology file is sof-tgl-max98373-rt5682-igonr.tplg

@cujomalainey
Copy link
Member Author

here is a fw log of the crash. Crash starts at line 186676 from what I can tell fw_log.txt

@lgirdwood
Copy link
Member

You will see signs of stress such as
[ 356.374910] sof-audio-pci 0000:00:1f.3: error: hda_dsp_stream_trigger: cmd 0: timeout on STREAM_SD_OFFSET read

This has been fixed in recent kernel and FW, it causes the host DMA to get stuck and requires a reboot.
@bkokoszx do you have all the info you need to backport the fix ? It involves a change to the IPC flow in order to shutdown the DMA in the correct host/FW sequence.
@sathya-nujella this also some kernel patches that need a backport (I think they are now merged upstream).
@plbossart @ranj063 fyi

@cujomalainey
Copy link
Member Author

This would be needed on TGL-013 or a new TGL branch

@sathya-nujella
Copy link
Contributor

You will see signs of stress such as
[ 356.374910] sof-audio-pci 0000:00:1f.3: error: hda_dsp_stream_trigger: cmd 0: timeout on STREAM_SD_OFFSET read

I saw these prints in my local tests, but it was not fatal in my case. Audio worked even though I saw these prints.
Not sure if I am missing some thing.


This has been fixed in recent kernel and FW, it causes the host DMA to get stuck and requires a reboot. @bkokoszx do you have all the info you need to backport the fix ? It involves a change to the IPC flow in order to shutdown the DMA in the correct host/FW sequence. @sathya-nujella this also some kernel patches that need a backport (I think they are now merged upstream). @plbossart @ranj063 fyi

Sure Liam, thank you.

@bkokoszx , can you please help give a test fw branch out of tgl-13 including the fixes as @lgirdwood mentioned for cross checking for this particular issue ?

@plbossart
Copy link
Member

I saw these prints in my local tests, but it was not fatal in my case. Audio worked even though I saw these prints.

If you allowed the device to suspend, then it's a recoverable error. but in stress tests or in cases where the device doesn't suspend there's no way to recover.

@bkokoszx
Copy link
Collaborator

bkokoszx commented Nov 16, 2021

Hi @lgirdwood
Do you mean to backport those changes:
#4922
on a firmware side?

@lgirdwood
Copy link
Member

@bkokoszx yes exactly those, btw there are kernel updates needed too as the IPC flow has a changed. @sathya-nujella I guess you can see the kernel patches to back port.
@mwasko fyi.

@cujomalainey
Copy link
Member Author

ADL-003 would also need to be patched as I suspect this is a load issue related to NC algorithms.

@sathya-nujella
Copy link
Contributor

with this,
tgl-13 private branch with NC patches
+
#5006

I am not able to reproduce: "sof-audio-pci 0000:00:1f.3: error: hda_dsp_stream_trigger: cmd 0: timeout on STREAM_SD_OFFSET read". Thank you to @ranj063 for sharing this minimalistic changes to avoid the issue.

Hi @cujomalainey , @johnylin76,
I ran the script with the above firmware and don't see the IPC issue reported. Need your help to cross check. Thank you.

@ranj063
Copy link
Collaborator

ranj063 commented Nov 19, 2021

I am not able to reproduce: "sof-audio-pci 0000:00:1f.3: error: hda_dsp_stream_trigger: cmd 0: timeout on STREAM_SD_OFFSET read". Thank you to @ranj063 for sharing this minimalistic changes to avoid the issue.

@sathya-nujella just to clarify, this is only meant as a hot fix for TGL. For ADL, we should backporrt both the kernel and FW fixes fully.

@bkokoszx
Copy link
Collaborator

@sathya-nujella
Copy link
Contributor

@ranj063 @sathya-nujella @cujomalainey @lgirdwood Recently created https://github.com/thesofproject/sof/tree/adl-004-drop-stable is based on https://github.com/thesofproject/sof/tree/stable-v1.9, so it already contains "HDA DMA sequence refining" changes.

Adding @sathyap-chrome for ADL visibility.

@cujomalainey
Copy link
Member Author

@johnylin76 can you produce a FW build and share internally with NC so we can stress test it? Thanks @sathya-nujella and @bkokoszx

@johnylin76
Copy link
Contributor

I have verified on my Drobit device by running the script above with #5006. The issue is not reproducible then.
I also generated the image for the full verification by our QA team. We are able to pass the manual stressing test on both Drobit and Delbin with #5006.

@sathya-nujella
Copy link
Contributor

I have verified on my Drobit device by running the script above with #5006. The issue is not reproducible then. I also generated the image for the full verification by our QA team. We are able to pass the manual stressing test on both Drobit and Delbin with #5006.

Thank you @johnylin76 for sharing test observations. Based on this, I have removed TEST label from PR. Requested team to review & merge: #5006.

@johnylin76
Copy link
Contributor

Uploaded the firmware and ldc file with #5006
sof-tgl-fw-pr5006.tar.gz

@lgirdwood
Copy link
Member

@bkokoszx @mwasko fwiw #5006 LGTM, please merge if you agree.

@cujomalainey
Copy link
Member Author

#5006 fixed this according to QA, thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected P1 Blocker bugs or important features
Projects
None yet
Development

No branches or pull requests

7 participants