Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Zephyr panic with pause-resume test on LNL/nocodec #9245

Closed
kv2019i opened this issue Jun 20, 2024 · 10 comments
Closed

[BUG] Zephyr panic with pause-resume test on LNL/nocodec #9245

kv2019i opened this issue Jun 20, 2024 · 10 comments
Assignees
Labels
bug Something isn't working as expected LNL Applies to Lunar Lake platform P1 Blocker bugs or important features

Comments

@kv2019i
Copy link
Collaborator

kv2019i commented Jun 20, 2024

Describe the bug
CI test started failing with:

https://sof-ci.01.org/sofpr/PR9220/build5593/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-pause-resume-capture-100

To Reproduce
See https://sof-ci.01.org/sofpr/PR9220/build5593/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-pause-resume-capture-100

Reproduction Rate
100% in a PR CI test plan that runs pause-resume test for 100 iterations

Expected behavior
No Zephyr panics

Impact

Environment
See https://sof-ci.01.org/sofpr/PR9220/build5593/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-pause-resume-capture-100

Screenshots or console output

[  910.023140] <inf> copier: copier_prepare: comp:0 0x4 copier_prepare()
[  910.023153] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 7
[  910.023160] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0119900 0x20210U priority 0 flags 0x0
[  910.023551] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x30004 no bytes to copy, available samples: 0, free_samples: 384
[  910.023561] <inf> dai_intel_ssp: dai_ssp_get_properties: SSP0: fifo 164112, handshake 1, init delay 0
[  910.023570] <inf> dai_intel_ssp: dai_ssp_early_start: SSP0 RX
[  910.023576] <err> os: print_fatal_exception:  ** FATAL EXCEPTION
[  910.023583] <err> os: print_fatal_exception:  ** CPU 0 EXCCAUSE 13 (load/store PIF data error)
[  910.023586] <err> os: print_fatal_exception:  **  PC 0xa003ae6a VADDR 0x28108
[  910.023590] <err> os: print_fatal_exception:  **  PS 0x60720
[  910.023593] <err> os: print_fatal_exception:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:7 CALLINC:2)
[  910.023596] <err> os: xtensa_dump_stack:  **  A0 0xa0062902  SP 0xa00f53e0  A2 0x4010ce1c  A3 0x400f0c30
[  910.023600] <err> os: xtensa_dump_stack:  **  A4 0x60d20  A5 0x400f0c30  A6 0x28100  A7 0xa00f53e0
[  910.023603] <err> os: xtensa_dump_stack:  **  A8 0xa003ae60  A9 0xa00f5370 A10 0xa00ebf88 A11 0x20c0
[  910.023606] <err> os: xtensa_dump_stack:  ** A12 0xfff001ff A13 0x10 A14 0x4010a4f0 A15 0xa00f5370
[  910.023610] <err> os: xtensa_dump_stack:  ** LBEG 0xa0037305 LEND 0xa0037314 LCOUNT 0xa005fe27
[  910.023613] <err> os: xtensa_dump_stack:  ** SAR 0x3
[  910.023616] <err> os: xtensa_dump_stack:  **  THREADPTR (nil)




Backtrace:0xa003ae67:0xa00f53e0 0xa00628ff:0xa00f5410 0xa0071cb2:0xa00f5460 0xa0071ada:0xa00f5490 0xa007aa05:0xa00f54c0 0xa007a0a3:0xa00f54e0 0xa00749d0:0xa00f5580 0xa004285d:0xa00f55a0 0xa00423a2:0xa00f55f0 0xa004294e:0xa00f5620 0xa00423a2:0xa00f5670 0xa004294e:0xa00f56a0 0xa006a868:0xa00f56f0 0xa006a521:0xa00f5760 0xa006a497:0xa00f5790 0xa003f3ee:0xa00f57d0 0xa00680c0:0xa00f5800 0xa006ac93:0xa00f5820 0xa005f547:0xa00f5870 



[  910.023751] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[  910.023756] <err> os: z_fatal_error: Current thread: 0x4010d3b8 (unknown)
[  910.025443] <err> zephyr: k_sys_fatal_error_handler: Halting syste
@kv2019i kv2019i added bug Something isn't working as expected P1 Blocker bugs or important features LNL Applies to Lunar Lake platform labels Jun 20, 2024
@kv2019i
Copy link
Collaborator Author

kv2019i commented Jun 20, 2024

@dnikodem has already rootcaused this and is working on a fix.

@dnikodem
Copy link
Contributor

Potential fix: zephyrproject-rtos/zephyr#74621 - I am waiting for CI and validation results

@marc-hb marc-hb mentioned this issue Jun 20, 2024
@dnikodem
Copy link
Contributor

The issue is not reproduced with zephyrproject-rtos/zephyr#74621 - please help with review the fix. Thanks

@bardliao
Copy link
Collaborator

check-pause-resume-capture-100.sh and multiple-pause-resume-50.sh failed on MTL Nocodec daily test, too. Not sure if it is the same issue or not.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 2, 2024

I just reviewed daily test runs and I can confirm that these tests were always passing before June 19th and have always been panicing after June 19th

Last daily test run passing ID 42709
SOF Commit: 5f5fdb6cf97c
Zephyr Commit: 0a3f2f0397a8

First daily test run crashing ID 42825
SOF Commit: 3da8e6474531
Zephyr Commit: a2386efbce18

Tentative fix zephyrproject-rtos/zephyr#74621 was merged as a7e9be60cfd7 a few days ago but sof/west.yml probably does not have it yet.

EDIT: indeed not, sof/west.yml is still at 97a97c744 for now.

We should probably wait for the MTL panic fix (#9268) and get both in one go.

Reproduction today:
https://sof-ci.01.org/softestpr/PR1213/build576/devicetest/index.html?model=LNLM_RVP_NOCODEC&testcase=check-pause-resume-capture-100

@ssavati
Copy link

ssavati commented Jul 3, 2024

@kv2019i this issue not seen for a while but yesterday CI results hava this failure planresultdetail/43335

@dnikodem
Copy link
Contributor

dnikodem commented Jul 3, 2024

The mentioned fix (zephyrproject-rtos/zephyr#74621) is merged to zephyr branch but sof/west.yml does not have it yet. We need to wait for update west file.

I have tested the fix with SOF CI - I have not been able to reproduce this problem. CI: #9251

@kv2019i
Copy link
Collaborator Author

kv2019i commented Jul 4, 2024

Fix merged via #9278 . Test results show the problem to be fixed https://sof-ci.01.org/sofpr/PR9174/build6212/devicetest/index.html (one fail, but no Zephyr OS panic seen in the logs so a differnet issue). Closing.

@kv2019i kv2019i closed this as completed Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected LNL Applies to Lunar Lake platform P1 Blocker bugs or important features
Projects
None yet
Development

No branches or pull requests

6 participants