Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app: zephyr: add CONFIG_ASSERT debug overlay #6530

Merged
merged 1 commit into from
Feb 21, 2023

Conversation

kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented Nov 3, 2022

Enable CONFIG_ASSERT=y in SOF debug overlay.

Also add a commented out list of additional useful Zephyr debugging
tools to the overlay. These incur a higher runtime cost, so are left
disabled by default. These are all debug tools that have been tested to
work with SOF and found useful.

Dependencies for merging (SOF):

Dependencies for merging (Zephyr):

@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 3, 2022

FYI @nashif @teburd -- CONFIG_SPIN_LOCK_TIME_LIMIT is pretty cool. I had to set the default threshold quite high due to the busywaiting we do in zephyr/drivers/dai/intel/ssp/ (a known limitation it seems), but it was cool to see this picked up by SPIN_LOGK_TIME_LIMIT.

@teburd
Copy link
Contributor

teburd commented Nov 3, 2022

FYI @nashif @teburd -- CONFIG_SPIN_LOCK_TIME_LIMIT is pretty cool. I had to set the default threshold quite high due to the busywaiting we do in zephyr/drivers/dai/intel/ssp/ (a known limitation it seems), but it was cool to see this picked up by SPIN_LOGK_TIME_LIMIT.

Glad its working for you! Let me know if there's any ideas for improvement around this, happy to add them.

Copy link
Contributor

@teburd teburd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that my +1 counts here, but yes, this makes sense for debugging

@lgirdwood
Copy link
Member

Not that my +1 counts here, but yes, this makes sense for debugging

You count :)

@@ -1 +1,6 @@
CONFIG_DEBUG=y
CONFIG_ASSERT=y
CONFIG_STACK_SENTINEL=y
CONFIG_SYS_HEAP_VALIDATE=y
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for HEAP_VALIDATE, sof does not call it directly, no sure zephyr will call it or not.
My feeling is that this does not like stack, low rate to have problems for heap.
I mean this maybe treated as option.

Stack overflow is must.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@btian1 This has found issues in the past and it triggers an assert if corruption is hit.

@lgirdwood
Copy link
Member

@kv2019i seeing red on CI will rerun.

@lgirdwood
Copy link
Member

SOFCI TEST

1 similar comment
@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 9, 2022

SOFCI TEST

@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 9, 2022

Let me try once more. I couldn't make sense out of the failure. The tests failed because non-existing FW file, suggesting a build failure, but this overlay is not supposed to be used in CI runs. @keqiaozhang @greg-intel any insights? Let's not merge until this is clarified.

@kv2019i kv2019i marked this pull request as draft November 10, 2022 09:44
@kv2019i
Copy link
Collaborator Author

kv2019i commented Nov 10, 2022

Moving back to draft. Our CI uses CONFIG_DEBUG to turn on features like the IPC flood test. So we really can't put feature like these that do add overhead across the board, to the builds.

@kv2019i kv2019i requested a review from aborisovich February 21, 2023 11:45
@lgirdwood
Copy link
Member

@kv2019i some failing tests around pause resume, wonder if timing has changed here ?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Feb 21, 2023

@wszypelt merge/build fails to "Unhandled Exception: System.BadImageFormatException" in IPC4 smoketest. Is this a failure in test runner, or actual failing case? I can't seem to find any errors related to test execution. @aborisovich you did a recent check with asserts enabled, right?

As for SOF test with Linux driver, I can't believe we hit an assert again:
https://sof-ci.01.org/sofpr/PR6530/build4058/devicetest/index.html

https://sof-ci.01.org/sofpr/PR6530/build4058/devicetest/index.html?model=ADLP_RVP_SDW_IPC4ZPH&testcase=multiple-pause-resume-5

ASSERTION FAIL [free_frames >= pending_frames] @ /srv/home/jenkins/workspace/sof_generic_build/sof/src/audio/mixin_mixout/mixin_mixout.c:368
[    6.343360] <err> os:  ** FATAL EXCEPTION
[    6.343370] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[    6.343378] <err> os:  **  PC 0xbe050f5a VADDR (nil)

This is not the same as previously fixed in #6900. This has been passing many times in last few weeks, so either this is a very recent regression (in other code), or has a low occurence rate.

So back to draft status :(

@kv2019i kv2019i changed the title app: zephyr: add CONFIG_ASSERT debug overlay [DNM] app: zephyr: add CONFIG_ASSERT debug overlay Feb 21, 2023
Enable CONFIG_ASSERT=y in SOF debug overlay.

Also add a commented out list of additional useful Zephyr debugging
tools to the overlay. These incur a higher runtime cost, so are left
disabled by default. These are all debug tools that have been tested to
work with SOF and found useful.

Signed-off-by: Kai Vehmanen <[email protected]>
@kv2019i kv2019i force-pushed the 202211-debug-overlay-addons branch from 216841d to ac7c228 Compare February 21, 2023 15:55
@aborisovich
Copy link
Contributor

@wszypelt merge/build fails to "Unhandled Exception: System.BadImageFormatException" in IPC4 smoketest. Is this a failure in test runner, or actual failing case? I can't seem to find any errors related to test execution. @aborisovich you did a recent check with asserts enabled, right?

As for SOF test with Linux driver, I can't believe we hit an assert again: https://sof-ci.01.org/sofpr/PR6530/build4058/devicetest/index.html

https://sof-ci.01.org/sofpr/PR6530/build4058/devicetest/index.html?model=ADLP_RVP_SDW_IPC4ZPH&testcase=multiple-pause-resume-5

ASSERTION FAIL [free_frames >= pending_frames] @ /srv/home/jenkins/workspace/sof_generic_build/sof/src/audio/mixin_mixout/mixin_mixout.c:368
[    6.343360] <err> os:  ** FATAL EXCEPTION
[    6.343370] <err> os:  ** CPU 0 EXCCAUSE 63 (zephyr exception)
[    6.343378] <err> os:  **  PC 0xbe050f5a VADDR (nil)

This is not the same as previously fixed in #6900. This has been passing many times in last few weeks, so either this is a very recent regression (in other code), or has a low occurence rate.

So back to draft status :(

Looks like a test runner issue.
I've executed full scope tests for MTL on 05.02.2023 so it was a while ago, maybe something new came up?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Feb 21, 2023

Another update:

@kv2019i kv2019i changed the title [DNM] app: zephyr: add CONFIG_ASSERT debug overlay app: zephyr: add CONFIG_ASSERT debug overlay Feb 21, 2023
@kv2019i
Copy link
Collaborator Author

kv2019i commented Feb 21, 2023

@aborisovich Is this ok to you, you still have "changes requested" on this PR?

@kv2019i
Copy link
Collaborator Author

kv2019i commented Feb 21, 2023

After rebase, tests are now good so we are finally ready to go.

Copy link
Contributor

@aborisovich aborisovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go

@lgirdwood lgirdwood merged commit 4d67d2f into thesofproject:main Feb 21, 2023
@teburd
Copy link
Contributor

teburd commented Feb 21, 2023

🥳

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 24, 2023

After a relatively painful git bisect I found that this has the unfortunate effect of making even the stripped ELF file non-reproducible :-(

gunzip -c ../build-sof-staging/sof-info/tgl/stripped-zephyr.elf.gz | strings now shows some absolute paths. It also shows a lot of relative paths so maybe there is hope?

cc: @lyakh , @andyross

@marc-hb
Copy link
Collaborator

marc-hb commented Mar 24, 2023

I just remembered I fixed this with -fmacro-prefix-map in Zephyr 4 years ago :

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 10, 2023

After a relatively painful git bisect I found that this has the unfortunate effect of making even the stripped ELF file non-reproducible :-(

Good news and bad news.

Good news: mystery solved. This affects only old toolchains like the 2017 and gcc v4 based xt-xcc currently used for TGL.

https://reproducible-builds.org/docs/build-path/

-fmacro-prefix-map=OLD=NEW is similar to -fdebug-prefix-map, but addresses unreproducibility due to the use of FILE macros in assert calls for example. (available since GCC 8 and Clang 10)

Bad news: this will never be fixed for TGL, see issue #7114 why.

marc-hb added a commit to marc-hb/sof that referenced this pull request Apr 25, 2023
This makes sure Zephyr's -fmacro-prefix-map is working and keeps the
builds reproducible when using a recent enough toolchain.

As found in the CONFIG_ASSERT PR
thesofproject#6530 (comment)
this is not true for old Xtensa toolchains.

Signed-off-by: Marc Herbert <[email protected]>
kv2019i pushed a commit that referenced this pull request Apr 25, 2023
This makes sure Zephyr's -fmacro-prefix-map is working and keeps the
builds reproducible when using a recent enough toolchain.

As found in the CONFIG_ASSERT PR
#6530 (comment)
this is not true for old Xtensa toolchains.

Signed-off-by: Marc Herbert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.