Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Increase a couple of test stack sizes #58459

Merged
merged 1 commit into from
Jun 1, 2023

Conversation

keith-packard
Copy link
Collaborator

mem_protect and sprintf stacks both need to be slightly larger than currently defined in order to avoid stack overflow when using picolibc.

mem_protect and sprintf stacks both need to be slightly larger than
currently defined in order to avoid stack overflow when using picolibc.

Signed-off-by: Keith Packard <[email protected]>
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicky whine: do we have a handle on exactly where the additional space is being used? Zephyr used to be extremely concerned about stack depth (thus the tests still in the tree expecting to run in ~512 bytes!). It's becoming less so as the years progress, but it's still sort of a design concern.

Basically: is this something we can fix, or is picolibc printf (or whatever) just inherently more stack hungry?

@keith-packard
Copy link
Collaborator Author

keith-packard commented May 31, 2023

Nitpicky whine: do we have a handle on exactly where the additional space is being used? Zephyr used to be extremely concerned about stack depth (thus the tests still in the tree expecting to run in ~512 bytes!). It's becoming less so as the years progress, but it's still sort of a design concern.

Yeah, I haven't spent a lot of time looking at the issues here. Back in the early days of integrating picolibc into Zephyr, I discovered a pile of stack overflow issues that only manifested because of picolibc's use of TLS. All of the stacks would be allocated in adjacent chunks of memory. When a stack overflow occurred, it would often end up writing over the upper parts of another stack segment. Before TLS started storing critical data there, this was remarkably benign as quite a bit of the upper reaches of the stack were filled with stuff that didn't matter.

Makes me wonder if some of these newer overflows aren't caused by something other than the picolibc code itself. We've got several things that change when we switch from the minimal C library to picolibc:

  1. Different C library code. However, except for things like printf, that's a pretty minor user of stack space.
  2. Enables _FORTIFY_SOURCE=1. This is not supposed to affect generated code, but I'm not entirely convinced.
  3. Enables TLS. This uses a few more bytes in every stack for some thread-local storage bits.
  4. Disables -ffreestanding. This changes a lot of the generated code in pretty complicated ways.

Basically: is this something we can fix, or is picolibc printf (or whatever) just inherently more stack hungry?

Yes, when compiled with its full C standard conformance glory, picolibc's printf likely uses more stack space, especially when printing floating point numbers. One large consumer is likely to be the support for POSIX positional parameters ("%1$d"). That requires another copy of the va_list value so printf can rewind the argument list to go backwards if necessary. On x86, that's completely trivial. On aarch64 (and any arch using register param passing), it's a pile of stack memory.

It's also not a part of any C standard, and it's optional in the picolibc build, but currently enabled by default. We could disable this for the integer-only printf version in the SDK, and maybe disable it by default for the module.

Other than that, there are stack usage differences due to how various functions are implemented underneath the Zephyr API surface. picolibc adds a FILE structure to the stack when calling printk. If you call printf directly, you save that by using the shared stdout FILE instead. Similarly, the picolibc implementation of cbprintf uses a chunk of stack memory to pass the callback data through picolibc.

I'd suggest disabling the positional parameter stuff and then doing some stack usage measurements to see how things compare.

Comment on lines +1049 to +1051
#define KERNEL_ONLY_THREAD_STACK_SIZE (ROUND_UP(1024, CONFIG_MMU_PAGE_SIZE))
#else
#define KERNEL_ONLY_THREAD_STACK_SIZE (512)
#define KERNEL_ONLY_THREAD_STACK_SIZE (1024)
Copy link
Member

@dcpleung dcpleung May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... the thread is a very simple. It "shouldn't" need much stack space.

@keith-packard
Copy link
Collaborator Author

Ok, time for some data. The test which used more than 512 bytes of stack was on qemu_riscv32_smp.

https://github.com/keith-packard/zephyr/tree/measure-stack

I instrumented tests/kernel/mem_protect/userspace/src/main.c:kernel_only_thread_entry to measure stack usage by filling the unused stack with a sentinel value (0xbaadf00d) and then checking to see how much of that value remained in the stack after calling various functions. Here's how much stack was in use after each step:

picolibc from SDK 0.16.2-rc1-1-gdc0ab61

 68: function entry
184: sem_take
360: printf
408: printk

picolibc module

 68: function entry
184: sem_take
376: printf
424: printk

minimal C library

  68: function entry
 184: sem_take
 392: printf
 440: printk

Rather unexpectedly, picolibc ends up using less stack space than the minimal C library for this test, both when used from the SDK and when built as a module with the application. Looking a bit further at the printk call stack under picolibc all the way down to the bottom of the uart driver, we see:

400	arch_irq_lock
400	z_smp_cpu_mobile
400	z_spin_unlock_valid
368	k_spin_unlock
368	uart_ns16550_poll_out
336	z_impl_uart_poll_out
336	uart_poll_out
300	console_out
300	vfprintf
224	vprintk
224	vprintk
128	printk
 64	kernel_only_thread_entry
 48	z_thread_entry

So, the task isn't using too much stack. However, when it tries to invoke k_thread_user_mode_enter, it generates an exception and at that point the stack goes deep into the weeds. Here's where it first overflows the 512 byte stack:

#0  0x80007c08 in z_spin_unlock_valid (l=l@entry=0x8002682c <uart_ns16550_dev_data_0+8>)
    at /home/keithp/zephyrproject/zephyr/kernel/thread.c:872
#1  0x80005b00 in k_spin_unlock (key=..., l=0x8002682c <uart_ns16550_dev_data_0+8>)
    at /home/keithp/zephyrproject/zephyr/include/zephyr/spinlock.h:195
#2  uart_ns16550_poll_out (dev=0x80013344 <__device_dts_ord_23>, c=<optimized out>)
    at /home/keithp/zephyrproject/zephyr/drivers/serial/uart_ns16550.c:736
#3  0x80005792 in z_impl_uart_poll_out (dev=<optimized out>, out_char=69 'E')
    at /home/keithp/zephyrproject/zephyr/include/zephyr/drivers/uart.h:583
#4  uart_poll_out (dev=<optimized out>, out_char=69 'E')
    at /home/keithp/zephyrproject/zephyr/twister-out/qemu_riscv32_smp/tests/kernel/mem_protect/userspace/kernel.memory_protection.userspace/zephyr/include/generated/syscalls/uart.h:106
#5  console_out (c=69) at /home/keithp/zephyrproject/zephyr/drivers/console/uart_console.c:102
#6  0x80012f16 in __i_vfprintf ()
#7  0x80002272 in vprintk (ap=0x80021124 <_k_thread_stack_kernel_only_thread+292>, 
    fmt=0x80014ffc "%c:      a0: %08lx    t0: %08lx\n") at /home/keithp/zephyrproject/zephyr/lib/os/printk.c:146
#8  vprintk (fmt=fmt@entry=0x80014ffc "%c:      a0: %08lx    t0: %08lx\n", 
    ap=ap@entry=0x80021124 <_k_thread_stack_kernel_only_thread+292>) at /home/keithp/zephyrproject/zephyr/lib/os/printk.c:113
#9  0x8000377e in z_log_minimal_printk (fmt=fmt@entry=0x80014ffc "%c:      a0: %08lx    t0: %08lx\n")
    at /home/keithp/zephyrproject/zephyr/subsys/logging/log_minimal.c:19
#10 0x800038d8 in z_riscv_fatal_error (reason=4, esf=0x80021150 <_k_thread_stack_kernel_only_thread+336>)
    at /home/keithp/zephyrproject/zephyr/arch/riscv/core/fatal.c:35
#11 0x800000ac in _isr_wrapper () at /home/keithp/zephyrproject/zephyr/arch/riscv/core/isr.S:270

At this point, the stack pointer is 16 bytes below the bottom of the stack, and it still hasn't finished running. It ends up writing all over a bunch more memory.

I'm not sure how this test passes when running the minimal C library; it also overwrites memory, writing lots of garbage into k_mem_domain_default. However, whatever it has written there doesn't seem to upset the code much?

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good enough for me. FWIW if user_mode_enter() is a stack bomb like that we should probably file a bug to track it and maybe see if we can mitigate it somehow. This obviously is happening before memory protection is enabled for the region, and thus user stack sizing is likely to be stingy (though on MMU architectures it's generally rounded up to 4k pages) in the expectation that protection is available.

@keith-packard
Copy link
Collaborator Author

That's good enough for me. FWIW if user_mode_enter() is a stack bomb like that we should probably file a bug to track it and maybe see if we can mitigate it somehow. This obviously is happening before memory protection is enabled for the region, and thus user stack sizing is likely to be stingy (though on MMU architectures it's generally rounded up to 4k pages) in the expectation that protection is available.

yeah, whereever it took the fault was using quite a bit of stack already. Of course, taking a fault there was an error, not a normal condition, so it might not be as much of a problem with correct code?

In any case, we've exonerated picolibc's printf for overuse of the stack, which was a more positive outcome than I had expected. It might be good to see what this looks like on aarch64 where va_list is a lot heavier weight. Although, I guess it's just three pointers and two ints, so it might not be all that terrible. Somehow I had convinced myself that you'd end up copying all of the saved register values too, and that won't happen.

@nashif nashif merged commit 091948a into zephyrproject-rtos:main Jun 1, 2023
@keith-packard keith-packard deleted the test-stack-increase branch June 1, 2023 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants