pend() assertion can allow user threads to crash the kernel #22999

andrewboie · 2020-02-21T01:21:03Z

Describe the bug
There's an assertion deep in the kernel scheduling code which, when a thread is being pended on a wait queue, to raise an assertion of the timeout is negative and not equal to 0. For some reason, this is immediately followed by a runtime check to just set timeout to 0 if assertions aren't built in.

static void pend(struct k_thread *thread, _wait_q_t *wait_q, s32_t timeout)
{
	...

	if (timeout != K_FOREVER) {
		s32_t ticks;

		__ASSERT(timeout >= 0,
			"Only non-negative values are accepted.");

		if (timeout < 0) {
			timeout = 0;
		}

		ticks = _TICK_ALIGN + k_ms_to_ticks_ceil32(timeout);

		z_add_thread_timeout(thread, ticks);
	}
}

Unfortunately, this assertion is problematic, as any system call with a timeout parameter can cause the kernel to fail this assertion. A user thread can make a call like k_sem_take(&sem, -2) and cause a crash in kernel mode. This is particularly bad because callers of pend() are either holding some kind of spinlock or have locked interrupts; a kernel crash in a critical section is an unrecoverable condition.

User threads making system calls are not allowed to ever cause the kernel to fail an assertion or otherwise trigger a kernel panic due to the risk of kernel corruption beyond the scope of the calling thread. Fatal errors in system calls are only permitted in a controlled fashion using Z_OOPS() in syscall verification functions.

I can conceive of three ways of addressing this:

Delete the assertion. We have a runtime correction any way, what's the point?
Delete the assertion and modify pend() to propagate a return value all the way up to callers of z_pend_* APIs
Add bounds checks to the implementation functions of all system calls that accept a timeout parameter.

To Reproduce

diff --git a/tests/kernel/semaphore/semaphore/src/main.c b/tests/kernel/semaphore/semaphore/src/main.c
index 29ef44ac80..a2f8825bf5 100644
--- a/tests/kernel/semaphore/semaphore/src/main.c
+++ b/tests/kernel/semaphore/semaphore/src/main.c
@@ -432,7 +432,7 @@ void test_sem_take_timeout_forever(void)
 
        k_sem_reset(&simple_sem);
 
-       ret_value = k_sem_take(&simple_sem, K_FOREVER);
+       ret_value = k_sem_take(&simple_sem, -2);
        zassert_true(ret_value == 0, "k_sem_take failed");
        k_thread_abort(&sem_tid);

The text was updated successfully, but these errors were encountered:

andrewboie · 2020-02-21T01:24:15Z

@nordic-krch according to git blame you added this assertion, do you have time to fix this?

cc @pabigot @andyross

This assertion, if built in, allows users threads to crash the kernel in a critical section by passing a negative timeout value, creating a DoS attack vector. Remove this assertion, immediately below it there's a check which just resets it to 0 anyway. Fixes: zephyrproject-rtos#22999 Signed-off-by: Andrew Boie <[email protected]>

nordic-krch · 2020-02-21T07:27:53Z

@andrewboie it's been discussed #19591 when conversion functions changed signing and was giving different results than before for negative numbers. Because of that behavior changed when negative number was provided. Assert was added to detect cases when negative number is used and fix them.

We can remove assert but then we will not detect misuse and api states:

Non-negative waiting period to take the semaphore (in milliseconds), or one of the special values K_NO_WAIT and  K_FOREVER.

pabigot · 2020-02-21T13:45:30Z

The override of an invalid timeout to a timeout with defined behavior is there specifically to avoid unspecified behavior (probably a sleep for a very long time) when assertions are disabled, and was added because doing runtime validation of the parameter was rejected during review as being too expensive.

I'm ok with any of the proposed solutions. The one with least impact on existing behavior and code is deleting the assertion. I suspect the functional safety approved solution would be to validate the inputs at the system calls.

This assertion, if built in, allows users threads to crash the kernel in a critical section by passing a negative timeout value, creating a DoS attack vector. Remove this assertion, immediately below it there's a check which just resets it to 0 anyway. Fixes: #22999 Signed-off-by: Andrew Boie <[email protected]>

andrewboie added the bug The issue is a bug, or the PR is fixing a bug label Feb 21, 2020

andrewboie assigned nordic-krch Feb 21, 2020

andrewboie added the priority: medium Medium impact/importance bug label Feb 21, 2020

andrewboie changed the title ~~timeout assertion can allow user threads to crash the kernel~~ pend() assertion can allow user threads to crash the kernel Feb 21, 2020

andrewboie mentioned this issue Feb 21, 2020

kernel: remove problematic pend() assertion #23000

Merged

andrewboie closed this as completed in #23000 Feb 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pend() assertion can allow user threads to crash the kernel #22999

pend() assertion can allow user threads to crash the kernel #22999

andrewboie commented Feb 21, 2020 •

edited

Loading

andrewboie commented Feb 21, 2020

nordic-krch commented Feb 21, 2020

pabigot commented Feb 21, 2020

pend() assertion can allow user threads to crash the kernel #22999

pend() assertion can allow user threads to crash the kernel #22999

Comments

andrewboie commented Feb 21, 2020 • edited Loading

andrewboie commented Feb 21, 2020

nordic-krch commented Feb 21, 2020

pabigot commented Feb 21, 2020

andrewboie commented Feb 21, 2020 •

edited

Loading