Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vtimer_msg crashes after some time - unless there is a delay between msg_receive(&m); and accessing m #1811

Closed
benpicco opened this issue Oct 13, 2014 · 20 comments
Assignees
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@benpicco
Copy link
Contributor

Now this one had me puzzled for quite a bit.
On ek-tm4c123gxl tests/vtimer_msg crashes after ~53s - that is, just to make debugging even more confusing, shortly after the hardware timer overflows for the first time.
It also crashes on native after 10s, but I know that native isn't quite as stable, so I disregarded that.

Now when adding a short delay after msg_receive(&m); in timer_thread, the crash goes away:

--- vtimer_msg/main.c   2014-10-10 20:23:59.876720869 +0200
+++ vtimer_msg/main.c   2014-10-13 23:14:48.024480356 +0200
@@ -26,6 +26,8 @@
 #include "thread.h"
 #include "msg.h"

+#include "board.h"
+
 char timer_stack[KERNEL_CONF_STACKSIZE_MAIN];
 char timer_stack_local[KERNEL_CONF_STACKSIZE_MAIN];

@@ -55,6 +57,11 @@
     while (1) {
         msg_t m;
         msg_receive(&m);
+
+        for (int i = 0; i < 10000; ++i) 
+          GREEN_LED_TOGGLE;
+
+//        puts("message received");
         struct timer_msg *tmsg = (struct timer_msg *) m.content.ptr;
         vtimer_now(&now);
         printf("now=%" PRIu32 ":%" PRIu32 " -> every %" PRIu32 ".%" PRIu32 "s: %s\n",
@benpicco benpicco added the Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) label Oct 13, 2014
@N8Fear
Copy link

N8Fear commented Oct 13, 2014

As stated on IRC: I've got a similar issue on the mega2560 - so either similar code (I think of you, hwtimer_arch.c) or something more general...
My impression from trying to debug it is that the mutex doesn't really work the way it should (something like a race condition).

@benpicco
Copy link
Contributor Author

Some debugging with turning LEDs on and off suggests the crash happens in _msg_receive at tcb_t *me = (tcb_t*) sched_threads[sched_active_pid];
What does that mean? And why does it work if this assignment is delayed?

@Kijewski
Copy link
Contributor

sched_threads[sched_active_pid] is nonsense anyways.
Can you please try again with *me = (tcb_t*) sched_active_thread?

@benpicco
Copy link
Contributor Author

Hm, that just moves the place where the fault happens around, looks like I've got some memory corruption going on

2014-10-14 14:04:12,869 - INFO # _msg_receive(): timer local: No msg in queue. Going blocked.
2014-10-14 14:04:13,379 - INFO # Hard fault happened in thread 1
2014-10-14 14:04:13,385 - INFO #    pid | name                 | state    Q | pri | location
2014-10-14 14:04:13,390 - INFO #      1 | idle                 | running  Q |  15 | 0x200000ec
2014-10-14 14:04:13,395 - INFO #      2 | main                 | bl mutex _ |   7 | 0x200001ec
2014-10-14 14:04:13,400 - INFO #    8192 | timer                | bl rx    _ |   6 | 0x20000c78
2014-10-14 14:04:13,405 - INFO #      4 | timer local          | bl rx    _ |   6 | 0x20001878

@OlegHahm
Copy link
Member

Try to build with DEVELHELP and put a thread_print_all() into the hard fault handler.

@LudwigKnuepfer LudwigKnuepfer changed the title vtimer_test crashes after some time - unless there is a delay between msg_receive(&m); and accessing m vtimer_msg crashes after some time - unless there is a delay between msg_receive(&m); and accessing m Oct 27, 2014
@LudwigKnuepfer
Copy link
Member

native does not crash on my laptop (Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz).

@LudwigKnuepfer
Copy link
Member

@N8Fear @benpicco can you retry with

diff --git a/core/hwtimer.c b/core/hwtimer.c
index e878ffb..92f634a 100644
--- a/core/hwtimer.c
+++ b/core/hwtimer.c
@@ -148,16 +148,12 @@ static int _hwtimer_set(unsigned long offset, void (*callback)(void*), void *ptr
 {
     DEBUG("_hwtimer_set: offset=%lu callback=%p ptr=%p absolute=%d\n", offset, callback, ptr, absolute);

-    if (!inISR()) {
-        dINT();
-    }
+    unsigned state = disableIRQ();

     int n = lifo_get(lifo);

     if (n == -1) {
-        if (!inISR()) {
-            eINT();
-        }
+        restoreIRQ(state);

         puts("No hwtimer left.");
         return -1;
@@ -177,9 +173,7 @@ static int _hwtimer_set(unsigned long offset, void (*callback)(void*), void *ptr

     lpm_prevent_sleep++;

-    if (!inISR()) {
-        eINT();
-    }
+    restoreIRQ(state);

     return n;
 }

@benpicco
Copy link
Contributor Author

native does not crash on my laptop (Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz).

I've just checked out the latest master branch, I get

RIOT native interrupts/signals initialized.
LED_GREEN_OFF
LED_RED_ON
RIOT native board initialized.
RIOT native hardware initialization complete.

kernel_init(): This is RIOT! (Version: 2014.05-1140-gb7fc)
kernel_init(): jumping into first task...
This is thread 3
sending 1st msg
now=0:323 -> every 2.0s: Hello World
timer_thread: set timer successfully
sending 2nd msg
now=0:353 -> every 5.0s: This is a Test
timer_thread: set timer successfully
This is thread 4
sec=1 min=0 hour=0
now=2:481 -> every 2.0s: Hello World
timer_thread: set timer successfully
now=2:522 -> every 5.0s: This is a Test
timer_thread: set timer successfully
sec=2 min=0 hour=0
sec=3 min=0 hour=0
now=4:671 -> every 5.0s: This is a Test
timer_thread: set timer successfully
now=4:715 -> every 5.0s: This is a Test
timer_thread: set timer successfully
sec=4 min=0 hour=0
sec=5 min=0 hour=0
sec=6 min=0 hour=0
sec=7 min=0 hour=0
sec=8 min=0 hour=0


then it hangs.

If I add usleep(1000); between msg_receive(&m); and struct timer_msg *tmsg = (struct timer_msg *) m.content.ptr; it keeps going till sec=12, but the 'Hello World' thread seems to have died in the process:

RIOT native interrupts/signals initialized.
LED_GREEN_OFF
LED_RED_ON
RIOT native board initialized.
RIOT native hardware initialization complete.

kernel_init(): This is RIOT! (Version: 2014.05-1140-gb7fc)
kernel_init(): jumping into first task...
This is thread 3
sending 1st msg
now=0:1313 -> every 2.0s: Hello World
timer_thread: set timer successfully
sending 2nd msg
now=0:2432 -> every 5.0s: This is a Test
timer_thread: set timer successfully
This is thread 4
sec=1 min=0 hour=0
now=2:2453 -> every 2.0s: Hello World
timer_thread: set timer successfully
sec=2 min=0 hour=0
sec=3 min=0 hour=0
now=4:3718 -> every 2.0s: Hello World
timer_thread: set timer successfully
now=4:4885 -> every 5.0s: This is a Test
timer_thread: set timer successfully
sec=4 min=0 hour=0
sec=5 min=0 hour=0
now=6:4872 -> every 2.0s: Hello World
timer_thread: set timer successfully
sec=6 min=0 hour=0
sec=7 min=0 hour=0
now=8:6170 -> every 5.0s: This is a Test
timer_thread: set timer successfully
now=8:7364 -> every 5.0s: This is a Test
timer_thread: set timer successfully
sec=8 min=0 hour=0
sec=9 min=0 hour=0
sec=10 min=0 hour=0
sec=11 min=0 hour=0
sec=12 min=0 hour=0

Intel(R) Core(TM) i5-4430 CPU @ 3.00GHz if that matters, gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2

@LudwigKnuepfer
Copy link
Member

@N8Fear @benpicco can you retry with
...

@kaspar030 just opened a PR for that: #1883

@N8Fear
Copy link

N8Fear commented Oct 27, 2014

now=52:7325 -> every 2.0s: Hello World
timer_thread: set timer successfully
now=52:7412 -> every 5.0s: This is a Test
timer_thread: set timer successfully
sec=52 min=0 hour=0
sec=53 min=0 hour=0
now=54:7608 -> every 2.0s: Hello World
timer_thread: set timer successfully
now=54:7655 -> every 5.0s: This is a Test
timer_thread: set timer successfully
sec=54 min=0 hour=0
sec=55 min=0 hour=0
now=56:7780 -> every 2.0s: Hello World
timer_thread: set timer successfully
now=56:7826 -> every 5.0s: This is a Test
timer_thread: set timer successfully

Just tested with #1883 on native - both timers run on ~2 sec. Haven't got my hardware at hand so I can't test on the mega2560 right now.

Edit: Note that this seems to be the same for current master on my laptop as well (with a single "Hello World" every ~18 sek).

@OlegHahm
Copy link
Member

Weird, I'm not even able to reproduce #547 (comment) It's not crashing, but the output is wrong anyway.

kernel_init(): jumping into first task...
This is thread 2
sending 1st msg
now=0:138 -> every 2.0s: Hello World
timer_thread: set timer successfully
sending 2nd msg
now=0:154 -> every 5.0s: This is a Test
timer_thread: set timer successfully
This is thread 3
sec=1 min=0 hour=0
now=2:218 -> every 2.0s: Hello World
timer_thread: set timer successfully
sec=2 min=0 hour=0
sec=3 min=0 hour=0
now=4:285 -> every 2.0s: Hello World
timer_thread: set timer successfully
sec=4 min=0 hour=0
now=5:200 -> every 5.0s: This is a Test
timer_thread: set timer successfully
sec=5 min=0 hour=0
now=6:316 -> every 2.0s: Hello World
timer_thread: set timer successfully
sec=6 min=0 hour=0
sec=7 min=0 hour=0
now=8:424 -> every 2.0s: Hello World
timer_thread: set timer successfully
now=8:447 -> every 5.0s: This is a Test
timer_thread: set timer successfully

Update: The output varies and it is crashing from time to time.

@LudwigKnuepfer
Copy link
Member

On my desktop computer (Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz, same software setup) I can achieve crashes in native as well, sadly not when running with Valgrind though.

@LudwigKnuepfer
Copy link
Member

And: #1883 does not change this.

@N8Fear
Copy link

N8Fear commented Oct 28, 2014

From my tries at debugging this (or something similar) on the mega2560 I guess it's the mutex in core/hwtimer.c's hwtimer_wait(...) function that somehow gets corrupted/raced/whatevered...

@LudwigKnuepfer
Copy link
Member

Changing core/msg.c irq API usage (#1887) does not help either.

@OlegHahm
Copy link
Member

Funny thing: in current master it misbehaves and crashes on my desktop, but runs correctly on my laptop...

@miri64
Copy link
Member

miri64 commented Mar 30, 2015

Is this issue fixed by now?

@benpicco
Copy link
Contributor Author

It's still broken on native - although in a different way now:

kernel_init(): This is RIOT! (Version: 2014.12-875-gdbd44)
kernel_init(): jumping into first task...
This is thread 3
sending 1st msg
now=0:116 -> every 2.0s: Hello World <-- this is a lie
sending 2nd msg
now=0:127 -> every 5.0s: This is a Test <-- this is a lie
This is thread 4
sec=1 min=0 hour=0
sec=2 min=0 hour=0
sec=3 min=0 hour=0
sec=4 min=0 hour=0
sec=5 min=0 hour=0
sec=6 min=0 hour=0
sec=7 min=0 hour=0
sec=8 min=0 hour=0
sec=9 min=0 hour=0
sec=10 min=0 hour=0

@janoskut
Copy link

janoskut commented Apr 1, 2015

Please note that the timer_thread in the main has changed, so that messages don't get re-sent. If you add
vtimer_set_msg(&tmsg->timer, tmsg->interval, thread_getpid(), MSG_TIMER, tmsg);
to the end of the while loop, the faulty behaviour will be restored. (Hang after 21 seconds on my machine, which is Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz and Ubuntu 14.04.2 LTS).

@kaspar030
Copy link
Contributor

I just checked native, it doesn't crash anymore. Also, as vtimer got kicked out, this is now using vtimer_compat and thus xtimer. So I'll close this as solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

No branches or pull requests

8 participants