Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpu/cc2538: timer broken? #6419

Closed
PeterKietzmann opened this issue Jan 19, 2017 · 25 comments · Fixed by #6692
Closed

cpu/cc2538: timer broken? #6419

PeterKietzmann opened this issue Jan 19, 2017 · 25 comments · Fixed by #6692
Assignees
Labels
Area: timers Area: timer subsystems Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)

Comments

@PeterKietzmann
Copy link
Member

On remote boards it appears that xtimer_usleep hangs. Didn't test on other cc2538 boards. Even tests/periph_timer acts weird as it can't initialise all timers:

2017-01-19 10:19:11,538 - INFO # Available timers: 4
2017-01-19 10:19:11,538 - INFO # 
2017-01-19 10:19:11,539 - INFO # Testing TIMER_0:
2017-01-19 10:19:11,540 - INFO # TIMER_0: initialization successful
2017-01-19 10:19:11,540 - INFO # TIMER_0: stopped
2017-01-19 10:19:11,541 - INFO # TIMER_0: set channel 0 to 5000
2017-01-19 10:19:11,542 - INFO # TIMER_0: set channel 1 to 10000
2017-01-19 10:19:11,542 - INFO # TIMER_0: starting
2017-01-19 10:19:11,621 - INFO # TIMER_0: channel 0 fired at SW count    12309 - init:    12309
2017-01-19 10:19:11,623 - INFO # TIMER_0: channel 1 fired at SW count    76926 - diff:    64617
2017-01-19 10:19:11,624 - INFO # 
2017-01-19 10:19:11,624 - INFO # Testing TIMER_1:
2017-01-19 10:19:11,625 - INFO # TIMER_1: ERROR on initialization - skipping
2017-01-19 10:19:11,626 - INFO # 
2017-01-19 10:19:11,626 - INFO # 
2017-01-19 10:19:11,627 - INFO # Testing TIMER_2:
2017-01-19 10:19:11,628 - INFO # TIMER_2: ERROR on initialization - skipping
2017-01-19 10:19:11,628 - INFO # 
2017-01-19 10:19:11,628 - INFO # 
2017-01-19 10:19:11,629 - INFO # Testing TIMER_3:
2017-01-19 10:19:11,629 - INFO # TIMER_3: ERROR on initialization - skipping
2017-01-19 10:19:11,630 - INFO # 
2017-01-19 10:19:11,630 - INFO # 
2017-01-19 10:19:11,630 - INFO # TEST FAILED
@PeterKietzmann PeterKietzmann added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: timers Area: timer subsystems labels Jan 19, 2017
@PeterKietzmann PeterKietzmann added this to the Release 2017.01 milestone Jan 19, 2017
@kaspar030
Copy link
Contributor

Please look into the timer_init() function to see in what cases it might fail.

@OlegHahm
Copy link
Member

Related to #6392 (comment) ?

@kYc0o
Copy link
Contributor

kYc0o commented Jan 19, 2017

Yes I noticed that yesterday and it's about the initialisation first, then the conversions used by xtimer...

@PeterKietzmann
Copy link
Member Author

@kYc0o you knowthe fix out of the box? Otherwise I will (must) go into details soon

@PeterKietzmann
Copy link
Member Author

Concerning the periph_timer issue we land here which is not a bug cause the test tries to initialize with 1MHz which is not the sysclk.

@kYc0o
Copy link
Contributor

kYc0o commented Jan 19, 2017

I was working on the fix yesterday but didn't finish. I'm still trying today but I'm busy with waspmote's timers... if you're already there I'll let you solve the bug I'd you find out what it is.

@kYc0o
Copy link
Contributor

kYc0o commented Jan 19, 2017

The timer should be initialised at 16MHz according to XTIMER_HZ.

@alignan
Copy link
Contributor

alignan commented Jan 19, 2017

I'm changing job/country now so probably I won't have time to look this anytime soon, but @PeterKietzmann is right-on spot as the expected frequency is different

@alignan alignan removed their assignment Jan 19, 2017
@PeterKietzmann
Copy link
Member Author

I have a strong assumption that this configuration is incorrect. git bisect pointed to this commit:
63e91151cd0b3dfde28bd7cc9b3ef23b140f3730 which is at least related to xtimer. Printing this line during xtimer initialization and checking the prescaler value shows the following:

  • old (ok)
2017-01-19 14:11:43,436 - INFO # timer_init(0, 1000000, 0x200641, 0)
2017-01-19 14:11:43,438 - INFO # PRESCALER 15
  • new (defect)
2017-01-19 14:10:17,644 - INFO # timer_init(0, 16000000, 0x2005e5, 0)
2017-01-19 14:10:17,645 - INFO # PRESCALER 0

@gebart, @kaspar030 I'm pretty inexperienced and uninformed about the xtimer in general, therefore I tried to give best insights about my initial debugging efforts. Can you please help?

@PeterKietzmann
Copy link
Member Author

PS: Applying proposal of #6427 does not solve the issue

@PeterKietzmann
Copy link
Member Author

@kYc0o IMO this issue has high priority. Recently it seemed you're into this. Any chance for a fix from you side?

@kYc0o
Copy link
Contributor

kYc0o commented Jan 25, 2017

I'm investigating this and the waspmote, I'll try to provide a fix this week

@PeterKietzmann
Copy link
Member Author

I still didn't fully understand where you see the relation between these two, but I trust you :-)! Looking forward for that!

@kYc0o
Copy link
Contributor

kYc0o commented Jan 25, 2017

It's just that both are getting bad initialisation prescalers... but the fixes will be different indeed.

@kYc0o
Copy link
Contributor

kYc0o commented Jan 30, 2017

Well after a deep investigation the culprit is #5608 , specifically 63e9115 .

I'm looking deeper so if someone can helps it would be faster maybe.

The insights are:

  • the remote boards (in remote-common) initialise the timer at a very high frequency of 16MHz, for a 16bit timer, which could overload the xtimer layer. Changing the frequency doesn't help.
  • The tick conversions shift a lot while using the default 16MHz (4 shifts are needed) so this can also maybe break something.
  • The xtimer works good at this point 199140e .

As for the periph_timer test, it needs to be reworked since it doesn't allow to dynamically set the timer initialisation frequency, which is different for the timers on cc2538 cpus.

@kaspar030
Copy link
Contributor

kaspar030 commented Jan 30, 2017 via email

@kYc0o
Copy link
Contributor

kYc0o commented Jan 30, 2017

Like, setting the frequency to 1MHz or even 250KHz via prescaler doesn't
help?

Yes, I tried several frequencies and the timer gets stuck, it's like if the overflow interrupt is not working.

@A-Paul
Copy link
Member

A-Paul commented Feb 16, 2017

I'm also trying to investigate this topic. I made a remarkable discovery that tests/xtimer_usleep is running to the end, when I connected with the jlink adapter via JTAG.
It is still running through after warm resets. After a cold reboot it gets stuck at the first xtimer_usleep call, as usual.

@drcref
Copy link

drcref commented Feb 19, 2017

I've also had issues with my applications getting stuck at xtimer_usleep calls. I'm developing on the Zolertia re-mote board (cc2538 cpu).

I've determined that when xtimer_usleep is called from the main thread or from auto_init routines before main is entered, it will sleep forever. However, it works as expected when called from within other threads that I create. My application involves reading from a sensor with long measurement time, and the driver I wrote use xtimer_usleep to wait for the sensor after initiating a measurement.

My current workaround is to start a new thread and only call the sensor's read routine from within that thread. Has anyone found anything about what's causing this?

@kYc0o
Copy link
Contributor

kYc0o commented Feb 28, 2017

@drcef Can you test your code without the workaround with the previous RIOT release? There are several modifications done for the current release that could have broken xtimer, but I cannot determine exactly which one.

@smlng
Copy link
Member

smlng commented Mar 1, 2017

I ran several tests but no solution to this so far. First, I thought that the 16Bit (vs. default 32Bit) timer might be a problem, but then I tested with an atmega328p Arduino (16Bit timer, too) and everything works. I created a gist with the outputs for boards remote-revb and arduino-duemilanove for tests/xtimer_hang here. Btw. PR #5608 introduces (part of) the issue ...

@kYc0o
Copy link
Contributor

kYc0o commented Mar 1, 2017

@smlng I spend quite longtime analysing on which way #5608 would introduce this problem, however I couldn't see anything harming the behaviour specifically for this platform. I also tried 16 and 32 bits timers with xtimer, and I have always the same behaviour.

My findings so far lead me to suspect that for some reason the interrupt is not being triggered, causing the timer being stuck somehow, then xtimer cannot catch the interrupt with its callback function.

However, looking at your gist, it seems that for your example the callback is actually triggered, so I can't see where it gets stuck.

Can you explain a bit more how did you test it?

@smlng
Copy link
Member

smlng commented Mar 1, 2017

@kYc0o I tested with tests/xtimer_hang and enabled debug in xtimer_core.c to get the output you see in the gists. I played around with different settings for the hardware timer: frequency 1MHz, 16MHz, even 250kHz of the arduino, I also set it to 32Bit mode, but nothing helped so far. Its confusing ... btw. tests/periph_timer works, at least with correct settings - so the hardware timer and driver are okay. Somehow the integration with xtimer is messed up.

@smlng
Copy link
Member

smlng commented Mar 1, 2017

I tested with tests/xtimer_periodic_wakeup which reports to large differences, for any viable setting for timer in cc2538, i.e. I tested with 16Bit timer using 1MHz and 16MHz, and also with 32Bit timer and 16 MHz in all cases the diff was around 2000

2017-03-01 16:36:56,296 - INFO #    7 diff=2083
2017-03-01 16:36:56,297 - INFO #    6 diff=2083
2017-03-01 16:36:56,297 - INFO #    5 diff=2083
2017-03-01 16:36:56,298 - INFO #    4 diff=2085
2017-03-01 16:36:56,298 - INFO #    3 diff=2087
2017-03-01 16:36:56,299 - INFO #    2 diff=2087
2017-03-01 16:36:56,299 - INFO #    1 diff=2087
2017-03-01 16:36:56,299 - INFO # 
2017-03-01 16:36:56,300 - INFO # min/avg/max error: 1833/1960/2087
2017-03-01 16:36:56,301 - INFO # too large difference.
2017-03-01 16:36:56,301 - INFO # 
2017-03-01 16:36:56,301 - INFO # Test Failed.
2017-03-01 16:36:56,301 - INFO # 

Basically the xtimer_periodic_wakeup function returns immediately cause target time already based passed. This seems to be the root cause, when trying to set a timer on the cc2538 in most cases the target time has already passed when trying to set it and return.

in the output from tests/xtimer_hang you [can see] that when it calls timer_set_absolute the target time is already long passed by, hence the callback function will never be called:

2017-03-01 16:07:19,405 - INFO # timer_init(1, 16000000, 0x2005f1, 0)
2017-03-01 16:07:19,407 - INFO # _irq_enable(1)
2017-03-01 16:07:19,407 - INFO # lltimer_set: set 4294967295
2017-03-01 16:07:19,408 - INFO # timer_set_absolute(1, 0, 4294967295)
2017-03-01 16:07:19,409 - INFO # channel 0
2017-03-01 16:07:19,412 - INFO # main(): This is RIOT! (Version: 2017.04-devel-302-gd9062-MacBook-Air.lan-cc2538/xtimer_regression)
2017-03-01 16:07:19,412 - INFO # _xtimer_tsleep
2017-03-01 16:07:19,413 - INFO # xt_set64() off=16000 loff=0
2017-03-01 16:07:19,414 - INFO # xt_set: off=16000 now=322444 (322474)
2017-03-01 16:07:19,415 - INFO # xt_set_absolute: now=375341 tgt=391293
2017-03-01 16:07:19,417 - INFO # xt_set_absolute: timer expires in this period.
2017-03-01 16:07:19,418 - INFO # _add_timer_to_list
2017-03-01 16:07:19,418 - INFO #  tgt=391293
2017-03-01 16:07:19,420 - INFO # xt_set_absolute: timer is new list head. update lltimer (now=537880).
2017-03-01 16:07:19,420 - INFO # lltimer_set: set 391273
2017-03-01 16:07:19,421 - INFO # timer_set_absolute(1, 0, 391273)
2017-03-01 16:07:19,422 - INFO # channel 0
2017-03-01 16:07:19,423 - INFO # xt_set_absolute: now=728337.
2017-03-01 16:07:19,425 - INFO # _xtimer_tsleep
2017-03-01 16:07:19,426 - INFO # xt_set64() off=17600 loff=0
2017-03-01 16:07:19,427 - INFO # xt_set: off=17600 now=828408 (828438)
2017-03-01 16:07:19,429 - INFO # xt_set_absolute: now=881303 tgt=898855
2017-03-01 16:07:19,431 - INFO # xt_set_absolute: timer expires in this period.
2017-03-01 16:07:19,431 - INFO # _add_timer_to_list
2017-03-01 16:07:19,432 - INFO #  tgt=391293
2017-03-01 16:07:19,432 - INFO #  tgt=898855
2017-03-01 16:07:19,433 - INFO # [START]
2017-03-01 16:07:19,433 - INFO # _xtimer_tsleep
2017-03-01 16:07:19,434 - INFO # xt_set64() off=1600000 loff=0
2017-03-01 16:07:19,435 - INFO # xt_set: off=1600000 now=1134207 (1134237)
2017-03-01 16:07:19,436 - INFO # xt_set_absolute: now=1192663 tgt=2792615
2017-03-01 16:07:19,437 - INFO # xt_set_absolute: timer expires in this period.
2017-03-01 16:07:19,437 - INFO # _add_timer_to_list
2017-03-01 16:07:19,438 - INFO #  tgt=391293
2017-03-01 16:07:19,438 - INFO #  tgt=898855
2017-03-01 16:07:19,439 - INFO #  tgt=2792615

@smlng
Copy link
Member

smlng commented Mar 3, 2017

@PeterKietzmann #6685 fixes the weird output in tests/periph_timer and #6692 should fix the xtimer issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: timers Area: timer subsystems Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants