-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System timer handling with low-frequency timers #9904
Comments
First, I would like to present some background information about this complex problem: Issue 1: Tick time calculation. In nRF5x series such approach leads to the following kernel settings (assuming CONFIG_SYS_CLOCK_TICKS_PER_SEC = 100):
Problem: CONFIG_SYS_CLOCK_TICKS_PER_SEC is no longer true. In fact we have 32768/32 = 100.2 ticks per second. All code using CONFIG_SYS_CLOCK_TICKS_PER_SEC for timing calculations directly or indirectly produces invalid output as the fractional part is not taken under account. However many tests still calculates timeouts basing on this value and hidden assumption that CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC divides by CONFIG_SYS_CLOCK_TICKS_PER_SEC without remainder. Of course, we could try to mitigate this problem, for example by setting CONFIG_SYS_CLOCK_TICKS_PER_SEC to 128, however this will not eliminate bigger issue described below. Issue 2: Tick <-> ms conversion
Issue 3: Tickless mode Summary: BTW: We can also "nicely" hide all of these issues just by rounding up |
|
This is a bit confusing to me. The system clock is the tick-based "soft" clock (as opposed to the 32-bit HW clock). "System clock cycles as timing unit": you mean 32-bit HW clock cycles as timing unit? |
Even in tickless mode, the system still is using "ticks" to manage time slices
Without eliminating whole idea of ticks, we will always end fighting with ms<->tick conversion issues. |
I agree with @carlescufi for the most part. There's an idea floating around locally here to try to make TICKLESS_KERNEL the default on all platforms that support it, which will go most of the way to what you want. Note that there's still a need for something like a "tick". There are two time units internal to the Zephyr kernel: the high speed one returned by k_cycle_get_32(), and a lower speed one that is needed for scheduling timeouts (because the cycle counter rolls over too fast) -- that's our "tick" right now, and it will still be around in any design that needs to use a 32 bit timeout counter, even if we don't call it a "tick". |
@andyross I assume you meant that you agree with @pizi-nordic and his comments here and here |
Heh, indeed. Misread the attribution in your quote :) |
@pizi-nordic maybe correction applied once per second (or per n ticks), that would make one of the ticks shorter (or longer)? Should be relatively cheap. |
In my opinion the sole place where we need "low speed" timer is scheduler slices (as we cannot swap threads every hw clock cycle). All other things (sleep times, timeouts etc) could be directly specified using full hardware timer resolution. Now we are converting everything into ticks, which gives us only coarse timing. |
@pdunaj: I thought about that, but this solution introduces jitter. |
@pizi-nordic , sure but why not accept the jitter? I don't think ticks should be used for precise time accounting (as you mentioned above sleep may take more then requested) and the biggest problem I see here is the time shift introduced in the long period. |
@pdunaj: Let's assume the following situation:
|
In my opinion making TICKLESS_KERNEL the default is only half-way solution. If we keep tick abstraction we will not gain precision, which is easy achievable in tickless kernel. From the user perspective lack of the precision is the real problem: The hardware timer could schedule event using its full resolution but tick abstraction enforces bigger time unit. If we add to that error introduced by ms to tick conversion, this issue becomes really painful. Of course such approach will be not easy. We will have to hold 64-bit timestamps, handle hardware timer wraps, etc. Fortunately the direction of the changes is already drawn in #8340 (please see this discussion: #8340 (comment)). If we follow the ideas presented there and create generic abstraction of 64-bit timer, then kernel part will be easy. In my opinion the effort is justified in this case, as we could gain several advantages:
|
The "tick" doesn't have to be a ~100 Hz number. It's reasonable to increase precision on a tickless system to whatever the underlying timer can support. So if you want a 1MHz tick rate, you're limited to a longest timeout of about an hour, etc... You don't have to go all the way to 64 bit to get a reasonable tradeoff, though specific numbers are likely going to have to be chosen per-app and not per-system. |
To be clear: I'm not opposed to the idea of a fully 64 bit timer subsystem. I just don't think the benefit is going to be that high vs. a well-chosen SYS_CLOCK_TICKS_PER_SEC value on an existing tickless kernel. |
Basically we would like to have SYS_CLOCK_TICKS_PER_SEC = SYS_CLOCK_HW_CYCLES_PER_SEC. I just thought about a more generic solution which enable all platforms to use full hw clock precision. |
That doesn't work in all systems, though. On a PC, the TSC runs in GHz and will overflow the 32 bit counter in seconds, requiring either a fully 64 bit system or some other kind of "intermediate representation" (i.e. represent timeouts in "TSC >> 10" units or something) for timeout scheduling, at which point we've simply reinvented the tick. |
I agree, that we have to represent timeouts longer than hw counter overflow time (this is why I proposed 64-bit). I also agree, that direct usage of TSC value for event scheduling is pointless (we cannot efficiently handle time-outs separated by few TSC cycles). In such case intermediate representation is unavoidable. However we can hide this representation in the timer driver, virtually reducing hardware timer frequency. |
Conclusions from off-line meeting between @nashif, @andyross, @cvinayak, @carlescufi and @pizi-nordic:
|
Now that all the timer/timeout/tickless/driver work is merged, we should go back and retest all of these. I think all of these bugs should be addressed now, but don't have access to all the hardware to check. |
@andyross I think we cannot expect every nRF-based application to implement this by itself. This has been raised by several users both upstream and downstream so it is my impression that there is justification to solve this problem in a way that is reusable across applications. By that I am not suggesting any change in particular, just for timer accuracy with low power, low frequency timers to be part of the common codebase so that every user can benefit from it. |
@andyross, @carlescufi, @nashif: As you see the whole discussion about timers is continuously repeating. Could we eventually fix these issues? Just short recap of the things that should be changes in my opinion (any comments welcome):
@kubiaj: Will these changes fix your issues? |
@kubiaj I agree with much of your first comment here, but from the second I don't believe Zephyr is internally perceived to be in its early stages. There's a core of Intel engineers who have been working on it for at least four years, and there's a lot of corporate investment and marketing that is inconsistent with accepting a description of it as immature. Which it is. I spent three months in late 2018 and early 2019 doing a full-time deep evaluation of Zephyr's suitability for applications like you describe on Nordic hardware. By the end I was unable to identify enough stable functionality to support a low-power multi-sensor environmental beacon application. The gaps were in timer/alarm/sleep/power features, GPIO/ADC/I2C, and sensor APIs. I didn't even get to the beacon part. My conclusion was that Zephyr is a too-big-to-fail initiative that is loved by the people who have invested in it, especially for applications that involve multi-core DSP engines and x86-64--class hardware, but perhaps not so much by people with other needs and perspectives. As @pizi-nordic notes these issues keep recurring and keep getting dismissed. I don't see a path to evolving Zephyr to be suitable for ultra-low-power sensor applications on Cortex-M--class hardware, but I wish everyone the very best of good luck in proving me wrong. |
Hi All, I am TDK (Kai) and will help my colleague @kubiaj to take over this discussion on behalf of our team. He will be out of the office in the upcoming days. @pizi-nordic So far I can give these comments
@pabigot The last comments posted by @kubiaj came from me. I think there was misunderstanding and apologies if it was offending. What I meant with early stage is related market adoption. I think Zephyr is still new for the community and it will need time to prove itself to be mature. We have been monitoring Zephyr since 2016 and later last year we have decided to switch to it. I think it has all the ingredients to become the "Linux" for RTOS and MCU world. The works which have been done is really amazing and it is great that finally there is a community project which will make a real difference and has real potential to become a standard. We have tested or used different OS/RTOS in the past (RTEMS, Contiki, FreeRTOS, eCos, RIOT, mynewt, ..) but I think none really have the potential as Zephyr. I think and I hope that it will become a standard in a couple of years. Our comments are not to undermine the works the Zephyr community have done. We really appreciate what you have done and want to contribute back. Not much as we are a small team, but I hope it would be helpful. I would not be skeptical about the ultra-low-power sensor applications on Cortex-M class hardware. We have been using eCos in our products. eCos kernel was a great RTOS but I think it did not have the investment and marketing force to make it successful. We had to make a couple of changes in its kernel to achieve ultra-low-power application. In fact we have created a kind of Tickless mode in eCos with a fix time base. So Zephyr is a few steps further already. But the way the sleep timer in Zephyr is done at the moment I think is not correct. We can work toward that and hopefully in a couple of months we might prove you were wrong ;). Let's see. |
Can someone point to an example of an OS that implements timing the way you want? I recognize that I've become sort of a whipping boy for issues of timing of all sorts, so maybe my input isn't really desired here. Some of the criticisms just don't make sense to me (in particular the ones about "tick abstractions" -- in the presence of pervasive tickless, ticks are just a scaling factor, you can set them to whatever rate you want!). All I can say at this point is just to repeat:
|
More broadly, I think I'm going to take myself out of the timer design discussions. The goals of the rewrite in 1.14 were to shrink the code size, simplify driver implementation, unify the kernel timeout handling, fix a bunch of existing bugs (especially on nRF), and make CONFIG_TICKLESS_KERNEL the pervasive default. And as I see it (and yeah, I may be the only one) that's been an outrageous success and I'm very pleased with the result. I'm absolutely on the hook for fixing bugs with that code. Issues about changing the underlying behavior aren't really in scope for that stuff, may require a third timing rewrite, and quite frankly may be beyond my capability. I genuinely don't see an easy path to getting everyone what they want given the breadth of the comments above. Timing is, y'know, hard. |
Is this really a bug? |
@carlescufi Does this remain to be a bug? |
Put another way: This is set to High priority, blocking the upcoming release of 1.14. What are your expectations for resolving this for 1.14, and of what has been discussed here, what falls under wishlist/feature request/enhancements as opposed to actual bugs? |
@andrewboie @rljordan this is an issue that has come up several times. The idea behind setting it as a high-priority bug was agreed with @nashif so that we would raise awareness about the fact that it impacts users of Nordic boards in general and to discuss a path forward. I am happy to talk this over tomorrow at the TSC and discuss what we can do about it in the short and mid term. |
Thanks for the feedback.
I am certainly not a expert in the area, so I will delegate to @pizi-nordic, @ioannisg and @leapslabs to discuss the fine technical details, but I would certainly disagree with the statement about your input. You are the author of a timer rewrite that you accurately describe as highly successful, and have the overview of the different requirements of the myriad of platforms supported by the kernel and its timer subsystem. Your input is certainly not only desired, but essential.
Does this means that, as long as timeslicing is not enabled, the precision errors would disappear if we were to set the ticks to 32768 in our platforms? |
@andyross @andrewboie @rljordan @carlescufi I think we should separate the discussion into 2 issues. One is related with using k_sleep and the rounding errors. I think this one is a bug. This has been observed on nrf52_pca10040 when k_sleep is used Regarding the second requirement of high resolution sleep I agree is too difficult to do at this stage. We can park it for now. Andy, perhaps you could please point me to some documentation explaining more in details the timer design so I could do first some PoC before we would have further discussion about this. I hope this might be more helpful for you guys since I understand you are very busy at the moment to complete other high priority stuffs. Thanks! |
yes, we know this is difficult to fix in 1.14 and will end up transferred to the next release, however, it should not be "hidden" as a future item until we agree on next steps. |
moving to a future milestone: Nordic will provide an RFC and a fix to all issues listed after 1.14. |
We plan on using a 32khz timer for the Microchip boards instead of SYSTICK, we have an interest in helping resolve these issues as well. |
@nashif can we scope this for 2.0? |
Needs to be pushed to 2.1 release - there is no point to keep it as a bug at this stage of the release cycle. |
This issue is not really applicable as a roll-up anymore. |
Nordic ICs use a low-power, 32KHz timer to driver the Zephyr kernel.
This results in several precision errors that have an effect on real-life application timer handling.
Sub-issues:
#12332 Undesirable behaviour of z_clock_announce() when ticks argument is greater by 1
#16195 k_uptime_delta(): Defective by design
Closed issues related:
#16238 k_cycle_get_32() API is useless in some Zephyr configurations
#12509: Fix rounding in _ms_to_ticks()
#12542 nrf timers unstable with ticks faster than 100 Hz
#9708: tests/kernel/tickless/tickless_concept fails on nRF5x with 1000msec boot delay
#8159: tests/kernel/fifo/fifo_timeout fails on nrf51_pca10028 and nrf52_pca10040
#9843: tests sched/deadline fails on nRF boards
The text was updated successfully, but these errors were encountered: