Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer rewrite phase 3: drivers #10556

Merged
merged 13 commits into from
Nov 13, 2018
Merged

Conversation

andyross
Copy link
Contributor

Just like the last (still active) PR #10303 this builds on top of the previous one and adds extra patches.

In this case, it takes advantage of the much simpler timer driver API to actually implement much simpler drivers for ARM SysTick and Intel HPET. Seriously, these are like 4x smaller now.

They support all the features the old ones did, and now work with tickless pervasively (and enable tickless by default, as there's essentially no downside anymore).

Please review. Also please test if you can, especially on SysTick hardware (I've only run tests on Qemu and the two boards I have on my desk).

I have similar rewrites in progress for Xtensa and RISC-V MTIME (which are very similar in structure to HPET) and will append those here as I finish them.

While this can be merged as a single unit, it's probably best to merge the older pull request first to smooth the transition. I'll rebase this one as soon as the older one goes in.

@codecov-io
Copy link

codecov-io commented Oct 12, 2018

Codecov Report

Merging #10556 into master will increase coverage by 2.69%.
The diff coverage is 81.48%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10556      +/-   ##
==========================================
+ Coverage   48.34%   51.04%   +2.69%     
==========================================
  Files         264      227      -37     
  Lines       42164    28601   -13563     
  Branches    10135     7145    -2990     
==========================================
- Hits        20386    14599    -5787     
+ Misses      17700    11212    -6488     
+ Partials     4078     2790    -1288
Impacted Files Coverage Δ
drivers/timer/sys_clock_init.c 0% <0%> (ø) ⬆️
kernel/sched.c 92.49% <100%> (+0.05%) ⬆️
drivers/timer/nrf_rtc_timer.c 85.96% <86.66%> (-3.17%) ⬇️
include/bluetooth/buf.h 0% <0%> (-100%) ⬇️
include/drivers/bluetooth/hci_driver.h 0% <0%> (-100%) ⬇️
include/bluetooth/hci.h 0% <0%> (-66.67%) ⬇️
boards/posix/nrf52_bsim/bstests_entry.c 15.21% <0%> (-43.48%) ⬇️
include/misc/byteorder.h 56.81% <0%> (-40.91%) ⬇️
subsys/bluetooth/host/hci_core.c 2.19% <0%> (-35.85%) ⬇️
subsys/bluetooth/host/uuid.c 0% <0%> (-34.15%) ⬇️
... and 57 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2020290...ee8fe57. Read the comment docs.

@andyross andyross force-pushed the timer-drivers branch 3 times, most recently from ba7ce85 to 319033f Compare October 16, 2018 17:39
@andyross
Copy link
Contributor Author

Update with some fixes to HPET...

...and a reworked nrf_rtc_timer driver that really wants review from Nordic folks. It works for me on a nrf52_pca10056 at least, but needs validation on broader hardware. The size should make review easy at least.

@@ -1,513 +1,68 @@
/*
* Copyright (c) 2016-2017 Nordic Semiconductor ASA
* Copyright (c) 2018 Intel Corporation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pizi-nordic please review

@andyross
Copy link
Contributor Author

Rebased. Fixed an issue on nrf where the driver and kernel don't care about the symbol name of the ISR, but one test does.

Copy link
Collaborator

@pizi-nordic pizi-nordic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still afraid of using now() time in z_clock_set_timeout().
Even with the lock there is small probability of slipping over clock cycle during each clock programming operation. Could we use last announcement as reference here? In such case we will have to read the now() only when new timeout will be added to timeout queue, and all further operations will be error-free.

{
u32_t rtc_away, sys_away = 0;
#define COUNTER_MAX 0x00ffffff
#define CYC_PER_TICK (CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use sys_clock_hw_cycles_per_tick() instead?
IMHO we should have single function/macro for core calculations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to deprecate that as an API, actually. It's dangerous due to precision loss and has a weirdness (it has to look like a function call) just to deal with HPET (which has to query the frequency at runtime). The only code that uses it is in the drivers, and I'd prefer they do the math themselves.

sys_clock_hw_cycles_per_tick());
nrf_rtc_event_enable(SYS_CLOCK_RTC, RTC_EVTENSET_COMPARE0_Msk);
nrf_rtc_int_enable(SYS_CLOCK_RTC, RTC_INTENSET_COMPARE0_Msk);
RTC->PRESCALER = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use nrf_rtc_prescaler_set() here.


/* Clear the event flag and possible pending interrupt */
RTC_CC_EVENT = 0;
RTC->EVENTS_COMPARE[0] = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use nrf_rtc_event_clear() here.

{
unsigned int key;
#ifdef CONFIG_TICKLESS_KERNEL
ticks = ticks == K_FOREVER ? MAX_TICKS : ticks;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest parentheses around ticks == K_FOREVER.

unsigned int key;
#ifdef CONFIG_TICKLESS_KERNEL
ticks = ticks == K_FOREVER ? MAX_TICKS : ticks;
ticks = max(min(ticks - 1, (s32_t)MAX_TICKS), 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ticks - 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because our API is werid. Historically we've passed a "1" to set_timeout() to indicate "at the next tick" (and then had to play games with a _TICK_ALIGN symbol needlessly). But in the code here we want to compute a delta from current time rounded up to the next tick boundary, so we start from ticks-1.


irq_disable(NRF5_IRQ_RTC1_IRQn);
/* Round up to next tick boundary */
cyc = ticks * CYC_PER_TICK + counter_sub(t, last_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about potential overflow in these calculations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticks gets clamped to [0,MAX_TICKS] in the code above this spot, which is calibrated to prevent that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: that's about preventing overflow beyond 24 bits of cycles in a single timeout.

There's no need to prevent overflow in the calculation of the next timeout value itself, the cycle counter is cyclic and just rolls over.


set_comparator(cyc);
Copy link
Collaborator

@pizi-nordic pizi-nordic Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if while tick to cycle calculation will work correctly if we set tick equal to hw clock cycle. It looks that announce will be always scheduled 1 clock cycle earlier than expected in such case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the point. The cyc value here is the time of the last expiration (last_count) plus some integer number of CYC_PER_TICK increments. Unless the timer skips a cycle when the comparator fires, the ticks should be perfectly in-phase and evenly scheduled (modulo interrupt delivery slop, of course).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I see what you're saying: in tickless, you HAVE to use the current time, because counting interrupts isn't sufficient. But the math is guaranteed not to underrun.

/* Calculate time to program into RTC */
rtc_away = (RTC_MASK - RTC_COUNTER);
rtc_away = rtc_away > RTC_HALF ? RTC_HALF : rtc_away;
#define MIN_DELAY 32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this minimal delay?
In current implementation, this basically reduces timer resolution, which is a thing we would like to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to have a minimum value greater than zero here, otherwise it's possible that the computation of the "next" value based on the "current" timer counter will compute a value that is in the "past" by the timer it arrives in the comparator register.

I picked 1/10th of a tick to be conservative, but given the frequencies here something much lower is probably fine. You guys are the experts :)

@andyross andyross force-pushed the timer-drivers branch 3 times, most recently from e05ed1c to 5ae7497 Compare October 18, 2018 18:20
@andyross
Copy link
Contributor Author

andyross commented Nov 9, 2018

APIC timer refactoring is coming as part of x86_64, FWIW. In fact the need for that is sort of the initial germ that got me started on this whole timer saga.

Andy Ross added 13 commits November 13, 2018 08:35
Newer, and much smaller driver written to the new timer API.  Supports
all the features the old one did (including shutting off the clock
when clock_always_on is disabled), should be faster in practice, and
should be significantly more accurate due to the "lost cycle" trick
applied in z_clock_set_timeout().

Signed-off-by: Andy Ross <[email protected]>
The SysTick driver is working with tickless now.  Make this our first
guinea pig.

Signed-off-by: Andy Ross <[email protected]>
Qemu doesn't like tickless.  By default[1] it tries to be realtime as
vied by the host CPU -- presenting read values from hardware cycle
counters and interrupt timings at the appropriate real world clock
times according to whatever the simulated counter frequency is.  But
when the host system is loaded, there is always the problem that the
qemu process might not see physical CPU time for large chunks of time
(i.e. a host OS scheduling quantum -- generally about the same size as
guest ticks!) leading to lost cycles.

When those timer interrupts are delivered by the emulated hardware at
fixed frequencies without software intervention, that's not so bad:
the work the guest has to do after the interrupt generally happens
synchronously (because the qemu process has just started running) and
nothing notices the dropout.

But with tickless, the interrupts need to be explicitly programmed by
guest software!  That means the driver needs to be sure it's going to
get some real CPU time within some small fraction of a Zephyr tick of
the right time, otherwise the computations get wonky.

The end result is that qemu tends to work with tickless well on an
unloaded/idle run, but not in situations (like sanitycheck) where it
needs to content with other processes for host CPU.

So, add a flag that drivers can use to "fake" tickless behavior when
run under qemu (only), and enable it (only!) for the small handful of
tests that are having trouble.

[1] There is an -icount feature to implement proper cycle counting at
the expense of real-world-time correspondence.  Maybe someday we might
get it to work for us.

Signed-off-by: Andy Ross <[email protected]>
These tests was written to try to eliminate timer interrupts, but the
mechanism chosen (setting TICKS_PER_SECOND to 1) is dangerously
susceptible to overflow conditions on systems with fast cycle counters
and high timeout durations.

Actually the tests pass just fine if you use a conventional tick rate
and use a tickless-capable driver (which eliminates interrupts too,
which is the whole point), but there's no easy way in kconfig to do an
"if" to select that condition for capable systems only.  Just disable
tickless to keep the same behavior.

Signed-off-by: Andy Ross <[email protected]>
When TICKLESS_KERNEL is enabled, the current time in ticks is based on
a hardware counter and not interrupt delivery (which is the whole
point of tickless), so irq-locking does not prevent time from
advancing.  Disable this test in that configuration.

Signed-off-by: Andy Ross <[email protected]>
This test was written with an outrageously long timeout of 25 seconds.
That blows right through the 32 bit cycle counter on qemu_cortex_m3[1]
and produces an essentially random delay instead of the desired
number, causing a hang with the new SysTick driver in tickless mode.

Push the number down so it doesn't overflow.  The root cause, though,
is that k_busy_wait() can take arguments it can't handle.  It ought to
have an outer loop or something so that it can spin for INT_MAX
milliseconds correctly.

[1] Which has a 12MHz clock rate.  Many hardware implementations are
much faster still.

Signed-off-by: Andy Ross <[email protected]>
Add a TICKLESS_CAPABLE kconfig variable which is used by the kernel to
select tickless mode's default automatically on drivers that support
it (rather than having to set the default per-board).  Select it from
the ARM SysTick and Intel HPET drivers.

Also remove the old qemu_cortex_m3 default settings which this
replaces.

Signed-off-by: Andy Ross <[email protected]>
Many drivers won't need to implement z_clock_idle_exit() or
sys_clock_disable(). Make those weak stubs too.

Signed-off-by: Andy Ross <[email protected]>
Rewritten along the lines of ARM SysTick.  Implements only the new,
simplified API.  MUCH smaller.  Works with tickless pervasively.  No
loss of functionality.

Signed-off-by: Andy Ross <[email protected]>
If this function is itself interrupted by a timeslice event, the
slicing state can be corrupted.  Just re-use the scheduler lock
instead of using a new spinlock; this is a low-latency function that
won't deadlock.  Found by inspection.

Signed-off-by: Andy Ross <[email protected]>
Reworked using the older hardware interface code, but with an
implementation of the new API only.  Much smaller & simpler.

As yet, tested (manually) only on a nrf52_pca10056 board.

Signed-off-by: Andy Ross <[email protected]>
Rewritten driver along the lines of all the other new drivers,
implementing the new timer API.  Structurally, the machine timer is an
up-counter with comparator, so it works broadly the same way HPET and
NRF do.  The quirk here is that it's a 64 bit counter, which needs a
little more care.

Unlike the other timer reworks, this driver has grown by a few lines
as it used to be very simple.  But in exchange, we get full tickless
support on the platform.

Fixes zephyrproject-rtos#10609 in the process (the 64 bit timer registers are unlatched
for sub-word transfers, so you have to use careful ordering).

Signed-off-by: Andy Ross <[email protected]>
Rewritten Xtensa CCOUNT driver along the lines of all the other new
drivers.  The new API permits much smaller code.

Notably: The Xtensa counter is a 32 bit up-counter with a comparator
register.  It's in some sense the archetype of this kind of timer as
it's the simplest of the bunch (everything else has quirks: NRF is
very slow and 24 bit, HPET has a runtime frequency detection, RISC-V
is 64 bit...).  I should have written this one first.

Note also that this includes a blacklist of the xtensa architecture on
the tests/driver/ipm test.  I'm getting spurious failures there where
a k_sem_take() call with a non-zero timeout is being made out of the
console output code in interrupt context.  This seems to have nothing
to do with the timer; I suspect it's because the old timer drivers
would (incorrectly!) call z_clock_announce() in non-interrupt context
in some contexts (e.g. "expiring really soon").  Apparently this test
(or something in the IPM or Xtensa console code) was somehow relying
on that on Xtensa.  But IPM is a Quark thing and there's no particular
reason to run this test there.

Signed-off-by: Andy Ross <[email protected]>
@andyross
Copy link
Contributor Author

Rebased and pushed a final version. All code is present, all tests are passing. All needed workaround (e.g. for nrf54_bsim's spin loops) are already in the tree. I think all criticism is addressed. Needs final review.

Copy link
Member

@carlescufi carlescufi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving in lieu of @pizi-nordic who is away this week

Copy link
Contributor

@nategraff-sifive nategraff-sifive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RISC-V LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants