Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

topic-counter: missing support for late-to-set alarm configurations #12626

Open
pabigot opened this issue Jan 21, 2019 · 12 comments
Open

topic-counter: missing support for late-to-set alarm configurations #12626

pabigot opened this issue Jan 21, 2019 · 12 comments
Assignees
Labels
area: API Changes to public APIs area: Counter RFC Request For Comments: want input from the community

Comments

@pabigot
Copy link
Collaborator

pabigot commented Jan 21, 2019

If there's any hope of using the counter API to implement a system timer (#2921, #12068, others), let alone correctly handling thresholds in an asynchronous event counter, there has to be some way of attempting to set an alarm and being informed that it's already been missed.

For example, I've got a counter with range 0..999 and I want to wake at value 100, but the counter is at 101 (or 100) at the point where the compare value is set. I need to know that, so I don't lose 1000 events waiting for the compare to fire after the counter wraps.

A natural way to test for this is to reject the set attempt if the signed difference between the current counter value and the requested absolute compare value is zero or negative. This would have to be done within the driver synchronous with setting the compare value.

Naturally, this feature must be optional. One way to support this is to extend counter_alarm_cfg to have a flags field that records information such as whether the count value is absolute or relative, and whether late-to-check rejection is required.

Also for systems like Nordic RTC where there's a minimum offset necessary to guarantee the compare event fires there should be a flag saying what to do if the compare can't be guaranteed to fire if set for the requested value.

@pabigot pabigot added area: Counter RFC Request For Comments: want input from the community area: API Changes to public APIs labels Jan 21, 2019
@nordic-krch
Copy link
Contributor

nordic-krch commented Jan 22, 2019

I agree with that.

counter_alarm_cfg already has absolute flag so it's possible to set relative values.

But what should we do if late setting occurs? Return error (e.g. -ETIMEOUT) or just trigger user callback from within the counter_set_ch_alarm call?
@MaureenHelm @pizi-nordic @anangl any opinions? Do we see any use case where knowledge about too late setting would be used?

@pizi-nordic
Copy link
Collaborator

I think that this should be handled in upper layer. The counter driver might return something to indicate if current count is less or more than the requested one, but IMHO the driver does not have knowledge to decide if such situation is an error (what if I requested delay close to counter wrap around time?).

@pabigot
Copy link
Collaborator Author

pabigot commented Jan 22, 2019

@pizi-nordic If the driver was told that late-to-check is an error, it's an error, otherwise it isn't. Only the driver can detect this situation, and the upper layer cannot be expected to be able to recover from the situation in all cases. Whether the specific compare value is close to wrap doesn't matter; compare values are absolute at the point the driver sets them (unless that's a variation point among vendor counter implementations, which would create a whole new mess).

As for a use case, I gave one: I set a compare alarm for the 100th asynchronous event too late (maybe because I was interrupted after my check). When I'm told it is too late, I'll set a new callback for 200 which is my next checkpoint. If I'm not told, I'll miss all the events until the counter wraps around. Unacceptable.

@pizi-nordic
Copy link
Collaborator

@pabigot: The Counter API is just HW abstraction. It does not know your use-case.

Example: I have a counter which wraps every second when in reach 65536. Current value is 32768. I am requesting event at ABSOLUTE 32760. Question: My request was delayed and should fire now or I want an event a shortly before one second from now? Only upper layer knows that.

Of course, if you schedule relative event (and in case of counter API reference point is current counter value), situation changes. The following situation should be handled inside the driver:

set_relative(int cnt) {
    alarm = current() + cnt;
    if (current() >= alarm) {
        /* This is error. We do not know if we missed alarm point. */
    }
}

I think that the problem you describe, should be solved in "System Timer <=> Counter Interface" translation layer proposed in #12068. This layer will have to use absolute alarms (in order to avoid errors introduced by the fact that code execution takes time) and can run into the problem you pointed. And there is no doubt that we will have to solve it.

@pabigot
Copy link
Collaborator Author

pabigot commented Jan 22, 2019

One way to support this is to extend counter_alarm_cfg to have a flags field that records information such as whether the count value is absolute or relative, and whether late-to-check is supported.

By "supported" I meant "required"; I've updated the original. So the driver does know my expectations.

I'm unable to see how you're going to make that translation layer work without assistance from the counter that amounts to what I'm proposing here.

(edit): and, the use case I provided does not involve a timer, so your approach wouldn't work.

@pizi-nordic
Copy link
Collaborator

I thought about something like this (the example does not take under account wrapping):

void set_alarm(int target)
{
   cnt = counter_read()
   if (cnt >= target) {
       /* Missed, handle error */
   }

   struct counter_alarm_cfg cfg = {
       .ticks = target,
       .absolute = true,
   };

   counter_set_ch_alarm(cfg);

   cnt = counter_read()
   if (cnt >= target) {
       /* Missed, handle error */
   }
}

I do not see other method to deal with the potential miss than the final (the one after setting up alarm) counter readout (if we assume that we can be interrupted or counted events are purely asynchronous).

On the other hand, you are right that the details about HW have to be handled in the counter API. And the limitations like MIN_DELAY have to be taken under account in API design.

@pabigot
Copy link
Collaborator Author

pabigot commented Jan 22, 2019

That doesn't work (as you may have meant, I didn't understand the "do not see other method" comment). I've annotated just two points where things break:

void set_alarm(int target)
{
   cnt = counter_read()
   // counter increments to target here: late not detected.
   if (cnt >= target) {
       /* Missed, handle error */
   }

   struct counter_alarm_cfg cfg = {
       .ticks = target,
       .absolute = true,
   };

   counter_set_ch_alarm(cfg);
   // count increments to target here: on-time misdiagnosed as late
   cnt = counter_read()
   if (cnt >= target) {
       /* Missed, handle error */
   }
}

I really think it needs to be atomic within the counter, allowing application code to do something like the following (assuming -ESTALE indicates the late set; other approaches might involve adding an output channel state parameter to the set function; see #12625 (comment)).

void reconfigure_alarm(struct counter_alarm_cfg *cp)
{
  int rc = 0;
  
  cp->error_on_late = true;
  do {
    if (rc != 0) {
      manual_callback_invoke();
    }
    cp->ticks = next_deadline();
    rc = counter_set_ch_alrm(ctr, cp);
  } while (-ESTALE == rc);
}

@pizi-nordic
Copy link
Collaborator

pizi-nordic commented Jan 22, 2019

But how you are going to detect the -ESTALE situation in the counter_set_ch_alarm()?
I think that there will be always race here (as we cannot stop the counter) and we will have to handle that race both when we are setting an alarm and when alarm fires (as you pointed, we might have an situation that handler fires and but we think that we missed the alarm).

For me the counter_read(), counter_set_ch_alarm(cfg) are a just wrappers over the actual register access. So if the algorithm does not use any special HW capabilities it could be implemented outside the counter module.

@nordic-krch
Copy link
Contributor

nordic-krch commented Jan 22, 2019

In nrf5 sdk we have RTC driver which supports detection of late setting, takes into account wrapping and latching of RTC interfaces. I've added gist: https://gist.github.com/nordic-krch/8b2543f9b0cb714b42964773754ef6b7

This algorithm was stress tested and seems to handle well but as you can see it is not simple. It takes into account 2 tick limitation from RTC HW and maximal latency (for how long given code can be interrupted). Based on that it can detect if we set too late. The limitation is that value to set cannot be bigger than 2^24-safe_window.

@pizi-nordic
Copy link
Collaborator

pizi-nordic commented Jan 22, 2019

Hmm... It looks that this needs a detailed knowledge about the hardware. This mean, that my idea cannot be used and we have to implement it in counter driver as @pabigot suggested. What about changing absolute to flags in counter_alarm_cfg?

@pabigot
Copy link
Collaborator Author

pabigot commented Jan 22, 2019

Yes, switching to flags is what I proposed in the original comment above.

That SDK algorithm is complicated and probably not unlike what I have in my C++-17 infrastructure for alarm management (I haven't looked closely). Sometimes you can make use of this idiom:

unsigned int ctr[2];
do {
  ctr[0] = RTC->COUNTER;
  // stuff here
  ctr[1] = RTC->COUNTER;
} while (ctr[0] != ctr[1]);

but it depends on whether stuff here has side effects that aren't idempotent. That's all well below the level of this issue, though, which is really about whether the feature is necessary (I think so). If so then we can determine whether/how it can be implemented, and only if it absolutely can't for some platform would we have to figure out what to do.

@pabigot
Copy link
Collaborator Author

pabigot commented May 15, 2019

See #15510 (comment) for a use case that requires an API change to fully support late-to-set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Counter RFC Request For Comments: want input from the community
Projects
Status: No status
Development

No branches or pull requests

3 participants