-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sys/xtimer: fix xtimer_mutex_lock_timeout corner cases #6441
Conversation
Can you provide a test application testing for the cases you drew out in #6428? |
Will do. Please, note that #6428 is still required. |
Understood it as such ;-) |
Rebase is required. |
sys/include/xtimer.h
Outdated
* | ||
* @return 0, when returned after mutex was locked | ||
* @return -1, when the timeout occcured | ||
* @return -1, mutex can't be locked right now (timeout <= XTIMER_BACKOFF) | ||
* @return -2, when the timeout occcured |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you introduce an enum for the error values? This would make it easier to change this later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative use errno.h
. -EINVAL
and -ETIMEDOUT
seem fitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
sys/xtimer/xtimer.c
Outdated
/* timeout lower than XTIMER_BACKOFF might cause the code to spin rather | ||
* than set it up for interrupt */ | ||
if (timeout <= XTIMER_BACKOFF) { | ||
return (mutex_trylock(mutex) - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's say A has the mutex and B calls xtimer_mutex_lock_timeout()
with timeout <= XTIMER_BACKOFF
. Now A receives an interrupt that causes it to release the mutex. B could do its spin now and lock the mutex, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've a sketch of that locally, but still not fully happy with it... Will update it asap
92f5932
to
2d8a05b
Compare
I still have to provide some tests for it but [I hope] all corner cases are addressed in this implementation. I removed the two error return values (-1 and -2) and now returns only 0 (success) and @miri64 using sizediffs: http://pastebin.com/bn6et69K |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment on premature optimization.
sys/xtimer/xtimer.c
Outdated
t.arg = (void *)((mutex_thread_t *)&mt); | ||
_xtimer_set64(&t, timeout, timeout >> 32); | ||
if (locked || (timeout == 0)) { | ||
return (locked - 1) * ETIMEDOUT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unnecessary optimization in my eyes. Make it a simple if
branch and return -ETIMEDOUT
if it is locked, otherwise 0.
The current code hurts readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's ugly... however with an if branch it generates larger code for most boards:
if (locked) {
return 0;
}
else if (timeout == 0) {
return -ETIMEDOUT;
}
here is the size diff: http://pastebin.com/VVPjZQfK (+4 +8 bytes larger for most boards but -10 for avr).
Removing the -ETIMEDOUT and returning -1 instead reduces the code size at least by 4 bytes for most boards (many 10 and even 16). Size diff here: http://pastebin.com/dnG7YCJ5)
if (locked || (timeout == 0)) {
return (locked - 1);
}
I would prefer this solution. What do you think?
ad9ae47
to
493c6f6
Compare
Example output of the test
|
For some reason when running on |
493c6f6
to
8501eee
Compare
8501eee
to
6c5b42d
Compare
did anyone have a look at this? @OlegHahm are all your comments addressed? |
On it now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also provide a pexpect script for the test.
sys/include/xtimer.h
Outdated
* | ||
* @note this requires core_thread_flags to be enabled | ||
* This will try to lock a mutex. If the mutex is not available immediately or | ||
* until a certain amount of time (timeout) the method will return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase to:
Tries to lock the mutex for a maximum timespan of @p timeout microseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your sentence is in this case misleading, it's not clear if the mutex will be locked only during timeout
or if it will try to lock it during the timeout
. I will use the @p
in any case :-)
sys/include/xtimer.h
Outdated
* @return 0, when returned after mutex was locked | ||
* @return -1, when the timeout occcured | ||
* @param[in] mutex mutex to lock | ||
* @param[in] timeout timeout in microseconds relative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/relative//
It reads a bit weird and I think it's common sense that a timeout is specified in relative numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test seems to fail.
Test application hangs for me on native, when I tried it first at |
After 5/6 pings in 8 months I dismiss the review of @OlegHahm . Can someone else please have a look? @miri64 @gebart @vincent-d ? |
postponed |
@kaspar030 @gebart can you maybe have a look? |
Ping @kaspar030 @gebart? |
@lebrush I'm really sorry for what happened here. I was not really confident in reviewing this one at the time you opened it, and this stalled for no good reason. We encountered one of the issues which are solved here and a colleague opened #10872. I've run a couple of tests and when trying to reproduced I encountered another issue which is also fixed by your PR. Would you mind rebasing this then we could merge? If you don't have time, could someone else take this over? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on nucleo-f207zg. Please rebase.
ACK
@lebrush ping? |
/* a timeout lower than XTIMER_BACKOFF causes the xtimer to spin rather | ||
* than to set a timer for interrupt. Hence, we shall make the mutex_lock | ||
* call blocking only when the interrupt didn't occur yet. */ | ||
locked = _mutex_lock(mutex, (mt.timeout == NO_TIMEOUT)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While looking at the original code @JulianHolzwarth we noticed the issue with this one, and looked for existing references. But actually this does not solve the concurrency completely.
Between the evaluation of ==
and the actual irq_disable
in the function, the callback can be triggered.
He will provide a PR for the needed change in core
to evaluate a volatile
condition when the interrupt are disabled.
This would however maybe make using 'mt.timeout' not possible and may require another variable.
Is this still being worked on? Is this fixed? @JulianHolzwarth @kaspar030 maybe you have some insight on this. |
Then let's close this. |
As discussed in #6428 (comment) if the timeout given is lower than XTIMER_BACKOFF the timer might spin instead of setting an interrupt and the mutex might lock until it's released rather than applying the timeout. This tries to solve this cases (and improves the documentation)