Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/xtimer: fix xtimer_mutex_lock_timeout corner cases #6441

Closed
wants to merge 3 commits into from

Conversation

lebrush
Copy link
Member

@lebrush lebrush commented Jan 20, 2017

As discussed in #6428 (comment) if the timeout given is lower than XTIMER_BACKOFF the timer might spin instead of setting an interrupt and the mutex might lock until it's released rather than applying the timeout. This tries to solve this cases (and improves the documentation)

@miri64
Copy link
Member

miri64 commented Jan 20, 2017

Can you provide a test application testing for the cases you drew out in #6428?

@miri64 miri64 added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Area: timers Area: timer subsystems labels Jan 20, 2017
@miri64 miri64 added this to the Release 2017.01 milestone Jan 20, 2017
@lebrush
Copy link
Member Author

lebrush commented Jan 20, 2017

Will do. Please, note that #6428 is still required.

@miri64
Copy link
Member

miri64 commented Jan 20, 2017

Please, note that #6428 is still required.

Understood it as such ;-)

@OlegHahm
Copy link
Member

Rebase is required.

*
* @return 0, when returned after mutex was locked
* @return -1, when the timeout occcured
* @return -1, mutex can't be locked right now (timeout <= XTIMER_BACKOFF)
* @return -2, when the timeout occcured
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you introduce an enum for the error values? This would make it easier to change this later on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative use errno.h. -EINVAL and -ETIMEDOUT seem fitting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/* timeout lower than XTIMER_BACKOFF might cause the code to spin rather
* than set it up for interrupt */
if (timeout <= XTIMER_BACKOFF) {
return (mutex_trylock(mutex) - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say A has the mutex and B calls xtimer_mutex_lock_timeout() with timeout <= XTIMER_BACKOFF. Now A receives an interrupt that causes it to release the mutex. B could do its spin now and lock the mutex, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've a sketch of that locally, but still not fully happy with it... Will update it asap

@lebrush
Copy link
Member Author

lebrush commented Jan 23, 2017

I still have to provide some tests for it but [I hope] all corner cases are addressed in this implementation. I removed the two error return values (-1 and -2) and now returns only 0 (success) and -ETIMEDOUT (timed out). Besides if the timeout < XTIMER_BACKOFF it just spins (blocking) and tries to lock the mutex again afterwards ( @OlegHahm I guess that's what you meant). Documentation has been updated accordingly.

@miri64 using ETIMEDOUT requires 4 more bytes (in stm32) than returning just -1 due to the multiplication operation (L253).

sizediffs: http://pastebin.com/bn6et69K

Copy link
Member

@jnohlgard jnohlgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment on premature optimization.

t.arg = (void *)((mutex_thread_t *)&mt);
_xtimer_set64(&t, timeout, timeout >> 32);
if (locked || (timeout == 0)) {
return (locked - 1) * ETIMEDOUT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unnecessary optimization in my eyes. Make it a simple if branch and return -ETIMEDOUT if it is locked, otherwise 0.
The current code hurts readability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's ugly... however with an if branch it generates larger code for most boards:

    if (locked) {
        return 0;
    }
    else if (timeout == 0) {
        return -ETIMEDOUT;
    }

here is the size diff: http://pastebin.com/VVPjZQfK (+4 +8 bytes larger for most boards but -10 for avr).

Removing the -ETIMEDOUT and returning -1 instead reduces the code size at least by 4 bytes for most boards (many 10 and even 16). Size diff here: http://pastebin.com/dnG7YCJ5)

    if (locked || (timeout == 0)) {
        return (locked - 1);
    }

I would prefer this solution. What do you think?

@lebrush lebrush force-pushed the fix/xtimer-mutex branch 2 times, most recently from ad9ae47 to 493c6f6 Compare January 24, 2017 07:58
@lebrush
Copy link
Member Author

lebrush commented Jan 24, 2017

Example output of the test

main: tests with unlocked mutex:
thread1 timeout: 0, mutex: unlocked --> OK
thread1 timeout: 0, mutex: locked (1) --> OK
main: locking mutex
main: tests with mutex locked once:
thread1 timeout: 0, mutex: locked (1) --> OK
thread1 timeout: 4, mutex: locked (1) --> OK
thread1 timeout: 5, mutex: locked (1) --> OK
thread1 timeout: 6, mutex: locked (1) --> OK
thread1 timeout: 9, mutex: locked (1) --> OK
thread1 timeout: 10, mutex: locked (1) --> OK
thread1 timeout: 1000000, mutex: locked (1) --> OK
main: test unlocking while waiting
thread1: waiting for mutex
main: unlocking...
thread1: got mutex! --> OK 
main: starting thread2
main: locking mutex
thread2: running
thread2 timeout: 0, mutex: locked (2+) --> OK
thread2 timeout: 4, mutex: locked (2+) --> OK
thread2 timeout: 5, mutex: locked (2+) --> OK
thread2 timeout: 6, mutex: locked (2+) --> OK
thread2 timeout: 9, mutex: locked (2+) --> OK
thread2 timeout: 10, mutex: locked (2+) --> OK
thread2 timeout: 1000000, mutex: locked (2+) --> OK
thread2: waiting for mutex
thread1: unlocking
main: got mutex, unlocking
main: donethread1: done
thread2: got mutex! --> OK 
thread2: done

@lebrush
Copy link
Member Author

lebrush commented Jan 24, 2017

For some reason when running on native once an xtimer is set and fired I can't set any xtimer anymore. Might be related to #6442 or be the same issue... but the code runs fine on nucleo-f103.

@lebrush lebrush changed the title sys/xtimer: fix mutex locking "before" corner case sys/xtimer: fix xtimer_mutex_lock_timeout corner cases Jan 24, 2017
@PeterKietzmann PeterKietzmann added the Process: needs backport Integration Process: The PR is required to be backported to a release or feature branch label Jan 26, 2017
@lebrush
Copy link
Member Author

lebrush commented Jan 30, 2017

did anyone have a look at this? @OlegHahm are all your comments addressed?

@OlegHahm
Copy link
Member

On it now.

Copy link
Member

@OlegHahm OlegHahm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also provide a pexpect script for the test.

*
* @note this requires core_thread_flags to be enabled
* This will try to lock a mutex. If the mutex is not available immediately or
* until a certain amount of time (timeout) the method will return -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase to:

Tries to lock the mutex for a maximum timespan of @p timeout microseconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your sentence is in this case misleading, it's not clear if the mutex will be locked only during timeout or if it will try to lock it during the timeout. I will use the @p in any case :-)

* @return 0, when returned after mutex was locked
* @return -1, when the timeout occcured
* @param[in] mutex mutex to lock
* @param[in] timeout timeout in microseconds relative
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/relative//

It reads a bit weird and I think it's common sense that a timeout is specified in relative numbers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@OlegHahm OlegHahm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test seems to fail.

@miri64
Copy link
Member

miri64 commented Jan 31, 2017

Test application hangs for me on native, when I tried it first at thread1 timeout: 201, mutex: locked (1) --> , later at thread1 timeout: 1000000, mutex: locked (1) --> :-/

@PeterKietzmann PeterKietzmann modified the milestones: Release 2017.01, Release 2017.04 Feb 1, 2017
@lebrush
Copy link
Member Author

lebrush commented Oct 6, 2017

After 5/6 pings in 8 months I dismiss the review of @OlegHahm . Can someone else please have a look? @miri64 @gebart @vincent-d ?

@lebrush lebrush dismissed OlegHahm’s stale review October 6, 2017 08:42

Not responding to pings on update

@lebrush lebrush added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Oct 6, 2017
@smlng
Copy link
Member

smlng commented Jan 18, 2018

postponed

@smlng smlng added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jan 18, 2018
@miri64
Copy link
Member

miri64 commented Apr 13, 2018

@kaspar030 @gebart can you maybe have a look?

@kaspar030 kaspar030 added CI: run tests If set, CI server will run tests on hardware for the labeled PR CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Apr 20, 2018
@kaspar030 kaspar030 removed this from the Release 2018.04 milestone Apr 22, 2018
@miri64
Copy link
Member

miri64 commented Jun 5, 2018

Ping @kaspar030 @gebart?

@vincent-d
Copy link
Member

@lebrush I'm really sorry for what happened here. I was not really confident in reviewing this one at the time you opened it, and this stalled for no good reason.

We encountered one of the issues which are solved here and a colleague opened #10872. I've run a couple of tests and when trying to reproduced I encountered another issue which is also fixed by your PR.

Would you mind rebasing this then we could merge? If you don't have time, could someone else take this over?

Copy link
Member

@vincent-d vincent-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on nucleo-f207zg. Please rebase.

ACK

@miri64
Copy link
Member

miri64 commented Feb 18, 2019

@lebrush ping?

/* a timeout lower than XTIMER_BACKOFF causes the xtimer to spin rather
* than to set a timer for interrupt. Hence, we shall make the mutex_lock
* call blocking only when the interrupt didn't occur yet. */
locked = _mutex_lock(mutex, (mt.timeout == NO_TIMEOUT));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking at the original code @JulianHolzwarth we noticed the issue with this one, and looked for existing references. But actually this does not solve the concurrency completely.
Between the evaluation of == and the actual irq_disable in the function, the callback can be triggered.

He will provide a PR for the needed change in core to evaluate a volatile condition when the interrupt are disabled.
This would however maybe make using 'mt.timeout' not possible and may require another variable.

@miri64
Copy link
Member

miri64 commented Jun 23, 2020

Is this still being worked on? Is this fixed? @JulianHolzwarth @kaspar030 maybe you have some insight on this.

@miri64 miri64 added the State: stale State: The issue / PR has no activity for >185 days label Jun 23, 2020
@JulianHolzwarth
Copy link
Contributor

JulianHolzwarth commented Jun 23, 2020

@miri64 this should be fixed. #11660

@stale stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Jun 23, 2020
@miri64
Copy link
Member

miri64 commented Jun 23, 2020

Then let's close this.

@miri64 miri64 closed this Jun 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: timers Area: timer subsystems CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR CI: run tests If set, CI server will run tests on hardware for the labeled PR Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.