Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: update wake_up while holding all the locks of sharded time wheels #6683

Merged

Conversation

wathenjiang
Copy link
Contributor

@wathenjiang wathenjiang commented Jul 12, 2024

Motivation

fixes #6682

We may encounter the following issue:

  • After we calculated that expiration_time is None, we did not update the next_wake immediately. The next_wake is now the old value of the previous round of park_internal (the value is not None). Then we droped the locks.
  • When other threads poll a Sleep, they find that when > next_wake (here next_wake is not None), so they will not execute unpark.unpark().

Solution

While we hold the locks, we calculate and update the next_wake.

@github-actions github-actions bot added the R-loom-time-driver Run loom time driver tests on this PR label Jul 12, 2024
@wathenjiang wathenjiang added A-tokio Area: The main tokio crate M-time Module: tokio/time labels Jul 12, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a loom test for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to add a loom test for this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an actual data race or just a race condition? It looks like it's just incorrect usage of locks, not any incorrect unsafe code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that I have confused them. I would like to modify the title to time: update wake_up while holding all the locks of sharded time wheels.

@wathenjiang wathenjiang changed the title time: fix the data race of next_wake time: update wake_up while holding all the locks of sharded time wheels Jul 13, 2024
Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are things that could be improved here, but I don't think it needs to block this PR. Let's get a fix out now, and we can improve things later.

Comment on lines +193 to +195
let locks = (0..rt_handle.time().inner.get_shard_size())
.map(|id| rt_handle.time().inner.lock_sharded_wheel(id))
.collect::<Vec<_>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid allocating this vector every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any ideas to achieve this? Can we use smallvec::SmallVec here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not going to introduce a new dependency for it, but maybe a solution like that could work. But I don't think it has to be solved before we publish a release containing this fix.

(See also chat on discord.)

@wathenjiang
Copy link
Contributor Author

There are things that could be improved here, but I don't think it needs to block this PR. Let's get a fix out now, and we can improve things later.

I'm sorry for the delay here, I've had a very busy weekend, maybe I'll get around to getting this done today.

@wathenjiang wathenjiang changed the base branch from master to tokio-1.38.x July 16, 2024 11:30
@wathenjiang wathenjiang force-pushed the update-next-wake-hold-locks branch from 5b233e6 to 35f94c1 Compare July 16, 2024 11:42
@wathenjiang wathenjiang force-pushed the update-next-wake-hold-locks branch from 35f94c1 to 1b818a3 Compare July 16, 2024 11:43
@Darksonn Darksonn merged commit 24344df into tokio-rs:tokio-1.38.x Jul 16, 2024
73 checks passed
kodiakhq bot pushed a commit to pdylanross/fatigue that referenced this pull request Jul 17, 2024
Bumps tokio from 1.38.0 to 1.38.1.

Release notes
Sourced from tokio's releases.

Tokio v1.38.1
1.38.1 (July 16th, 2024)
This release fixes the bug identified as (#6682), which caused timers not
to fire when they should.
Fixed

time: update wake_up while holding all the locks of sharded time wheels (#6683)

#6682: tokio-rs/tokio#6682
#6683: tokio-rs/tokio#6683



Commits

14b9f71 chore: release Tokio v1.38.1 (#6688)
24344df time: fix race condition leading to lost timers (#6683)
See full diff in compare view




Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-time Module: tokio/time R-loom-time-driver Run loom time driver tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The tokio::time::timeout() does not return after timeout.
2 participants