Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make System::Layer::mHandleSelectThread atomic. #7828

Merged

Conversation

bzbarsky-apple
Copy link
Contributor

mHandleSelectThread is used for an optimization: when it's set to the
current thread id, WakeIOThread knows that:

  1. We're in the middle of running Layer::HandleTimeout
  2. We're doing that on the same thread where WakeIOThread was called

and hence WakeIOThread doesn't need to do anything, because we're
already on the IO thread and it's already awake.

The read of this member in WakeIOThread is not synchronized in any
way, but as long as we guarantee that it correctly reads a value that
was actually assigned to the member (which std::atomic does), things
will work correctly. Either the value we read will be equal to the
current thread id, in which case we know we're on the one thread that
called Layer::HandleTimeout and are inside that function, or it will
not be equal to our thread id and then we need to do the actual wakeup
work, whatever that value is (including if it's null).

Fixes #7818

Problem

TSan shows that we have a data race on Linux if PlatformManager::PostEvent is called while Layer::HandleTimeout is running on another thread.

Change overview

Use std::atomic to guarantee that we don't have a data race. With that fixed, the overall logic does not seem to have races that lead to incorrect behavior.

Testing

Manually tested this with #7797 and changes to make CHIP-tool use PostEvent which failed CI without this change and pass with it.

mHandleSelectThread is used for an optimization: when it's set to the
current thread id, WakeIOThread knows that:

1) We're in the middle of running Layer::HandleTimeout
2) We're doing that on the same thread where WakeIOThread was called

and hence WakeIOThread doesn't need to do anything, because we're
already on the IO thread and it's already awake.

The read of this member in WakeIOThread is not synchronized in any
way, but as long as we guarantee that it correctly reads a value that
was actually assigned to the member (which std::atomic does), things
will work correctly.  Either the value we read will be equal to the
current thread id, in which case we know we're on the one thread that
called Layer::HandleTimeout and are inside that function, or it will
not be equal to our thread id and then we need to do the actual wakeup
work, whatever that value is (including if it's null).

Fixes project-chip#7818
@github-actions
Copy link

Size increase report for "nrfconnect-example-build" from 90592a7

File Section File VM
chip-lock.elf text 32 32
chip-shell.elf text 32 32
Full report output
BLOAT REPORT

Files found only in the build output:
    report.csv

Comparing ./master_artifact/chip-lock.elf and ./pull_artifact/chip-lock.elf:

sections,vmsize,filesize
.debug_info,0,427147
.debug_abbrev,0,8333
.debug_str,0,3196
.debug_loc,0,592
text,32,32
.debug_ranges,0,-8
.debug_line,0,-3356

Comparing ./master_artifact/chip-shell.elf and ./pull_artifact/chip-shell.elf:

sections,vmsize,filesize
.debug_info,0,174805
.debug_abbrev,0,4090
.debug_str,0,3196
.debug_loc,0,608
text,32,32
.debug_ranges,0,-8
.debug_line,0,-1779


@woody-apple woody-apple merged commit 49b98b0 into project-chip:master Jun 23, 2021
@bzbarsky-apple bzbarsky-apple deleted the fix-race-on-thread-id branch June 23, 2021 03:10
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021
mHandleSelectThread is used for an optimization: when it's set to the
current thread id, WakeIOThread knows that:

1) We're in the middle of running Layer::HandleTimeout
2) We're doing that on the same thread where WakeIOThread was called

and hence WakeIOThread doesn't need to do anything, because we're
already on the IO thread and it's already awake.

The read of this member in WakeIOThread is not synchronized in any
way, but as long as we guarantee that it correctly reads a value that
was actually assigned to the member (which std::atomic does), things
will work correctly.  Either the value we read will be equal to the
current thread id, in which case we know we're on the one thread that
called Layer::HandleTimeout and are inside that function, or it will
not be equal to our thread id and then we need to do the actual wakeup
work, whatever that value is (including if it's null).

Fixes project-chip#7818
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of mHandleSelectThread in SystemLayer is data-racy
5 participants