Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply fix for use-after-free in Envoy ThreadLocal Slot. #111

Merged
merged 3 commits into from
Oct 10, 2019

Conversation

jplevyak
Copy link

@jplevyak jplevyak commented Oct 9, 2019

Signed-off-by: John Plevyak [email protected]

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Description: Import thread local changes from envoyproxy/master and apply to RDS in the same manner as was applied in envoyproxy/master in order to address a segfault.
Risk Level:
Testing: unit tests (e2e in future)
Docs Changes:
Release Notes:
[Optional Fixes #Issue] istio/istio#17699
[Optional Deprecated:]

@lizan
Copy link

lizan commented Oct 9, 2019

So is this a cherry-pick of envoyproxy#8290?

@lambdai
Copy link

lambdai commented Oct 9, 2019

Seems it's envoyproxy#8290 + envoyproxy#8135

@lizan
Copy link

lizan commented Oct 10, 2019

I'm OK as long as it's cherry-pick from upstream, @jplevyak can you include upstream references in commit message?

@lambdai
Copy link

lambdai commented Oct 10, 2019

Drive by comments: AFAIK the original fix is supposed to fix the use-after-free caused by srds. Istio doesn't adopt SRDS it's not fully convinced we are seeing the exact same use-after-free.
Plain RDS will also experience this use-after-free
But it's not hurting to give it a try.

@rlenglet
Copy link

If you'd like to include this into 1.3.3, this needs to get merged today, and the SHAs updated in istio/proxy and istio/istio.

@jplevyak
Copy link
Author

Jean-Rémy Bancel is going to test.

@jplevyak
Copy link
Author

RE: cherry-picks. We also need part of envoyproxy#7979, but it is not possible to cherry-pick all these because of changes to other bits of code which would require more cherry-picks, etc. Instead for this patch, I simply upgraded the thread_local code which is well encapsulated and the associated tests and then used the new API which is thread-safe for RDS. It is clear from the backtraces on istio/istio#17699 that RdsRouteConfigSubscription is the source of the problem.

@lizan are there other places I should apply the new thread-safe API?

…llThreads interface to Slot. (envoyproxy#8135)

See the issue in envoyproxy#7902, this PR is to make the SlotImpl detachable from its owner, by introducing a Booker object wraps around a SlotImpl, which bookkeeps all the on-the-fly update callbacks. And on its destruction, if there are still on-the-fly callbacks, move the SlotImpl to an deferred-delete queue, instead of destructing the SlotImpl which may cause an SEGV error.

More importantly, introduce a new runOnAllThreads(ThreadLocal::UpdateCb cb) API to Slot, which requests a Slot Owner to not assume that the Slot or its owner will out-live (in Main thread) the fired on-the-fly update callbacks, and should not capture the Slot or its owner in the update_cb.

Picked RDS and config-providers-framework as examples to demonstrate that this change works. {i.e., changed from the runOnAllThreads(Event::PostCb) to the new runOnAllThreads(TLS::UpdateCb) interface. }

Risk Level: Medium
Testing: unit test
Docs Changes: N/A
Release Notes: N/A
[Optional Fixes #Issue] envoyproxy#7902

Signed-off-by: Xin Zhuang <[email protected]>
Signed-off-by: John Plevyak <[email protected]>
@jplevyak
Copy link
Author

@lizan spoke too soon, I can get the cherry picks to merge.... stay tuned.

@jplevyak
Copy link
Author

Updated to include the cherry picks. The first required a patch, so they are not completely clean, but at least we have the history.

@JRBANCEL
Copy link

I tested it and I didn't see any segfault running the Knative E2E tests (it would consistently segfaults 3-4 times before the fix) 👍

@jplevyak jplevyak merged commit d24427d into istio:release-1.3 Oct 10, 2019
Copy link

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fpesce
Copy link

fpesce commented Oct 10, 2019

Do we have cherrypicks for 1.2 and 1.1 on this ?

jplevyak added a commit to jplevyak/envoy that referenced this pull request Oct 10, 2019
Apply fix for use-after-free in Envoy ThreadLocal Slot.

Signed-off-by: John Plevyak <[email protected]>
jplevyak added a commit to jplevyak/envoy that referenced this pull request Oct 11, 2019
Apply fix for use-after-free in Envoy ThreadLocal Slot.

Signed-off-by: John Plevyak <[email protected]>
jplevyak added a commit that referenced this pull request Oct 14, 2019
Merge pull request #111 from jplevyak/release-1.3-use-after-free
fpesce pushed a commit that referenced this pull request Oct 18, 2019
Merge pull request #111 from jplevyak/release-1.3-use-after-free
Miss-you pushed a commit to Miss-you/envoy that referenced this pull request Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants