Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: FakeUpstream threading fixes #14526

Merged
merged 18 commits into from
Jan 21, 2021
Merged
Changes from 1 commit
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
512610f
test: FakeUpstream threading fixes
antoniovicente Dec 24, 2020
19a63f7
address review comments
antoniovicente Dec 29, 2020
5f767cf
fix flakiness in hds_integration_test
antoniovicente Dec 29, 2020
1bfa958
Merge remote-tracking branch 'upstream/master' into fake_upstream_thr…
antoniovicente Dec 29, 2020
45b8864
Fix wait for disconnect. Previously disconnect condition happened un…
antoniovicente Jan 6, 2021
3b653f2
fix use after free due to connection possibly being deleted.
antoniovicente Jan 7, 2021
ee052d6
fix more use-after-free
antoniovicente Jan 7, 2021
e9f4597
fix flaky test
antoniovicente Jan 7, 2021
3c55b38
Merge remote-tracking branch 'upstream/master' into fake_upstream_thr…
antoniovicente Jan 8, 2021
495b19e
add logging to debug macos failure
antoniovicente Jan 9, 2021
6580555
Wait outside the upstream lock since holding that lock is not needed.
antoniovicente Jan 11, 2021
f90610f
Symbolize names of VERSIONED_GRPC_CLIENT_INTEGRATION_PARAMS test cases.
antoniovicente Jan 11, 2021
22211c9
also stringify arguments to other VERSIONED_GRPC_CLIENT_INTEGRATION_P…
antoniovicente Jan 11, 2021
0d4fc3d
Fix issue of timeouts waiting for disconnect on macos while also avoi…
antoniovicente Jan 13, 2021
d0565e5
address review comments
antoniovicente Jan 15, 2021
197362f
revert back to the original wait for a raw connection followed by che…
antoniovicente Jan 16, 2021
3aee6d7
add thread annotations to FakeConnectionBase::initialized_
antoniovicente Jan 19, 2021
8fccaf3
fix use-after-free in test framework
antoniovicente Jan 20, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 11 additions & 15 deletions test/integration/hds_integration_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -438,11 +438,9 @@ TEST_P(HdsIntegrationTest, SingleEndpointTimeoutHttp) {
hds_stream_->sendGrpcMessage(server_health_check_specifier_);
test_server_->waitForCounterGe("hds_delegate.requests", ++hds_requests_);

// Envoy sends a health check message to an endpoint
ASSERT_TRUE(host_upstream_->waitForRawConnection(host_fake_raw_connection_));

// Endpoint doesn't respond to the health check
ASSERT_TRUE(host_fake_raw_connection_->waitForDisconnect());
// Envoy sends a health check message to an endpoint, but the endpoint doesn't respond to the
// health check
ASSERT_TRUE(host_upstream_->waitForAndConsumeDisconnectedConnection());

// Receive updates until the one we expect arrives
waitForEndpointHealthResponse(envoy::config::core::v3::TIMEOUT);
Expand Down Expand Up @@ -515,11 +513,9 @@ TEST_P(HdsIntegrationTest, SingleEndpointTimeoutTcp) {
hds_stream_->sendGrpcMessage(server_health_check_specifier_);
test_server_->waitForCounterGe("hds_delegate.requests", ++hds_requests_);

// Envoys asks the endpoint if it's healthy
ASSERT_TRUE(host_upstream_->waitForRawConnection(host_fake_raw_connection_));

// No response from the endpoint
ASSERT_TRUE(host_fake_raw_connection_->waitForDisconnect());
// Envoy sends a health check message to an endpoint, but the endpoint doesn't respond to the
// health check
ASSERT_TRUE(host_upstream_->waitForAndConsumeDisconnectedConnection());

// Receive updates until the one we expect arrives
waitForEndpointHealthResponse(envoy::config::core::v3::TIMEOUT);
Expand Down Expand Up @@ -1031,6 +1027,8 @@ TEST_P(HdsIntegrationTest, SingleEndpointUnhealthyTlsMissingSocketMatch) {
tls_hosts_ = true;

initialize();
// Allow the fake upstreams to detect an error and disconnect during the TLS handshake.
host_upstream_->setReadDisableOnNewConnection(false);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The readDisable on the connection does not prevent the delivery of EPOLLOUT events to the connection. The delivery of a write event triggers the start of the SSL handshake operation which can perform reads and writes. The handling of the out event sometimes results in detection of the plain HTTP request sent by the proxy to the fake upstream, which can trigger termination of the upstream connection in the middle of the process of creating a raw connection, which results in a use-after-free crash.

By fully enabling events on the TLS upstream and waiting for disconnect we avoid flaky failures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add all this color to the comment for the next person? (or since you have the same fix below maybe have a well named helper function with a bunch of comments? Up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to go back to see if I can fix this properly.

I think that the original waitForRawConnection followed by waitForDisconnect API was more intuitive than the setReadDisableOnNewConnection+waitForAndConsumeDisconnectedConnection that I need to avoid use after free in the TLS handshake failure cases. I think the solution is to avoid creating the network connection object until later and thus preventing events from being handled when the filter chain and related objects are not fully setup

/wait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to make tests work with the old API by allowing raw connection creation to succeed in cases where the network connection is already disconnected. Not having to call different methods in tests that expect disconnects feels easier.

PTAL.


// Server <--> Envoy
waitForHdsStream();
Expand All @@ -1046,11 +1044,9 @@ TEST_P(HdsIntegrationTest, SingleEndpointUnhealthyTlsMissingSocketMatch) {
hds_stream_->sendGrpcMessage(server_health_check_specifier_);
test_server_->waitForCounterGe("hds_delegate.requests", ++hds_requests_);

// Envoy sends a health check message to an endpoint
ASSERT_TRUE(host_upstream_->waitForRawConnection(host_fake_raw_connection_));

// Endpoint doesn't respond to the health check
ASSERT_TRUE(host_fake_raw_connection_->waitForDisconnect());
// Envoy sends a health check message to an endpoint, but the endpoint doesn't respond to the
// health check
ASSERT_TRUE(host_upstream_->waitForAndConsumeDisconnectedConnection());

// Receive updates until the one we expect arrives. This should be UNHEALTHY and not TIMEOUT,
// because TIMEOUT occurs in the situation where there is no response from the endpoint. In this
Expand Down