Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add setting for keep-alive duration for oidc back-channel #87773

Merged
merged 14 commits into from
Jun 20, 2022

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Jun 17, 2022

In some environment, the back-channel connection can be dropped
without sending a TCP RST to ES. When that happens, reusing the same
connection results into timeout error.

This PR adds a new http.connection_pool_ttl setting to control how long
a connection in the OIDC back-channel pool can be idle before it is
closed. This allows ES to more actively close idle connections to avoid
the timeout issue.

The new setting has a 3min default which means idle connections are
closed every 3 min if server response does not specify a shorter keep-alive.

Resolves: #75515

In some environment, the back-channel connection can be dropped
without sending a TCP RST to ES. When that happens, reusing the same
connection results into timeout error.

This PR adds a new http.connection_pool_ttl setting to control how long
a connection in the OIDC back-channel pool can be idle before it is
closed. This allows ES to more actively close idle connections to avoid
the timeout issue.

Resolves: elastic#75515
@ywangd ywangd added >enhancement :Security/Security Security issues without another label v8.4.0 labels Jun 17, 2022
@ywangd ywangd requested a review from tvernum June 17, 2022 02:34
@ywangd ywangd marked this pull request as ready for review June 17, 2022 02:35
@elasticmachine elasticmachine added the Team:Security Meta label for security team label Jun 17, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (Team:Security)

@elasticsearchmachine
Copy link
Collaborator

Hi @ywangd, I've created a changelog YAML for you.

@ywangd
Copy link
Member Author

ywangd commented Jun 17, 2022

@tvernum This might be worth to backport to earlier versions (including 7.17?) if it proves to work. For now, I just tagged v8.4.0.

docs/changelog/87773.yaml Outdated Show resolved Hide resolved
Comment on lines 1871 to 1873
Configure this setting to `-1` to let the server dictates this value using the `Keep-Alive` HTTP
response header. If the header is not set by the server, the time-to-live is infinite meaning
that connections never expire.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should reword this, but I'll come back with a suggestion later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reword'd this paragraph and also made it consistent to the new behaviour of handling keep-alive response header.

ywangd and others added 3 commits June 17, 2022 15:56
…ecurity/authc/oidc/OpenIdConnectAuthenticator.java

Co-authored-by: Tim Vernum <[email protected]>
Comment on lines 715 to 725
var serverKeepAlive = DefaultConnectionKeepAliveStrategy.INSTANCE.getKeepAliveDuration(response, context);
final long actualKeepAlive;
if (serverKeepAlive == -1) {
actualKeepAlive = userConfiguredKeepAlive;
} else if (userConfiguredKeepAlive == -1) {
actualKeepAlive = serverKeepAlive;
} else {
actualKeepAlive = Math.min(serverKeepAlive, userConfiguredKeepAlive);
}
LOGGER.debug("effective HTTP connection keep-alive: [{}]ms", actualKeepAlive);
return actualKeepAlive;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaked this part a bit more and added a debug logging.

@ywangd ywangd requested a review from tvernum June 19, 2022 23:13
Copy link
Contributor

@tvernum tvernum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything that's here LGTM, but we do need a plan for automated testing.

docs/reference/settings/security-settings.asciidoc Outdated Show resolved Hide resolved
docs/reference/settings/security-settings.asciidoc Outdated Show resolved Hide resolved
}
LOGGER.debug("effective HTTP connection keep-alive: [{}]ms", actualKeepAlive);
return actualKeepAlive;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to come up with a test strategy for this.
We could do it with a mock HTTP Server and ensure that the connection is pulled down, but that seems like a lot of work.
Is it possible to check the value configured on the client itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up a way for testing the changes for both the configured TTL and the actual behaviour of "not reusing the same connection". Please see 058872f

Comment on lines 1030 to 1043
httpServer.createContext("/", exchange -> {
try {
final int currentPort = exchange.getRemoteAddress().getPort();
// Either set the first port number, otherwise the current (2nd) port number should be different from the 1st one
if (false == firstClientPort.compareAndSet(null, currentPort)) {
assertThat(currentPort, not(equalTo(firstClientPort.get())));
}
final byte[] bytes = randomByteArrayOfLength(2);
exchange.sendResponseHeaders(200, bytes.length);
exchange.getResponseBody().write(bytes);
} finally {
exchange.close();
}
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing the behaviour that two requests with interval greater than the TTL (1s) should use different connections.

Comment on lines +1062 to +1069
appender.addExpectation(
new MockLogAppender.PatternSeenEventExpectation(
"log",
logger.getName(),
Level.DEBUG,
".*Connection .* can be kept alive for 1.0 seconds"
)
);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for the configured TTL in action.

@ywangd ywangd requested a review from tvernum June 20, 2022 06:46
@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 20, 2022
@ywangd ywangd removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 20, 2022
@ywangd ywangd merged commit f075d50 into elastic:master Jun 20, 2022
@jkakavas
Copy link
Member

🚀

ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jun 21, 2022
This PR adds a new setting to enable tcp keepalive probes for the
connections used by the oidc back-channel communication. It defaults to
false to keep the existing behaviour.

Relates: elastic#87773
elasticsearchmachine pushed a commit that referenced this pull request Jul 7, 2022
This PR adds a new setting to enable tcp keepalive probes for the
connections used by the oidc back-channel communication. It defaults to
true as tcp keepalive is generally useful for ES.

Relates: #87773
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jul 8, 2022
…7773)

In some environment, the back-channel connection can be dropped
without sending a TCP RST to ES. When that happens, reusing the same
connection results into timeout error.

This PR adds a new http.connection_pool_ttl setting to control how long
a connection in the OIDC back-channel pool can be idle before it is
closed. This allows ES to more actively close idle connections to avoid
the timeout issue.

NOTE: This is a "safe" backport of elastic#87773. The key difference here is
that the new setting is by default not configured, which means the PR
introduces *zero* behaviour change by default. Users need to actively
configure the new setting to enable the new behaviour ("automatically
closing idle connections).

Backport: elastic#87773
elasticsearchmachine pushed a commit that referenced this pull request Jul 8, 2022
…88363)

* New setting to close idle connections in OIDC back-channel (#87773)

In some environment, the back-channel connection can be dropped
without sending a TCP RST to ES. When that happens, reusing the same
connection results into timeout error.

This PR adds a new http.connection_pool_ttl setting to control how long
a connection in the OIDC back-channel pool can be idle before it is
closed. This allows ES to more actively close idle connections to avoid
the timeout issue.

NOTE: This is a "safe" backport of #87773. The key difference here is
that the new setting is by default not configured, which means the PR
introduces *zero* behaviour change by default. Users need to actively
configure the new setting to enable the new behaviour ("automatically
closing idle connections).

Backport: #87773

* Make it safe by keeping default behaviour

* tweak

* Update x-pack/plugin/security/src/test/java/org/elasticsearch/xpack/security/authc/oidc/OpenIdConnectAuthenticatorTests.java

Co-authored-by: Tim Vernum <[email protected]>

* address feedback

Co-authored-by: Tim Vernum <[email protected]>
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jul 11, 2022
…7773)

In some environment, the back-channel connection can be dropped
without sending a TCP RST to ES. When that happens, reusing the same
connection results into timeout error.

This PR adds a new http.connection_pool_ttl setting to control how long
a connection in the OIDC back-channel pool can be idle before it is
closed. This allows ES to more actively close idle connections to avoid
the timeout issue.

NOTE: This is a "safe" backport of elastic#87773. The key difference here is
that the new setting is by default not configured, which means the PR
introduces *zero* behaviour change by default. Users need to actively
configure the new setting to enable the new behaviour ("automatically
closing idle connections).

Backport: elastic#87773
ywangd added a commit that referenced this pull request Jul 11, 2022
…88412)

In some environment, the back-channel connection can be dropped
without sending a TCP RST to ES. When that happens, reusing the same
connection results into timeout error.

This PR adds a new http.connection_pool_ttl setting to control how long
a connection in the OIDC back-channel pool can be idle before it is
closed. This allows ES to more actively close idle connections to avoid
the timeout issue.

NOTE: This is a "safe" backport of #87773. The key difference here is
that the new setting is by default not configured, which means the PR
introduces *zero* behaviour change by default. Users need to actively
configure the new setting to enable the new behaviour ("automatically
closing idle connections).

Backport: #87773
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Security/Security Security issues without another label Team:Security Meta label for security team v7.17.6 v8.3.3 v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom keep-alive strategy for outgoing OIDC connections
5 participants