Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chart(add): Default ingress annotations for upstream keepalive, or disable HTTP/2 #2328

Merged
merged 1 commit into from
Jul 27, 2024

Conversation

VietND96
Copy link
Member

@VietND96 VietND96 commented Jul 27, 2024

User description

Thanks for contributing to the Docker-Selenium project!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines, applied for this repository.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

Ingress Configuration

List mapping of chart values and default annotation(s)

# `ingress.nginx.proxyTimeout` pass value to annotation(s)
nginx.ingress.kubernetes.io/upstream-keepalive-timeout

# `ingress.nginx.useHttp2` pass boolean value to enable/disable HTTP/2 in TLS termination in the ingress controller
nginx.ingress.kubernetes.io/use-http2: "true"

# `ingress.nginx.upstreamKeepalive` pass value to upstream keepalive
nginx.ingress.kubernetes.io/upstream-keepalive-connections: "10000"
nginx.ingress.kubernetes.io/upstream-keepalive-time: "1h"
nginx.ingress.kubernetes.io/upstream-keepalive-request: "10000"

Motivation and Context

TLS termination in the ingress controller, HTTP/2, and related troubleshooting

In case the Selenium Grid is deployed with the Ingress controller in front, and the Ingress controller has configured the secure connection with approach SSL termination to terminate the TLS connection, the backend components (mostly Hub/Router to process the request and return to the client) will receive the incoming in plain HTTP. In a few confirmations (also referred to ChatGPT)

When TLS termination is performed by an ingress controller, HTTP/2 is typically enabled by default. This is because many ingress controllers are designed to support modern web protocols to ensure better performance and efficiency. For example, popular ingress controllers like NGINX and HAProxy enable HTTP/2 by default when handling HTTPS traffic.

At that time, the Selenium Grid server returns the response in HTTP/1.1. However, this mismatch is not expected to cause any problems. Selenium Grid is using JDKHttpClient to communicate between components since the following OpenJDK docs mentioned that

The Java HTTP Client supports both HTTP/1.1 and HTTP/2. By default, the client will send requests using HTTP/2. Requests sent to servers that do not yet support HTTP/2 will automatically be downgraded to HTTP/1.1

A few reports mention the error java.io.IOException: HTTP/1.1 header parser received no bytes, java.io.IOException: /: GOAWAY received, or a timed-out issue with a stack trace containing jdk.internal.net.http.Http2Connection, or Http2ClientImpl when creating a RemoteWebDriver session.

What could be the issue around this? It could be due to different JDK versions used. Since JDK20, the default keepalive timeout has been adjusted; see docs on jdk.httpclient.keepalive.timeout (default to 30). Or it could be jdk.httpclient.maxstreams (default to 100) if Grid serves many client requests at the same time, it could reach the maximum stream limit.

In some scenarios, the issue might be resolved by setting ClientConfig with HTTP/1.1 when creating RemoteWebDriver. For example, in Java binding you can try this:

ClientConfig config = ClientConfig.defaultConfig().baseUrl(seleniumGridUrl)
                      .readTimeout(300)
                      .version(HttpClient.Version.HTTP_1_1.name());

driver = RemoteWebDriver.builder().oneOf(new ChromeOptions())
         .config(config).build();

With the workaround set http version via ClientConfig also there was a point mentioned that we can understand something like HTTP/1.1 header parser received no bytes, or GOAWAY is an IOException thrown by client HTTP/2, and when switching client to HTTP/1.1, it could go to a situation that would continue to get "random" IOExceptions with a different message from the server.

For example, in this case the issue could be due to HTTP/2 configs on Ingress controller. Refer to usage of Annotations ConfigMap settings in NGINX Ingress Controller.

  • use-http2 (default is true) - enable or disable HTTP/2 support in secure connection.
  • upstream-keepalive-timeout (default to 60) - timeout during which an idle keepalive connection to an upstream server will stay open.
  • upstream-keepalive-connections (default to 320) - maximum number of idle keepalive connections to upstream servers. When this number is exceeded, the least recently used connections are closed

The above notes are motivated by SeleniumHQ/selenium#14258. Kindly let us know if you have further troubleshooting on this.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

PR Type

Enhancement, Documentation


Description

  • Added new settings to control HTTP/2 and upstream keepalive in ingress configuration.
  • Updated Helm templates to include new annotations for HTTP/2 and upstream keepalive.
  • Enhanced test scripts to conditionally disable HTTP/2.
  • Updated documentation to explain new settings and provide troubleshooting tips for TLS termination and HTTP/2 issues.
  • Modified default Java options to include HTTP client settings for better compatibility.

Changes walkthrough 📝

Relevant files
Enhancement
chart_test.sh
Add condition to disable HTTP/2 in ingress tests                 

tests/charts/make/chart_test.sh

  • Added condition to disable HTTP/2 in ingress.
  • Updated Helm command to include useHttp2 setting.
  • +6/-0     
    _helpers.tpl
    Add annotations for HTTP/2 and upstream keepalive settings

    charts/selenium-grid/templates/_helpers.tpl

  • Added annotations for upstream keepalive settings.
  • Added annotation for enabling/disabling HTTP/2.
  • Updated proxy timeout annotations.
  • +15/-2   
    Makefile
    Update test target to disable HTTP/2 in secure ingress     

    Makefile

    • Updated test target to disable HTTP/2 in secure ingress.
    +1/-1     
    values.yaml
    Add HTTP/2 and upstream keepalive settings to values.yaml

    charts/selenium-grid/values.yaml

  • Added useHttp2 setting for ingress.
  • Added upstreamKeepalive settings for ingress.
  • Updated SE_JAVA_OPTS with HTTP client settings.
  • +8/-1     
    Documentation
    README.md
    Document HTTP/2 and upstream keepalive settings                   

    charts/selenium-grid/README.md

  • Added documentation for HTTP/2 and upstream keepalive settings.
  • Added troubleshooting section for TLS termination and HTTP/2.
  • +47/-2   

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    Copy link

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Script Logic
    The script conditionally adds the --set ingress.nginx.useHttp2=false based on the INGRESS_DISABLE_USE_HTTP2 environment variable. Ensure that this logic aligns with the intended behavior and that there are no side effects or conflicts with other settings.

    Annotation Consistency
    The new annotations for ingress configurations (nginx.ingress.kubernetes.io/proxy-stream-timeout, nginx.ingress.kubernetes.io/upstream-keepalive-timeout, nginx.ingress.kubernetes.io/ssl-session-timeout) are added. It's crucial to ensure these annotations are supported by the ingress controller version used and are consistent with the rest of the ingress configuration.

    Configuration Defaults
    New ingress settings (useHttp2, upstreamKeepalive) are introduced. Verify that the default values provided (useHttp2: true, upstreamKeepalive: {connections: 10000, time: 1h, requests: 10000}) are sensible defaults and consider the impact of these high values on system resources and performance.

    Copy link

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Close the HELM_COMMAND_SET_IMAGES variable properly to prevent syntax errors

    Ensure that the HELM_COMMAND_SET_IMAGES variable is properly closed with a quote at
    the end of the conditional block to maintain consistency and prevent syntax errors.

    tests/charts/make/chart_test.sh [209-210]

     HELM_COMMAND_SET_IMAGES="${HELM_COMMAND_SET_IMAGES} \
    ---set ingress.nginx.useHttp2=false \
    -"
    +--set ingress.nginx.useHttp2=false"
     
    • Apply this suggestion
    Suggestion importance[1-10]: 10

    Why: Ensuring that the HELM_COMMAND_SET_IMAGES variable is properly closed prevents syntax errors, which is crucial for the script to run correctly.

    10
    Enhancement
    Add default values for upstreamKeepalive settings to enhance configuration robustness

    Consider adding a default value for upstreamKeepalive settings to ensure that there
    are fallback values if not explicitly set, enhancing the robustness of the
    configuration.

    charts/selenium-grid/values.yaml [130-132]

     upstreamKeepalive:
    -  connections: 10000
    -  time: 1h
    -  requests: 10000
    +  connections: {{ .Values.upstreamKeepalive.connections | default 1000 }}
    +  time: {{ .Values.upstreamKeepalive.time | default "30m" }}
    +  requests: {{ .Values.upstreamKeepalive.requests | default 1000 }}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: Adding default values for upstreamKeepalive settings ensures that there are fallback values, enhancing the robustness and reliability of the configuration.

    9
    Best practice
    Add a conditional check to ensure the useHttp2 value is defined

    Consider adding a conditional check to ensure that the useHttp2 value is defined
    before using it in the template. This will prevent rendering issues if the value is
    not set in the values file.

    charts/selenium-grid/templates/_helpers.tpl [133]

    +{{- if .useHttp2 }}
     nginx.ingress.kubernetes.io/use-http2: {{ .useHttp2 | quote }}
    +{{- end }}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Adding a conditional check for useHttp2 ensures that the template does not fail if the value is not set, which is a good practice for robustness.

    8
    Validate upstreamKeepalive settings to prevent resource exhaustion

    It's recommended to validate the upstreamKeepalive settings to ensure they are not
    set to excessively high values, which could lead to resource exhaustion.

    charts/selenium-grid/templates/_helpers.tpl [144-147]

    -nginx.ingress.kubernetes.io/upstream-keepalive-connections: {{ . | quote }}
    -nginx.ingress.kubernetes.io/upstream-keepalive-time: {{ . | quote }}
    -nginx.ingress.kubernetes.io/upstream-keepalive-request: {{ . | quote }}
    +nginx.ingress.kubernetes.io/upstream-keepalive-connections: {{ min 1000 . | quote }}
    +nginx.ingress.kubernetes.io/upstream-keepalive-time: {{ min "1h" . | quote }}
    +nginx.ingress.kubernetes.io/upstream-keepalive-request: {{ min 1000 . | quote }}
     
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Validating upstreamKeepalive settings to prevent excessively high values can help avoid potential resource exhaustion issues, improving system stability.

    7

    @VietND96 VietND96 merged commit 395a401 into SeleniumHQ:trunk Jul 27, 2024
    15 checks passed
    @VietND96 VietND96 added this to the 4.23.0 milestone Jul 27, 2024
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant