Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Timeout For TLS Passthrough #2504

Conversation

jay-rob
Copy link
Contributor

@jay-rob jay-rob commented May 13, 2018

What this PR does / why we need it:

Requests to the HTTPS port will hang indefintely under the following conditions:

  1. --tls-passthrough is enabled
  2. proxy protocol is enabled in the configmap
  3. The incoming request does not include the PROXY header and just sends a request to open a TCP connection. This can occur if you are using, say an AWS NLB to bring traffic to the controller and do not have proxy protocol enabled because the current proxy-protocol library used by this ingress controller does not support v2. https://github.com/kubernetes/ingress-nginx/blob/master/vendor/github.com/armon/go-proxyproto/README.md

Since enabling proxy protocol support is a global setting and cannot be changed for HTTP and HTTPs independently, this causes users of the NLB to need to disable proxy protocol for HTTPS to avoid errors on header processing if they have tls-passthrough enabled.

Which issue this PR fixes:

The go-proxyproto library specifically recommends settings a deadline if you call before call RemoteAddr() before calling Read() or it could cause the thread to sit in i/o wait if the proxy headers never come through. This can happen just due to network disruptions when a client is sending a request without the proxy headers and the rest of the request never makes it through.

This PR adds a configurable timeout/deadline value, defaulting 5 seconds to prevent this from occurring. I should also note that the /healthz does not currently check for the liveness of the proxy pass process, so when this bug does occur, it will not self heal just with the kube liveness probe.

Special notes for your reviewer:

The bug is reproduced by running the ingress controller in master with the following settings

cli:
--enable-tls-passthrough

configMap:
use-proxy-protocol: true

kubectl port-forward directly to a/the pod on port 443. Curls against the pod should return the default backend. To produce the bug. Telnet to the pod on 443 and do not put additional input. If you curl on a seperate terminal, might take a few attempts, you will see that the curl request will hang until clientside timeout. This is resolved by closing the telnet process or restarting the pod.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 13, 2018
@codecov-io
Copy link

codecov-io commented May 13, 2018

Codecov Report

Merging #2504 into master will increase coverage by 0.09%.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2504      +/-   ##
==========================================
+ Coverage   40.58%   40.68%   +0.09%     
==========================================
  Files          75       75              
  Lines        5098     5108      +10     
==========================================
+ Hits         2069     2078       +9     
- Misses       2750     2751       +1     
  Partials      279      279
Impacted Files Coverage Δ
internal/ingress/controller/nginx.go 11.24% <0%> (-0.03%) ⬇️
internal/ingress/controller/config/config.go 98.27% <100%> (+0.03%) ⬆️
internal/ingress/controller/template/configmap.go 77.98% <100%> (+1.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af5c379...d637a9b. Read the comment docs.

@jay-rob
Copy link
Contributor Author

jay-rob commented May 14, 2018

/assign @aledbf

Copy link
Contributor

@antoineco antoineco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One extra suggestion: the first line of the commit message should be shorter and self-explanatory. The verbose description can stay on the following lines of course.

@@ -346,6 +346,11 @@ type Configuration struct {
// https://www.nginx.com/resources/admin-guide/proxy-protocol/
UseProxyProtocol bool `json:"use-proxy-protocol,omitempty"`

// When use-proxy-protocol is enabled, sets the maximum time the connection handler will wait
// to receieve proxy headers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"receive" (typo)

@@ -722,7 +723,10 @@ func (n *NGINXController) setupSSLProxy() {
glog.Fatalf("%v", err)
}

proxyList := &proxyproto.Listener{Listener: listener}
//Set a deadline to prevent the connection from hanging/blocking if a non proxy-proto connection is received.
proxyDeadlineDuration, _ := time.ParseDuration(cfg.ProxyProtocolHeaderTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing the user input may fail, so the error should be handled.

@@ -722,7 +723,10 @@ func (n *NGINXController) setupSSLProxy() {
glog.Fatalf("%v", err)
}

proxyList := &proxyproto.Listener{Listener: listener}
//Set a deadline to prevent the connection from hanging/blocking if a non proxy-proto connection is received.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after the comment mark.

@jay-rob
Copy link
Contributor Author

jay-rob commented May 14, 2018

Lets say warn and use the default value instead?

@antoineco
Copy link
Contributor

That's also fine yes. Can you also add a test for this?

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 20, 2018
@jay-rob
Copy link
Contributor Author

jay-rob commented May 20, 2018

I went ahead and moved configmap entry validation logic to the configmap template processors to stay consistent with where I see similar validations taking place. Also added a unit test to validate the input.

I think it would be great to have an e2e test, but that really means we need a e2e for TLS passthrough in general. I can do that, but I think it goes beyond the scope of this PR and it could be added in a separate PR.

@jay-rob
Copy link
Contributor Author

jay-rob commented May 23, 2018

Anything else needed on my part for this to move forward?

@antoineco
Copy link
Contributor

Sorry for the delay, I will look into these changes today. Thanks for addressing my comments!

Copy link
Contributor

@antoineco antoineco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments on my side. I think using a time.Duration type instead of a string would make a lot of sense for this particular scenario.

@@ -125,6 +127,17 @@ func ReadConfig(src map[string]string) config.Configuration {
}
}

//Verify that the configured timeout is parsable as a duration. if not, set the default value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after //

@@ -722,7 +723,15 @@ func (n *NGINXController) setupSSLProxy() {
glog.Fatalf("%v", err)
}

proxyList := &proxyproto.Listener{Listener: listener}
// Set a deadline to prevent the connection from hanging/blocking if a non proxy-proto connection is received.
proxyDeadlineDuration, err := time.ParseDuration(cfg.ProxyProtocolHeaderTimeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making cfg.ProxyProtocolHeaderTimeout an actual time.Duration and benefit from the type safety? This way you would only have to care about errors when a user input is submitted.

@@ -31,6 +31,24 @@ func TestFilterErrors(t *testing.T) {
}
}

func TestProxytTimeoutParsing(t *testing.T) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leading newline.

if cfg.ProxyProtocolHeaderTimeout != "5s" {
t.Errorf("expected 5s but got %v", cfg.ProxyProtocolHeaderTimeout)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing newline.

@@ -31,6 +31,24 @@ func TestFilterErrors(t *testing.T) {
}
}

func TestProxytTimeoutParsing(t *testing.T) {

cfg := ReadConfig(map[string]string{"proxy-protocol-header-timeout": "35s"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use a map here in order to remain as consistent as possible with other tests. E.g.

	testCases := map[string]struct {
		input  string
		expect string // or time.Duration, see previous comment
	}{
		"valid duration":   {"35s", "35s" },  // or 35*time.Second
		"invalid duration": {"3zxs", "5s" },  // or 5*time.Second
	}

	for n, tc := range testCases {
		// ...
	}

@jay-rob
Copy link
Contributor Author

jay-rob commented May 30, 2018

Thanks for the feedback and guidance @antoineco

I agree that using a Duration is a better approach and have adjusted the config type accordingly. I also think that change to unit testing is much cleaner. Both changes have been committed along with the styling fixes.

Copy link
Contributor

@antoineco antoineco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jrthrawny, it looks good to me now! Just one suggestion to address regarding the use of time.Duration and I guess we're good to go.

@@ -528,6 +534,10 @@ func NewDefault() Configuration {
defIPCIDR = append(defIPCIDR, "0.0.0.0/0")
defNginxStatusIpv4Whitelist = append(defNginxStatusIpv4Whitelist, "127.0.0.1")
defNginxStatusIpv6Whitelist = append(defNginxStatusIpv6Whitelist, "::1")
defProxyDeadlineDuration, err := time.ParseDuration("5s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could assign the variable to a time.Duration directly and remove any possibility of failing at run time due to an incorrect input. eg. time.Duration(5000000000) or (more readable) time.Duration(5) * time.Second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This should have been my initial approach

func TestProxytTimeoutParsing(t *testing.T) {
testCases := map[string]struct {
input string
expect float64 // duration in seconds
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion, use time.Duration instead of assuming we use seconds and set the expect field to something like time.Duration(5) * time.Second in your test cases. This way a test case like { "100ms", time.Duration(100) * time.Millisecond } is both valid and readable.

@aledbf
Copy link
Member

aledbf commented May 30, 2018

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2018
@aledbf
Copy link
Member

aledbf commented May 30, 2018

@antoineco if you add lgtm we are ready to go

@aledbf
Copy link
Member

aledbf commented May 30, 2018

@jrthrawny thank you for the PR and apologies for the delay in merging this

@antoineco
Copy link
Contributor

antoineco commented May 30, 2018

Let's squash these 5 commits and use a descriptive commit title like "Configurable Proxy Protocol header timeout for TLS passthrough". The PR title is truncated.

The documentation is also missing in docs/user-guide/nginx-configuration/configmap.md.

@jay-rob jay-rob changed the title * Added a timeout to the TLS passthrough connection handler that prev… Add Timeout For TLS Passthrough Jun 2, 2018
@jay-rob jay-rob force-pushed the proxy-protocol-timeout-for-passthrough-pr branch from 1f10c6c to d637a9b Compare June 4, 2018 01:11
@antoineco
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 4, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, antoineco, jrthrawny

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit fa98236 into kubernetes:master Jun 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants