Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Status port using a socket #3684

Merged
merged 2 commits into from
Feb 6, 2019
Merged

Conversation

aledbf
Copy link
Member

@aledbf aledbf commented Jan 21, 2019

What this PR does / why we need it:

Which issue this PR fixes: fixes #3457

The new status server is configured like this:

	# default server, used for NGINX healthcheck and access to nginx stats
	server {
		listen unix:/tmp/nginx-status-server.sock;
		set $proxy_upstream_name "internal";
		
		keepalive_timeout 0;
		gzip off;
		
		access_log off;
		
		location /healthz {
			return 200;
		}
		
		location /is-dynamic-lb-initialized {
			content_by_lua_block {
				local configuration = require("configuration")
				local backend_data = configuration.get_backends_data()
				if not backend_data then
				ngx.exit(ngx.HTTP_INTERNAL_SERVER_ERROR)
				return
				end
				
				ngx.say("OK")
				ngx.exit(ngx.HTTP_OK)
			}
		}
		
		location /nginx_status {
			stub_status on;
		}
		
		location /configuration {
			# this should be equals to configuration_data dict
			client_max_body_size                    10m;
			client_body_buffer_size                 10m;
			proxy_buffering                         off;
			
			content_by_lua_block {
				configuration.call()
			}
		}
		
		location / {
			content_by_lua_block {
				ngx.exit(ngx.HTTP_NOT_FOUND)
			}
		}
	}
}

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 21, 2019
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2019
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 22, 2019
@aledbf aledbf force-pushed the health branch 2 times, most recently from 107afcc to d637a05 Compare January 25, 2019 14:29
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 25, 2019
@aledbf aledbf changed the title WIP: Add log for health checks errors WIP: Replace Status port using a socket Jan 25, 2019
@aledbf aledbf force-pushed the health branch 3 times, most recently from ff6027f to a95213d Compare January 28, 2019 19:50
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 28, 2019
@ElvinEfendi
Copy link
Member

make sense 👍

But I don't see how it's going to fix #3457. That issue seems to be result of Nginx not having enough resources - so this PR will help a bit since it reduces overhead but when the server is under more load the same thing will happen.

@aledbf
Copy link
Member Author

aledbf commented Jan 30, 2019

But I don't see how it's going to fix #3457.

Please check comments after #3457 (comment)

For some reason, the health checks start failing without valid reason. If you enter in the container and run netstat -lnp nginx is listening in all the interfaces but the go client cannot reach 127.0.0.1:18080.
Replacing the port for a unix socket removes this issue, removes a port from the ingress controller and also encapsulates the server.

@ElvinEfendi
Copy link
Member

@aledbf I support switching to using UNIX socket. And this might indeed fix the issue but it also might not. It'd be nice to understand

nginx is listening in all the interfaces but the go client cannot reach 127.0.0.1:18080

--

Again, this PR is a right direction!

@aledbf
Copy link
Member Author

aledbf commented Jan 30, 2019

It'd be nice to understand
nginx is listening in all the interfaces but the go client cannot reach 127.0.0.1:18080

Agree. The issue here is that I cannot reproduce this issue and why I am waiting for feedback from #3457.

@aledbf aledbf force-pushed the health branch 3 times, most recently from 9fe173e to 897439d Compare February 6, 2019 16:27
@aledbf aledbf changed the title WIP: Replace Status port using a socket Replace Status port using a socket Feb 6, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2019
@@ -161,6 +160,7 @@ Feature backed by OpenResty Lua libraries. Requires that OCSP stapling is not en
)

flags.MarkDeprecated("sort-backends", "Feature removed because of the lua load balancer that removed the need of reloads for change in endpoints")
flags.MarkDeprecated("status-port", `Status port is a unix socket.`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also applies to sort-backends, but why are we marking these flags deprecated when we are removing them? Aren't they either deprecated or removed, but not both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid breaking existing deployments using the flags. We can remove the flags in 0.24

Copy link
Contributor

@alexkursell alexkursell Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When 0.22 was released, using --sort-backends generated an error (unknown flag: --sort-backends) because the flag was removed. MarkDeprecated doesn't do anything if the flag is removed. See also @ElvinEfendi comment here: #3655 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

ControllerPodsCount: pcfg.ControllerPodsCount,
})
if err != nil {
return err
}

if statusCode != http.StatusCreated {
return fmt.Errorf("unexpected error code: %d", statusCode)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why did we not do this before. Also where are you not doing the same above for backends?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check added to backends

@aledbf aledbf force-pushed the health branch 2 times, most recently from 18bb412 to 15f56c6 Compare February 6, 2019 17:56

// NewGetStatusRequest creates a new GET request to the internal NGINX status server
func NewGetStatusRequest(path string) (int, []byte, error) {
client := buildUnixSocketClient()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you build the client for every request? why not build it once and use the same client always?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

// StreamSocket defines the location of the unix socket used by NGINX for the NGINX stream configuration socket
var StreamSocket = "/tmp/ingress-stream.sock"

var nginxUnixLocation = "nginx-status"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: statusLocation probably makes more sense since this is an alias to StatusSocket

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

listen {{ $all.ListenPorts.Status }} default_server {{ if $all.Cfg.ReusePort }}reuseport{{ end }} backlog={{ $all.BacklogSize }};
{{ if $IsIPV6Enabled }}listen [::]:{{ $all.ListenPorts.Status }} default_server {{ if $all.Cfg.ReusePort }}reuseport{{ end }} backlog={{ $all.BacklogSize }};{{ end }}
set $proxy_upstream_name "-";
listen unix:/tmp/nginx-status-server.sock;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

{{ template "CUSTOM_ERRORS" (buildCustomErrorDeps $all.ProxySetHeaders $cfg.CustomHTTPErrors $all.EnableMetrics) }}
Copy link
Member

@ElvinEfendi ElvinEfendi Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you deleting this now? Why did we even have this here, does not make sense

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you deleting this now?

Because before this change we publish the server to port 18080.

Why did we even have this here, does not make sense

Keep in mind we had more content before 0.16 (like vts pages)

@ElvinEfendi
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 6, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 6, 2019
@ElvinEfendi
Copy link
Member

Please don't modify history during review process, makes it hard to review new changes :)

@ElvinEfendi
Copy link
Member

/lgtm

I guess the change was to avoid PID hardcoding.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 6, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aledbf, ElvinEfendi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 17e788b into kubernetes:master Feb 6, 2019
@aledbf
Copy link
Member Author

aledbf commented Feb 6, 2019

Please don't modify history during review process, makes it hard to review new changes :)

Sorry about that but I didn't want a new commit just to fix a template in the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regular crash of ngnix ingress controller pods after upgrade to 0.20.0 image
4 participants