Extended.[k8s.io] Services should be able to create a functioning NodePort service #13108

smarterclayton · 2017-02-24T18:18:36Z

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_gce/965/testReport/junit/(root)/Extended/_k8s_io__Services_should_be_able_to_create_a_functioning_NodePort_service/

Never seen this flake before (in roughly 600 runs), looks like the port never became available in 5 minutes. This is openshift default SDN setup

/tmp/openshift/tito/rpmbuild-originz5K5EQ/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/service.go:485
Feb 24 06:46:56.965: expected node port 30633 to be in use, stdout: . err: error running &{/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl [kubectl --server=https://internal-api.prtest965.origin-ci-int-gce.dev.rhcloud.com:8443 --kubeconfig=/tmp/cluster-admin.kubeconfig exec --namespace=e2e-tests-services-wd35q hostexec -- /bin/sh -c for i in $(seq 1 300); do if ss -ant46 'sport = :30633' | grep ^LISTEN; then exit 0; fi; sleep 1; done; exit 1] []  <nil>   [] <nil> 0xc421bed7d0 exit status 1 <nil> <nil> true [0xc420a1a628 0xc420a1a640 0xc420a1a658] [0xc420a1a628 0xc420a1a640 0xc420a1a658] [0xc420a1a638 0xc420a1a650] [0x986eb0 0x986eb0] 0xc421bdaea0 <nil>}:
Command stdout:

stderr:

error:
exit status 1

/tmp/openshift/tito/rpmbuild-originz5K5EQ/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/service.go:483

Very weird

STEP: Building a namespace api object
Feb 24 06:41:33.468: INFO: About to run a Kube e2e test, ensuring namespace is privileged
STEP: Waiting for a default service account to be provisioned in namespace
[BeforeEach] [k8s.io] Services
  /tmp/openshift/tito/rpmbuild-originz5K5EQ/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/service.go:90
[It] should be able to create a functioning NodePort service
  /tmp/openshift/tito/rpmbuild-originz5K5EQ/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/service.go:485
STEP: creating service nodeport-test with type=NodePort in namespace e2e-tests-services-wd35q
STEP: creating pod to be part of service nodeport-test
Feb 24 06:41:34.470: INFO: Waiting up to 2m0s for 1 pods to be created
Feb 24 06:41:34.522: INFO: Found 0/1 pods - will retry
Feb 24 06:41:36.562: INFO: Found all 1 pods
Feb 24 06:41:36.562: INFO: Waiting up to 2m0s for 1 pods to be running and ready: [nodeport-test-dd39k]
Feb 24 06:41:36.562: INFO: Waiting up to 2m0s for pod nodeport-test-dd39k status to be running and ready
Feb 24 06:41:36.595: INFO: Waiting for pod nodeport-test-dd39k in namespace 'e2e-tests-services-wd35q' status to be 'running and ready'(found phase: "Pending", readiness: false) (32.700389ms elapsed)
Feb 24 06:41:38.682: INFO: Waiting for pod nodeport-test-dd39k in namespace 'e2e-tests-services-wd35q' status to be 'running and ready'(found phase: "Pending", readiness: false) (2.11980505s elapsed)
Feb 24 06:41:40.716: INFO: Wanted all 1 pods to be running and ready. Result: true. Pods: [nodeport-test-dd39k]
STEP: hitting the pod through the service's NodePort
Feb 24 06:41:40.716: INFO: Testing HTTP reachability of http://104.154.147.57:30633/echo?msg=hello
Feb 24 06:41:45.716: INFO: Got error testing for reachability of http://104.154.147.57:30633/echo?msg=hello: Get http://104.154.147.57:30633/echo?msg=hello: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Feb 24 06:41:47.717: INFO: Testing HTTP reachability of http://104.154.147.57:30633/echo?msg=hello
STEP: verifying the node port is locked
Feb 24 06:41:51.599: INFO: Running '/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl --server=https://internal-api.prtest965.origin-ci-int-gce.dev.rhcloud.com:8443 --kubeconfig=/tmp/cluster-admin.kubeconfig exec --namespace=e2e-tests-services-wd35q hostexec -- /bin/sh -c for i in $(seq 1 300); do if ss -ant46 'sport = :30633' | grep ^LISTEN; then exit 0; fi; sleep 1; done; exit 1'
Feb 24 06:46:56.965: INFO: rc: 127
Feb 24 06:46:56.965: INFO: expected node port 30633 to be in use, stdout: . err: error running &{/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl [kubectl --server=https://internal-api.prtest965.origin-ci-int-gce.dev.rhcloud.com:8443 --kubeconfig=/tmp/cluster-admin.kubeconfig exec --namespace=e2e-tests-services-wd35q hostexec -- /bin/sh -c for i in $(seq 1 300); do if ss -ant46 'sport = :30633' | grep ^LISTEN; then exit 0; fi; sleep 1; done; exit 1] []  <nil>   [] <nil> 0xc421bed7d0 exit status 1 <nil> <nil> true [0xc420a1a628 0xc420a1a640 0xc420a1a658] [0xc420a1a628 0xc420a1a640 0xc420a1a658] [0xc420a1a638 0xc420a1a650] [0x986eb0 0x986eb0] 0xc421bdaea0 <nil>}:
Command stdout:

stderr:

error:
exit status 1

[AfterEach] [k8s.io] Services
  /tmp/openshift/tito/rpmbuild-originz5K5EQ/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:142
STEP: Collecting events from namespace "e2e-tests-services-wd35q".
STEP: Found 11 events.
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:34 -0500 EST - event for nodeport-test: {replication-controller } SuccessfulCreate: Created pod: nodeport-test-dd39k
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:34 -0500 EST - event for nodeport-test-dd39k: {default-scheduler } Scheduled: Successfully assigned nodeport-test-dd39k to ci-prtest965-ig-n-fsv9
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:36 -0500 EST - event for nodeport-test-dd39k: {kubelet ci-prtest965-ig-n-fsv9} Created: Created container with docker id 25b24e578d03; Security:[seccomp=unconfined]
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:36 -0500 EST - event for nodeport-test-dd39k: {kubelet ci-prtest965-ig-n-fsv9} Started: Started container with docker id 25b24e578d03
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:36 -0500 EST - event for nodeport-test-dd39k: {kubelet ci-prtest965-ig-n-fsv9} Pulled: Container image "gcr.io/google_containers/netexec:1.7" already present on machine
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:47 -0500 EST - event for hostexec: {default-scheduler } Scheduled: Successfully assigned hostexec to ci-prtest965-ig-n-fsv9
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:49 -0500 EST - event for hostexec: {kubelet ci-prtest965-ig-n-fsv9} Pulled: Container image "gcr.io/google_containers/hostexec:1.2" already present on machine
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:50 -0500 EST - event for hostexec: {kubelet ci-prtest965-ig-n-fsv9} Created: Created container with docker id 86e8055fe394; Security:[seccomp=unconfined]
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:50 -0500 EST - event for hostexec: {kubelet ci-prtest965-ig-n-fsv9} Started: Started container with docker id 86e8055fe394
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:41:59 -0500 EST - event for nodeport-test-dd39k: {kubelet ci-prtest965-ig-n-fsv9} Unhealthy: Readiness probe failed: Get http://172.16.4.40:80/hostName: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Feb 24 06:46:56.997: INFO: At 2017-02-24 06:45:10 -0500 EST - event for nodeport-test-dd39k: {kubelet ci-prtest965-ig-n-fsv9} Unhealthy: Readiness probe failed: Get http://172.16.4.40:80/hostName: dial tcp 172.16.4.40:80: getsockopt: no route to host
Feb 24 06:46:57.090: INFO: POD

The text was updated successfully, but these errors were encountered:

smarterclayton · 2017-02-24T18:19:32Z

This is not #11016 because 2m < 5m.

smarterclayton · 2017-02-24T18:24:05Z

P1 until triaged

bparees · 2017-03-13T06:54:47Z

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_conformance_gce/148/

smarterclayton · 2017-03-13T17:59:55Z

Link is broken?

…

On Mon, Mar 13, 2017 at 2:54 AM, Ben Parees ***@***.***> wrote: https://ci.openshift.redhat.com/jenkins/job/test_pull_ request_origin_ehttps://ci.openshift.redhat.com/jenkins/ job/test_pull_request_origin_extended_conformance_gce/148/ testReport/junit/(root)/Extended/_k8s_io__Services_ should_be_able_to_create_a_functioning_NodePort_service/ xtended_conformance_gce/148/testReport/junit/(root)/ Extended/_k8s_io__Services_should_be_able_to_create_a_ functioning_NodePort_service/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#13108 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p357jZxPsfJwHLnah-hjIss11Etkks5rlOg5gaJpZM4MLhGp> .

bparees · 2017-03-13T22:50:16Z

link upated, sorry.

bparees · 2017-06-14T03:10:42Z

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2199/

bparees · 2017-07-23T13:58:49Z

https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/1354 (3.6 branch)

danwinship · 2017-10-31T15:21:24Z

Hm... found an instance of this flake in a recent test_branch_origin_extended_conformance_gce:

STEP: verifying the node port is locked
Oct 25 16:44:20.632: INFO: Running '/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl --server=https://internal-api.prtest-3c63a87-1798.origin-ci-int-gce.dev.rhcloud.com:8443 --kubeconfig=/tmp/cluster-admin.kubeconfig exec --namespace=e2e-tests-services-l7272 hostexec -- /bin/sh -c for i in $(seq 1 300); do if ss -ant46 'sport = :31798' | grep ^LISTEN; then exit 0; fi; sleep 1; done; ss -ant46; ip a; exit 1'
Oct 25 16:44:21.057: INFO: rc: 127
Oct 25 16:44:21.057: INFO: expected node port 31798 to be in use, stdout: . err: error running &{/data/src/github.com/openshift/origin/_output/local/bin/linux/amd64/kubectl [kubectl --server=https://internal-api.prtest-3c63a87-1798.origin-ci-int-gce.dev.rhcloud.com:8443 --kubeconfig=/tmp/cluster-admin.kubeconfig exec --namespace=e2e-tests-services-l7272 hostexec -- /bin/sh -c for i in $(seq 1 300); do if ss -ant46 'sport = :31798' | grep ^LISTEN; then exit 0; fi; sleep 1; done; ss -ant46; ip a; exit 1] []  <nil>  Error from server: error dialing backend: dial tcp 10.142.0.4:10250: getsockopt: connection refused [] <nil> 0xc420b951d0 exit status 1 <nil> <nil> true [0xc42000f938 0xc42000f950 0xc42000f968] [0xc42000f938 0xc42000f950 0xc42000f968] [0xc42000f948 0xc42000f960] [0x17c27d0 0x17c27d0] 0xc42134e240 <nil>}:

The important bit there being dial tcp 10.142.0.4:10250: getsockopt: connection refused meaning that it failed to run kubectl exec, not that the "verifying the node port is locked" failed. I'm not sure if that's what was failing before (and the logs are just clearer about it now) or if this is a different problem.

Automatic merge from submit-queue (batch tested with PRs 18376, 18355). Move pod-namespace calls out of process As discussed in #15991, we need to move all operations in the pod's network namespace out of process, due to a golang issue that allows setns() calls in a locked thread to leak into other threads, causing random lossage as operations intended for the main network namespace end up running in other namespaces instead. (This is fixed in golang 1.10 but we need a fix before then.) Fixes #15991 Fixes #14385 Fixes #13108 Fixes #18317

smarterclayton added component/networking kind/test-flake Categorizes issue or PR as related to test flakes. labels Feb 24, 2017

smarterclayton added the priority/P1 label Feb 24, 2017

smarterclayton added this to the 1.5.0 milestone Feb 24, 2017

bparees assigned knobunc Mar 3, 2017

smarterclayton modified the milestones: 1.5.0, 1.6.0 Mar 12, 2017

bparees changed the title ~~flake: NodePort test failed, unable to listen~~ Extended.[k8s.io] Services should be able to create a functioning NodePort service Mar 13, 2017

gabemontero mentioned this issue Mar 30, 2017

fix sync plugin PR tester #13577

Closed

mtnbikenc mentioned this issue Apr 12, 2017

Refactor initialize groups tasks used for byo openshift/openshift-ansible#3895

Merged

abutcher mentioned this issue Apr 25, 2017

Replace original router cert variable name openshift/openshift-ansible#3988

Merged

bparees mentioned this issue Jun 14, 2017

don't start template informer unless templateservicebroker is enabled #14579

Merged

juanvallejo mentioned this issue Jun 16, 2017

update manage-node to support multiple output formats #14655

Merged

bparees mentioned this issue Jun 23, 2017

enable podpresets with service catalog #14814

Merged

sttts mentioned this issue Jul 12, 2017

Add integration test for preferred GVs in discovery #15160

Merged

bparees mentioned this issue Jul 23, 2017

use openshift service catalog in cluster up (backport to 3.6) #15366

Merged

danwinship mentioned this issue Aug 15, 2017

Fix NodePort test #15766

Merged

smarterclayton modified the milestones: 3.6.0, 3.6.x Oct 1, 2017

danwinship assigned danwinship and unassigned knobunc Oct 5, 2017

danwinship mentioned this issue Feb 1, 2018

Move pod-namespace calls out of process #18355

Merged

openshift-merge-robot closed this as completed in #18355 Feb 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended.[k8s.io] Services should be able to create a functioning NodePort service #13108

Extended.[k8s.io] Services should be able to create a functioning NodePort service #13108

smarterclayton commented Feb 24, 2017

smarterclayton commented Feb 24, 2017

smarterclayton commented Feb 24, 2017

bparees commented Mar 13, 2017 •

edited

Loading

smarterclayton commented Mar 13, 2017 via email

bparees commented Mar 13, 2017

bparees commented Jun 14, 2017

bparees commented Jul 23, 2017

danwinship commented Oct 31, 2017

Extended.[k8s.io] Services should be able to create a functioning NodePort service #13108

Extended.[k8s.io] Services should be able to create a functioning NodePort service #13108

Comments

smarterclayton commented Feb 24, 2017

smarterclayton commented Feb 24, 2017

smarterclayton commented Feb 24, 2017

bparees commented Mar 13, 2017 • edited Loading

smarterclayton commented Mar 13, 2017 via email

bparees commented Mar 13, 2017

bparees commented Jun 14, 2017

bparees commented Jul 23, 2017

danwinship commented Oct 31, 2017

bparees commented Mar 13, 2017 •

edited

Loading