Investigate using kind for e2e tests on Prow #103

adrcunha · 2018-09-05T23:34:31Z

kind (kubernetes in docker) might make e2e tests faster since there's no need to create an external cluster. However, it needs to be taken into consideration that we have to start a k8s cluster with a known, public version (e.g. 1.11.1).

adrcunha · 2018-11-15T20:05:03Z

/cc @mattmoor @josephburnett FYI

This should help with the question brought today during Joe's presentation about autoscaling

mattmoor · 2018-11-16T01:09:32Z

I'm wondering about the resource requirements we need and it's ability to handle our workloads (e.g. provisioning GCP LoadBalancers).

chaodaiG · 2019-05-15T18:42:21Z

Investigated using KIND in our e2e test flows, by trying to make all e2e tests in knative/serving passing in KIND. After a couple of PRs(the ones associated with this issue), KIND works for most of tests as shown below:

test/e2e-upgrade: 2/2 passed
test/e2e: 17/17 passed
test/scale: 0/2 passed (They would pass if giving more time, like 5 minutes for scale to 10, and 20 minutes for scaling to 50. This is unlikely due to resource limit, as KIND doesn't have a resource limit, and these tests ran on a linux box with 12 cpus and 64Gb of memory)
test/conformance: 56/56 passed (1 test only pass with turning swap off)
test/e2e/build: 3/6 passed (Failed build-pipeline tests)

To make KIND work, the following steps are needed:

Build node image from k8s.io/kubernetes for specific Kubernetes version, as the default Kubernetes version is much ahead of what we test on GKE
Passing cluster name and special KIND kubeconfig file to every e2e tests
Figure out ip address from istio and pass it to e2e tests
Turn “swap” off

Overall, KIND is not ready yet for running our e2e tests, in considering that the scaling test duration is not comparable to GKE clusters, and the potential benefit it could bring us(shorter clusters creation time, from ~3 minutes to ~10-20 seconds). Also, KIND is not in a stable state yet(master branch is currently broken kubernetes-sigs/kind#509), and there is a breaking change in upcoming 0.3 release branch.

chaodaiG · 2019-05-15T18:42:39Z

@adrcunha fyi

BenTheElder · 2019-05-15T18:53:35Z

👋 kind dev here, would like to help if there's interest 🙃

test/scale: 0/2 passed (They would pass if giving more time, like 5 minutes for scale to 10, and 20 minutes for scaling to 50. This is unlikely due to resource limit, as KIND doesn't have a resource limit, and these tests ran on a linux box with 12 cpus and 64Gb of memory)

Do you always need to run scale tests? These are necessarily going to be bound by the hardware it runs on and might improve if run on bigger CI nodes, I'd guess your GKE clusters have more horsepower. Are you running scale tests in presubmit?

test/e2e/build: 3/6 passed (Failed build-pipeline tests)

I'd be curious what those failed tests are doing, if you know :-)

Build node image from k8s.io/kubernetes for specific Kubernetes version, as the default Kubernetes version is much ahead of what we test on GKE

What version do you need? kubernetes-sigs/kind#531

Does knative not test with the current Kubernetes release at all? This seems surprising.

Turn “swap” off

Should only be necessary if you need memory limits on your pods, unfortunately there aren't many options there.

Also, KIND is not in a stable state yet(master branch is currently broken kubernetes-sigs/kind#509), and there is a breaking change in upcoming 0.3 release branch.

To clarify, it is not broken. You must build with go modules or use the makefile.
As for stability, sure, but the Kubernetes master branch is not stable either. Using a particular kind release should be stable (E.G. 0.2.1). Please do not install from HEAD in your CI.

https://github.com/kubernetes-sigs/kind/releases has precompiled binaries to make this easier.

0.3 will require new node images (but we're providing them) and changes some internal details of the node that are never supposed to be guaranteed (like which CRI we use). I wouldn't expect knative testing to depend on these details.

chaodaiG · 2019-05-15T19:22:33Z

Hi @BenTheElder , thanks for prompt response, as well as clarification. Yes we do run scale test in presubmit, and the difference is 240-300 seconds on KIND vs 10-20 seconds on GKE for scaling up to 10, and our e2e tests run in n1-standard-4 machines with 4 nodes, each machine has 4 vcpus and 15 GB mem, this isn't much different from my linux box(12 cpus and 64 GB mem).
In terms of Kubernetes version, we need latest GKE version for consistency reason.

adrcunha · 2019-05-15T19:23:04Z

Do you always need to run scale tests? These are necessarily going to be bound by the hardware it runs on and might improve if run on bigger CI nodes, I'd guess your GKE clusters have more horsepower. Are you running scale tests in presubmit?

At this point we're running them on presubmit, specially because there's active development in the autoscale area.

Does knative not test with the current Kubernetes release at all? This seems surprising.

No, we don't test against the latest k8s version for several reasons, including features and compatibility with other k8s providers (GCP, IBM, Azure, etc). Running against the latest k8s isn't currently a concern.

BenTheElder · 2019-05-15T19:32:12Z

Thanks @chaodaiG @adrcunha that makes sense to me.

Yes we do run scale test in presubmit, and the difference is 240-300 seconds on KIND vs 10-20 seconds on GKE for scaling up to 10, and our e2e tests run in n1-standard-4 machines with 4 nodes, each machine has 4 vcpus and 15 GB mem, this isn't much different from my linux box(12 cpus and 64 GB mem).

So pod start time is the issue? That is good to know and worth looking into on my end...

No, we don't test against the latest k8s version for several reasons, including features and compatibility with other k8s providers (GCP, IBM, Azure, etc).

I see, so if I want to run knative locally I need to match the cloud providers?
Out of curiosity, what version exactly are you testing?

Running against the latest k8s isn't currently a concern.

FWIW It looks like IBM has 1.14.1 with the default at 1.13.6 which is pretty recent (latest from the two most recent Kubernetes branches). AKS has 1.13 GA with 1.14 in preview. GKE supports 1.13.5, and of course customers on all of these clouds use unmanaged clusters to run more recent versions.

kind defaults to 1.14.1 currently but we run CI against all supported Kubernetes versions https://testgrid.k8s.io/conformance-kind

chaodaiG · 2019-05-21T16:52:57Z

Close this investigation for now, will re-evaluate in the future.

fyi @BenTheElder , based on my observation pod start time is the bottleneck. And thanks a lot for the helps and quick turnaround time along my investigation.

Also fyi @tcnghia , we decided not to integrate with KIND at this point. Thank you for helping me sweeping out most of the failed tests.

/close

knative-prow-robot · 2019-05-21T16:52:59Z

@chaodaiG: Closing this issue.

In response to this:

Close this investigation for now, will re-evaluate in the future.

fyi @BenTheElder , based on my observation pod start time is the bottleneck. And thanks a lot for the helps and quick turnaround time along my investigation.

Also fyi @tcnghia , we decided not to integrate with KIND at this point. Thank you for helping me sweeping out most of the failed tests.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

adrcunha · 2019-05-21T17:32:35Z

I see, so if I want to run knative locally I need to match the cloud providers?

For stability, yes. knative.dev website has instructions for several cloud providers, and also minikube.

Out of curiosity, what version exactly are you testing?

CI and presubmit run against latest GKE.

kind defaults to 1.14.1 currently but we run CI against all supported Kubernetes versions

Good to know, as it might be necessary in the future, thanks.

chizhg · 2020-04-02T22:37:02Z

/reopen

KIND can help us get a better understanding of the cluster state, to dump the logs to a folder instead (e.g. kind export logs) under $ARTIFACTS in prow, it could be super helpful for developers to fight with the test flakiness. So we should reevaluate using KIND to run integration tests for Knative, at least for some tests.

knative-prow-robot · 2020-04-02T22:37:04Z

@chizhg: Reopened this issue.

In response to this:

/reopen

KIND can help us get a better understanding of the cluster state, to dump the logs to a folder instead (e.g. kind export logs) under $ARTIFACTS in prow, it could be super helpful for developers to fight with the test flakiness. So we should reevaluate using KIND to run integration tests for Knative, at least for some tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chaodaiG · 2020-04-02T22:38:34Z

You'll also need to consider incorporate it with kntest I believe

mattmoor · 2020-04-02T23:08:28Z

I'd love to see us try this because frankly it's going to be our most reliable way of tracking the set of upstream Kubernetes versions that we want to validate against.

Technically we should be testing against 1.16-1.18 right now, but GKE only has 1.15 in the default channel.

Getting some base set of tests (e.g. conformance) to run against kind across those versions would be pretty valuable IMO.

knative-housekeeping-robot · 2020-07-02T00:01:17Z

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

chizhg · 2020-07-02T00:48:14Z

/remove-lifecycle stale

coryrc · 2020-09-15T23:34:11Z

/assign @mattmoor
apparently :-)

mattmoor · 2020-09-15T23:59:27Z

I'm not looking at this in Prow
/unassign

coryrc · 2020-09-16T00:20:06Z

Ah, glossed over the "on Prow" part, but I figured the "using KIND" was more important :)

chaodaiG · 2020-09-29T13:37:15Z

Enabled in #2427
/close

knative-prow-robot · 2020-09-29T13:37:22Z

@chaodaiG: Closing this issue.

In response to this:

Enabled in #2427
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

adrcunha mentioned this issue Oct 11, 2018

Need to support running tests against multiple Kubernetes version #178

Closed

jessiezcc assigned chaodaiG Feb 15, 2019

jessiezcc added this to the M3 milestone Feb 26, 2019

adrcunha mentioned this issue Apr 26, 2019

End to End test for Minikube knative/serving#1418

Closed

This was referenced May 14, 2019

Remove helloworld_shell_test knative/serving#4048

Merged

allow grpc and websocket e2e tests to pass when --ingressendpoint is … knative/serving#4047

Merged

knative-prow-robot closed this as completed May 21, 2019

knative-prow-robot reopened this Apr 2, 2020

knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2020

knative-prow-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2020

knative-prow-robot assigned mattmoor Sep 15, 2020

knative-prow-robot unassigned mattmoor Sep 15, 2020

knative-prow-robot closed this as completed Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate using kind for e2e tests on Prow #103

Investigate using kind for e2e tests on Prow #103

adrcunha commented Sep 5, 2018

adrcunha commented Nov 15, 2018

mattmoor commented Nov 16, 2018

chaodaiG commented May 15, 2019

chaodaiG commented May 15, 2019

BenTheElder commented May 15, 2019 •

edited

Loading

chaodaiG commented May 15, 2019

adrcunha commented May 15, 2019

BenTheElder commented May 15, 2019 •

edited

Loading

chaodaiG commented May 21, 2019

knative-prow-robot commented May 21, 2019

adrcunha commented May 21, 2019

chizhg commented Apr 2, 2020

knative-prow-robot commented Apr 2, 2020

chaodaiG commented Apr 2, 2020

mattmoor commented Apr 2, 2020

knative-housekeeping-robot commented Jul 2, 2020

chizhg commented Jul 2, 2020

coryrc commented Sep 15, 2020

mattmoor commented Sep 15, 2020

coryrc commented Sep 16, 2020

chaodaiG commented Sep 29, 2020

knative-prow-robot commented Sep 29, 2020

Investigate using kind for e2e tests on Prow #103

Investigate using kind for e2e tests on Prow #103

Comments

adrcunha commented Sep 5, 2018

adrcunha commented Nov 15, 2018

mattmoor commented Nov 16, 2018

chaodaiG commented May 15, 2019

chaodaiG commented May 15, 2019

BenTheElder commented May 15, 2019 • edited Loading

chaodaiG commented May 15, 2019

adrcunha commented May 15, 2019

BenTheElder commented May 15, 2019 • edited Loading

chaodaiG commented May 21, 2019

knative-prow-robot commented May 21, 2019

adrcunha commented May 21, 2019

chizhg commented Apr 2, 2020

knative-prow-robot commented Apr 2, 2020

chaodaiG commented Apr 2, 2020

mattmoor commented Apr 2, 2020

knative-housekeeping-robot commented Jul 2, 2020

chizhg commented Jul 2, 2020

coryrc commented Sep 15, 2020

mattmoor commented Sep 15, 2020

coryrc commented Sep 16, 2020

chaodaiG commented Sep 29, 2020

knative-prow-robot commented Sep 29, 2020

BenTheElder commented May 15, 2019 •

edited

Loading

BenTheElder commented May 15, 2019 •

edited

Loading