From a768c66b9aadb2f489908271651b15668099c1b2 Mon Sep 17 00:00:00 2001 From: Bridget McErlean Date: Tue, 3 Dec 2019 10:31:18 -0500 Subject: [PATCH] Add an FAQ (#998) This adds an FAQ to our documentation site. It mostly covers questions related to the e2e plugin, but some of the information applies to plugins in general. Signed-off-by: Bridget McErlean --- site/_data/master-toc.yml | 6 +- site/docs/master/faq.md | 172 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 177 insertions(+), 1 deletion(-) create mode 100644 site/docs/master/faq.md diff --git a/site/_data/master-toc.yml b/site/_data/master-toc.yml index add297700..373ba0b93 100644 --- a/site/_data/master-toc.yml +++ b/site/_data/master-toc.yml @@ -25,4 +25,8 @@ toc: - page: Using Private Images url: /pullsecrets - page: Advanced Customization - url: /gen \ No newline at end of file + url: /gen + - title: Resources + subfolderitems: + - page: Frequently Asked Questions + url: /faq diff --git a/site/docs/master/faq.md b/site/docs/master/faq.md new file mode 100644 index 000000000..7c39c06ea --- /dev/null +++ b/site/docs/master/faq.md @@ -0,0 +1,172 @@ +# Frequently Asked Questions + +## Kubernetes Conformance and end-to-end testing +### Why were so many tests skipped? +When running the `e2e` plugin on Sonobuoy, you will notice that a large number of tests are skipped by default. +The reason for this is that the image used by Sonobuoy to run the Kubernetes conformance tests contains all the end-to-end tests for Kubernetes. +However, only a subset of those tests are required to check conformance. +For example, the v1.16 Kubernetes test image contains over 4000 tests however only 215 of those are conformance tests. + +The default mode for the e2e plugin (`non-disruptive-conformance`) will run all tests which contain the tag `[Conformance]` and exclude those that with the `[Disruptive]` tag. +This is to help prevent you from accidentally running tests which may disrupt workloads on your cluster. +To run all the conformance tests, use the `certified-conformance` mode. + +Please refer to our [documentation for the `e2e` plugin][e2ePlugin] for more details of the built-in configurations. + +### How do I determine why my tests failed? +Before debugging test failures, we recommend isolating any failures to verify that they are genuine and are not spurious or transient. +Unfortunately, such failures can be common in complex, distributed systems. +To do this, you can make use of the `--e2e-focus` flag when using the `run` command. +This flag accepts a regex which will be used to find and run only the tests matching that regex. +For example, you can provide the name of a test to run only that test: + +``` +sonobuoy run --e2e-focus "should update pod when spec was updated and update strategy is RollingUpdate" +``` + +If the test continues to fail and it appears to be a genuine failure, the next step would be to read the logs to understand why the test failed. +To read the logs for a test failure, you can find the log file within the results tarball from Sonobuoy (`plugins/e2e/results/global/e2e.log`) or you can use the `results` command to show details of test failures. +For example, the following commands retrieve the results tarball and then use [jq][jq] to return an object for each test failure with the failure message and the associated stdout. + +``` +outfile=$(sonobuoy retrieve) && \ + sonobuoy results --mode detailed --plugin e2e $outfile | jq '. | select(.status == "failed") | .details' +``` + +Carefully read the test logs to see if anything stands out which could be the cause of the failure. +For example: Were there difficulties when contacting a particular service? Are there any commonalities in the failed tests due to a particular feature? +Often, the test logs will provide enough detail to allow you to determine why a test failed. + +If you need more information, Sonobuoy also queries the cluster upon completion of plugins. +The details collected allow you to see the state of the cluster and whether there were any issues. +For example: Did any of the nodes have memory pressure? Did the scheduler pod go down? + +As a final resort, you can also read the upstream test code to determine what actions were being performed at the point when the test failed. +If you decide to take this approach, you must ensure that you are reading the version of the test code that corresponds to your test image. +You can verify which version of the test image was used by inspecting the plugin definition which is available in the results tarball in `plugins/e2e/definition.json` under the key `Definition.spec.image`. +For example, if the test image was `gcr.io/google-containers/conformance:v1.15.3`, you should read the code at the corresponding [v1.15.3 tag in GitHub][kubernetes-1.15.3]. +All the tests can be found within the `test/e2e` directory in the Kubernetes repository. + +### How can I run the E2E tests with certain test framework options set? What are the available options? +How you provide options to the E2E test framework and determining which options you can set depends on which version of Kubernetes you are testing. + +To view the available options that you can set when running the tests, you can run the test executable for the conformance image you will be using as follows: + +``` +KUBE_VERSION= +docker run -it gcr.io/google-containers/conformance:$KUBE_VERSION ./e2e.test --help +``` + +You can also view the definitions of these test framework flags in the [Kubernetes repository][framework-flags]. + +If you are running Kubernetes v1.16.0 or greater, a new feature was included in this release which makes it easier to specify your own options. +This new feature allows arbitrary options to be specified when the tests are invoked. +To use this, you must ensure the environment variable `E2E_USE_GO_RUNNER=true` is set. +This is the default behavior from Sonobuoy v0.16.1 in the CLI and only needs to be manually set if working with a Sonobuoy manifest generated by an earlier version. +If this is enabled, then you can provide your options with the flag `--plugin-env=e2e.E2E_EXTRA_ARGS`. +For example, the following allows you set provider specific flags for running on GCE: + +``` +sonobuoy run --plugin-env=e2e.E2E_USE_GO_RUNNER=true \ + --plugin-env=e2e.E2E_PROVIDER=gce \ + --plugin-env=e2e.E2E_EXTRA_ARGS="--gce-zone=foo --gce-region=bar" +``` + +Before this version, it was necessary to build your own custom image which could execute the tests with the desired options. + +For details on the two different approaches that you can take, please refer to [our blog post][custom-e2e-image] which describes in more detail how to use the new v1.16.0 Go test runner and how to build your own custom images. + + +### Some of the registries required for the tests are blocked with my test infrastructure. Can I still run the tests? +Yes! Sonobuoy can be configured to use custom registries so that you can run the tests in airgapped environments. + +For more information and details on how to configure your environment, please refer to [our documentation for custom registries and air-gapped environments][airgap]. + +### We have some nodes with custom taints in our cluster and the tests won't start. How can I run the tests? +Although Sonobuoy plugins can be adapted to use [custom Kubernetes PodSpecs][custom-podspecs] where tolerations for custom taints can be specified, these settings do not apply to workloads started by the Kubernetes end-to-end testing framework as part of running the `e2e` plugin. + +The end-to-end test framework checks the status of the cluster before beginning to run the tests. +One of the checks that it runs, is checking that all of the nodes are schedulable and ready to accept workloads. +This check deems any nodes with a taint other than the master node taint (`node-role.kubernetes.io/master`) to be unschedulable. +This means that any node with a different taint will not be considered ready for testing and will block the tests from starting. + +With the release of Kubernetes v1.17.0, you will be able to whitelist node taints so that any node with a whitelisted taint will be deemed schedulable as part of the pre-test checks. +This will ensure that these nodes will not block the tests from starting. +If you are running Kubernetes v1.17.0 or greater, you will be able to specify the taints to whitelist using the flag `--non-blocking-taints` which takes a comma-separated list of taints. +To find out how to set this flag via Sonobuoy, please refer to our previous answer on how to set test framework options. + +This solution does not enable workloads created by the tests to run on these nodes. +This is still an [open issue in Kubernetes][support-custom-taints]. +The workloads created by the end-to-end tests will continue to run only on untainted nodes. + +For all versions of Kubernetes prior to v1.17.0, there are two approaches that you may be able to take to allow the tests to run. + +The first is adjusting the number of nodes the test framework allows to be "not-ready". +By default, the test framework will wait for all nodes to be ready. +However, if only a subset of your nodes are tainted and the rest are otherwise suitable for accepting test workloads, you could provide the test framework flag `--allowed-not-ready-nodes` specifying the number of tainted nodes you have. +By setting this, the test framework will allow for your tainted nodes to be in a "not-ready" state. +This does not guarantee that your tests will start however as a node in your cluster may not be ready for another reason. +Also, this approach will only work if there are untainted nodes as some will still need to be available for the tests to run on. + +The only other approach is to untaint the nodes for the purposes of testing. + +### What tests can I run? How can I figure out what tests/tags I can select? +The `e2e` plugin has a number of preconfigured modes for running tests, with the default mode running all conformance tests which are non-disruptive. +It is possible to [configure the plugin][e2ePlugin] to provide a specific set of E2E tests to run instead. + +Which tests you can run depends on the version of Kubernetes you are testing as the list of tests changes with each release. + +A list of the conformance tests is maintained in the [Kubernetes repository][kubernetes-conformance]. +Within the GitHub UI, you can change the branch to the tag that matches your Kubernetes version to see all the tests for that version. +This list provides each test name as well where you can find the test in the repository. +You can include these test names in the `E2E_FOCUS` or `E2E_SKIP` environment variables when [running the plugin][e2ePlugin]. + +Although the default behavior is to run the Conformance tests, you can run any of the other Kubernetes E2E tests with Sonobuoy. +These are not required for checking that your cluster is conformant and we only recommend running these if there is specific behavior you wish to check. + +There are a large number of E2E tests available (over 4000 as of v1.16.0). +Many of these tests have "tags" which show that they belong to a specific group, or have a particular trait. +There isn't a definitive list of these tags, however below are some of the most commonly seen tags: + +- Conformance +- NodeConformance +- Slow +- Serial +- Disruptive +- Flaky +- LinuxOnly +- Feature:* (there are numerous feature tags) + +There are also specific tags for tests that belong to a particular [Special Interest Group (SIG)][sig-list]. +The following SIG tags exist within the E2E tests: + +- [sig-api-machinery] +- [sig-apps] +- [sig-auth] +- [sig-autoscaling] +- [sig-cli] +- [sig-cloud-provider] +- [sig-cloud-provider-gcp] +- [sig-cluster-lifecycle] +- [sig-instrumentation] +- [sig-network] +- [sig-node] +- [sig-scheduling] +- [sig-service-catalog] +- [sig-storage] +- [sig-ui] +- [sig-windows] + + +[kubernetes-podspec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#podspec-v1-core +[custom-e2e-image]: https://sonobuoy.io/custom-e2e-image/ +[custom-podspecs]: https://sonobuoy.io/customizing-plugin-podspecs/ +[sig-list]: https://github.com/kubernetes/community/blob/master/sig-list.md +[jq]: https://stedolan.github.io/jq/ +[kubernetes-1.15.3]: https://github.com/kubernetes/kubernetes/tree/v1.15.3 +[kubernetes-conformance]: https://github.com/kubernetes/kubernetes/blob/master/test/conformance/testdata/conformance.txt +[airgap]: airgap.md +[e2ePlugin]: e2eplugin.md +[customPlugins]: plugins.md +[support-custom-taints]: https://github.com/kubernetes/kubernetes/issues/83329 +[framework-flags]: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/test_context.go