From a768c66b9aadb2f489908271651b15668099c1b2 Mon Sep 17 00:00:00 2001
From: Bridget McErlean <bmcerlean@vmware.com>
Date: Tue, 3 Dec 2019 10:31:18 -0500
Subject: [PATCH] Add an FAQ (#998)

This adds an FAQ to our documentation site. It mostly covers questions
related to the e2e plugin, but some of the information applies to
plugins in general.

Signed-off-by: Bridget McErlean <bmcerlean@vmware.com>
---
 site/_data/master-toc.yml |   6 +-
 site/docs/master/faq.md   | 172 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 177 insertions(+), 1 deletion(-)
 create mode 100644 site/docs/master/faq.md

diff --git a/site/_data/master-toc.yml b/site/_data/master-toc.yml
index add297700..373ba0b93 100644
--- a/site/_data/master-toc.yml
+++ b/site/_data/master-toc.yml
@@ -25,4 +25,8 @@ toc:
       - page: Using Private Images
         url: /pullsecrets
       - page: Advanced Customization
-        url: /gen
\ No newline at end of file
+        url: /gen
+  - title: Resources
+    subfolderitems:
+      - page: Frequently Asked Questions
+        url: /faq
diff --git a/site/docs/master/faq.md b/site/docs/master/faq.md
new file mode 100644
index 000000000..7c39c06ea
--- /dev/null
+++ b/site/docs/master/faq.md
@@ -0,0 +1,172 @@
+# Frequently Asked Questions
+
+## Kubernetes Conformance and end-to-end testing
+### Why were so many tests skipped?
+When running the `e2e` plugin on Sonobuoy, you will notice that a large number of tests are skipped by default.
+The reason for this is that the image used by Sonobuoy to run the Kubernetes conformance tests contains all the end-to-end tests for Kubernetes.
+However, only a subset of those tests are required to check conformance.
+For example, the v1.16 Kubernetes test image contains over 4000 tests however only 215 of those are conformance tests.
+
+The default mode for the e2e plugin (`non-disruptive-conformance`) will run all tests which contain the tag `[Conformance]` and exclude those that with the `[Disruptive]` tag.
+This is to help prevent you from accidentally running tests which may disrupt workloads on your cluster.
+To run all the conformance tests, use the `certified-conformance` mode.
+
+Please refer to our [documentation for the `e2e` plugin][e2ePlugin] for more details of the built-in configurations.
+
+### How do I determine why my tests failed?
+Before debugging test failures, we recommend isolating any failures to verify that they are genuine and are not spurious or transient.
+Unfortunately, such failures can be common in complex, distributed systems.
+To do this, you can make use of the `--e2e-focus` flag when using the `run` command.
+This flag accepts a regex which will be used to find and run only the tests matching that regex.
+For example, you can provide the name of a test to run only that test:
+
+```
+sonobuoy run --e2e-focus "should update pod when spec was updated and update strategy is RollingUpdate"
+```
+
+If the test continues to fail and it appears to be a genuine failure, the next step would be to read the logs to understand why the test failed.
+To read the logs for a test failure, you can find the log file within the results tarball from Sonobuoy (`plugins/e2e/results/global/e2e.log`) or you can use the `results` command to show details of test failures.
+For example, the following commands retrieve the results tarball and then use [jq][jq] to return an object for each test failure with the failure message and the associated stdout.
+
+```
+outfile=$(sonobuoy retrieve) && \
+  sonobuoy results --mode detailed --plugin e2e $outfile |  jq '.  | select(.status == "failed") | .details'
+```
+
+Carefully read the test logs to see if anything stands out which could be the cause of the failure.
+For example: Were there difficulties when contacting a particular service? Are there any commonalities in the failed tests due to a particular feature? 
+Often, the test logs will provide enough detail to allow you to determine why a test failed.
+
+If you need more information, Sonobuoy also queries the cluster upon completion of plugins.
+The details collected allow you to see the state of the cluster and whether there were any issues.
+For example: Did any of the nodes have memory pressure? Did the scheduler pod go down?
+
+As a final resort, you can also read the upstream test code to determine what actions were being performed at the point when the test failed.
+If you decide to take this approach, you must ensure that you are reading the version of the test code that corresponds to your test image.
+You can verify which version of the test image was used by inspecting the plugin definition which is available in the results tarball in `plugins/e2e/definition.json` under the key `Definition.spec.image`.
+For example, if the test image was `gcr.io/google-containers/conformance:v1.15.3`, you should read the code at the corresponding [v1.15.3 tag in GitHub][kubernetes-1.15.3].
+All the tests can be found within the `test/e2e` directory in the Kubernetes repository.
+
+### How can I run the E2E tests with certain test framework options set? What are the available options?
+How you provide options to the E2E test framework and determining which options you can set depends on which version of Kubernetes you are testing.
+
+To view the available options that you can set when running the tests, you can run the test executable for the conformance image you will be using as follows:
+
+```
+KUBE_VERSION=<Kubernetes version you are using>
+docker run -it gcr.io/google-containers/conformance:$KUBE_VERSION ./e2e.test --help
+```
+
+You can also view the definitions of these test framework flags in the [Kubernetes repository][framework-flags].
+
+If you are running Kubernetes v1.16.0 or greater, a new feature was included in this release which makes it easier to specify your own options.
+This new feature allows arbitrary options to be specified when the tests are invoked.
+To use this, you must ensure the environment variable `E2E_USE_GO_RUNNER=true` is set.
+This is the default behavior from Sonobuoy v0.16.1 in the CLI and only needs to be manually set if working with a Sonobuoy manifest generated by an earlier version.
+If this is enabled, then you can provide your options with the flag `--plugin-env=e2e.E2E_EXTRA_ARGS`.
+For example, the following allows you set provider specific flags for running on GCE:
+
+```
+sonobuoy run --plugin-env=e2e.E2E_USE_GO_RUNNER=true \
+  --plugin-env=e2e.E2E_PROVIDER=gce \
+  --plugin-env=e2e.E2E_EXTRA_ARGS="--gce-zone=foo --gce-region=bar"
+```
+
+Before this version, it was necessary to build your own custom image which could execute the tests with the desired options.
+
+For details on the two different approaches that you can take, please refer to [our blog post][custom-e2e-image] which describes in more detail how to use the new v1.16.0 Go test runner and how to build your own custom images.
+
+
+### Some of the registries required for the tests are blocked with my test infrastructure. Can I still run the tests?
+Yes! Sonobuoy can be configured to use custom registries so that you can run the tests in airgapped environments.
+
+For more information and details on how to configure your environment, please refer to [our documentation for custom registries and air-gapped environments][airgap].
+
+### We have some nodes with custom taints in our cluster and the tests won't start. How can I run the tests?
+Although Sonobuoy plugins can be adapted to use [custom Kubernetes PodSpecs][custom-podspecs] where tolerations for custom taints can be specified, these settings do not apply to workloads started by the Kubernetes end-to-end testing framework as part of running the `e2e` plugin.
+
+The end-to-end test framework checks the status of the cluster before beginning to run the tests.
+One of the checks that it runs, is checking that all of the nodes are schedulable and ready to accept workloads.
+This check deems any nodes with a taint other than the master node taint (`node-role.kubernetes.io/master`) to be unschedulable.
+This means that any node with a different taint will not be considered ready for testing and will block the tests from starting.
+
+With the release of Kubernetes v1.17.0, you will be able to whitelist node taints so that any node with a whitelisted taint will be deemed schedulable as part of the pre-test checks.
+This will ensure that these nodes will not block the tests from starting.
+If you are running Kubernetes v1.17.0 or greater, you will be able to specify the taints to whitelist using the flag `--non-blocking-taints` which takes a comma-separated list of taints.
+To find out how to set this flag via Sonobuoy, please refer to our previous answer on how to set test framework options.
+
+This solution does not enable workloads created by the tests to run on these nodes.
+This is still an [open issue in Kubernetes][support-custom-taints].
+The workloads created by the end-to-end tests will continue to run only on untainted nodes.
+
+For all versions of Kubernetes prior to v1.17.0, there are two approaches that you may be able to take to allow the tests to run.
+
+The first is adjusting the number of nodes the test framework allows to be "not-ready".
+By default, the test framework will wait for all nodes to be ready.
+However, if only a subset of your nodes are tainted and the rest are otherwise suitable for accepting test workloads, you could provide the test framework flag `--allowed-not-ready-nodes` specifying the number of tainted nodes you have.
+By setting this, the test framework will allow for your tainted nodes to be in a "not-ready" state.
+This does not guarantee that your tests will start however as a node in your cluster may not be ready for another reason.
+Also, this approach will only work if there are untainted nodes as some will still need to be available for the tests to run on.
+
+The only other approach is to untaint the nodes for the purposes of testing.
+
+### What tests can I run? How can I figure out what tests/tags I can select?
+The `e2e` plugin has a number of preconfigured modes for running tests, with the default mode running all conformance tests which are non-disruptive.
+It is possible to [configure the plugin][e2ePlugin] to provide a specific set of E2E tests to run instead.
+
+Which tests you can run depends on the version of Kubernetes you are testing as the list of tests changes with each release.
+
+A list of the conformance tests is maintained in the [Kubernetes repository][kubernetes-conformance].
+Within the GitHub UI, you can change the branch to the tag that matches your Kubernetes version to see all the tests for that version.
+This list provides each test name as well where you can find the test in the repository.
+You can include these test names in the `E2E_FOCUS` or `E2E_SKIP` environment variables when [running the plugin][e2ePlugin].
+
+Although the default behavior is to run the Conformance tests, you can run any of the other Kubernetes E2E tests with Sonobuoy.
+These are not required for checking that your cluster is conformant and we only recommend running these if there is specific behavior you wish to check.
+
+There are a large number of E2E tests available (over 4000 as of v1.16.0).
+Many of these tests have "tags" which show that they belong to a specific group, or have a particular trait.
+There isn't a definitive list of these tags, however below are some of the most commonly seen tags:
+
+- Conformance
+- NodeConformance
+- Slow
+- Serial
+- Disruptive
+- Flaky
+- LinuxOnly
+- Feature:* (there are numerous feature tags)
+
+There are also specific tags for tests that belong to a particular [Special Interest Group (SIG)][sig-list].
+The following SIG tags exist within the E2E tests:
+
+- [sig-api-machinery]
+- [sig-apps]
+- [sig-auth]
+- [sig-autoscaling]
+- [sig-cli]
+- [sig-cloud-provider]
+- [sig-cloud-provider-gcp]
+- [sig-cluster-lifecycle]
+- [sig-instrumentation]
+- [sig-network]
+- [sig-node]
+- [sig-scheduling]
+- [sig-service-catalog]
+- [sig-storage]
+- [sig-ui]
+- [sig-windows]
+
+
+[kubernetes-podspec]: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.16/#podspec-v1-core
+[custom-e2e-image]: https://sonobuoy.io/custom-e2e-image/
+[custom-podspecs]: https://sonobuoy.io/customizing-plugin-podspecs/
+[sig-list]: https://github.com/kubernetes/community/blob/master/sig-list.md
+[jq]: https://stedolan.github.io/jq/
+[kubernetes-1.15.3]: https://github.com/kubernetes/kubernetes/tree/v1.15.3
+[kubernetes-conformance]: https://github.com/kubernetes/kubernetes/blob/master/test/conformance/testdata/conformance.txt
+[airgap]: airgap.md
+[e2ePlugin]: e2eplugin.md
+[customPlugins]: plugins.md
+[support-custom-taints]: https://github.com/kubernetes/kubernetes/issues/83329
+[framework-flags]: https://github.com/kubernetes/kubernetes/blob/master/test/e2e/framework/test_context.go