Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: KEP: Who are we and why are we here? #2363

Closed
wants to merge 32 commits into from
Closed

Conversation

hh
Copy link
Member

@hh hh commented Jul 11, 2018

  • Ask some introspective questions of our software interactions
  • allow cluster wide aggregation of the answers
  • analyse community wide results for non-obvious patterns

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 11, 2018
@k8s-ci-robot k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 11, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: jdumars

If they are not already assigned, you can assign the PR to them by writing /assign @jdumars in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hh
Copy link
Member Author

hh commented Jul 11, 2018

/cc @kubernetes/sig-architecture-feature-requests @kubernetes/sig-testing-feature-requests
/cc @WilliamDenniss @spiffxp @AishSundar

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 14, 2018
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the read and the cultural background!
found one typo :)

enable all kubernetes components to begin transmitting identity and purpose.

Setting this variable on all pods could be accomplished with
an admission or initialization controller allowng all other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 15, 2018

## Drawbacks [optional]

Some feel this level of client ‘debugging’ doesn't belong in server-side logs.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DirectXMan12
Copy link
Contributor

At a glance, some of this looks similar to tracing. can you compare/contrast this a bit with opentracing? It seems like you could inject the user agent and traceback stuff optionally as persistent tracing information, but I'm not super familiar with the details.

@hh
Copy link
Member Author

hh commented Jul 18, 2018

Our goal is to make it simple to submit and gather API usage data, both cluster and community wide. If we want to identify the yet unseen patterns of use, we need the ability to easily instrument all K8s API consumers.

The core of this KEP is to explore easy methods to enable any application to verbosely convey, for each API call, the identity of the code behind a particular interaction. I'd prefer to enable this behavior by setting a variable vs creating special binaries with compile time instrumentation libraries.

Jaeger and Opentracing are great for debugging and development purposes. They do however require instrumentation logic to be added to the applications themselves. They do provide some beautiful insight, if you already know what your looking for. This would also limit the data we would be able to collect.

The initial APISnoop approach used mitm injection via a sidecar, similar to Envoy, and provided a non-kubernetes specific approach.

We explored using Istio / Envoy, but they were not designed to intercept communications destined for the APIServer.

At this point that things were looking a bit Rube Goldberg, hence the switch in focus to audit-logging.

While I'm enjoying the simplicity of audit-logging, it might be worth re-exploring the mitm / Envoy approach.

If we route APIServer traffic to route through a proxy like Envoy we could support APISnooping more than just the K8s API.

@dankohn
Copy link
Contributor

dankohn commented Jul 18, 2018

@bhs could you please add any comments you have re OpenTracing.

@bhs
Copy link

bhs commented Jul 18, 2018

There are three pieces here:

  1. Literally adding the necessary instrumentation to pass context / causality information from the clients into k8s, then through k8s while servicing the client API request(s)
  2. Adding information (e.g., "purpose," though also what actually happens (beyond mere context propagation) to decorate the transactions for later analysis
  3. Collection and analysis of all of the above

The third piece almost certainly does require "something new"... general-purpose tracing tools like Jaeger/Zipkin/etc are designed for performance analysis and root cause analysis, and it sounds like @hh intends for more bespoke sorts of insight as an output to the entire effort. It is conceivable to use a general-purpose distributed tracing system to collect the data, then write an analytical script to generate the human-ready output via API calls, but in any case "something new" will still need t obe written.

The second piece is kind of an "it depends"... OpenTracing has the notion of "baggage" which is designed for just this sort of user-level context information that has some user-level interpretation. At first glance it's very similar in spirit, though perhaps I'm missing something. Note that some people are nervous about baggage, though.

The first piece is where I see strong overlap with OpenTracing specifically. The actual context propagation and context management is what OT was designed for, and the API is thin, with no dependencies of its own (i.e., not a Rube Goldberg nightmare per the KEP's language).

I think the determinant would be (a) how much multi-process context propagation we expect to see here, and (b) what k8s wants to do with more debugging-centric context propagation. The main insight with (b) is that "context is context" regardless of the application, and the true "Rube Goldberg" outcome would be to record and propagate it multiple times in parallel.

If there was a decision to proceed with OT, I'd imagine that this effort could be / would be supported via a Tracer implementation that does some sort of passthrough/multiplexing with a "normal" Tracer... i.e., the WhyAreWeHereTracer / WAWHTracer would record the precise info it needed to, then pass things along to a (potentially noop) general-purpose Tracer.

Hope this helps.

@dankohn
Copy link
Contributor

dankohn commented Jul 18, 2018

@bhs thanks for the quick review. Let's plan to have @hh present this work to the OpenTracing community in the next couple months, once he has some progress to show.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 18, 2018
@k8s-ci-robot
Copy link
Contributor

@hh: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this pull request Aug 22, 2018
Automatic merge from submit-queue (batch tested with PRs 67430, 67550). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add e2e Test Description to User-Agent

This PR appends detail e2e test line number and test details to the User-Agent.
It's needed for kubernetes/community#2363 and Conformance Data gathering.

This can be easily tested with any 1.12.0-alpha or later that is configured for audit-logging.

Here is an example using the test-infra/kubetest dind from kubernetes/test-infra#9061

```
  cd ~/go/src/k8s.io/kubernetes/
  go get -u k8s.io/test-infra/kubetest
  kubetest --build=dind
  kubetest --up --deployment=dind
  export KUBERNETES_CONFORMANCE_TEST=y
  export KUBECONFIG=$(ls -rt /tmp/k8s-dind-kubecfg-* | tail -1)
  export DIND_K8S_DATA=$(ls -drt /tmp/dind-k8* | tail -1)
  make -j 8 GOGCFLAGS="-N -l -v" WHAT=test/e2e/e2e.test
  ./_output/local/bin/linux/amd64/e2e.test \
    --ginkgo.focus=\[Conformance\] \
    --ginkgo.seed=1436380640 \
    --v=2 \
    --ginkgo.skip='\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]|\[HPA\]|Dashboard|Services.*functioning.*NodePort' 
  cp $DIND_KCS_DATA/audit/audit.log .
```

```
e2e.test/v1.12.0 (linux/amd64) kubernetes/94c2c6c -- k8s.io/kubernetes/test/e2e/kubectl/portforward.go:496 -- [sig-cli] Kubectl Port forwarding [k8s.io] With a server listening on localhost [k8s.io] that expects NO client request should support a client that connects, sends DATA, and disconnects
e2e.test/v1.12.0 (linux/amd64) kubernetes/94c2c6c -- k8s.io/kubernetes/test/e2e/network/dns.go:158 -- [sig-network] DNS should provide DNS for ExternalName services
e2e.test/v1.12.0 (linux/amd64) kubernetes/94c2c6c -- k8s.io/kubernetes/test/e2e/framework/framework.go:710 -- [sig-cli] Kubectl client [k8s.io] Kubectl cluster-info should check if Kubernetes master services is included in cluster-info  [Conformance]
e2e.test/v1.12.0 (linux/amd64) kubernetes/94c2c6c -- k8s.io/kubernetes/test/e2e/storage/persistent_volumes-local.go:562 -- [sig-storage] PersistentVolumes-local  StatefulSet with pod affinity should use volumes on one node when pod has affinity
```

Sample log file from e2e run:

[audit.log.gz](https://github.com/kubernetes/kubernetes/files/2298944/audit.log.gz)
@justaugustus
Copy link
Member

/kind kep

@quinton-hoole
Copy link
Contributor

@hh Very entertaining :-). Can this be closed? If not can it be abbreviated to to the bare essentials to facilitate effective review?

@hh
Copy link
Member Author

hh commented Oct 30, 2018

I'm working on abbreviating it to the bare essentials. I'll set it back to WIP.

@hh hh changed the title KEP: Who are we and why are we here? WIP: KEP: Who are we and why are we here? Oct 30, 2018
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 30, 2018
@justaugustus
Copy link
Member

REMINDER: KEPs are moving to k/enhancements on November 30. Please attempt to merge this KEP before then to signal consensus.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.

@justaugustus
Copy link
Member

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

@k8s-ci-robot
Copy link
Contributor

@justaugustus: Closed this PR.

In response to this:

KEPs have moved to k/enhancements.
This PR will be closed and any additional changes to this KEP should be submitted to k/enhancements.
For more details on this change, review this thread.

Any questions regarding this move should be directed to that thread and not asked on GitHub.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants