Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Add agent trace support #1442

Merged

Conversation

jodh-intel
Copy link
Contributor

Add configuration options to support the various Kata agent tracing modes and types. See the comments in the built configuration files for details:

  • cli/config/configuration-fc.toml
  • cli/config/configuration-qemu.toml

Fixes #1369.

Signed-off-by: James O. D. Hunt [email protected]

@jodh-intel
Copy link
Contributor Author

Note that the CI's will fail as this PR is blocked on:

... so adding the dnm label for now...

# full details.
#
# (default: disabled)
#enable_tracing = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's time to have a specific section for tracing, [trace] might be...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. Yes, we could make a sub-section like the one below, but it might involve more rework as none of the other sections have sub-sections in them:

[agent.kata]

enable_debug = true

[agent.kata.trace]
enabled = true
mode = "dynamic"
type = "isolated"

But we haven't done that historically. For example, we have two "groups" of options for the hypervisor section:

memory_slots = ...
memory_offset = ...

block_device_driver = ...
block_device_cache_set = ...
block_device_cache_noflush = ...

For consistency they would also need new subsections too.

I am tempted to suggest -- although not ideal I know -- that we keep the status quo for now and potentially totally rework the configuration handling in v2.

#enable_tracing = true
#
#trace_mode = "dynamic"
#trace_type = "isolated"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there option mutually exclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -27,7 +27,7 @@ import (
//
// XXX: Increment for every change to the output format
// (meaning any change to the EnvInfo type).
const formatVersion = "1.0.21"
const formatVersion = "1.0.23"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mathematics has changed since I left the school 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha! Well, there are two bumps to this number in this PR :)

@jodh-intel
Copy link
Contributor Author

@kata-containers/runtime - ping y'all!

@jodh-intel
Copy link
Contributor Author

Related: #1352.

@jcvenegas
Copy link
Member

@jodh-intel ping this needs rebase

@jodh-intel jodh-intel force-pushed the add-agent-trace-support branch 3 times, most recently from 19bd0ac to 33eb4fb Compare April 15, 2019 11:20
@jodh-intel
Copy link
Contributor Author

/test

Ping @sboeuf, @egernst.

Copy link
Member

@egernst egernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions, but overall the changes look good to me. Thanks @jodh-intel

@jcvenegas jcvenegas added feature New functionality release labels Apr 15, 2019
Copy link
Member

@amshinde amshinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@chavafg
Copy link
Contributor

chavafg commented Apr 16, 2019

vm templating test is consistently failing:

+ sudo docker run --runtime=kata-runtime -d --name kata-ySDdaZ busybox tail -f /dev/null
ff99d46857e94e52526c7dcea53ddf489e958813ed1ac8a5b8b8634365983e97
docker: Error response from daemon: OCI runtime create failed: rpc error: code = Unavailable desc = transport is closing: unknown.
Makefile:123: recipe for target 'vm-factory' failed
make: *** [vm-factory] Error 125
Build step 'Execute shell' marked build as failure

http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-18-04-PR-initrd/1799/console
http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-18-04-PR-initrd/1801/console
http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-18-04-PR-initrd/1805/console

@jcvenegas
Copy link
Member

@chavafg could you open and issue?

Make `newAgentConfig()` return an explicit error rather than handling
the error scenario by simply returning the `error` object in the
`interface{}` return type. The old behaviour was confusing and
inconsistent with the other functions creating a new config type (shim,
proxy, etc).

Signed-off-by: James O. D. Hunt <[email protected]>
Refactor a common error into a new standard error object.

Signed-off-by: James O. D. Hunt <[email protected]>
Removed the unused `KataShimConfig` type and updated an error message
that incorrectly mentioned it.

Signed-off-by: James O. D. Hunt <[email protected]>
Replace the two versions of `makeRuntimeConfigFileData()` with a single
`MakeRuntimeConfigFileData()` in a new `katatestutils` package and a new
`katautils.GetDefaultHypervisorConfig()` to query the default hypervisor
details.

This isn't ideal but a new package had to be created to avoid circular
dependencies. It was also required since test code cannot be exported
from a package.

Signed-off-by: James O. D. Hunt <[email protected]>
Previously, the agent behaviour was controlled entirely using the
`kernel_params=` config option. This mechanism suffers from a subtle
problem - the runtime is not aware of how the agent will behave.

From now on, all significant agent options will be controlled from the
agent section in the configuration file. This allows the runtime to be
more aware of -- and in control of -- such agent settings. It would also
allow the underlying kernel CLI options to be modified in the future if
required.

This PR adds the only useful agent option as an explicit option by
adding an `enable_debug=true` option to the Kata agent section in
`configuration.toml`. This allows controlling agent debug to be handled
in the same manner as the other debug options.

This change is somewhat foundational: it permits the agent to be handled
consistently with other config file sections which is useful, but
arguably not essential (the old way worked). However, the new way of
handling agent options will be essential when introducing agent tracing
control as the runtime must be aware of the agent trace mode to allow
the runtime to modify its behaviour accordingly.

Signed-off-by: James O. D. Hunt <[email protected]>
Updated the agent vendoring for `StartTracing` and `StopTracing`. This
only changed a single file - the auto-generated gRPC protocol
buffer file.

This change resolves four vendoring issues:

- The github.com/kubernetes-incubator/cri-o project was renamed to
  github.com/cri-o/cri-o. Although github redirects, `dep` complains that
  it cannot find the old `github.com/kubernetes-incubator/cri-o` files
  under `vendor/` so remove the old config, relying on the existing (and
  in other respects identical) `dep` config.

- There was a stale dependency on `github.com/clearcontainers/proxy`
  which should have been removed when the Clear Containers code was
  excised.

- The latest version of the agent code vendored into the runtime prior
  to this commit was a merge commit (commit
  `48dd1c031530fce9bf16b0f6a7305979cedd8fc9`). This somehow confused `dep`
  which did *not* correctly pull in the latest version of the
  auto-generated gRPC code
  (`vendor/github.com/kata-containers/agent/protocols/grpc/agent.pb.go`).
  This is clear because commit `48dd1c031530fce9bf16b0f6a7305979cedd8fc9`
  is newer than the agent commit that introduced the `StartTracing` and
  `StopTracing` APIs (`00cf907afcb7c8e56f077cf45ae3615f612fdc9d`).

  Resolving the other two issues above seems to have resolved this issue
  as the correct version of this file has now been included in the
  vendoring, however note there is no change to the `dep` files as this
  version of `agent.pb.go` should already have been included (!)

- Updating `agent.pb.go` also removed the `AddInterface` and
  `RemoveInterface` API calls which should again also have been removed
  already.

  Updated tests to remove these redundant calls.

Signed-off-by: James O. D. Hunt <[email protected]>
Add configuration options to support the various Kata agent tracing
modes and types. See the comments in the built configuration files for
details:

- `cli/config/configuration-fc.toml`
- `cli/config/configuration-qemu.toml`

Fixes kata-containers#1369.

Signed-off-by: James O. D. Hunt <[email protected]>
@jodh-intel
Copy link
Contributor Author

/retest

@jodh-intel
Copy link
Contributor Author

Branch updated yet again. Let's try to get this landed before it goes stale again folks...

Note too the mess I've tried to resolve on the vendor commit (b573d9b).

Copy link
Contributor

@grahamwhaley grahamwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this lgtm
let's try and get this landed this time...

@grahamwhaley
Copy link
Contributor

Only thing we are waiting for is the rhel-7 CI ....

@grahamwhaley
Copy link
Contributor

afaict, the rhel7 CI has passed: http://jenkins.katacontainers.io/job/kata-containers-agent-rhel-7-PR/1/ but not reported status back here. That may be because that looks like a new jenkins job - and we may have to enable some hook/auth on the github end. @chavafg - can you see/confirm if that is the case.
If so, when we are happy we understand why it is pending on this PR, let's get it merged....

@chavafg
Copy link
Contributor

chavafg commented Apr 25, 2019

@grahamwhaley yeah, this is job landed yesterday I think. I see some jobs on the queue waiting for some hours, so not sure what is happening, in the meantime feel free to merge without rhel reporting back.
/cc @GabyCT

@grahamwhaley
Copy link
Contributor

Thanks @chavafg - merging!!

@grahamwhaley grahamwhaley merged commit f4fe31e into kata-containers:master Apr 25, 2019
@GabyCT
Copy link
Contributor

GabyCT commented Apr 25, 2019

@grahamwhaley and @chavafg , yesterday I saw that the agent node of jenkins was having in trying to launch the VMs from rhel after killing the nodes, I saw that the VMs start to run without issues. I review the configuration and everything was ok, not sure why Jenkins could not launch the VM.

@grahamwhaley
Copy link
Contributor

Thanks @GabyCT - let's keep an eye open for it happening again

@jodh-intel
Copy link
Contributor Author

Errr. Something isn't right here. I triggered these builds almost 7 hours ago, and they are still "running"?!?

sles 12 certainly shouldn't be running as it fell over:

make[2]: Entering directory '/tmp/cri-install-runc.yroG/src/github.com/opencontainers/runc'
CGO_ENABLED=1 go build  -tags "seccomp apparmor netgo osusergo static_build" -installsuffix netgo -ldflags "-w -extldflags -static -X main.gitCommit="12f6a991201fdb8f82579582d5e00e28fba06d0a" -X main.version=1.0.0-rc6+dev " -o runc .
# github.com/opencontainers/runc
/usr/local/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: cannot find -lseccomp
collect2: error: ld returned 1 exit status

Makefile:43: recipe for target 'static' failed

jodh-intel added a commit to jodh-intel/tests-1 that referenced this pull request Apr 29, 2019
Update the `check_vendor()` function in the static check script to run
our current vendoring tool (`dep`) in check mode. This will ensure our
vendoring does not get silently broken [1].

Fixes kata-containers#1506.

[1] - See for example kata-containers/runtime#1442.

Signed-off-by: James O. D. Hunt <[email protected]>
ganeshmaharaj pushed a commit to ganeshmaharaj/kata-tests that referenced this pull request Apr 30, 2019
Update the `check_vendor()` function in the static check script to run
our current vendoring tool (`dep`) in check mode. This will ensure our
vendoring does not get silently broken [1].

Fixes kata-containers#1506.

[1] - See for example kata-containers/runtime#1442.

Signed-off-by: James O. D. Hunt <[email protected]>
(cherry picked from commit 830983f)
Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
ganeshmaharaj pushed a commit to ganeshmaharaj/kata-tests that referenced this pull request May 1, 2019
Update the `check_vendor()` function in the static check script to run
our current vendoring tool (`dep`) in check mode. This will ensure our
vendoring does not get silently broken [1].

Fixes kata-containers#1506.

[1] - See for example kata-containers/runtime#1442.

Signed-off-by: James O. D. Hunt <[email protected]>
(cherry picked from commit 830983f)
Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
chavafg pushed a commit to chavafg/tests-1 that referenced this pull request May 2, 2019
Update the `check_vendor()` function in the static check script to run
our current vendoring tool (`dep`) in check mode. This will ensure our
vendoring does not get silently broken [1].

Fixes kata-containers#1506.

[1] - See for example kata-containers/runtime#1442.

Signed-off-by: James O. D. Hunt <[email protected]>
jcvenegas pushed a commit to jcvenegas/kata-test that referenced this pull request May 2, 2019
Update the `check_vendor()` function in the static check script to run
our current vendoring tool (`dep`) in check mode. This will ensure our
vendoring does not get silently broken [1].

Fixes kata-containers#1506.

[1] - See for example kata-containers/runtime#1442.

Signed-off-by: James O. D. Hunt <[email protected]>
(cherry picked from commit 830983f)
Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add agent tracing support
8 participants