-
Notifications
You must be signed in to change notification settings - Fork 374
Add agent trace support #1442
Add agent trace support #1442
Conversation
Note that the CI's will fail as this PR is blocked on:
... so adding the dnm label for now... |
# full details. | ||
# | ||
# (default: disabled) | ||
#enable_tracing = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's time to have a specific section for tracing, [trace]
might be...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Yes, we could make a sub-section like the one below, but it might involve more rework as none of the other sections have sub-sections in them:
[agent.kata]
enable_debug = true
[agent.kata.trace]
enabled = true
mode = "dynamic"
type = "isolated"
But we haven't done that historically. For example, we have two "groups" of options for the hypervisor section:
memory_slots = ...
memory_offset = ...
block_device_driver = ...
block_device_cache_set = ...
block_device_cache_noflush = ...
For consistency they would also need new subsections too.
I am tempted to suggest -- although not ideal I know -- that we keep the status quo for now and potentially totally rework the configuration handling in v2.
#enable_tracing = true | ||
# | ||
#trace_mode = "dynamic" | ||
#trace_type = "isolated" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there option mutually exclusive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -27,7 +27,7 @@ import ( | |||
// | |||
// XXX: Increment for every change to the output format | |||
// (meaning any change to the EnvInfo type). | |||
const formatVersion = "1.0.21" | |||
const formatVersion = "1.0.23" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mathematics has changed since I left the school 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha! Well, there are two bumps to this number in this PR :)
@kata-containers/runtime - ping y'all! |
Related: #1352. |
@jodh-intel ping this needs rebase |
19bd0ac
to
33eb4fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions, but overall the changes look good to me. Thanks @jodh-intel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
vm templating test is consistently failing:
http://jenkins.katacontainers.io/job/kata-containers-runtime-ubuntu-18-04-PR-initrd/1799/console |
@chavafg could you open and issue? |
Make `newAgentConfig()` return an explicit error rather than handling the error scenario by simply returning the `error` object in the `interface{}` return type. The old behaviour was confusing and inconsistent with the other functions creating a new config type (shim, proxy, etc). Signed-off-by: James O. D. Hunt <[email protected]>
Refactor a common error into a new standard error object. Signed-off-by: James O. D. Hunt <[email protected]>
Removed the unused `KataShimConfig` type and updated an error message that incorrectly mentioned it. Signed-off-by: James O. D. Hunt <[email protected]>
Replace the two versions of `makeRuntimeConfigFileData()` with a single `MakeRuntimeConfigFileData()` in a new `katatestutils` package and a new `katautils.GetDefaultHypervisorConfig()` to query the default hypervisor details. This isn't ideal but a new package had to be created to avoid circular dependencies. It was also required since test code cannot be exported from a package. Signed-off-by: James O. D. Hunt <[email protected]>
Previously, the agent behaviour was controlled entirely using the `kernel_params=` config option. This mechanism suffers from a subtle problem - the runtime is not aware of how the agent will behave. From now on, all significant agent options will be controlled from the agent section in the configuration file. This allows the runtime to be more aware of -- and in control of -- such agent settings. It would also allow the underlying kernel CLI options to be modified in the future if required. This PR adds the only useful agent option as an explicit option by adding an `enable_debug=true` option to the Kata agent section in `configuration.toml`. This allows controlling agent debug to be handled in the same manner as the other debug options. This change is somewhat foundational: it permits the agent to be handled consistently with other config file sections which is useful, but arguably not essential (the old way worked). However, the new way of handling agent options will be essential when introducing agent tracing control as the runtime must be aware of the agent trace mode to allow the runtime to modify its behaviour accordingly. Signed-off-by: James O. D. Hunt <[email protected]>
Updated the agent vendoring for `StartTracing` and `StopTracing`. This only changed a single file - the auto-generated gRPC protocol buffer file. This change resolves four vendoring issues: - The github.com/kubernetes-incubator/cri-o project was renamed to github.com/cri-o/cri-o. Although github redirects, `dep` complains that it cannot find the old `github.com/kubernetes-incubator/cri-o` files under `vendor/` so remove the old config, relying on the existing (and in other respects identical) `dep` config. - There was a stale dependency on `github.com/clearcontainers/proxy` which should have been removed when the Clear Containers code was excised. - The latest version of the agent code vendored into the runtime prior to this commit was a merge commit (commit `48dd1c031530fce9bf16b0f6a7305979cedd8fc9`). This somehow confused `dep` which did *not* correctly pull in the latest version of the auto-generated gRPC code (`vendor/github.com/kata-containers/agent/protocols/grpc/agent.pb.go`). This is clear because commit `48dd1c031530fce9bf16b0f6a7305979cedd8fc9` is newer than the agent commit that introduced the `StartTracing` and `StopTracing` APIs (`00cf907afcb7c8e56f077cf45ae3615f612fdc9d`). Resolving the other two issues above seems to have resolved this issue as the correct version of this file has now been included in the vendoring, however note there is no change to the `dep` files as this version of `agent.pb.go` should already have been included (!) - Updating `agent.pb.go` also removed the `AddInterface` and `RemoveInterface` API calls which should again also have been removed already. Updated tests to remove these redundant calls. Signed-off-by: James O. D. Hunt <[email protected]>
Add configuration options to support the various Kata agent tracing modes and types. See the comments in the built configuration files for details: - `cli/config/configuration-fc.toml` - `cli/config/configuration-qemu.toml` Fixes kata-containers#1369. Signed-off-by: James O. D. Hunt <[email protected]>
33eb4fb
to
ed64240
Compare
/retest |
Branch updated yet again. Let's try to get this landed before it goes stale again folks... Note too the mess I've tried to resolve on the vendor commit (b573d9b). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this lgtm
let's try and get this landed this time...
Only thing we are waiting for is the rhel-7 CI .... |
afaict, the rhel7 CI has passed: http://jenkins.katacontainers.io/job/kata-containers-agent-rhel-7-PR/1/ but not reported status back here. That may be because that looks like a new jenkins job - and we may have to enable some hook/auth on the github end. @chavafg - can you see/confirm if that is the case. |
@grahamwhaley yeah, this is job landed yesterday I think. I see some jobs on the queue waiting for some hours, so not sure what is happening, in the meantime feel free to merge without rhel reporting back. |
Thanks @chavafg - merging!! |
@grahamwhaley and @chavafg , yesterday I saw that the agent node of jenkins was having in trying to launch the VMs from rhel after killing the nodes, I saw that the VMs start to run without issues. I review the configuration and everything was ok, not sure why Jenkins could not launch the VM. |
Thanks @GabyCT - let's keep an eye open for it happening again |
Errr. Something isn't right here. I triggered these builds almost 7 hours ago, and they are still "running"?!? sles 12 certainly shouldn't be running as it fell over:
|
Update the `check_vendor()` function in the static check script to run our current vendoring tool (`dep`) in check mode. This will ensure our vendoring does not get silently broken [1]. Fixes kata-containers#1506. [1] - See for example kata-containers/runtime#1442. Signed-off-by: James O. D. Hunt <[email protected]>
Update the `check_vendor()` function in the static check script to run our current vendoring tool (`dep`) in check mode. This will ensure our vendoring does not get silently broken [1]. Fixes kata-containers#1506. [1] - See for example kata-containers/runtime#1442. Signed-off-by: James O. D. Hunt <[email protected]> (cherry picked from commit 830983f) Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
Update the `check_vendor()` function in the static check script to run our current vendoring tool (`dep`) in check mode. This will ensure our vendoring does not get silently broken [1]. Fixes kata-containers#1506. [1] - See for example kata-containers/runtime#1442. Signed-off-by: James O. D. Hunt <[email protected]> (cherry picked from commit 830983f) Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
Update the `check_vendor()` function in the static check script to run our current vendoring tool (`dep`) in check mode. This will ensure our vendoring does not get silently broken [1]. Fixes kata-containers#1506. [1] - See for example kata-containers/runtime#1442. Signed-off-by: James O. D. Hunt <[email protected]>
Update the `check_vendor()` function in the static check script to run our current vendoring tool (`dep`) in check mode. This will ensure our vendoring does not get silently broken [1]. Fixes kata-containers#1506. [1] - See for example kata-containers/runtime#1442. Signed-off-by: James O. D. Hunt <[email protected]> (cherry picked from commit 830983f) Signed-off-by: Ganesh Maharaj Mahalingam <[email protected]>
Add configuration options to support the various Kata agent tracing modes and types. See the comments in the built configuration files for details:
cli/config/configuration-fc.toml
cli/config/configuration-qemu.toml
Fixes #1369.
Signed-off-by: James O. D. Hunt [email protected]