Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oss merge libs 0.10.1 #874

Closed

Conversation

greyhame-s
Copy link
Contributor

First attempt at 0.10.1 merge from upstream. I also specifically picked up Greg's completed fix for adding euid to execve exit events. I had to make a few agent changes to get the build to succeed, so that will be a separate PR that will need to go in soon after this one.

fixup commits

b562ea53 HEAD@{16}: commit: fixup! struct definitions moved by 53aad03
fac67317 HEAD@{18}: rebase (continue): !fixup - removed dead code, or code that will be added in a subsequent cherry pick
87c6c2fc HEAD@{34}: commit: fixup! Merge with sprintf - snprintf cleanup from 08f078c
c4d7129f HEAD@{37}: commit: fixup! Allow enabling/disabling individual container engines on startup (docker windows support dropped by 07acb8c9)
c523c796 HEAD@{39}: commit: fixup! Make sinsp remove_inactive_threads() method public (simple merge against 2f235d7)
b54e415c HEAD@{43}: commit: fixup! Add procfs_utils.ut.cpp to the test binary (simple merge against 4b0cf1e)
75e96eb8 HEAD@{48}: commit: fixup! Enhancements to initial scan of /proc, for supportability (Joe, your old changes are split across 3 linux specific files now. Greg, please check that 08fd40e and 0fedd76 are merged properly)
d7a5fe84 HEAD@{50}: commit: fixup! Fill thread loginuid with default value -1, if /proc loginuid is unavailable (Joe, please check if your changes were correctly moved to the new linux/scap_procs.c file)

Andreagit97 and others added 30 commits December 15, 2022 17:11
This change increases the number of retries to retrieve container
information from CRI API from 3 to 5, as several failures were observed
with the maximum number of attempts set to 3.

Signed-off-by: Iacopo Rozzo <[email protected]>
Co-authored-by: Andrea Terzolo <[email protected]>
Signed-off-by: Melissa Kilby <[email protected]>
This change makes sure that 5 maximum retries to retrieve container information
are used with CRI only. It puts back the number of retries to 3 for all the
other container runtimes.
It also adjusts the maximum time to complete all their attempts to take into
account the increased of retries.

Signed-off-by: Iacopo Rozzo <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]>
Co-authored-by: Hendrik Brueckner <[email protected]>
Co-authored-by: Mauro Ezequiel Moltrasio <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]>
Co-authored-by: Hendrik Brueckner <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]>
Co-authored-by: Hendrik Brueckner <[email protected]>
greyhame-s and others added 17 commits February 9, 2023 10:27
Have sinsp_container_lookup with what was sinsp_container_lookup_state inside.
Also introduce convenience methods.

Signed-off-by: Angelo Puglisi <[email protected]>
This is still used in analyzer_thread.cpp so keep it in our fork.
* fix(container_engine): Only return on success or all retries failed

Instead of always returning a result on the first attempt, only return
results on success or when all retries have failed.

This prevents spurious "container" events for incomplete results.

This is especially important when both docker and cri are enabled,
when both must be tried due to the cgroup pattern overlapping, but
only one actually holds the container.

Signed-off-by: Mark Stemm <[email protected]>

* Log a warning when empty container infos are returned

When empty container infos are passed up due to all attempts failing,
log a warining. This will help highlight cases when the communication
with the container runtime isn't working properly.

Signed-off-by: Mark Stemm <[email protected]>

* Add debug log to note when a lookup is async or sync

The "async_xxx" refers to the code that performs the lookup (we used
to have a separate "docker" engine, but it's been removed.

To make it more clear about whether a lookup is synchronous or
asynchronous, add a debug log.

Signed-off-by: Mark Stemm <[email protected]>

* Use bundled valijson for "regular" build

valijson doesn't really have an ubuntu package, so it can't be
preinstalled. Use the bundled valijson instead.

* Add RE2 to container used for builds + tests

This way it will be present when building with
-DUSE_BUNDLED_DEPS=False

Signed-off-by: Mark Stemm <[email protected]>
This reverts commit 35d80de.

It was probably causing some container runtime tests to fail.
Userspace workaround for Linux kernel behaviors on ARM and zLinux,
was not fully effective, and has since been obviated by kernel driver/eBPF
probe logic to generate missing scap events by other means.  So this
changeset removes the userspace workaround.
* Revert "Revert "Merge upstream pr 688 (#121)" (#122)"

This reverts commit c8dbbf3.

This adds the fix back. I'll test with an agent PR that
updates/removes the tests.

* Add the ability to "defer" an async lookup

In some cases, the "server" code running run_impl might want to retry
its work until later. The current version can't do that--once a key is
dequeued using deque_next_key, it has to call store_value or lose the
request.

To make retries easier, add a method defer_lookup that pushes the
key (and optional value) back onto the request queue with a
configurable delay. After delay, the key will be pulled again with a
call to dequeue_next_key().

Signed-off-by: Mark Stemm <[email protected]>

* Use defer_lookup for container info retry instead of lookup_delayed

When the container async lookup class wants to retry a lookup, the
current version tries to use lookup_delayed to initiate a new request.

It turns out that that doesn't work--if there's already an existing
request in m_value_map, it assumes that the "server" doing run_impl
will eventually return an answer, and doesn't add a request to the
queue.

The solution is to use the newly added lookup_delayed instead, which
pushes the request back onto the queue with a short delay.

Signed-off-by: Mark Stemm <[email protected]>

* Use a separate max_wait_ms instead of re-using s_cri_timeout

Now that timeouts are working, it may take several seconds for
subsequent retries to complete. However, s_cri_timeout (typically 1
second) was being used for the max_wait_ms in cri_async_source. That
would mean that a lookup would expire before the server side had
retried the lookup.

The solution is to use a separate 10 second max_wait_ms, which matches
docker.

Signed-off-by: Mark Stemm <[email protected]>

Signed-off-by: Mark Stemm <[email protected]>
It isn't on Windows.

Signed-off-by: Grzegorz Nosek <[email protected]>
The function sinsp_thread_manager::reinit_thread_from_proc() was added
to draios/agent-libs as part of a now-obsolete workaround for an
ARM/zLinux platform bug.  That workaround has been removed, so now
we need to remove this no-longer-used function from sinsp_thread_manager.
In some cases, we want to ensure that a visitor does *not* change the
ast. This includes cases where the ast pointer used by the visitor is
read-only.

To support these use cases, add a const_expr_visitor interface where
all the visit() methods take a const argument.

Also add variants of accept() that take const_expr_visitor arguments
and call the const_expr_visitors visit() method.

Compiling, cloning, and stringifying asts are all cases that should
not change the underlying ast, so switch those to use
const_expr_visitor instead of expr_visitor.

A couple of compile private methods had to be changed to take const
arguments. They already didn't modify those arguments, so it was a
safe change.

Signed-off-by: Mark Stemm <[email protected]>

Signed-off-by: Mark Stemm <[email protected]>
We need them when building with hayabusa
We can't prevent losing setuid events completely and the uid
is pretty important for some execve-related rules, so explicitly
pass the uid in execve/at exit events

Signed-off-by: Grzegorz Nosek <[email protected]>
Co-authored-by: Angelo Puglisi <[email protected]>
Co-authored-by: Andrea Terzolo <[email protected]>
@poiana
Copy link
Contributor

poiana commented Feb 10, 2023

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana
Copy link
Contributor

poiana commented Feb 10, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: greyhame-s
Once this PR has been reviewed and has the lgtm label, please assign incertum for approval by writing /assign @incertum in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana
Copy link
Contributor

poiana commented Feb 10, 2023

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@greyhame-s greyhame-s closed this Feb 10, 2023
@jasondellaluce jasondellaluce deleted the oss-merge-libs-0.10.1-2023-02-02 branch February 10, 2023 10:45
@jasondellaluce jasondellaluce restored the oss-merge-libs-0.10.1-2023-02-02 branch February 10, 2023 10:46
@greyhame-s greyhame-s deleted the oss-merge-libs-0.10.1-2023-02-02 branch March 24, 2023 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.