Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update: in nodriver mode, avoid loading proc, users and interfaces related informations #122

Conversation

FedeDP
Copy link
Contributor

@FedeDP FedeDP commented Nov 9, 2021

Signed-off-by: Federico Di Pierro [email protected]

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind cleanup

Any specific area of the project related to this PR?

/area libscap

/area libsinsp

What this PR does / why we need it:

In nodriver mode, skip loading proc, users and interfaces related informations as all event sources will be system-external.

Which issue(s) this PR fixes:

Possibly: falcosecurity/falco#1757

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@poiana
Copy link
Contributor

poiana commented Nov 9, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: FedeDP
To complete the pull request process, please assign gnosek after the PR has been reviewed.
You can assign the PR to them by writing /assign @gnosek in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana
Copy link
Contributor

poiana commented Nov 9, 2021

Hi @FedeDP. Thanks for your PR.

I'm waiting for a falcosecurity member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana poiana added the size/M label Nov 9, 2021
@leogr
Copy link
Member

leogr commented Nov 10, 2021

/ok-to-test

Copy link
Contributor

@mstemm mstemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you still want the initial process/user/interface lists in nodriver mode? I'm not sure I understand why we would not want them.

@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 11, 2021

Don't you still want the initial process/user/interface lists in nodriver mode? I'm not sure I understand why we would not want them.

What's the use for that? I mean, is there any particular use case where we want that in nodriver mode?

EDIT: surely i can be missing a piece there :)

… loading proc, users and interfaces related informations.

Signed-off-by: Federico Di Pierro <[email protected]>
@FedeDP FedeDP force-pushed the nodriver_no_scan_procs_users_interfaces branch from 11f87d3 to a70dd4a Compare November 12, 2021 16:13
@mstemm
Copy link
Contributor

mstemm commented Nov 12, 2021

I thought nodriver mode still attempted to get a reconstructed view of the world and a minimal set of events that could be written to a capture file. (I don't remember the details). Do you know?

@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 15, 2021

I thought nodriver mode still attempted to get a reconstructed view of the world and a minimal set of events that could be written to a capture file. (I don't remember the details). Do you know?

As nodriver mode has no events from syscalls. there is no real "world's view" in that sense.
We always dump:

  • scap_write_machine_info
  • scap_write_iflist
  • scap_write_userlist
  • scap_write_proclist
  • scap_write_fdlist

when opening the scap file. With the implementation, all of these will be empty (except for machine_info possibly).
Note, moreover, that these scap_write_ functions already have a

//
// No machine info on disk if the source is a plugin
//
if(handle->m_mode == SCAP_MODE_PLUGIN)
{
	return SCAP_SUCCESS;
}

IE: we are already avoiding dumping initial world for plugins; this is what we need to do not only for plugins, but whenever internal event source (ie: syscalls) is disabled.
I may rearrange the code with a more explicit check (eg: adding an helper method on scap is_system_event_source_enabled() or something similar).

@mstemm
Copy link
Contributor

mstemm commented Nov 15, 2021

One big difference between nodriver mode and plugin mode is that when in plugin mode, the only events and state are within the plugin. All of libsinsp is effectively disabled.

That's not true for nodriver mode, where the goal is to still obtain process and thread level state of the system.

Let me get more familiar with nodriver mode and then comment more.

@mstemm
Copy link
Contributor

mstemm commented Nov 16, 2021

I double-checked and nodriver mode depends on reading this info from /proc, so I think if the goal is to come up with a "light" mode that allowed only k8s audit logs and no syscalls, we should come up with another mode/solution, perhaps at the falco level instead of the libs level, to fix this.

@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 16, 2021

I double-checked and nodriver mode depends on reading this info from /proc, so I think if the goal is to come up with a "light" mode that allowed only k8s audit logs and no syscalls, we should come up with another mode/solution, perhaps at the falco level instead of the libs level, to fix this.

I think that when (if?) k8s audit logs will become a plugin, the issue will be fixed in any case; it's ok for me to close this PR and wait for a proper plugin implementation.

@FedeDP
Copy link
Contributor Author

FedeDP commented Nov 22, 2021

Closing because nodriver mode still needs a system view.
Issue itself (falcosecurity/falco#1757) will be fixed once k8s audit log becomes a plugin.

@FedeDP FedeDP closed this Nov 22, 2021
leogr pushed a commit to leogr/libs that referenced this pull request Jan 5, 2023
This reverts commit 35d80de.

It was probably causing some container runtime tests to fail.
leogr pushed a commit to leogr/libs that referenced this pull request Jan 5, 2023
* Revert "Revert "Merge upstream pr 688 (falcosecurity#121)" (falcosecurity#122)"

This reverts commit c8dbbf3.

This adds the fix back. I'll test with an agent PR that
updates/removes the tests.

* Add the ability to "defer" an async lookup

In some cases, the "server" code running run_impl might want to retry
its work until later. The current version can't do that--once a key is
dequeued using deque_next_key, it has to call store_value or lose the
request.

To make retries easier, add a method defer_lookup that pushes the
key (and optional value) back onto the request queue with a
configurable delay. After delay, the key will be pulled again with a
call to dequeue_next_key().

Signed-off-by: Mark Stemm <[email protected]>

* Use defer_lookup for container info retry instead of lookup_delayed

When the container async lookup class wants to retry a lookup, the
current version tries to use lookup_delayed to initiate a new request.

It turns out that that doesn't work--if there's already an existing
request in m_value_map, it assumes that the "server" doing run_impl
will eventually return an answer, and doesn't add a request to the
queue.

The solution is to use the newly added lookup_delayed instead, which
pushes the request back onto the queue with a short delay.

Signed-off-by: Mark Stemm <[email protected]>

* Use a separate max_wait_ms instead of re-using s_cri_timeout

Now that timeouts are working, it may take several seconds for
subsequent retries to complete. However, s_cri_timeout (typically 1
second) was being used for the max_wait_ms in cri_async_source. That
would mean that a lookup would expire before the server side had
retried the lookup.

The solution is to use a separate 10 second max_wait_ms, which matches
docker.

Signed-off-by: Mark Stemm <[email protected]>

Signed-off-by: Mark Stemm <[email protected]>
@poiana poiana mentioned this pull request Feb 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants