Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(libsinsp): enable metrics collector on all platforms #1870

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mrgian
Copy link
Contributor

@mrgian mrgian commented May 16, 2024

What type of PR is this?

Uncomment one (or more) /kind <> lines:

/kind bug

/kind cleanup

/kind design

/kind documentation

/kind failing-test

/kind feature

Any specific area of the project related to this PR?

Uncomment one (or more) /area <> lines:

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap-engine-bpf

/area libscap-engine-gvisor

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap-engine-nodriver

/area libscap-engine-noop

/area libscap-engine-source-plugin

/area libscap-engine-savefile

/area libscap

/area libpman

/area libsinsp

/area tests

/area proposals

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

fix(libsinsp): enable metrics collector on all platforms

Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering whether we should tie available sinsp_stats_v2_collectors to eg: MINIMAL_BUILD (for example, container-related ones will always be 0 on MINIMAL_BUILD builds).
This should be as simple as adding a compilation guard around collector entries.

@@ -274,9 +272,11 @@ class libs_metrics_collector
uint32_t m_metrics_flags = METRICS_V2_KERNEL_COUNTERS | METRICS_V2_LIBBPF_STATS | METRICS_V2_RESOURCE_UTILIZATION | METRICS_V2_STATE_COUNTERS | METRICS_V2_PLUGINS;
std::vector<metrics_v2> m_metrics;

#ifdef __linux__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had time to check out this PR but reading your comment @FedeDP I would like that. Especially since the scap refactor the CPU usage calculation is broken when only having a plugin source even when on Linux because we do not instantiate the agent info in that case which is used in the CPU usage calculation.

@mrgian mrgian changed the title [WIP] fix(libsinsp): enable metrics collector on all platforms fix(libsinsp): enable metrics collector on all platforms May 16, 2024
@mrgian mrgian marked this pull request as ready for review May 16, 2024 12:55
@poiana poiana requested a review from incertum May 16, 2024 12:56
@FedeDP
Copy link
Contributor

FedeDP commented May 16, 2024

Since we don't need this for the next release, i'd put this in the
/milestone 0.18.0

@poiana poiana added this to the 0.18.0 milestone May 16, 2024
@mrgian
Copy link
Contributor Author

mrgian commented May 16, 2024

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Ei @FedeDP make sense!
Since you moved this to the next milestone and we are not in hurry, I can take care of this :)

@incertum
Copy link
Contributor

I think we might want to move these in the scap_platform vtable, likely as a
struct scap_metrics_vtable (embedded in each scap_foo_platform), so that we could get platform-dependent metrics from the scap handle. Again, this might be an idea for a future refactor.

Ei @FedeDP make sense! Since you moved this to the next milestone and we are not in hurry, I can take care of this :)

Added this as item to falcosecurity/falco#3194 (comment).
Just to reiterate: If we could fix the agent info initialization for Linux for the plugin platform (see falcosecurity/falco#2821) -- it would be fantastic. For macOS and Windows CPU utilization and memory usage calculation would need to be new code, not sure if truly needed, WDYT?

@FedeDP
Copy link
Contributor

FedeDP commented May 17, 2024

If we could fix the agent info initialization for Linux for the plugin platform (see falcosecurity/falco#2821) -- it would be fantastic.

Agree!

For macOS and Windows CPU utilization and memory usage calculation would need to be new code, not sure if truly needed, WDYT?

I think it is interesting to expose those metric for osx and win too, but yes it's not high priority.

@incertum
Copy link
Contributor

@mrgian hope all is well, just wanted to kindly check in and ask what our current plan is to get out of the regression in our scap platforms approach? (falcosecurity/falco#2821) If we can have a proper refactor -- amazing. Else I would also support something more intermediate to ensure the next Falco release does not have this regression anymore. CC @FedeDP @leogr

Thanks in advance!

@mrgian mrgian marked this pull request as ready for review July 17, 2024 08:52
@poiana poiana requested a review from LucaGuerra July 17, 2024 08:54
@incertum
Copy link
Contributor

Hey @incertum For now I'm just moving linux-specific metrics collection logic to the scap_platform vtable. So that we can use the scap handle to gather platform-dependent metrics. This will make libs_metrics_collector platform agnostic. I'm not working on a proper refactor that will solve the regression, but if you have any idea for that please let me know!

Posted here falcosecurity/falco#2821 (comment)

@gnosek
Copy link
Contributor

gnosek commented Jul 23, 2024

Seems like this one and #2821 are intertwined :)

@mrgian, please take a look at my comment #1969 (comment) for some ideas about the future direction of libscap/libsinsp and scap_platform. IMO, let's move stuff out of libscap, not into (especially here: libscap doesn't care one bit about these metrics, they're purely for libsinsp use).

If you agree with that, then I think it's not a good idea to add more stuff to scap_platform. Instead, we can make metrics_collector a virtual base class (this is effectively what a scap_platform is) and move the concrete implementation to e.g. userspace/libsinsp/linux/metrics_collector.cpp.

Then, we have two options for the consumers of the metrics:

  • provide a no-op userspace/libsinsp/generic/metrics_collector.cpp for other platforms so that we always have some metrics collector, or
  • add a #define (presumably via cmake) that says we do have a metrics_collector for this platform and #ifdef on that, rather than __linux__

(I'd rather go for 1, personally)

One thing to bikeshed would be the directory structure (it's trivial here, but it will set precedent for future per-platform components). I see two approaches:

libsinsp/linux/metrics_collector.cpp:

  • (good) we can build e.g. sinsp_linux.a from the whole libsinsp/linux directory, simplifying the build system a little
  • (bad) the API header would have to live directly in libsinsp/

libsinsp/metrics_collector/linux_metrics_collector.cpp:

  • (good) provides a nice place for a platform-agnostic header with the base class definition
  • (bad) platform-specific code is spread across directories, making it a bit less convenient to create common per-platform helpers (would have to live in something like libsinsp/linux/common.h)

I don't have a strong opinion on this either way tbh.

@mrgian
Copy link
Contributor Author

mrgian commented Jul 23, 2024

Ehi @gnosek
I see now. I agree on keeping the metrics collection logic out of the scap_platform.
Also the scap_platform it's plain-C, this can make collecting other kinds of metrics harder.

Instead, we can make metrics_collector a virtual base class

If I'm not wrong, currently libs_resource_utilization (https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/metrics_collector.h#L271-L300) is the only class with linux-only code.
A similar solution would be making libs_resource_utilization a virtual class (with platform-specific implementations).

As you said, taking a decision on the directory naming will influence future components development, so I'll wait to know what the maintainers think.

@mrgian mrgian marked this pull request as draft July 23, 2024 09:57
Copy link

github-actions bot commented Jul 23, 2024

Perf diff from master - unit tests

     5.36%     -1.51%  [.] gzfile_read
     1.57%     -0.99%  [.] sinsp_evt::get_direction
     6.78%     +0.92%  [.] sinsp::next
     3.79%     +0.87%  [.] sinsp_parser::process_event
     3.84%     +0.86%  [.] sinsp_evt::load_params
     2.82%     +0.63%  [.] sinsp_thread_manager::find_thread
     2.11%     -0.62%  [.] scap_event_decode_params
     1.53%     +0.55%  [.] std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release
     0.43%     +0.52%  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Identity, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, true, true> >::find
     9.66%     -0.51%  [.] sinsp_parser::reset

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Benchmarks diff from master

Comparing gbench_data.json to /root/actions-runner/_work/libs/libs/build/gbench_data.json
Benchmark                                                         Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------
BM_sinsp_split_mean                                            -0.0255         -0.0255           152           148           152           148
BM_sinsp_split_median                                          -0.0268         -0.0267           152           148           152           148
BM_sinsp_split_stddev                                          -0.3974         -0.3975             1             1             1             1
BM_sinsp_split_cv                                              -0.3816         -0.3817             0             0             0             0
BM_sinsp_concatenate_paths_relative_path_mean                  +0.0274         +0.0274            60            61            60            61
BM_sinsp_concatenate_paths_relative_path_median                +0.0348         +0.0348            60            62            60            62
BM_sinsp_concatenate_paths_relative_path_stddev                -0.2858         -0.2862             1             0             1             0
BM_sinsp_concatenate_paths_relative_path_cv                    -0.3048         -0.3052             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_mean                     +0.0578         +0.0578            23            25            23            25
BM_sinsp_concatenate_paths_empty_path_median                   +0.0562         +0.0562            23            25            23            25
BM_sinsp_concatenate_paths_empty_path_stddev                   -0.2739         -0.2736             0             0             0             0
BM_sinsp_concatenate_paths_empty_path_cv                       -0.3136         -0.3133             0             0             0             0
BM_sinsp_concatenate_paths_absolute_path_mean                  +0.0006         +0.0006            62            62            62            62
BM_sinsp_concatenate_paths_absolute_path_median                +0.0066         +0.0066            62            62            62            62
BM_sinsp_concatenate_paths_absolute_path_stddev                +4.3443         +4.3339             0             1             0             1
BM_sinsp_concatenate_paths_absolute_path_cv                    +4.3411         +4.3306             0             0             0             0
BM_sinsp_split_container_image_mean                            -0.0294         -0.0294           414           402           414           402
BM_sinsp_split_container_image_median                          -0.0318         -0.0317           414           401           414           401
BM_sinsp_split_container_image_stddev                          +0.7112         +0.7145             2             3             2             3
BM_sinsp_split_container_image_cv                              +0.7630         +0.7664             0             0             0             0

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 92.66055% with 8 lines in your changes missing coverage. Please review.

Project coverage is 73.69%. Comparing base (6a0df22) to head (d0c2588).
Report is 65 commits behind head on master.

Files with missing lines Patch % Lines
userspace/libsinsp/linux/resource_utilization.cpp 92.23% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1870      +/-   ##
==========================================
+ Coverage   73.57%   73.69%   +0.12%     
==========================================
  Files         253      255       +2     
  Lines       31875    31914      +39     
  Branches     5648     5619      -29     
==========================================
+ Hits        23452    23520      +68     
+ Misses       8408     8382      -26     
+ Partials       15       12       -3     
Flag Coverage Δ
libsinsp 73.69% <92.66%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@incertum
Copy link
Contributor

If I'm not wrong, currently libs_resource_utilization (https://github.com/falcosecurity/libs/blob/master/userspace/libsinsp/metrics_collector.h#L271-L300) is the only class with linux-only code.

Confirmed.

As you said, taking a decision on the directory naming will influence future components development, so I'll wait to know what the maintainers think.

Also don't have any preference. Maybe go with what @gnosek deems slightly better, because Grzeg has been around the block for some time and I get all the callouts. The ifdefs were a good solution to get these metrics going. Now we can finally get it right. By now 4+ folks already refactored the libs metrics collector, so there is hope that we will stabilize that code at some point 🙃 .

@FedeDP
Copy link
Contributor

FedeDP commented Aug 27, 2024

Any news on this @mrgian ?

@mrgian
Copy link
Contributor Author

mrgian commented Aug 27, 2024

Ei @FedeDP
Not yet!
We decided to refactor this again :( and currently I'm busy with other tasks
So I don't think this will make it in the next release, but I will start working on this as soon as I can

@FedeDP
Copy link
Contributor

FedeDP commented Aug 27, 2024

Ok! Moving to next milestone then :)
/milestone 0.19.0

@poiana poiana modified the milestones: 0.18.0, 0.19.0 Aug 27, 2024
@mrgian mrgian force-pushed the plugin-api-metrics-win-test branch 4 times, most recently from b8b3ee8 to 9938e53 Compare October 8, 2024 15:29
@mrgian mrgian force-pushed the plugin-api-metrics-win-test branch from 9938e53 to d0c2588 Compare October 10, 2024 14:44
@mrgian
Copy link
Contributor Author

mrgian commented Oct 10, 2024

Ehi @incertum @gnosek
I moved all the linux-specific code in linux/resource_utilization.cpp.
If compiled on a non-linux platform, instead of using linux_resource_utilization we use a generic libs_metrics which returns an empty metrics vector on to_metrics().
This allows us to use the metrics collector on all platforms.

WDYT?

@FedeDP
Copy link
Contributor

FedeDP commented Nov 13, 2024

/cc @gnosek

/milestone 0.20.0

@poiana poiana requested a review from gnosek November 13, 2024 08:59
@poiana poiana modified the milestones: 0.19.0, 0.20.0 Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants