Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factorize queue code duplicated at two places #15098

Merged
merged 2 commits into from
Jan 17, 2023
Merged

Conversation

L3n41c
Copy link
Member

@L3n41c L3n41c commented Jan 17, 2023

What does this PR do?

  • Factorize the aggregating queue component which was duplicated in the container image metadata and the SBOM checks.
  • Use a time mocking library to make the test more reliable.

Motivation

  • Code duplication is bad.
  • Flaky tests are bad.

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

This change with be validating at the same time the container image metadata (#14567) and the SBOM (#14989) checks are validated.

Reviewer's Checklist

  • If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • Use the major_change label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.
  • A release note has been added or the changelog/no-changelog label has been applied.
  • Changed code has automated tests for its functionality.
  • Adequate QA/testing plan information is provided if the qa/skip-qa label is not applied.
  • At least one team/.. label has been applied, indicating the team(s) that should QA this change.
  • If applicable, docs team has been notified or an issue has been opened on the documentation repo.
  • If applicable, the need-change/operator and need-change/helm labels have been applied.
  • If applicable, the k8s/<min-version> label, indicating the lowest Kubernetes version compatible with this feature.
  • If applicable, the config template has been updated.

@L3n41c L3n41c added team/containers changelog/no-changelog [deprecated] qa/skip-qa - use other qa/ labels [DEPRECATED] Please use qa/done or qa/no-code-change to skip creating a QA card labels Jan 17, 2023
@L3n41c L3n41c added this to the 7.43.0 milestone Jan 17, 2023
@L3n41c L3n41c requested review from a team as code owners January 17, 2023 11:11
@L3n41c L3n41c mentioned this pull request Jan 17, 2023
10 tasks
Copy link
Contributor

@clamoriniere clamoriniere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

Copy link
Contributor

@ogaca-dd ogaca-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@L3n41c L3n41c merged commit bb1fc00 into main Jan 17, 2023
@L3n41c L3n41c deleted the lenaic/generic_queue branch January 17, 2023 15:41
guyarb added a commit that referenced this pull request Jan 18, 2023
* [fargate] Make hostname resolution more reliable (#14746)

* [config/environment] Check AWS_EXECUTION_ENV in Fargate detection

* [util/fargate] Rely on features for ECS Fargate detection

* [fargate/detection] Rely on features to detect EKS

* [trace-agent/config] Call fargate.GetOrchestrator after loading config

* add unit-test for trace-agent config on fargate

* Add release note

* [cmd/trace-agent/config] Fix TestFargateConfig in macOS

Co-authored-by: Cedric Lamoriniere <[email protected]>

* 7.41.0 CHANGELOG (#14675) (#14745)

* Updated Python to 3.8.16

* CWS: sync BTFhub constants (#14804)

Co-authored-by: paulcacheux <[email protected]>

* [CSPM] respect verbose on compliance check cli cmd (#14750)

* CODEOWNERS: splitting files so USM can own its own files (#14789)

* config: test: Removed duplicated test (#14705)

* Running dockers in the kitchen test (#14589)

* ci: kitchen: Allow running dockers in kitchen test, and extend the filesystem

The PR introduce a way to run external dockers in the kitchen tests, without pulling them
As we cannot authenticate in the kitchen machines to dockerhub, we had to work around that
and we are pulling and saving the dockers in gitlab, uploading them to the remote machine
using kitchen, and then loading those dockers on the remote machine so they are available
for usage.

In the PR we added steps to install docker and docker compose on the kitchen machines.

The PR introduce an example test that runs dockers.

During the PR we faced the problem of "no space left on the device", to solve those errors
we have to extend the filesystem of the remote machines.

* Fixed cr comments

* Debugging the artifacts

* Debugging the artifacts

* Debugging the artifacts

* Debugging the artifacts

* revert artifacts

* Giving another try to dependencies

* Fixed path

* Fixed CR comment

* [CWS] Add tests for activity dump processes content (#14708)

* [CWS] Add two checks to avoid adding nodes with abnormal paths in activity dumps (#14698)

* [gitlab] Repack macOS JUnit tarball to include correct name and job URL (#14793)

* Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /pkg/security/secl (#14710)

* Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /pkg/security/secl

Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.3.0 to 0.4.0.
- [Release notes](https://github.com/golang/tools/releases)
- [Commits](https://github.com/golang/tools/compare/v0.3.0...v0.4.0)

---
updated-dependencies:
- dependency-name: golang.org/x/tools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-generate go.sum and LICENSE-3rdparty.csv changes

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: paulcacheux <[email protected]>

* [single-machine-performance] Introduce regression detector jobs (#14528)

* [WIP][single-machine-performance] Introduce regression detector jobs

This PR intends to introduce the Single Machine Performance regression detector
into Agent CI. This builds on work done in #14477 and is peer to #14438. The
Regression Detector is a CI tool that determines if a changed introduced into a
project modifies project performance in a way that is more than just random
chance with some statistical guarantee. The Regression Detector is not a
microbenchmarking tool and must operate on the whole Agent. This PR introduces
only 'throughput' as an optmization goal -- how quickly can the Regression
Detector produce load into the Agent -- but other goals are
possible. Regressions are checked per-experiment, please see `tests/regression`
for details about how to define an experiment.

The Regression Detector runs today in vectordotdev/vector project and is
influential in keeping that project's performance consistently high.

REF SMP-208

Signed-off-by: Brian L. Troutwine <[email protected]>

* Use static smp binary

Signed-off-by: Brian L. Troutwine <[email protected]>

* different base sha calculation

Signed-off-by: Brian L. Troutwine <[email protected]>

* Try to clone the whole repo

Signed-off-by: Brian L. Troutwine <[email protected]>

* baseline sha computation redux

Signed-off-by: Brian L. Troutwine <[email protected]>

* specify region explicitly

Signed-off-by: Brian L. Troutwine <[email protected]>

* use smp 0.6.3-rc3

Signed-off-by: Brian L. Troutwine <[email protected]>

* Wait for job to complete, output report, status

Signed-off-by: Brian L. Troutwine <[email protected]>

* update job name

Signed-off-by: Brian L. Troutwine <[email protected]>

* Update smp, lading

Signed-off-by: Brian L. Troutwine <[email protected]>

* remove \

Signed-off-by: Brian L. Troutwine <[email protected]>

* Use smp 0.6.4

Signed-off-by: Brian L. Troutwine <[email protected]>

* diagnose why file_to_blackhole fails

Signed-off-by: Brian L. Troutwine <[email protected]>

* just one test for now

Signed-off-by: Brian L. Troutwine <[email protected]>

* set log level for smp

Signed-off-by: Brian L. Troutwine <[email protected]>

* tweaks

Signed-off-by: Brian L. Troutwine <[email protected]>

* debug

Signed-off-by: Brian L. Troutwine <[email protected]>

* actually add datadog.yaml et all, .gitignore issue?

Signed-off-by: Brian L. Troutwine <[email protected]>

* tidy up cases to initial trio, less file_to_blackhole which needs work

Signed-off-by: Brian L. Troutwine <[email protected]>

* update smp, config tweak

Signed-off-by: Brian L. Troutwine <[email protected]>

* override .gitignore

Signed-off-by: Brian L. Troutwine <[email protected]>

* Apply @GeorgeHahn's patches

Signed-off-by: Brian L. Troutwine <[email protected]>

* enable other tests, tweak OTEL

Signed-off-by: Brian L. Troutwine <[email protected]>

* more fiddling

Signed-off-by: Brian L. Troutwine <[email protected]>

* tweaks

Signed-off-by: Brian L. Troutwine <[email protected]>

* use markdown output report

Signed-off-by: Brian L. Troutwine <[email protected]>

* use OTEL http

Signed-off-by: Brian L. Troutwine <[email protected]>

* use smp 0.6.5-rc1

Signed-off-by: Brian L. Troutwine <[email protected]>

* debug -> info

Signed-off-by: Brian L. Troutwine <[email protected]>

* preserve output

Signed-off-by: Brian L. Troutwine <[email protected]>

* remove stray tick

Signed-off-by: Brian L. Troutwine <[email protected]>

* Update test/regression/README.md

Co-authored-by: Kylian Serrania <[email protected]>

* Update test/regression/README.md

Co-authored-by: Kylian Serrania <[email protected]>

Signed-off-by: Brian L. Troutwine <[email protected]>
Co-authored-by: Kylian Serrania <[email protected]>

* Split bundle params (#14702)

* Split BundleParams into ConfigParams and LogParams
* Move ConfigParams and LogParams to their own file
* Move WithXXX functions from BundleParams to config.Params
* Use constructors for config.Params
* Fix comp/core/log/params_test.go
* Make fields for log.Params unexported
* Make config.Params fields not exported.
* Fix package names in the security agent.
* Explain why `fx.Provide` is needed in bundle.go
* Remove configLoadSecurityAgent from NewSecurityAgentParams
* Add NewAgentParamsWithSecrets and NewAgentParamsWithoutSecrets

* CWS: sync BTFhub constants (#14815)

Co-authored-by: paulcacheux <[email protected]>

* Check the package exists before creating package. Restore install script after packaging. (#14777)

* change networks slack channel (#14819)

* fix close_time value display in INFO log (#14744)

* Updates prometheusScrape to support tag_by_endpoint and collect_counters_with_distributions (#14805)

* Updates prometheusScrape to support tag_by_endpoint

* Adds release note

* Cleans release note

* Also adds support for `collect_counters_with_distributions`

* Updates release note to include the second added parameter

* Updates release note based on suggestion by @clamoriniere

* Migrating flare to a component (#14234)

Migrating flare to a component

This adds a 'flare' component and rework the flare package to be
compatible with fx app and non-fx app.

The flare generation now happens through a FlareBuilder which handles
all the logic of adding data to a flare. This FlareBuilder can be used
directly (by the flare package) or be received by each component when
they register a flare provider.

Migration workflow for each component would be to move their dedicated
code from the flare package to a flare provider.

Note: Until `cmd/systray/` is migrated to fx we can't start using the
flare component from other flare (on windows the systray can create
flare on it's own).

* Add netlink process monitor (#14706)

This monitor will read the netlink socket process events queue and run it on parallel worker (map to n cpu cores)
ProcessMonitor require root or CAP_NET_ADMIN capabilities

Aim to Subscribe() to process event Exec, Exit
With or without metadata process Any, Name, MAPfile
    
ProcessMonitor will subscribe to the netlink process events like Exec, Exit
and call the subscribed callbacks
Initialize() will scan the current process and will call the subscribed callbacks

callbacks will be executed in parallel via a pool of goroutines (runtime.NumCPU())
callbackRunner is callbacks queue. The queue size is set by processMonitorMaxEvents

Multiple team can use the same ProcessMonitor,
the callers need to guarantee calling each Initialize() Stop() one single time this maintain an internal reference counter

Netlink process subscription, socket connection is allowed only by one PID

* protocols: refactor tests to allow pre-post setups (#14817)

* protocols: refactor tests to allow pre-post setups

* Added temporary nolint for skippers

* Fixed bugs

* Escape path in get-acl command (#14818)

* ci: Add manual benchmark step for trace-agent (#14466)

* pkg/trace/config: Lower max tracer payload to 25 MB to better align with backend limits (#14782)

* Revert #14367 and use nano timestamp instead (#14825)

* Revert "Replace timestamp by increasing id to avoid configVersion matching different config changed in the same second"

This reverts commit f8e097de2aa3322670fcc6a6c8cfc5c1ed9d6239.

* Revert #14367 and use nano timestamp instead

* Disable by default remote-tagger in clc-runner mode (#14821)

* fix gofmt -s for pkg/collector/collector_demux_test.go (#14808)

* Improve debug logging in cloud foundry container tagger (#14803)

* Add logging around container retries

* Add trace log

* Change to debug and add release note

* Delete Improve-container-tagger-logging-e48b0fffbe8563d0.yaml

* Add timestamp id to events

* Make id more specific, use container String method

* Just print class

* Update pkg/cloudfoundry/containertagger/container_tagger.go

Co-authored-by: NouemanKHAL <[email protected]>

* Address PR review

* Create event ID

Co-authored-by: NouemanKHAL <[email protected]>

* [Serverless] Merge serverless/main to main. (#14826)

* [Serverless] change account (#14755)

* Aj/buffer cold start span data (#14664)

* wip dirty commit - trace being created but not flushed properly. No further traces appearing

WIP: more debugging. StopChan properly set up

feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec

feat: Update specs to write to channels

feat: Merge conflicts resolved for tests

feat: Use smaller methods to handle locking

fix: pass coldstartSpanId to sls-init main

feat: Remove default

feat: Use Millisecond as Second is far longer than necessary

feat: No need to export ColdStartSpanId

fix: update units

feat: Directionality for lambdaSpanChan as well as for initDurationChan

fix: No need for the nil check, I need to stop javascripting my go

feat: ints

* feat: rebase missing changes from merge commits

* feat: update ints after moving accounts

* Empty commit to trigger ci

* [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783)

* Retry serverless integration test failures automatically. (#14801)

* [Serverless] Allow some keys to be option in serverless integration tests. (#14827)

* Ability to remove items from the json.

* Remove items from snapshot.

Co-authored-by: Maxime David <[email protected]>
Co-authored-by: AJ Stuyvenberg <[email protected]>

* Allow Regression Detector pipeline to fail (#14828)

At present there's a race condition in the CI pipeline with regard to Regression
Detector: we rely on an artifact to be created by main pipeline merge but have
no way of making a hard dependency on that artifact. If that artifact is not
present then the Regression Detection job will be submitted and then immediately
fail. Absent a solution we allow the Regression Detector job to fail,
unfortunately making any actual regressions caught but also not contributing to
alert blindness in the meanwhile.

Signed-off-by: Brian L. Troutwine <[email protected]>

Signed-off-by: Brian L. Troutwine <[email protected]>

* [process-agent] Remove unused properties from AgentConfig (#14842)

* [process-agent] Remove unused properties from AgentConfig

* Fix tests

* 7.41.1 changelog (#14822) (#14824)

* Add do-not-merge github action (#14843)

* [CWS] remove useless resolver function (#14792)

* [kitchen] Work around bundler and ruby version issue in verifier (#14851)

Modifies the script used to run kitchen tests to run the verify phase twice, and adds a pre_verify lifecycle hook to install the dependency needed for system-probe kitchen tests.

Works around an issue (version mismatch between ruby and bundler) that started happening after the release of version 2.4.0 of bundler.
As long as this workaround is needed, we can't have Gemfiles in test suites, and instead need to manually install gems whenever needed.

* Add the 'test' build tag to the 'unit-tests' flavor

This tag is needed to run unit-test but was not printed by
'inv print-default-build-tags -b unit-tests'. When running tests from
a IDE or other we need the correct list of tags to be returned.

* flare: Added /opt/datadog-agent directory permissions to permissions.log (#14848)

* flare: Added /opt/datadog-agent directory permissions to permissions.log

system-probe internal files (sysprobe.socket, runtime compilation source files, prebuilt version, etc.) are located in /opt/datadog-agent
when getting a flare, we cannot know those files permissions (and if they exist).

* Take directories from configuration

* Fixed cr comments

* Fixed cr comments

* Fixed cr comments

* Update comp/core/flare/helpers/helpers.go

Co-authored-by: maxime mouial <[email protected]>

* [USM] protocol classification: add RabbitMQ classification  (#14734)

* wip

* Fixed

* added support for amqp without tests

* added UT's for consumer and sender for rabbitmq

* removed redundant client and server

* added support to classify also protocol header of amqp

* removed redundant function

* test

* fixed most of the cr notes

* fixed all the cr notes

* add ut

* fixed licence issue

* fixed ci issue

* fixed event common protocol type number

* Revert update of github.com/DataDog/datadog-operator

* fixed all cr notes

* merged main

* fixed a cr note

* reverted datadog-operation

* update licence

* fixed ci issue

* merged main and updated ut

* fixed cr note

* added some UT's and support the latest classification uts update

* refactor the uts

* Added debug log

* Added debug log 2

* Added debug log 3

* Added pattern scanner

Co-authored-by: Guy Arbitman <[email protected]>

* Handle environment variables without an equal sign (#14806)

* usm: protocols: Refactored server creation (#14869)

* Removed example docker tests (#14852)

* [CWS][SEC-5573] add custom CWS product (#14748)

* [CWS] add custom CWS product

* Add a debouncer to limit reloads

* Update URL regexp to detect for Datadog's URL

In the past we use to edit the regexp everytime Datadog would open a new
location. This commit allow the agent to detect for all present and
future locations as long as they follow the format of 2 letters + 1
digit. Example: 'us3.datadoghq.com'.

* system-probe: tasks: Save all dockers from docker-compose files in the protocols dir (#14873)

* system-probe: tasks: Save all dockers from docker-compose files in the protocols dir

* Fixed lint

* [process-agent] Move data scrubber and disallow list from pkg/process/config (#14863)

- Move these two fields in preparation for removal of pkg/process/config package.
- Use inclusive naming where possible - will rename the config param in the future.
- Update imports in pkg/security using the DataScrubber type.

* add `integration_profiling` config option (#14847)

Add a new option to enable profiling of python integrations. It's used only within the python integrations. See https://github.com/DataDog/integrations-core/pull/13576.

* Fix flaky TestKSMCheckInitTags unit-test (#14832)

* Fix flaky TestKSMCheckInitTags unit-test
* improve config.GetConfiguredTags testability
* update GetConfiguredTags function description

* Deleting Security Agent for Windows resources (#14833)

* deleting windows resources

* removing windows operations for security-agent.build task

* removing secagent for windows resources in omnibus, addressing python lint

* [process-agent] Remove orchestrator config from AgentConfig (#14867)

* [process-agent] Move data scrubber and disallow list from pkg/process/config

- Move these two fields in preparation for removal of pkg/process/config package.
- Use inclusive naming where possible - will rename the config param in the future.
- Update imports in pkg/security using the DataScrubber type.

* [process-agent] Remove orchestrator config from AgentConfig

- Further decouple config management in prep for removal of pkg/process/config.
- Remove orchestrator config, push it into the pod check and collector structs.

* Address review feedback

* [process-agent] Display system probe process module status in process agent info commands (#14880)

Updates the process agent status information displayed by the datadog-agent status, process-agent status and process-agent --info commands to display whether or not the system probe's process module is enabled

* tooling: Add invoke vscode devcontainer cmd (#14031)

* Add invoke vscode envcontainer cmd

* Update agent_dev_env.md

* fix typo in documentation

* adding err to exit SecAgent. fixes hanging if there's no API key (#14856)

* Replace hardcoded /proc path with config field (#14773)

Use the config field instead of hardcoding /proc. The config field should
be automatically detected to either /proc or /host/proc inside containers.

* usm: protocols: Added redis classification (#14886)

* usm: protocols: Added redis classification

* Fixed CR comment

* Fixed CR comment

* Fixed warning on centos

* [CWS] extract custom events package (#14230)

* [CWS] extract custom events package

* [CWS] extract selftest custom event

* [CWS] allow to specify a rate per rule through config

* post rebase

* add lint exception

* use the good sender

* [process-agent] Remove check intervals from pkg/process/config (#14878)

* [process-agent] Remove check intervals from pkg/process/config

- Remove check interval management from pkg/process/config package
- Never store intervals, just use config settings
- Generalize check for process and process RT check intervals

* Fix MacOS tests

* Address review feedback from @just-chillin

* flare: Ignore system probe dirs if they are empty (#14893)

* [CWS] increase exit event test timings (#14813)

* [CWS] fix rule id not sent for custom event (#14897)

* Adding return statment in GUI when an error is encountered

* [CI] Artifactory for Python (#14473)

* Introduce new E2E tests based on test-infra-definitions (#13643)

* manual check tracing uses new exhaustive tracing config option (#14892)

* manual check tracing uses new exhaustive tracing config option

Following up to https://github.com/DataDog/integrations-core/pull/13618, we now need to set both `integration_tracing` and `integration_tracing_exhaustive` config options to enable exhaustive tracing of integrations.

When manually running a check the increased overhead of exhaustive tracing (tracing all check methods) is acceptable. When continuous integration tracing is desired only the `integration_tracing` option should be set in order to keep the overhead minimal.

* update core agent check command

* fix sort order

* pkg/trace/traceutil: Add fast-path for NormalizeTags to reduce cpu usage (#14881)

* usm: remove the scenario of nil subprograms (#14909)

* usm: remove the scenario of nil subprograms

* Fixed CR comments

* Import order

* Fixed CR comments

* Bump datadog-api-client from 2.6.0 to 2.7.0 in /test/e2e/cws-tests (#14914)

Bumps [datadog-api-client](https://github.com/DataDog/datadog-api-client-python) from 2.6.0 to 2.7.0.
- [Release notes](https://github.com/DataDog/datadog-api-client-python/releases)
- [Changelog](https://github.com/DataDog/datadog-api-client-python/blob/master/CHANGELOG.md)
- [Commits](https://github.com/DataDog/datadog-api-client-python/compare/2.6.0...2.7.0)

---
updated-dependencies:
- dependency-name: datadog-api-client
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* usm: http2: improved functions (#14917)

* update profiling endpoint when the fips is enabled to avoid 404 (#14807)

* fix(fips): update profiling endpoint when the fips is enabled to avoid 404

Signed-off-by: Nicolas Guerguadj <[email protected]>

* pkg/clusteragent/admission: introduce deployment patcher (#14500)

* [CWS] avoid using readonly map for eBPF test prog (#14780)

* [e2e] add codeowners for new e2e tests (#14865)

* DogstatsD component improvements (#14839)

* Inject defaultLogFile
* Move main.go inside command/command.go
* Move start command to subcommands/start
* Dogstatsd uses pkg/cli/subcommands/version/command.go for version command
* Use similar code for cfgpath compare to datadog-agent

* eval.Opts holds MacroStore and VariableStore (#14874)

* [fake-datadog] add docker compose (#14902)

* [fake-datadog] add docker compose

* [fake-datadog] add docker instructions

* usm: mongo: Added mongo classification (#14809)

* usm: mongo: Added mongo classification

* Fixed CR comment

* Fixed CR comment

* Fixed CR comment

* Fixed CR comment

* Update agent_dev_env.md (#14887)

Co-authored-by: Kaylyn <[email protected]>

* [CWS][SEC-6508] use tail call limit to increase the number of args/envs (#14796)

* use tail call limit to increase the number of args/envs

* do not validate process overflow events to avoid srubbing argv and timeout

* [notifications] Catch all image pull errors as infra failures (#14926)

Updates the regex to match infra failure logs when pulling images to include more patterns. The previous pattern didn't catch the following line:

WARNING: Failed to pull image with policy "always": context deadline exceeded (manager.go:203:7197s)

* Do not install the integrations downloader for python 2 (#14920)

* usm: classification: Shrink classification buffer to 24 bytes (#14925)

* config: usm: Added USM to system-probe.yaml.example file (#14908)

* setupConfig consumes 1 param instead of many, adding to SecAgent constructor (#14884)

* changing func signature of setupConfig

* setting security agent config file instead of merging because Viper only supports 1 config file per viper instance

* Revert "setting security agent config file instead of merging because Viper only supports 1 config file per viper instance"

This reverts commit 8e6736d5025db79e5c1f552a983f9050f86a2c5c.

* MergeConfigurationFiles is just for SecAgent

* undo moving sys probe and secagent merge

fix return of merge

* rename configMissingOK field to baseConfigMissingOK

* setting secagent config path and config load secrets params

* adding secagent bundle param test

* reverting renaming configMissingOK to baseConfigMissingOK

* params.configMissingOK should be false

* fixing test post bundle breaking into config and log components

* config params test copywrite info

* [e2e/ndm] add snmp test environment (#14768)

* [e2e/ndm] add snmpsim data folder

* [new e2e test] update test-infra-definition version

* [e2e] fix aws signature

* [e2e/ndm] add snmp test environment

* [e2e/ndm] simpliofy err return code

* [e2e/ndm] remove unused close function

* [e2e/ndm] actually parse flags

* [e2e] ndm: fix destroy

* [e2e/ndm] add copyright header

* [CWS] extract probe from event and activity dump manager (#14515)

* [CWS] extract TC resolver into own resolver

* no probe in event

* include tcresolver in usual resolvers

* fix test

* apply review suggestion

* apply review suggestion v2

* [corechecks/snmp] Add IP Addresses to NDM Metadata interfaces (IPv4) (#14823)

* {Dockerfiles/agent,trace-agent/config}: disable apm `max_memory` and `max_cpu_percent` by default (#14850)

* [pkg/otlp] Add a simple example on metric export (#14784)

* Bump github.com/vektra/mockery/v2 from 2.15.0 to 2.16.0 in /internal/tools (#14913)

* Bump github.com/vektra/mockery/v2 in /internal/tools

Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.15.0 to 2.16.0.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/.goreleaser.yml)
- [Commits](https://github.com/vektra/mockery/compare/v2.15.0...v2.16.0)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* gen mocks

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Paul Cacheux <[email protected]>

* usm: Reducing chances for mistakes in the protocol type values (#14816)

* usm: classification: split the functions and helpers to protocol-dedicated-files (#14924)

* usm: classification: split the functions and helpers to protocol-dedicated-files

* usm: classification: rename protocol-classification-helpers to protocol-classification

* [process-agent] Remove host info from AgentConfig (#14885)

* [process-agent] Remove host info from AgentConfig

* Fix info command per review feedback

* [process-agent] Remove remaining properties from AgentConfig (#14889)

* Ignore RemoteSamplingClient when marshaling agent config (#14927)

* Ignore RemoteSamplingClient when marshaling agent config

* Add release note

* pkg/obfuscate: fix panic due to missing logger (#14859)

Obfuscator.log was uninitialized which was causing agent panic

* Update github.com/lxn/walk version (#14905)

* gitignore runtime compiled hash files (#14764)

* Try ignoring runtime compiled hash files

* Build object files before linting

* [process-agent] Remove pkg/process/config package (#14904)

* [process-agent] Remove pkg/process/config package

* Address review feedback from @kkhor-datadog

- Revert back to using util.PathExists for simplicity
- Clean up code with early exits

* Review feedback from @sgnn7

* Bump github.com/avast/retry-go/v4 from 4.3.1 to 4.3.2 (#14935)

Bumps [github.com/avast/retry-go/v4](https://github.com/avast/retry-go) from 4.3.1 to 4.3.2.
- [Release notes](https://github.com/avast/retry-go/releases)
- [Commits](https://github.com/avast/retry-go/compare/4.3.1...4.3.2)

---
updated-dependencies:
- dependency-name: github.com/avast/retry-go/v4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/prometheus/procfs from 0.8.0 to 0.9.0 (#14934)

Bumps [github.com/prometheus/procfs](https://github.com/prometheus/procfs) from 0.8.0 to 0.9.0.
- [Release notes](https://github.com/prometheus/procfs/releases)
- [Commits](https://github.com/prometheus/procfs/compare/v0.8.0...v0.9.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/procfs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [CWS Agent] Bugfixing SecAgent Params constructor (#14939)

* [USM] use per-cpu array map instead of in-stack buffer for classification (#14756)

* protocol classification: add per-cpu array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* Outsmart the verifier

* change map type on unsupported systems

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix runtime-compilation on older kernels + doc

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* docs & refactor

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add missing editor flag to change map type

Signed-off-by: Guillaume Pagnoux <[email protected]>

* usm: Reverted #14925

Signed-off-by: Guillaume Pagnoux <[email protected]>
Co-authored-by: Guy Arbitman <[email protected]>

* [gitlab] Use DEB buildimage based on Ubuntu 14.04 instead of Debian 8 (#14929)

* Adding config option to disable delta profiles when profiling the Agent

* Fixed nil return instead of an error in DogStatsD file replay

* Removed sending API key as params in forwarder

* [CWS] remove now useless runtime files sync check (#14945)

* flags package to organize security agent subcommand flags (#14906)

* [CI] Improve visibility for `docker run` commands in the CI (#14899)

Add line breaks for docker run commands

* [CWS Agent] SecAgent command pkg to replace common pkg, moving status and version subcommands (#14907)

* adding command package, to replace common

* status and version subcommands

* Bump github.com/itchyny/gojq from 0.12.10 to 0.12.11 (#14938)

Bumps [github.com/itchyny/gojq](https://github.com/itchyny/gojq) from 0.12.10 to 0.12.11.
- [Release notes](https://github.com/itchyny/gojq/releases)
- [Changelog](https://github.com/itchyny/gojq/blob/main/CHANGELOG.md)
- [Commits](https://github.com/itchyny/gojq/compare/v0.12.10...v0.12.11)

---
updated-dependencies:
- dependency-name: github.com/itchyny/gojq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Replacing TODOs in exposed comments with more meaningful comments (#14901)

* Revert "[agent] Support for running secrets backends with sha256 verification (#14529)" (#14940)

This reverts commit deb7fce8f668a4bca6697e76d0b77cb67d7f46f7.

* missing import in file with unsupported build flag (#14952)

* Bump golang.org/x/text from 0.5.0 to 0.6.0 (#14948)

Bumps [golang.org/x/text](https://github.com/golang/text) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/golang/text/releases)
- [Commits](https://github.com/golang/text/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: golang.org/x/text
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Download btfs for kitchen tests (#14587)

* Save btfs to dd-agent-omnibus s3 bucket

* Update folders to match new btfhub-archive names

* Download BTFs during kitchen-prepare task

* Add more details to error message

* Fix permissions

* Use btfs from dev box

* Update gitignore

* Check for bpftool compatability outside of generate_minimized_btfs

* Change x86-64 -> x86_64

* Fix generating minimized btfs

* Fix bpftool compatability check helper

* Fix python linting

* Fix python lint

* Only run BTF preparation outside of CI

* Explicitly indicate CI kitchen preparation

Co-authored-by: Hasan Mahmood <[email protected]>
Co-authored-by: Bryce Kahle <[email protected]>

* [secrets] Fix getDDAgentUserSID to account for NT AUTHORITY\SYSTEM (#14941)

* [secrets] Fix getDDAgentUserSID to account for NT AUTHORITY\SYSTEM

* Address review feedback from @clarkb7

* usm: classification: removed redundant nolint (#14958)

* Bump wheel versions (#14918)

* Fixing system_probe.py on linux machine (#14959)

* document trace API v04, including response (#14868)

* [CWS] improve mount fallback (#14779)

* [CWS] improve mount fallback

* post review

* [CWS] bump security agent policies to v0.42.1 (#14964)

* [orchestration] Add Vertical Pod Autoscalers (#14669)

* [orchestration] Add Vertical Pod Autoscalers

We want to start collecting Vertical Pod Autoscalers from Kubernetes.

Co-authored-by: Kangyi LI <[email protected]>
Co-authored-by: Bryce Eadie <[email protected]>

* [usm] Extract batching functionality into package (#14712)

* [Process Agent] Split Collector into Runner and Submitter (#14883)

* WIP

* Collector and submitter split, need to fix tests

* Rename receivers to `s`

* Delete components directory

* Add RT reporting to submitter

* Add `dropCheckPayloads` back into the submitter

* Move submitter tests to it's own file

* Delete component.go

* clean up comments and unused code

* Fix a couple tests

* Fix orchestrator tests

* Fix tests

* Fix copyright header

* Fix linter issues

* Use mocks in tests

* Fix import

* Fix data race in tests

* Fix data race in tests

* Update cmd/process-agent/collector.go

Co-authored-by: Ivan Ilichev <[email protected]>

* Refactor `Submit` to not return an error

* Remove `init()` in favor of using mock config

* Remove `init()` in favor of using mock config

* Update mockery to use version 2.16 since they were updated in #14913

* Fix linter errors (again)

* Fix `TestPodCheck/enabled` failing due to the clustername package caching a bad cluster name

* Remove `forwarderRetryQueueMaxBytes`

Co-authored-by: Ivan Ilichev <[email protected]>

* Bump Collector dependencies to v1.0.0-RC2/v0.68.0 (#14864)

* Bump Collector dependencies to v1.0.0-RC2/v0.68.0

* Revert InstrumentationLibraryMetadataAsTags changes

* Update collector test configuration error message

* Address PR comments

* Increase speed of generate_minimized_btfs jobs (#14585)

Co-authored-by: Bryce Kahle <[email protected]>

* Add dynamic way of determining eBPF helper availability on runtime compilation (#14685)

* Add KernelHeaderOptions type to prevent ebpf package dependency

* Add function to get available helpers on host

* Use dynamic method of finding available helpers

* Use static list for kernels with __BPF_FUNC_MAPPER macro

* Limit TestGetAvailableHelpers to kernels where it will work

* Fix udp bind for random ports (#14956)

* NDM: Add snmp.interface_status metric (#14797)

* NDM: Add snmp.interface_status metric

* update test

* Add reno

* Address review

* Rename metric

* Address review

* Add InterfaceStatus enum

* Remove iota and use explicit values

* NDM: Add snmp.device.[un]reachable metrics (#14649)

* NDM: Add snmp.device_up metric

* Address review

* update reno

* Address review

* fix import

* Improve log message (#14968)

Log the underlying error when GetUnitTypeProperties fails

* Use rv "0" when polling endpoint list (#13906)

Since this code path polls the endpoint list endpoint once every 60s by
default to update the internal stat in the agent, we don't really need
the consistency guarantees we implicitly get from the unset resource
version.

When the resource version is unset, the api-server needs to fetch all
endpoints from etcd, causing a costly round-trip that can potentially
result in a lot of data traffic. When setting resource version "0", all
requests are handled by the watch cache, meaning they will be much more
efficient and less costly.

For the most part, the actual returned data will be the same, but in
some cases where the API-servers are having a bad time, the data might
be a bit stall; but that is not very common. In that case, getting data
from the watch cache instead of not being able to list at all is
preferable.

The semantics are described in detail here;
https://kubernetes.io/docs/reference/using-api/api-concepts/#semantics-for-get-and-list

Signed-off-by: Odin Ugedal <[email protected]>

Signed-off-by: Odin Ugedal <[email protected]>

* Remove `CCA_IN_AD` flag and related unused code (#14955)

* remove CCA_IN_AD config flag

* PR feedback

* remove unused providers

* pr feedback

* epforwarder: add additional debug logging (#14161)

* Fix small typo in install XML. (#14687)

Causes Wix to throw error (although apparently non-fatal)

* CWS: sync BTFhub constants (#14986)

Co-authored-by: paulcacheux <[email protected]>

* Revert "pkg/obfuscate: improve formatting and string parsing in the SQL obfuscator (#11967)" (#14976)

This reverts commit 8ab1d187421087d8ae746ec0dcca00f25918a9f0.

* [CWS] remove unsafe pointer from eval.Context (#14890)

* [CWS] remove unsafe pointer for eval.Context

* Add user context

* move perf helper to a perf file

* remove resolvers from event

* generate handlers

* add extra field handlers

* remove accessors from probe

* remove model mock

* fix unit and functional tests

* refactor model/field_handlers

* add helper for common object creation

* fix stress tests

* [workloadmeta/collectors/containerd] Collect image metadata (#14592)

* [util/containerd] Rename Image to ImageOfContainer

To be able to introduce a new Image func that gets an image just by image ID,
regardless of whether it's being used in container.

* [util/containerd] Add Image func

* [workloadmeta] Add GetImage func

* [config] Add option to enable image collection in workloadmeta

* [workloadmeta/collectors/containerd] Collect image metadata

* [CSPM] remove the hostSelector field not used anymore (#14770)

* [CSPM] remove the hostSelector field not used anymore

In a more global effort to remove the internal compliance DSL after
our move to rego, this commit removes one field where it is still
being used.

The hostSelector field has been put in place in order to make sure
we only run specific rules on hosts that match, in particular for
k8s nodes. However, the rule were not used anymore since the hosts
"master" labels are not properly set. We rely other side effects
(like process and file existence) to avoid running some rules on
bad nodes.

* [CSPM] remove k8s nodeLabels retrieval from compliance rules execution

Now that hostSelector fields have been removed, fetching the k8s node labels
is not required anymore and completely useless.

This PR just remove the nodeLabels fetching and all the subsequent
dependencies.

* [CWS] add tests for live process monitoring (#14944)

* [system-probe][NET-2899] fix race condition in ephemeral port checker (#14802)

* [NET-2899] use mutex to lock fields causing race condition in ephemeral port checker

* [NET-2899] gofmt on changed files

* [NET-2899] remove mutex, move racey code to sync.once func

* [CWS] restore SECL documentation generation (#14993)

* [CWS] fix event missing field resolver (#14992)

* fix missing fields resolver in some events (around policy eval CLI)

* do not emit event in policy eval output

* Add __TARGET_ARCH_ to runtime compilation flags (#14983)

* Add __TARGET_ARCH_ to runtime compilation flags

* Use append instead

* Re-delete http runtime asset hash file (#14982)

* Add CO-RE version of TCP Queue Length check (#14763)

* Add CO-RE version of TCP Queue Length check

* Fix version

* Fix generate BTF job

* Invert err check on CO-RE load

* Add helper for missing BTF check

* Bump golang.org/x/tools from 0.4.0 to 0.5.0 in /pkg/security/secl (#14996)

* Bump golang.org/x/tools from 0.4.0 to 0.5.0 in /pkg/security/secl

Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/golang/tools/releases)
- [Commits](https://github.com/golang/tools/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: golang.org/x/tools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-generate go.sum and LICENSE-3rdparty.csv changes

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>

* Fix gateway lookup tests (#14951)

* [usm] Reduce HTTP test memory utilization (#15006)

* [CWS] mount fallback to pid 1 by default (#15007)

* [CWS][SEC-4020] parse args and envs from the new program stack pages (#13008)

* CWS: parse args and envs from the process stack

* remove useless function parameter

* get env vars offset from new program stack as well

* return from tailcall loop sooner

* use different kprobe to fix kernel function call order on CentOS 7

* [process-agent] Refactor conn rates with util/subscriptions (#14988)

* [process-agent] Refactor conn rates with util/subscriptions

* Update with a unit test for pub/sub

* Address feedback from @hmahmood

* [CWS] change programs to avoid mixing events between tests (#15012)

* [CWS] rework event json marshalling (#15010)

* externalize serialization

* a bit of cleanup

* refactor schema validators

* fix printfs

* re-enable policy eval event json

* fix trace dispatching

* fix deadcode

* fix validateProcessContextSECL error output

* [USM] protocol classification: add PostgreSQL classification (#14625)

* protocol classification: add per-cpu array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* Outsmart the verifier

* protocol classification: add per-cpu array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* Outsmart the verifier

* protocol classification: add PostgreSQL classification

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix licenses & set postgres port in test

Signed-off-by: Guillaume Pagnoux <[email protected]>

* test: fix port

Signed-off-by: Guillaume Pagnoux <[email protected]>

* test: use JoinHostPort instead of Sprintf

Signed-off-by: Guillaume Pagnoux <[email protected]>

* [USM] protocol classification: add Postgres detection

Signed-off-by: Guillaume Pagnoux <[email protected]>

* revert check_command fix

Signed-off-by: Guillaume Pagnoux <[email protected]>

* postgres: refactor check_command

Signed-off-by: Guillaume Pagnoux <[email protected]>

* change map type on unsupported systems

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix runtime-compilation on older kernels + doc

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix merge

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: add documentation

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: add long query test

Signed-off-by: Guillaume Pagnoux <[email protected]>

* docs & refactor

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix licenses

Signed-off-by: Guillaume Pagnoux <[email protected]>

* refactor

Signed-off-by: Guillaume Pagnoux <[email protected]>

* postgres: try to classify from client start messages

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add missing Cgo defs

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add postgres docker image pulling

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add missing editor flag to change map type

Signed-off-by: Guillaume Pagnoux <[email protected]>

* remove unused import

Signed-off-by: Guillaume Pagnoux <[email protected]>

* case-insensitive check + docs

Signed-off-by: Guillaume Pagnoux <[email protected]>

* check on tmp buf

Signed-off-by: Guillaume Pagnoux <[email protected]>

* docs

Signed-off-by: Guillaume Pagnoux <[email protected]>

* try fixing verifier issue

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix verifier issue

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: fix docker-compose path

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fixup! Merge remote-tracking branch 'origin/main' into guillaume.pagnoux/USMO-9-protocol-classification-posgres

* re-delete re-added files

Signed-off-by: Guillaume Pagnoux <[email protected]>

* go mod tidy

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add docs

Signed-off-by: Guillaume Pagnoux <[email protected]>

* remove redundant check

Signed-off-by: Guillaume Pagnoux <[email protected]>

* refactor server creation in tests

Signed-off-by: Guillaume Pagnoux <[email protected]>

* rename guards

Signed-off-by: Guillaume Pagnoux <[email protected]>

* specify postgres version in docker-compose

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: skip when using NAT

Signed-off-by: Guillaume Pagnoux <[email protected]>

* split sql files

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: add tests for all supported sql queries

Signed-off-by: Guillaume Pagnoux <[email protected]>

* move postgres struct to postgres-defs.h

Signed-off-by: Guillaume Pagnoux <[email protected]>

* remove redundant check

Signed-off-by: Guillaume Pagnoux <[email protected]>

* classify on command completion messages as well

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add long response test

Signed-off-by: Guillaume Pagnoux <[email protected]>

* re-enable query detection

Signed-off-by: Guillaume Pagnoux <[email protected]>

Signed-off-by: Guillaume Pagnoux <[email protected]>
Co-authored-by: Guy Arbitman <[email protected]>

* [process-agent] Scaffold components for process agent (#14972)

* [process-agent] Scaffold components for process agent

* Addresss review comments from @ogaca-dd

* Addresss review comments from @ogaca-dd

* Change to use context.Context and reintroduce empty Component interface to suppress linting

* CWS: sync BTFhub constants (#15023)

Co-authored-by: paulcacheux <[email protected]>

* [CWS] rework/cleanup `FieldHandlers` (#15015)

* remove probe from FieldHandlers

* cleanup `NewProcessResolver` resolvers dependency

* resolvers only need a link to the manager

* Update CODEOWNERS (#15024)

* Use sc query to gain information about the service before attempting to stop it. (#15028)

* [security-agent] remove redundant String() in compliance agent log (#15026)

* [invoke] Print summary of test failures at the end of inv test (#14682)

Updates the inv test command to print a summary of failed tests at the end of a run, across all modules and flavors that were tested, to more easily identify the list of failures, without having to visually parse the full job logs.

* [system-probe][NET-2891] Fix tcp retransmit count (#14740)

* [NET-2891] initial pass at changes to prebuilt code

* [NET-2891] use retrans_out for runtime compiled tcp_retransmit counter

* [NET-2891] runtime compiled version of tcp_retrans updates

* [NET-2891] remove debug comment

* [NET-2891] fix log

* [NET-2891] update bytecode

* [NET-2891] code review comments, regenerate license

* [NET-2891] newline

* [NET-2891] fix probe definitions

* [NET-2891] update comment

* [NET-2891] runtime compilation fixes

* [NET-2891] fix byte padding for args init

* [NET-2891] fix formatting

* testing debug logic

* more debug logic, added some config for the map

* [NET-2891] enable kretprobe and remove debug

* [NET-2891] disable bpf debug be default

* [NET-2891] update bytecode

* [NET-2891] make function as maybe unused

* [NET-2891] handle different paths of incremental vs absolute retransmit counters

* [NET-2891] use enum to track increment vs absolute retransmits

* [NET-2891] change enum values

* [NET-2891] move retrans code to runtime tracer

* pulled in new gitignore

* Revert "pulled in new gitignore"

This reverts commit b4b0df587aeb6b6f655ea90d7bc96ae250934170.

* remove runtime gen files, code review comments

* [NET-2891] use retransmit count none in runtime tracer

* [NET-2891] use retransmit_count_none in handle_tcp_stats

* [NET-2891] nit comments from code review

* [NET-2891] try to get runtime compilation working on 4.4 kernel

* usm: upgraded pgdriver version to indirectly upgrade mellium.in/sasl version due to a CVE ofound (#15030)

* usm: upgraded pgdriver version to indirectly upgrade mellium.in/sasl version due to a CVE ofound

* Fixed go.sum

* [CWS] fix signal test (#15025)

* [process-agent] Support dynamically enabling profiling for process agent from CLI (#14995)

Adds support for dynamically enabling profiling for the process agent from the CLI

* pkg/obfuscate: Fix parsing of sqlserver identifiers enclosed in square brackets (#15019)

* DBM-2010 Fix parsing of sqlserver literals enclosed in square brackets

* .gitlab: move APM benchmark job to manual only (#15036)

* fix datatype (#13791)

related to #13770

* [AD/prometheus] Ignore headless services (#15031)

* Fix stop service (#15035)

* Fix check conf directory

Durring the migration to component the hardcoded directory 'etc/confd'
for check configuration was removed.

* Fix shipping of 'version-history.json' and 'registry.json' in flares

When migrating to component the logic to include /opt/datadog-agent/run/
was handled as a file instead of a folder. This broke collecting
'version-history.json' and 'registry.json from it.

* Fix datadog.yaml file name in flare

* Force file permission to 644 within a flare

* auto instru: add rc provider (#15008)

* pkg/obfuscate: use github.com/outcaste-io/ristretto instead of github.com/dgraph-io/ristretto (#15005)

Migrate the usage of github.com/dgraph-io/ristretto to github.com/outcaste-io/ristretto

* [workloadmeta/kubelet] Parse image ID if name is a SHA256

We now try to parse the resolved image ID if the image in the pod's
container status is a SHA256. This seems to happen when pinning the
SHA256 in the container spec. This fixes an issue where `image:` filters
in DD_CONTAINER_INCLUDE/DD_CONTAINER_EXCLUDE would not be respected.

* pkg/trace/api: remove unused internal OTLP HTTP server (#14965)

* [pkg/trace/api] Remove unused OTLP HTTP server

* [pkg/trace] Remove protocol argument

* Remove unnecessary fmt.Sprintf

* Fix tests

* [CWS] cleanup last uses of `jsonschema_description` (#15050)

* [Serverless] Merge `serverless/main` to `main` (#14980)

* [Serverless] change account (#14755)

* Aj/buffer cold start span data (#14664)

* wip dirty commit - trace being created but not flushed properly. No further traces appearing

WIP: more debugging. StopChan properly set up

feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec

feat: Update specs to write to channels

feat: Merge conflicts resolved for tests

feat: Use smaller methods to handle locking

fix: pass coldstartSpanId to sls-init main

feat: Remove default

feat: Use Millisecond as Second is far longer than necessary

feat: No need to export ColdStartSpanId

fix: update units

feat: Directionality for lambdaSpanChan as well as for initDurationChan

fix: No need for the nil check, I need to stop javascripting my go

feat: ints

* feat: rebase missing changes from merge commits

* feat: update ints after moving accounts

* Empty commit to trigger ci

* [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783)

* Retry serverless integration test failures automatically. (#14801)

* [Serverless] Allow some keys to be option in serverless integration tests. (#14827)

* Ability to remove items from the json.

* Remove items from snapshot.

* Do not expect spans when there is no spans object. (#14396)

* [Serverless] Improve stability of two tests. (#14895)

* Increase timeout while decreasing test time.

* Increase timeout in test.

* [Serverless] Consolidate log normalization to single file for integration tests. (#15004)

* Consolidate log normalization to single file.

* Save raw logs to a temp dir.

* Fix linting issues.

Co-authored-by: Maxime David <[email protected]>
Co-authored-by: AJ Stuyvenberg <[email protected]>

* Fixes multiple problems with http processing/tagging on Windows. (#15022)

* Fixes multiple problems with http processing/tagging on Windows.
- There was an offset error in which the port was not properly computed
  on ipv6 connections
- There was a problem with computing whether an ipv6 address was loopback or
  not
- The fullpath indication (which is used to compute the key) was not
  properly being computed.  This led to the same tuple being used
  as a different key, so transactions were not properly combined.

* fix grammar error in release notes

* Add the plumbing in the agent forwarder to submit container images and SBOM (#14962)

* Improve documentation for BundleParams (#15011)

* pkg/clusteragent/admission: add unit tests (#15044)

* [CWS] bump syscall table + extract into separate task (#15061)

* 5.19 -> 6.1

* switch syscall table generator from go generate to task

* extract linux version

* [gitlab] Temporarily disable SUSE Agent 5 upgrade tests (#15055)

* [corechecks/snmp] Add LLDP remote device IP address (#14946)

* [CWS] add discarders eBPF unit test (#14471)

* [CWS] add discarder retention ut

* add another test

* add a unit test task

* add trace param

* make eBPF test part of the CI

* fake time to speed up tests

* bump baloum version

* add more tests

* [CWS Agent] Moving SecAgent subcommands to new dir part 2 (#14915)

* moving flare command to subcommands dir

* consolidating and moving secagent config package

* moving runtime to subcommands dir

* moved check subcommand, updated compliance subcommand which is the entry point to check funcs

* moving compliance cmd to subcommand dir

* exporting CliParams and RunCheck in Check subcommand for Compliance tests

* fixing cluster agent entry point into the check subcommand

* Add `container_image` core check (#14567)

* Reorganize the specs for some kitchen test (#15027)

* [check command] Add `--instance-filter` option (#15034)

* Migrate systray to an fx.App (#14985)

Deprecate single-dash args and add double-dash args

Move code from cmd/systray to comp/systray

Update UAC manifest to requireAdministrator

Fix log file and add `system_tray.log_file` configuration option.

* epforwarder: update dbm samples endpoint prefix (#15053)

dbm-metrics-intake and dbquery-intake resolve to the same IPs. This change cleans up code so that we're only referencing one endpoint name.

* [process-agent] Refactor Check interface (#15063)

* [process-agent] Refactor Check interface

- Refactors Check interface to consolidate CheckWithRealTime features
- This will simplify integration with components in the future PRs since it eliminates casts

* Address feedback from @just-chillin

* usm: postgres classification: Reduced 5 seconds per test, 1m30s in total (#15070)

Improved the regex for which we are using to detect if the server is up and running, by that
we can spare the 'wait 5 seconds' in GetPGHandle

* CWS: sync BTFhub constants (#15074)

Co-authored-by: paulcacheux <[email protected]>

* [DCA] Convert commands to Fx apps

* Extract magic strings into command.* constants

* [CWS] Add 4 tests, one for each kernel rate limiter algo (#15064)

* [CWS] remove useless callbacks (#15046)

* remove useless error check

* remove useless callback

* Add `SBOM` core check (#14989)

* Prevent check from running after it was unscheduled. (#15065)

* Prevent check from running after it was unscheduled.

If a check runs after it was unscheduled, in particular after it's
sender and samplers were removed, would create sender and samplers
again, leaking resources. This may happen if the check was cancelled
after it was put in the worker channel, but before worker called Run.

This change adjusts check_wrapper to make Cancel fully mutually
exclusive with Run, and adds a flag that would prevent Run from
executing the check after Cancel has completed.

* go fmt

* Update test helper

* Restrict flare file from being accessible by other users on Unix (#14862)

* pkg/clusteragent/admission/patch: poll rc on leadership switch (#15062)

* pkg/clusteragent/admission: add additional libconfig env vars (#15059)

* usm: classification: Split USM and NPM classifications (#15075)

USM does not need all classifiers, only those which we have dispatchers for (HTTP, and soon HTTP2)

* Python memory telemetry (#14757)

* Track memory used by the python arena allocator

pymalloc [1], Python built-in arena allocator is responsible for
handling small-sized allocations, while the rest goes through
the system malloc.

This patch tracks the amount of memory requested by pymalloc from the
operating system, allowing low cost, low granularity view into a
segment of python memory usage.

[1]: https://docs.python.org/3/c-api/memory.html#the-pymalloc-allocator

* inv -e rtloader.format

* Remove rtloader_mem.h from rtloader.h

This allows to call C malloc without warnings when we implement a
custom raw memory allocator for python.

* Add python raw allocator tracking.

Together with tracking pymalloc requests, this should give
comprehensive picture of memory allocated by the python interpreter.

* Make sure to call global malloc/free

In Pyraw allocator implementation, make sure to call global
malloc/calloc/realloc/free symbols, to avoid undesired interaction
with the rtloader-specific memory tracking (for example, call libc
free instead of RtLoader::free).

* Move all memory tracking to the same file

* Update Go naming to match C functions

pymalloc is now one of two tracked allocators, use pymem as umbrella.

* Add a note about new metrics to the docs

* Python memory telemetry supports py3 only

* Add releasenote

* Expand telemetry documentation.

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update releasenotes/notes/pymem-telemetry-0f62acb520d80a1f.yaml

Co-authored-by: Kari Halsted <[email protected]>

* Update rtloader/three/three_mem.cpp

Co-authored-by: Scott Opell <[email protected]>

* Improve metric description and remove outdated comment.

* Fix typo

* Add a comment about allocation size adjustments

Co-authored-by: Kari Halsted <[email protected]>
Co-authored-by: Scott Opell <[email protected]>

* Add telemetry for number of contexts per origin (#15016)

* Add telemetry for number of contexts per origin

Report number of contexts at the end of flush for each container
sending dogstatsd metrics.

This PR relies on origin detection to provide a set of identifying
tags for each origin, and reports number of distinct contexts for each
tag set. While this may not fully identify individual origins when
running with low tagger cardinality, it accurately reflects the way
agent would aggregate metrics from different origins together if their
tags end up the same.

* Only enable per-origin stats if telemetry is enabled.

* [process-agent] Fix kitchen tests for process agent on main (#15072)

* include `functests` in `DD_PIPELINE_ID` for system probe and security agent functests (#15043)

* include `functests` in DD_PIPELINE_ID for system probe and security agent functests

* simpler/shorter pipeline_id

* [install_script] Backport removal of RPM signing key 4172A230 (#15082)

* [corechecks/snmp] LLDP resolve local interface (#14991)

* [CWS] fix rule in error reported twice (#15084)

* Add java package in our circle-ci image (#14665)

* Use DMI on EC2 Nitro instances to get host aliases

The Agent now leverage DMI information on Unix to get the instance ID on AWS EC2 when the metadata endpoint fails or
is not accessible. The instance ID is exposed throught DMI only on AWS Nitro instances.

This will not change the hostname of the Agent upon upgrading but will add to the list of host aliases.

* [CWS] add inode to pid context to detect exec loss (#14661)

* [CWS] add revision to pid context

* use inode instead of revision

* Fix post rebase

* Fix serializer tests flakiness (#15093)

* [RCM-632] Add UUID in request (#15088)

* Add org uuid field

* Add org uuid in request

* Remove generate file

* Comment exported method

* fix the receiver name consistency (#15068)

* Add limits to allocated dictionaries, prevent browser cross-site requests (#15067)

* pkg/trace/api: Move semantic conventions to separate internal package (#14963)

* [pkg/trace/api] Move semantic conventions to separate internal package

* Rename to shared

* Move tagContainersTags back to API package

* Rename package to 'header'

* Fix Windows build

* Factorize queue code duplicated at two places (#15098)

* Factorize the aggregating queue used by the SBOM and container image checks

* Mock time functions to make tests more reliable

* [single-machine-performance] Push agent containers to SMP ECR (#14438)

* [single-machine-performance] Push agent container to SMP ECR

This commit is an attempt to introduce pushing containers from Agent CI for
single-machine-performance's Regression Detector in our isolated
infrastructure. Much like we have done for vectordotdev/vector we intend to run
the Regression Detector on Agent changes, gi…
nenadnoveljic added a commit that referenced this pull request Feb 6, 2023
* Fix datadog.yaml file name in flare

* Force file permission to 644 within a flare

* auto instru: add rc provider (#15008)

* pkg/obfuscate: use github.com/outcaste-io/ristretto instead of github.com/dgraph-io/ristretto (#15005)

Migrate the usage of github.com/dgraph-io/ristretto to github.com/outcaste-io/ristretto

* [workloadmeta/kubelet] Parse image ID if name is a SHA256

We now try to parse the resolved image ID if the image in the pod's
container status is a SHA256. This seems to happen when pinning the
SHA256 in the container spec. This fixes an issue where `image:` filters
in DD_CONTAINER_INCLUDE/DD_CONTAINER_EXCLUDE would not be respected.

* pkg/trace/api: remove unused internal OTLP HTTP server (#14965)

* [pkg/trace/api] Remove unused OTLP HTTP server

* [pkg/trace] Remove protocol argument

* Remove unnecessary fmt.Sprintf

* Fix tests

* [CWS] cleanup last uses of `jsonschema_description` (#15050)

* [Serverless] Merge `serverless/main` to `main` (#14980)

* [Serverless] change account (#14755)

* Aj/buffer cold start span data (#14664)

* wip dirty commit - trace being created but not flushed properly. No further traces appearing

WIP: more debugging. StopChan properly set up

feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec

feat: Update specs to write to channels

feat: Merge conflicts resolved for tests

feat: Use smaller methods to handle locking

fix: pass coldstartSpanId to sls-init main

feat: Remove default

feat: Use Millisecond as Second is far longer than necessary

feat: No need to export ColdStartSpanId

fix: update units

feat: Directionality for lambdaSpanChan as well as for initDurationChan

fix: No need for the nil check, I need to stop javascripting my go

feat: ints

* feat: rebase missing changes from merge commits

* feat: update ints after moving accounts

* Empty commit to trigger ci

* [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783)

* Retry serverless integration test failures automatically. (#14801)

* [Serverless] Allow some keys to be option in serverless integration tests. (#14827)

* Ability to remove items from the json.

* Remove items from snapshot.

* Do not expect spans when there is no spans object. (#14396)

* [Serverless] Improve stability of two tests. (#14895)

* Increase timeout while decreasing test time.

* Increase timeout in test.

* [Serverless] Consolidate log normalization to single file for integration tests. (#15004)

* Consolidate log normalization to single file.

* Save raw logs to a temp dir.

* Fix linting issues.

Co-authored-by: Maxime David <[email protected]>
Co-authored-by: AJ Stuyvenberg <[email protected]>

* Fixes multiple problems with http processing/tagging on Windows. (#15022)

* Fixes multiple problems with http processing/tagging on Windows.
- There was an offset error in which the port was not properly computed
  on ipv6 connections
- There was a problem with computing whether an ipv6 address was loopback or
  not
- The fullpath indication (which is used to compute the key) was not
  properly being computed.  This led to the same tuple being used
  as a different key, so transactions were not properly combined.

* fix grammar error in release notes

* Add the plumbing in the agent forwarder to submit container images and SBOM (#14962)

* Improve documentation for BundleParams (#15011)

* pkg/clusteragent/admission: add unit tests (#15044)

* [CWS] bump syscall table + extract into separate task (#15061)

* 5.19 -> 6.1

* switch syscall table generator from go generate to task

* extract linux version

* [gitlab] Temporarily disable SUSE Agent 5 upgrade tests (#15055)

* [corechecks/snmp] Add LLDP remote device IP address (#14946)

* [CWS] add discarders eBPF unit test (#14471)

* [CWS] add discarder retention ut

* add another test

* add a unit test task

* add trace param

* make eBPF test part of the CI

* fake time to speed up tests

* bump baloum version

* add more tests

* [CWS Agent] Moving SecAgent subcommands to new dir part 2 (#14915)

* moving flare command to subcommands dir

* consolidating and moving secagent config package

* moving runtime to subcommands dir

* moved check subcommand, updated compliance subcommand which is the entry point to check funcs

* moving compliance cmd to subcommand dir

* exporting CliParams and RunCheck in Check subcommand for Compliance tests

* fixing cluster agent entry point into the check subcommand

* Add `container_image` core check (#14567)

* Reorganize the specs for some kitchen test (#15027)

* [check command] Add `--instance-filter` option (#15034)

* Migrate systray to an fx.App (#14985)

Deprecate single-dash args and add double-dash args

Move code from cmd/systray to comp/systray

Update UAC manifest to requireAdministrator

Fix log file and add `system_tray.log_file` configuration option.

* epforwarder: update dbm samples endpoint prefix (#15053)

dbm-metrics-intake and dbquery-intake resolve to the same IPs. This change cleans up code so that we're only referencing one endpoint name.

* [process-agent] Refactor Check interface (#15063)

* [process-agent] Refactor Check interface

- Refactors Check interface to consolidate CheckWithRealTime features
- This will simplify integration with components in the future PRs since it eliminates casts

* Address feedback from @just-chillin

* usm: postgres classification: Reduced 5 seconds per test, 1m30s in total (#15070)

Improved the regex for which we are using to detect if the server is up and running, by that
we can spare the 'wait 5 seconds' in GetPGHandle

* CWS: sync BTFhub constants (#15074)

Co-authored-by: paulcacheux <[email protected]>

* [DCA] Convert commands to Fx apps

* Extract magic strings into command.* constants

* [CWS] Add 4 tests, one for each kernel rate limiter algo (#15064)

* [CWS] remove useless callbacks (#15046)

* remove useless error check

* remove useless callback

* Add `SBOM` core check (#14989)

* Prevent check from running after it was unscheduled. (#15065)

* Prevent check from running after it was unscheduled.

If a check runs after it was unscheduled, in particular after it's
sender and samplers were removed, would create sender and samplers
again, leaking resources. This may happen if the check was cancelled
after it was put in the worker channel, but before worker called Run.

This change adjusts check_wrapper to make Cancel fully mutually
exclusive with Run, and adds a flag that would prevent Run from
executing the check after Cancel has completed.

* go fmt

* Update test helper

* Restrict flare file from being accessible by other users on Unix (#14862)

* pkg/clusteragent/admission/patch: poll rc on leadership switch (#15062)

* pkg/clusteragent/admission: add additional libconfig env vars (#15059)

* usm: classification: Split USM and NPM classifications (#15075)

USM does not need all classifiers, only those which we have dispatchers for (HTTP, and soon HTTP2)

* Python memory telemetry (#14757)

* Track memory used by the python arena allocator

pymalloc [1], Python built-in arena allocator is responsible for
handling small-sized allocations, while the rest goes through
the system malloc.

This patch tracks the amount of memory requested by pymalloc from the
operating system, allowing low cost, low granularity view into a
segment of python memory usage.

[1]: https://docs.python.org/3/c-api/memory.html#the-pymalloc-allocator

* inv -e rtloader.format

* Remove rtloader_mem.h from rtloader.h

This allows to call C malloc without warnings when we implement a
custom raw memory allocator for python.

* Add python raw allocator tracking.

Together with tracking pymalloc requests, this should give
comprehensive picture of memory allocated by the python interpreter.

* Make sure to call global malloc/free

In Pyraw allocator implementation, make sure to call global
malloc/calloc/realloc/free symbols, to avoid undesired interaction
with the rtloader-specific memory tracking (for example, call libc
free instead of RtLoader::free).

* Move all memory tracking to the same file

* Update Go naming to match C functions

pymalloc is now one of two tracked allocators, use pymem as umbrella.

* Add a note about new metrics to the docs

* Python memory telemetry supports py3 only

* Add releasenote

* Expand telemetry documentation.

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update releasenotes/notes/pymem-telemetry-0f62acb520d80a1f.yaml

Co-authored-by: Kari Halsted <[email protected]>

* Update rtloader/three/three_mem.cpp

Co-authored-by: Scott Opell <[email protected]>

* Improve metric description and remove outdated comment.

* Fix typo

* Add a comment about allocation size adjustments

Co-authored-by: Kari Halsted <[email protected]>
Co-authored-by: Scott Opell <[email protected]>

* Add telemetry for number of contexts per origin (#15016)

* Add telemetry for number of contexts per origin

Report number of contexts at the end of flush for each container
sending dogstatsd metrics.

This PR relies on origin detection to provide a set of identifying
tags for each origin, and reports number of distinct contexts for each
tag set. While this may not fully identify individual origins when
running with low tagger cardinality, it accurately reflects the way
agent would aggregate metrics from different origins together if their
tags end up the same.

* Only enable per-origin stats if telemetry is enabled.

* [process-agent] Fix kitchen tests for process agent on main (#15072)

* include `functests` in `DD_PIPELINE_ID` for system probe and security agent functests (#15043)

* include `functests` in DD_PIPELINE_ID for system probe and security agent functests

* simpler/shorter pipeline_id

* [install_script] Backport removal of RPM signing key 4172A230 (#15082)

* [corechecks/snmp] LLDP resolve local interface (#14991)

* [CWS] fix rule in error reported twice (#15084)

* Add java package in our circle-ci image (#14665)

* Use DMI on EC2 Nitro instances to get host aliases

The Agent now leverage DMI information on Unix to get the instance ID on AWS EC2 when the metadata endpoint fails or
is not accessible. The instance ID is exposed throught DMI only on AWS Nitro instances.

This will not change the hostname of the Agent upon upgrading but will add to the list of host aliases.

* [CWS] add inode to pid context to detect exec loss (#14661)

* [CWS] add revision to pid context

* use inode instead of revision

* Fix post rebase

* Fix serializer tests flakiness (#15093)

* [RCM-632] Add UUID in request (#15088)

* Add org uuid field

* Add org uuid in request

* Remove generate file

* Comment exported method

* fix the receiver name consistency (#15068)

* Add limits to allocated dictionaries, prevent browser cross-site requests (#15067)

* pkg/trace/api: Move semantic conventions to separate internal package (#14963)

* [pkg/trace/api] Move semantic conventions to separate internal package

* Rename to shared

* Move tagContainersTags back to API package

* Rename package to 'header'

* Fix Windows build

* Factorize queue code duplicated at two places (#15098)

* Factorize the aggregating queue used by the SBOM and container image checks

* Mock time functions to make tests more reliable

* [single-machine-performance] Push agent containers to SMP ECR (#14438)

* [single-machine-performance] Push agent container to SMP ECR

This commit is an attempt to introduce pushing containers from Agent CI for
single-machine-performance's Regression Detector in our isolated
infrastructure. Much like we have done for vectordotdev/vector we intend to run
the Regression Detector on Agent changes, giving a reasonable statistical
guarantee that a change does or does not modify Agent performance by more than
random chance. In order for the Regression Detector to run jobs it must have
access to a 'baseline' and 'comparison' target. Baseline in this project would
be a container built from current `main` branch, comparison would be a container
built from the tip of a PR.

The main thing demonstrated here is that the team credentials SMP has created
for Agent are functional and are able to push up a containers, in a way that is
acceptable to Agent Platform. I have ammended `.docker_build_job_definition` to
mirror every created container to single-machine-performance's ECR, noting that
the tag now avoids the use of `CI_PIPELINE_ID`. In a later commit we will
introduce job submission and will rely on being able to compute the tag of a
previous pipeline's container from available Gitlab metadata, specificall
`CI_COMMIT_SHA` for the comparison container and whatever metadata maps to the
base branch's current SHA, `CI_MERGE_REQUEST_SOURCE_BRANCH_SHA`?

There are two outstanding questions regarding this work that I am aware of:

* Is there a race condition present between the triggering of this pipeline vs
main if users squash commits?
* Should we grant the exisitng CI user permissions into
single-machine-performance rather than use an issued bot account as done
presently and for vectordotdev/vector?

We've successfully demonstrated pushing up containers in a previous iteration of
this work, see https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/195939127.

Signed-off-by: Brian L. Troutwine <[email protected]>

* PR feedback

Signed-off-by: Brian L. Troutwine <[email protected]>

* trim ECR URL out of destination

Signed-off-by: Brian L. Troutwine <[email protected]>

* correct job dependency

Signed-off-by: Brian L. Troutwine <[email protected]>

* drop parallel.matrix

Signed-off-by: Brian L. Troutwine <[email protected]>

Signed-off-by: Brian L. Troutwine <[email protected]>

* Bump github.com/Microsoft/hcsshim from 0.9.4 to 0.9.6 (#14785)

Bumps [github.com/Microsoft/hcsshim](https://github.com/Microsoft/hcsshim) from 0.9.4 to 0.9.6.
- [Release notes](https://github.com/Microsoft/hcsshim/releases)
- [Commits](https://github.com/Microsoft/hcsshim/compare/v0.9.4...v0.9.6)

---
updated-dependencies:
- dependency-name: github.com/Microsoft/hcsshim
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [Windows] implement mapping of pid to service name (#15039)

* [Windows] implement mapping of pid to service name

Checks the pid against the table of SCM controlled processes.
If it's SCM controlled, returns the service information.

Because we must enumerate the entire SCM (there doesn't seem to be an api
for that), the SCMManager object maintains a cache of objects, and refreshes
only when it sees a PID it hasn't seen before.

On a machine with high process churn, this could still result in a lot
of accesses.  However, if process agent queries only when doing the
process check (i.e. every 30s), then it should only iterate the list
once per 30s.

* ci fixes, add tests

* fix test/improper conversion of data buffer

* review feedback

* more review feedback

* Rename structure

* Update README.md (#15126)

* Update README.md

* Update README.md

* Update README.md

* [CWS] remove useless variable (#15112)

* [CWS][SEC-4478] add RC to e2e tests (#14877)

* [CWS] add RC to e2e tests

* fix host name

* check remote-config before file

* test embedded policy

* use a configmap to have a first policy

* make rc configurable

* CWS: sync BTFhub constants (#15123)

Co-authored-by: paulcacheux <[email protected]>

* [CWS] rename json fields to make them less misleading (#15097)

* pkg/trace/testutil: improve the randomization in test spans generator (#15108)

* DOCS-2215 Add @env variables to datadog.yaml (#10069)

Co-authored-by: hestonhoffman <[email protected]>

* [CSPM] e2e remote configuration fix (#15130)

* Add missing remote_configuration_enabled parameters in CSPM workflow

* add other missing parameters

* [CWS] chain only different binaries on activity dumps (#15095)

* [CWS][SEC-6381] Update template configuration file to add activity dumps and network detection parameters (#14835)

* [CWS] remove unsafe usage in `ScopedVariables` (#15134)

* Revert Remove CCA_IN_AD flag and related unused code  (#15115)

* Revert "Remove `CCA_IN_AD` flag and related unused code (#14955)"

This reverts commit 394ac59ad9707dfff8009c0dec03320d1df20098.

# Conflicts:
#	cmd/agent/common/autodiscovery.go

* Bump github.com/hashicorp/consul/api from 1.13.0 to 1.15.3 (#13978)

Bumps [github.com/hashicorp/consul/api](https://github.com/hashicorp/consul) from 1.13.0 to 1.15.3.
- [Release notes](https://github.com/hashicorp/consul/releases)
- [Changelog](https://github.com/hashicorp/consul/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/consul/compare/v1.13.0...api/v1.15.3)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/consul/api
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lénaïc Huard <[email protected]>

* usm: classification: Fixed AMQP publish flakiness (#15071)

* usm: classification: Fixed AMQP publish flakiness

Publish has no response, so added classification by close-reply and close-ok messages.

* Moved closing of the client into the teardown

* Fixed test error

* Removed wrong bpf debug setup

* Set windows-agent as CODEOWNER for systray (#15117)

* Fix shutdown deadlock in docker socket tailer (#15138)

* fix shutdown deadlock in docker socket tailer

* Bump github.com/CycloneDX/cyclonedx-go from 0.6.0 to 0.7.0 (#15081)

* Bump github.com/CycloneDX/cyclonedx-go from 0.6.0 to 0.7.0

Bumps [github.com/CycloneDX/cyclonedx-go](https://github.com/CycloneDX/cyclonedx-go) from 0.6.0 to 0.7.0.
- [Release notes](https://github.com/CycloneDX/cyclonedx-go/releases)
- [Changelog](https://github.com/CycloneDX/cyclonedx-go/blob/master/.goreleaser.yml)
- [Commits](https://github.com/CycloneDX/cyclonedx-go/compare/v0.6.0...v0.7.0)

---
updated-dependencies:
- dependency-name: github.com/CycloneDX/cyclonedx-go
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Adapt conversion functions to CycloneDX/cyclonedx-go 0.7.0

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lénaïc Huard <[email protected]>

* [usm] Fix `ProcessMonitor` termination (#15140)

* [usm] Fix `ProcessMonitor` termination

* Ensure `refcount` doesn't go below zero

* [gitlab] update builderimages for go 1.19.4 builds (#14930)

* [gitlab] update builderimages for go 1.19.4 builds

* [reno] adding relnote for golang 1.19.4 update

* [golang] bumping 1.19.4 in other relevant places

* [circleci] updating images

* [golangci-lint] fixing issues after 1.19.4 bump

* [gitlab] source_test: force external linker when running -race - troubleshooting

* [omnibus] devtoolset changes not required w/RHEL when building with centos 7+

* Revert "[gitlab] source_test: force external linker when running -race - troubleshooting"

This reverts commit 8a7d7344297554cf0eb42ee7f59ed847787ce217.

* [gitlab] source_test: use centos7 based images to enable -race detector

* [golangci-lint] address new format issue after merge

* Apply suggestions from code review

Co-authored-by: Vickenty Fesunov <[email protected]>

* [golangci-lint] address missed lints on windows

* [gitlab] updating buildimages yet again

* [gitlab] setting buildimages after buildimages merged to main

Co-authored-by: Vickenty Fesunov <[email protected]>

* [Windows] Create SCM and Windows Service utilities

Adds a set of SCM and Windows Service utility functions

* Revert "[system-probe][NET-2891] Fix tcp retransmit count (#14740)" (#15141)

This reverts commit e677798efdfc27223f9c036c1fc2e429b2dd24e7.

* [process-agent] Remove check singletons (#15121)

* [process-agent] Remove check singletons

* Address feedback from @just-chillin

* Simplified the interface of httptx (#15146)

* Simplified the interface of httptx

* Fixed error

* [CWS] refactor eval options (#15132)

* [CWS] refactor eval options

* fix windows

* [CWS] enable ring buffer by default (#15111)

* Admission controller: support injecting multiple libs in the same pod (#14736)

* [NDM] Do not send empty tags for snmp.interface.status (#15157)

* Do not send empty tags for snmp.interface.status

* fix test

* [RCM-598] upgrade(remote-config): Use layered gRPC client between trace-agent & core-agent (#15100)

* upgrade(remote-config): Bump message size limit to 500MB

* fix(size): Size down to 110MB max

* refactor(auth): Refactor RC auth

* fix(interface): Remove opts

* [CWS] AD: drop event if its process lineage is incomplete and add a guard to avoid sending empty dumps (#15013)

In addition, two new metrics introduced to trace these new drops

* [tasks/licenses] Don't call `open` on dirs (#15161)


Co-authored-by: Alexandre Menasria <[email protected]>

* [process-agent] Set HintMask in CollectorProc during process checks. (#14759)

Adds a new process discovery hint in the process agent when the regular process and container checks run.

* [fake-datadog] fix index list numbers (#15154)

* Enable orchestrator manifest collection by default (#15094)

manifest collection GA

* Update CODEOWNERS and JOBOWNERS after team and job renames (#15021)

Updates the JOBOWNERS and the GITHUB_SLACK_MAP file to account for recent team and job changes.

* Changed TSM to USM (#15143)

* Network USM : add java TLS support (#14620)

Adding support to attach a live java process and send it "agent-usm.jar" runtime agent payload

Supporting JVMTI Hotspot mechanism

Configuration:
DD_SERVICE_MONITORING_CONFIG_ENABLE_JAVA_TLS_SUPPORT = true
service_monitoring_config:
  enable_java_tls_support: true


Co-authored-by: Guy Arbitman <[email protected]>

* Fix instance-filter error type (#15163)

* cmd/trace-agent: set gomemlimit based on cgroups (#14552)

* pkg/runtime,cmd/trace-agent: set gomemlimit based on cgroups

* search in stderr for expected log line as well (#15167)

* Add additional distros/versions to btfhub archive build (#15152)

* Tweak system-probe kitchen tests (#15165)

* [serverless] feat: add _dd.origin tags for azure and gcp (#15137)

* feat: add _dd.origin tags for azure and gcp

* add release note

* remove serverless release notes from this repo

* Bump github.com/Microsoft/go-winio from 0.5.2 to 0.6.0 (#13728)

* Bump github.com/Microsoft/go-winio from 0.5.2 to 0.6.0

Bumps [github.com/Microsoft/go-winio](https://github.com/Microsoft/go-winio) from 0.5.2 to 0.6.0.
- [Release notes](https://github.com/Microsoft/go-winio/releases)
- [Commits](https://github.com/Microsoft/go-winio/compare/v0.5.2...v0.6.0)

---
updated-dependencies:
- dependency-name: github.com/Microsoft/go-winio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-generate go.sum and LICENSE-3rdparty.csv changes

* go mod tidy

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
Co-authored-by: Brian Floersch <[email protected]>

* Factorize all the functions creating a pointer (#15118)

* Logging and pipeline changes to address failures to send data to DBM endpoints (#15058)

* epforwarder: double dbm-samples defaultBatchMaxContentSize

We have been seeing some issues where data fails to send
to our dbm samples endpoint. We see far fewer examples of
the dbm-activity and dbm-metrics pipelines being blocked or
failing to send data, and we suspect that this is partly
due to the drastic differences in size between those
pipelines and the dbm-samples pipeline.

This change doubles the defaultBatchMaxContentSize for
the dbm-samples pipeline to try to address this.

* batch_strategy: add debug logging for dbm pipelines

* http/destination: enrich logmessages about sleep backoff

* logs...http/destination: additional logging for err reponses

* logs: new metric to capture err status codes from http resp

* Update ebpf-manager (#15174)

* Send failed report first (#14700)

* [RCM] Allow agent clients to specify their k8s cluster (#15148)

This allows predicates to be written and configurations to be targeted
towards agent clients running on a particular cluster.

* Container image metadata check: add layers history (#15150)

* Add to permissions.log windows permissions (#14866)

Add to permissions.log windows permissions

* [process-agent] Fix data race in a test (#15177)

* [windows] [system-probe] Split connection gathering between open and closed. (#14164)

* Split connection gathering between open and closed.

Continue to only poll open connections on 30s intervals.
Poll closed connections on 30s interval, or when async notification
threshold is reached.

gofmt

Review feedback
Change the buffer used for reading from the driver to it's own type rather
than generic slice, to make more clear the pointer relationship in
function calls where pointers are used.

Change resizeDriverBuffer to operate on the buffer type for clarity

add better default guessing to new config variable

review feedback; wait for closed connections loop to stop before exiting

lint

update to latest driver for signing

Sync driver header file correctly for build.
Fixes hang of system probe on exit.
Fixes correctly draining full bucket

fix typo in install xml

Rebase to main (230106)
Update to latest test driver

(rebased to one change)

* Update release.json to 2.3.0 unsigned driver for signing build

* update to signed version

* USM: new configuration path service_monitoring_config.enable_go_tls_support (#15156)

* USM: new configuration path service_monitoring_config.enable_go_tls_support

* Fixed CR comment

* Added releasenotes about the changed configuration value

* Update releasenotes/notes/changing-gotls-configuration-flag-cb003ca4d25472ad.yaml

Co-authored-by: May Lee <[email protected]>

Co-authored-by: Nicolas PLANEL <[email protected]>
Co-authored-by: May Lee <[email protected]>

* [CWS] init args/envs for empty data from proc (#15181)

* [serializer] Use google's protobuf lib for marshaling SBOMs and Images (#15172)

* [serializer] Use google's protobuf lib for marshaling SBOMs

* [serializer] Use google's protobuf lib for marshaling Container Images

Co-authored-by: Lénaïc Huard <[email protected]>

* [CWS] add dump count limiter (#15014)

* Revert "Tweak system-probe kitchen tests (#15165)" (#15179)

This reverts commit e4aab732d2a9083d832c5a850773832fd551149b.

* [CSPM] Make sure the failing reports are reported first using a stable sort (#15189)

* convert config subcommand to fx (#15178)

* Add GlobalTags on basic telemetry when no hostname is detected (#14776)

* Auto compute timeWindow on external metrics based on DatadogMetrics maxAge (#14840)

* Auto compute timeWindow on external metrics based on DatadogMetrics maxAge

* Update releasenotes/notes/add-automatic-time-window-ffed100742f51246.yaml

Co-authored-by: Kaylyn <[email protected]>

* Review feedback

Co-authored-by: Kaylyn <[email protected]>

* Add a Stop lifecycle for log to call log.Flush (#15185)

kitchen_test_system_probe_xxx are flaky tests.

* Use Wixsharp for the Windows Installer (#13459)

* [gitlab+golang] updating builders + bumping tooling 1.19.5 (#15164)

* [gitlab+golang] updating builders + bumping tooling 1.19.5

* [changelog] updating changelog entry

* [circleci] updating runners

* [test] small cleanup

* [release] PR merged so MACOS_BUILD_VERSION back to master

* Add detection for ECS EC2 and start workloadmeta ECS collector accordingly (#14978)

* Adding log source latency stats to InfoProvider  (#15038)

* adding NoProxyNonexactMatchExplicitlySet to ProxyMeta payload within Host Payload to identify if customer did explicitly use/change the param no_proxy_nonexact_match within agent config.

* adding testing log source attributes

* removing bytes read and adding that attribute into log source instead

* adjust type32 to type64 for the agent build

* Update pkg/logs/sources/source.go

Byte to Bytes

Co-authored-by: Dustin J. Mitchell <[email protected]>

* new helper function for BytesRead type

* Update InfoProvider with Log Latency and getting rid of log latency stats from Source

* Changing language and replacing strconv.FormatInt with fmt.sprintf function.

Co-authored-by: Dustin J. Mitchell <[email protected]>

* [omnibus] retrieve freetds tarball using HTTPS instead of FTP (#15159)

* cmd/cluster-agent/subcommands/start: fix rcClient instanciation (#15193)

* [dca] filter pod annotation during workloadmeta collection (#15089)

Co-authored-by: Lénaïc Huard <[email protected]>

* introduce root node validator (#15032)

* [workloadmeta/containerd] Collect SBOMs with Trivy (#15139)

* [go.mod] Add Trivy

* [util/kubernetes/apiserver/leaderelection] Adapt to new k8s version

* Run inv security-agent.gen-mocks

* [docker] Adapt tests and fakes to new version

* [util/containerd] Expose raw client

* [util] Add Trivy client

* [workloadmeta/collectors/containerd] Collect SBOMs with Trivy

* [build-tags] Use trivy only in core agent

* Add release note

* Add missing licenses

* [github/codeowners] Assign /pkg/util/trivy to container-integrations

* [security-agent] bump security agent policies to v0.43.0 (#15190)

* Fix failing test (#15180)

rerun pipeline

* Fix `sbom` check (#15192)

* [DCA] Add tagger-list and workload-list cmds (#15135)

* [DCA] Add tagger-list and workload-list cmds

* Apply suggestions from code review

Co-authored-by: Lénaïc Huard <[email protected]>

* Include workload- and tagger-list in DCA flare

Co-authored-by: Lénaïc Huard <[email protected]>

* fix(cluster-agent): apply cluster-name normalization in ksm-core (#15057)

Co-authored-by: Bryce Eadie <[email protected]>
Co-authored-by: Xavier Lucas <[email protected]>

* Add option to use image mount instead of image export (requires mounting /var/lib and SYS_ADMIN in core Agent) (#15183)

* [tasks] Read go unit test reports in utf-8 format (#15202)

Explicitly sets the file reading encoding to utf-8 when reading the Go unit test report json file.

* [CSPM] bump `github.com/open-policy-agent/opa` to v0.48.0 (#15200)

* bump `github.com/open-policy-agent/opa` to v0.48.0

* `inv -e generate-licenses`

* [corechecks/snmp] Add `id` and `source_type` to Topology Links data (#15184)

Co-authored-by: pducolin <[email protected]>

* add orchestrator and cws url overwrite when fips is enabled (#15195)

* feat(fips): add orchestrator and cws url overwrite when fips is enabled

Signed-off-by: Nicolas Guerguadj <[email protected]>

* [CWS] Converting SecAgent Start command to fx (#14814)

* compliance subcmd using log and config components

* runtime using components

* fixing rebase clobbers

* moving root command

subcommand

* converting app/start.go and app/app.go to start/command.go (and command.go in previous commit)

start

start test

start

start

start command var change

start cmd import

* compliance, runtime, and check commands using command instead of common

* moving logs context to command from common, and deleting duplicated logs context code in runtime

* param setup

* handle case with no config files

* squashing more usages of pkg/util/log and pkg/config

* moving ConfigParams in compliance check fnc until after cfgpath has been parsed

* adding cluster agent bool to check entrypoint

* using log component in logs context

* fixing no api key error message

* adding log.Flush() to one shot funcs. TODO: remove once log component has self-flushing capabilities

* release note

* release note edits

* using lambda fnc to handle different entrypoints to check

* release note edit

* commit: removing log.Flush() because log component now has a lifecycle hook for flushing: https://github.com/DataDog/datadog-agent/pull/15185/files

* fixing check unsupported

* fixing import in check unsupported

* release note edit

* Ensure cloud foundry container tags are unique (#15066)

* Ensure collector tags are unique

* small refactor

* Format file

Co-authored-by: NouemanKHAL <[email protected]>

* [collector/python] uses ianlancetaylor's cgosymbolizer. (#14673)

* [collector/python] uses ianlancetaylor's cgosymbolizer.

The one we ship and use is outdated and is known to cause hangs.
See: https://github.com/golang/go/issues/45558#issuecomment-820764029

* Update cgosymbolizer.

* Update licencse

* add linux build tag

Co-authored-by: Brian Floersch <[email protected]>

* Configure analyzers used for SBOM generation (#15204)

* Allow specifying the trivy analyzers
* Only scan a few folders when using only os analyzers

Co-authored-by: Cedric Lamoriniere <[email protected]>

* Network USM : avoid ebpf maps contentions (#15166)

TLDR all maps with conn_tuple_t as key must be edited by the loader to MaxConnectionTracked (65536 by default)

Avoiding hash_lru_map contention (+50% system cpu on user pod) due to map too small compare the numbers of connections
On staging setup : >8000 sockets running on 16 cores, on the packets receive path (socket/classifier)

The main issue is the kernel spinlock an internal LRU list for evicted elements, this list is shared with all cores

Moved the ebpf maps to the ebpf program that instantiate them as they not shared
Only instantiate maps only once

* Add test for multiple items (#13319)

* Use container_image_collection in config options (#15197)

rename `workload.image_collection` config option to
`container_image_collection` to be more product feature oriented.

* [CWS] [SEC-176] Enable CWS in integration tests (#14146)

* Enable CWS in integration tests

* Remove system probe files at package removal

* Display remaining files after package removal

* Catch error when parsing 'status' JSON output

* Do not test CWS on iot agent flavor

* Only enable CWS on supported platforms

* Give more time to security-agent to communicate with system-probe

* Retry to reach the agent config endpoint

* Install policycoreutils-python on CentOS to apply SElinux rules at install time

* Add release notes

* deleting original MergeConfigurationFiles fnc (#14896)

* CWS: sync BTFhub constants (#15215)

Co-authored-by: paulcacheux <[email protected]>

* Move public IPv4 support to the cloudprovider package

* Remove dead code for EC2 local IPv4

This code is no longer use since the PR #12971.

* Move EC2 imds helper to their own file

* Move EC2 network metadata support to its own file

* Adding 'cloud_provider_source" to the inventories payload

We now track from where we fetch metadata related to a cloud provider.
This only support AWS EC2 for now. Depending on the instances
types, configuration, ... the Agent can use multiple sources to deduce
it's running one EC2.
The source used is now sent as 'cloud_provider_source' in inventories.

* [kitchen] Fix CWS integration tests on CentOS step-by-step tests (#15218)

Follow-up of #14146, applies the SELinux fix (from the install script cookbook) to the step-by-step cookbook, to make sure system-probe works correctly on CentOS 7 (which has SELinux enabled by default).

* pkg/clusteragent/admission: fix rc tracking annotations (#15219)

* Update release.json and Go modules for 6/7.43.0-rc.1 (#15216)

* [CWS] force DNS resolver to read `/etc/resolv.conf` (#15220)

* [CWS] fix stacktrace in signal/ptrace rules evaluation (#15225)

* [CWS] remove dead-lock in AD finalize when resolving tags (#15233)

* [workloadmeta/collectors/containerd] Disable sbom correctly when Trivy is not built (#15234)

When SBOM collection was enabled in a built without Trivy, the agent was still
pushing images to the `imagesToScan` channel. The channel was not initialized,
so this was blocking the agent.

* Bump snowflake to 2.8.3 and add back installing library (#15207)

* Bump snowflake to 2.8.3 and add back installing library

* Only include pip change

* Add back snowflake bump

* Fix version

* Changelogs for 7.42.0 release (#15158) (#15237)

* Changelogs for 7.42.0 release (#15158)

* Changelogs for 7.42.0 release

* Update CHANGELOG.rst

* Update CHANGELOG.rst

* Update CHANGELOG.rst

* Add empty space

* https: soWatcher, shared_libraries use a pathIdentifier as key of ELF binaries (#13748)

Bugfixes to support `network_config.enable_https_monitoring` in a k8s clusters

* pathIdentifier is an unique id of and ELF (system-wide) as it contain dev and inode as key.
* New Unregister path, thanks to processMonitor that recieve process EXIT event and unregister the uprobe (maintained by a refcount)
* ebpf UID use pathIdentifier as source of truth and use wider alphabet (base64), specially because the UID is limited (5 chars)

Motivation : Follow up on #incident-16860, #incident-18347

* Add logging in max cpu/mem defaulting (#15257)

* Fix the invocation of the secret backend from the cluster agent (#15250)

* Bump the version of `emicklei/go-restful` (#15252)

* Bump the version of `gopkg.in/yaml.v2` (#15253)

* [process-agent] Fix nil deref in check cmd (#15254)

* [sbom] Store generation duration and report it (#15258)

* [workloadmeta/collectors/containerd] Add image scan duration to telemetry

* [workloadmeta/collectors/containerd] Store SBOM generation duration

* [corechecks/sbom] Process SBOM generation duration

* [sbom] Store and report generation time

* [CWS] flush upstream kernel btf spec cache after use (#15264)

* Fix a bug in workloadmeta containerd collector (#15260)

* Update release.json and Go modules for 6/7.43.0-rc.2 (#15243)

* Fix system-probe build tags difference (#15268)

* [USM] go TLS cleanup debug messages (#15246)

* scan existence /proc/pid for 10 ms, it's better to do that in the callback
* report golang hooking issue only if it's a golang binary
* report only once when we unregister binary

* Release BTF cached by cilium/ebpf (#15269)

* [process-agent] Fix `Drop Check Payloads` status (#15274)

* Workaround lxn/walk issue on Windows 7/2008r2 (#15275)

* [CWS] bump security agent policies to v0.43.1 (#15280)

* CWS: sync BTFhub constants (#15285)

Co-authored-by: paulcacheux <[email protected]>

* [CWS] useless lock from AD manager (#15287)

* Update last_stable entries in release.json to 6/7.42.0 (#15289)

Updates the last_stable entries to 6/7.42.0 on main.

* [contimage] Split container image metadata in one event per registry (#15292)

* [sbom] Split sbom in one event per registry (#15295)

* [kitchen] Use official datadog cookbook for initial Agent install in upgrade scenario (#15300)

Updates the win-upgrade-rollback kitchen test suite to use the official datadog cookbook for the initial install.

* [CWS] always lock AD in the same order (#15290)

* Update release.json and Go modules for 6/7.43.0-rc.3 (#15296)

* Fix silent mutation of integration.Config in secret decryption (#15298)

* Do not run SBOM collection while running `agent check` (#15327)

* Lower memory allocated to ring buffer (#15245)

* [Windows] Fix the connection established check. (#15301)

Fixes the reporting of the established state on windows.
Also disables the test for `TCPCollectionDisabled`, as it is a (now)
known problem on Windows.

* Remove BTF exceptions (#15316)

We have these kernels now

* Speed up system-probe build by not copying unnecessary files (#15231)

* Speed up system-probe build by not copying unnecessary files

* Add fallback to find if rsync not available

* Fix python lint

* Bump Collector dependencies to v1.0.0-RC4/v0.70.0 (#15230)

* Bump Collector dependencies to v1.0.0-RC4/v0.70.0

* Add to release notes

* Fix format

* Update releasenotes/notes/v0.70.0-otel-c59cf4b8673d9497dc27f4d4f38dea2db79e74ed.yaml

Co-authored-by: Bryce Eadie <[email protected]>

* Upgrade opentelemetry-collector-contrib version

---------

Co-authored-by: Bryce Eadie <[email protected]>

* Address VPA fixes caught in QA (#15328)

* Address fixes caught in QA

* Commit to retrigger build

* protocols: Uses alpine based images as they are slimmer (#15310)

* system-probe: re-ordered protocols into directories (#15308)

* protocol classifications: tests: Restructure tests to cut 50% runtime (#14987)

* protocol classifications: tests: Restructure tests to cut 50% runtime

* Fixed cr comment

* Bump go.uber.org/zap from 1.23.0 to 1.24.0 in /pkg/otlp/model (#14597)

Bumps [go.uber.org/zap](https://github.com/uber-go/zap) from 1.23.0 to 1.24.0.
- [Release notes](https://github.com/uber-go/zap/releases)
- [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/uber-go/zap/compare/v1.23.0...v1.24.0)

---
updated-dependencies:
- dependency-name: go.uber.org/zap
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* java tls: remove requirement for runtime compilation (#15276)

* [usm] Fix concurrency issue in test (#15273)

* [system-probe] Handle stats overflows (#15282)

* [system-probe] make sure stats are reported as soon a module registration happens (#15329)

When modules are registered in the system-probe, a process to update
stats in the global `l` field every 15 seconds starts.

Those stats are pulled in by the `agent status` command which user can
use to see what's running in the agent.

The loop was written as:

```
for now := <-ticker {
  // update stuff
}
```

Which mean that for the first 15 seconds fo the agents life, stats would
come back incomplete. Meaning that for the first 15 seconds of the agent's.

This PR changes the stats loop to compute stats *then* wait for 15
seconds.

QA instructions (meant to run on a vanilla linux host):

* in one terminal window run `watch "sudo -u dd-agent
  /opt/datadog-agent/bin/agent/agent  status  | head -n10"`
* in another terminal window, restart the system-probe
* you should *not*  see errors about systemprobe.tmpl failing to render.
  Running these steps on 7.42 and below shows:

  ```
  Getting the status from the agent.
  template: /systemprobe.tmpl:11:34: executing "/systemprobe.tmpl" at <.updated_at>: invalid value; expected float64
  ```

  For the first 15 seconds.

  You should *not see* see these errors in the first 15 seconds of the
  system probe running (or ever).

* Bump resource limits for circleci unit_tests job (#15342)

* usm: Added usm to agent status command (#15232)

* usm: Added usm to agent status command

* Added releasenotes

* Rename releasenotes file due to a CI linter

* Fixed CR comment

* Fixed release notes

* Fixed cr comment

* Fixed tests

* Reducing duplication

* Fixed tests

* [system-probe] Add process monitoring and USM tagging (#12280)

* tracer: protocol classification: Adding a workaround to handle hitting instructions limits on socket filter (#15343)

* tracer: protocol classification: Adding a workaround to handle hitting instructions limits on socket filter

* Fixed condition

* fix(RCM): Fix timeout of client bypass (#15326)

Before this commit, when the previous request was still pending,
the client bypass wouldn't timeout until it managed to send its
request, meaning that the effective maximum timeout for a new client
was (request TTL) + (bypass TTL).

This was made more visible when the refresh interval went from 60s
to 5s minimum.

Now, the client bypass timeout takes the previous request into account
as well, so that a client doesn't wait for more than 2s on its first
request

* [CWS Agent] fix signature of NewRuleFilterModel on non-linux platforms

* [CWS] fix size of args (#15323)

* [CWS] split ResolveFields (#15261)

* pkg/clusteragent/admission/patch: make file provider ready for e2e testing (#15221)

* pkg/clusteragent/admission/patch: make file provider ready for e2e testing

* [USM] old java hotspot need credential (#15278)

Hotspot reject connection if credential by checking uid/gid of the connect() SOL_SOCKET/SO_PEERCRED
but older hotspot JRE (1.8.0) doesn't accept root and want explicitly uid/gid matching

side effect for go, during the connect() syscall we don't want to fork() and stay on the same pthread
to avoid side effect of set effective uid/gid.

* [corechecks/snmp] No local resolution if multiple results (#15262)

* [corechecks/snmp] Use dd_id instead of idType ndm (#15265)

* [pkg/trace] Embed ptraceotlp.UnimplementedGRPCServer to address future breaking change (#15291)

* Moved couple of noisy logs to trace (#15340)

* Don't update the configuration if it already exists (#15339)

The failures in the unit tests are unrelated.

* pkg/trace: Emit APM onboarding events on startup (#14799)

Collect trace agent startup errors and successes using instrumentation-telemetry "apm-onboarding-event" messages.

* switch account (#14572)

* lower log level (#15306)

* tracer: Use aliases to string instead of converting types (#15344)

* tracer: Use aliases to string instead of converting types

* Removed another conversion

* [CWS] remove unsafe cache (#15213)

* specialize string cache

* int and bool caches

* remove unused pointer import

* better error reporting

* fixup some cache get and cleanup

* remove `AppendFieldValues` (#15244)

* Separate system-probe config from datadog config (#14024)

* [serverless] fix: do not try to enable log api for local testing (#15229)

* fix: do not try to enable log api for local testing

* refactor: move out some code in functions, do not use go routine when not needed

* Revert "refactor: move out some code in functions, do not use go routine when not needed"

This reverts commit 750aa784895578ef01c7ffbe5bb150542b9b621f.

* use go friendly return to avoid extra indent

* export one constant rather than reusing string for local test

* [omnibus] Upgrade setuptools to 66.1.1, pip to 22.3.1 in Python 3 embedded environment (#15356)

- The python3 software definition has been updated to install the versions of pip (==22.0.4) and setuptools (==56.0.0) that are bundled alongside Python 3.8.16,
- The pip3 and setuptools3 software definitions have been updated: instead of installing from scratch (using python3 setup.py install), they use the bundled pip to install themselves. pip3 has been updated to 22.3.1, setuptools3 to 66.1.1,
- pip-tools (installed in the datadog-agent-integrations-py3 software definition) was upgraded from 6.4.0 to 6.12.1, as 6.5.0+ is required for pip 22.x support.

Co-authored-by: Lénaïc Huard <[email protected]>

* [CWS] remove useless usage of unsafe in SECL registers (#15214)

* [corechecks/{containerimage,sbom}] Fix parsing of config (#15355)

* [corechecks/containerimage] Fix parsing of config

* [corechecks/sbom] Fix parsing of config

* [kitchen] Use busser-rspec_datadog gem for tests (#15271)

Switches kitchen tests to use the busser-rspec_datadog gem, published to RubyGems by Datadog, from the busser-rspec-datadog fork of busser-rspec.

This gem behaves the same way as the upstream busser-rspec gem, except for the bundler version it installs, which is pinned to 2.3.26, to ensure it remains compatible with Ruby 2.5.

To do so, all kitchen folders previously named rspec are now named rspec_datadog (as busser uses folder names to guess which gem to install).

Removes the workaround introduced in #14851.

* [gitlab] Add Windows Agent team to GITHUB_SLACK_MAP (#15203)

Add the Windows Agent team to the GITHUB_SLACK_MAP.

* [Serverless] fix http + https proxy (#15320)

* USM: adding service_monitoring.java_agent_args=string parameter (#15314)

USM: adding service_monitoring.java_agent_args=string parameter
to pass through injected agent-usm.jar : agentmain(java_agent_args)

* [golangci-lint] Upgrade to version 1.50.1 (#15348)

* Also increase golangci-lint timeout

* Bump requests from 2.28.1 to 2.28.2 in /test/e2e/cws-tests (#15375)

Bumps [requests](https://github.com/psf/requests) from 2.28.1 to 2.28.2.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.28.1...v2.28.2)

---
updated-dependencies:
- dependency-name: requests
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump datadog-api-client from 2.7.0 to 2.8.0 in /test/e2e/cws-tests (#15374)

Bumps [datadog-api-client](https://github.com/DataDog/datadog-api-client-python) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/DataDog/datadog-api-client-python/releases)
- [Changelog](https://github.com/DataDog/datadog-api-client-python/blob/master/CHANGELOG.md)
- [Commits](https://github.com/DataDog/datadog-api-client-python/compare/2.7.0...2.8.0)

---
updated-dependencies:
- dependency-name: datadog-api-client
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [CWS] introduce SECLDoc and add secl field examples to documentation (#15109)

* [CWS] improve SetFieldValue tests (#15201)

* [CWS] fix e2e tests (#15381)

* Add history workload-list container-image entity (#15353)

* [workloadmeta] Delete unnecessary attrs of ContainerImageMetadata (#15385)

* [CWS] bump ebpf-manager fixing a data race (#15382)

* [CSPM] [SEC-6966] Allow specifying processes environment variables as rule inputs (#15241)

* [CSPM] Allow specifying processes environment variables as rule inputs

* [CSPM] Support env variables without "="

* [CSPM] Add omitempty flag for Process Envs param

* [tagger/telemetry] Extract subsystem const (#15386)

* Bump system probe build image (remove entrypoint) (#15393)

* [pkg/otlp/model] Do not send first value for cumulative monotonic sums if start timestamp matches timestamp (#15363)

* [pkg/otlp/model] Do not send first value for cumulative monotonic sums if start timestamp matches timestamp

* Add to release note

* Fix filename lint

* Fix release note type

* [process-agent] Refactor chunking to use generics (#15318)

* First draft of generic chunking

* Second draft of generics

* All use cases migrated to generics

* Fix null ptr return

* Add comments

* Remove ptr

* Fix test

* Use SetActiveChunk API

* Relnotes

* Relnotes update

* change flake8 url to github (#15398)

* Oracle integration boilerplate

* Create dockerpool for testing

* Add pkgs

* register oracle check

* hello world metric

---------

Signed-off-by: Brian L. Troutwine <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Nicolas Guerguadj <[email protected]>
Co-authored-by: Maxime mouial <[email protected]>
Co-authored-by: Ahmed Mezghani <[email protected]>
Co-authored-by: Katie Hockman <[email protected]>
Co-authored-by: Julio Greff <[email protected]>
Co-authored-by: Pablo Baeyens <[email protected]>
Co-authored-by: Paul Cacheux <[email protected]>
Co-authored-by: Rey Abolofia <[email protected]>
Co-authored-by: Maxime David <[email protected]>
Co-authored-by: AJ Stuyvenberg <[email protected]>
Co-authored-by: Derek Brown <[email protected]>
Co-authored-by: Lénaïc Huard <[email protected]>
Co-authored-by: Olivier G <[email protected]>
Co-authored-by: Slavek Kabrda <[email protected]>
Co-authored-by: Alexandre Yang <[email protected]>
Co-authored-by: Sylvain Afchain <[email protected]>
Co-authored-by: modernplumbing <[email protected]>
Co-authored-by: Julien Lebot <[email protected]>
Co-authored-by: Branden Clark <[email protected]>
Co-authored-by: Emma Ferguson <[email protected]>
Co-authored-by: Ivan Ilichev <[email protected]>
Co-authored-by: Guy Arbitman <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: paulcacheux <[email protected]>
Co-authored-by: Jonathan Ribas <[email protected]>
Co-authored-by: Vickenty Fesunov <[email protected]>
Co-authored-by: maxime mouial <[email protected]>
Co-authored-by: Kyle Verhoog <[email protected]>
Co-authored-by: Kari Halsted <[email protected]>
Co-authored-by: Scott Opell <[email protected]>
Co-authored-by: Nicolas PLANEL <[email protected]>
Co-authored-by: Paul <[email protected]>
Co-authored-by: William Yu <[email protected]>
Co-authored-by: Andrew Glaude <[email protected]>
Co-authored-by: Brian L. Troutwine <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yoann Ghigoff <[email protected]>
Co-authored-by: ruthnaebeck <[email protected]>
Co-authored-by: hestonhoffman <[email protected]>
Co-authored-by: Brian Floersch <[email protected]>
Co-authored-by: Lénaïc Huard <[email protected]>
Co-authored-by: Pedro Lambert <[email protected]>
Co-authored-by: Jaime Fullaondo <[email protected]>
Co-authored-by: Rich L <[email protected]>
Co-authored-by: Adam Karpowich <[email protected]>
Co-authored-by: Florian Veaux <[email protected]>
Co-authored-by: Baptiste Foy <[email protected]>
Co-authored-by: David Ortiz <[email protected]>
Co-authored-by: Alexandre Menasria <[email protected]>
Co-authored-by: daniel-taf <[email protected]>
Co-authored-by: pducolin <[email protected]>
Co-authored-by: Kangyi LI <[email protected]>
Co-authored-by: Kylian Serrania <[email protected]>
Co-authored-by: Usama Saqib <[email protected]>
Co-authored-by: Bryce Kahle <[email protected]>
Co-authored-by: alexgallotta <[email protected]>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sylvain Baubeau <[email protected]>
Co-authored-by: Kyle Ames <[email protected]>
Co-authored-by: Len Gamburg <[email protected]>
Co-authored-by: May Lee <[email protected]>
Co-authored-by: Guillaume Fournier <[email protected]>
Co-authored-by: Pierre Guilleminot <[email protected]>
Co-authored-by: Vincent Boulineau <[email protected]>
Co-authored-by: Kaylyn <[email protected]>
Co-authored-by: Duong (Yoon) <[email protected]>
Co-authored-by: Dustin J. Mitchell <[email protected]>
Co-authored-by: David du Colombier <[email protected]>
Co-authored-by: Cedric Lamoriniere <[email protected]>
Co-authored-by: Bryce Eadie <[email protected]>
Co-authored-by: Xavier Lucas <[email protected]>
Co-authored-by: Nicolas Guerguadj <[email protected]>
Co-authored-by: Sarah Witt <[email protected]>
Co-authored-by: NouemanKHAL <[email protected]>
Co-authored-by: Rémy Mathieu <[email protected]>
Co-authored-by: Kaden Wilkinson <[email protected]>
Co-authored-by: Kacper <[email protected]>
Co-authored-by: Andrew Zhang <[email protected]>
Co-authored-by: Corrina Sivak <[email protected]>
Co-authored-by: Yang Song <[email protected]>
Co-authored-by: Joshua Lineaweaver <[email protected]>
Co-authored-by: Hasan Mahmood <[email protected]>
Co-authored-by: Lee Avital <[email protected]>
Co-authored-by: paullegranddc <[email protected]>
Co-authored-by: Misha Badov <[email protected]>
Co-authored-by: alexbarksdale <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog [deprecated] qa/skip-qa - use other qa/ labels [DEPRECATED] Please use qa/done or qa/no-code-change to skip creating a QA card team/containers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants