Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to enable profiling of Python integrations #13576

Merged
merged 1 commit into from
Dec 27, 2022

Conversation

djova
Copy link
Contributor

@djova djova commented Dec 23, 2022

What does this PR do?

Add new config option integration_profiling which enables profiling of python integrations.

Motivation

This has proven valuable several times already while troubleshooting performance issues so we're adding it as an option to have it available by default without requiring a custom build.

image

Additional Notes

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached
  • If the PR doesn't need to be tested during QA, please add a qa/skip-qa label.

@djova djova requested a review from a team as a code owner December 23, 2022 20:38
@ghost ghost added the base_package label Dec 23, 2022
djova added a commit to DataDog/datadog-agent that referenced this pull request Dec 23, 2022
djova added a commit to DataDog/datadog-agent that referenced this pull request Dec 23, 2022
Add a new option to enable profiling of python integrations. It's used only within the python integrations. See DataDog/integrations-core#13576.
@codecov
Copy link

codecov bot commented Dec 23, 2022

Codecov Report

Merging #13576 (c86b0f7) into master (28ad542) will increase coverage by 0.23%.
The diff coverage is 33.33%.

Flag Coverage Δ
activemq_xml 82.31% <ø> (ø)
aerospike 87.17% <ø> (+0.32%) ⬆️
airflow 90.00% <ø> (ø)
amazon_msk 88.67% <ø> (ø)
ambari 85.75% <ø> (ø)
apache 95.08% <ø> (ø)
arangodb 98.21% <ø> (ø)
argocd 88.43% <ø> (ø)
avi_vantage 92.50% <ø> (ø)
azure_iot_edge 82.00% <ø> (ø)
boundary 100.00% <ø> (ø)
btrfs 82.91% <ø> (ø)
cacti 87.90% <ø> (ø)
calico 83.33% <ø> (ø)
cassandra_nodetool 93.16% <ø> (ø)
ceph 91.02% <ø> (ø)
cert_manager 77.41% <ø> (ø)
cilium 75.34% <ø> (+0.93%) ⬆️
citrix_hypervisor 87.50% <ø> (ø)
clickhouse 95.31% <ø> (ø)
cloud_foundry_api 95.98% <ø> (+0.12%) ⬆️
cloudera 99.08% <ø> (ø)
cockroachdb 90.96% <ø> (ø)
consul 91.64% <ø> (ø)
coredns 94.54% <ø> (ø)
couch 95.19% <ø> (+0.24%) ⬆️
couchbase 83.58% <ø> (ø)
datadog_checks_base 89.51% <33.33%> (+0.32%) ⬆️
datadog_checks_dev 82.24% <ø> (+0.07%) ⬆️
datadog_checks_downloader 78.64% <ø> (+0.99%) ⬆️
datadog_cluster_agent 90.00% <ø> (ø)
ddev 98.63% <ø> (ø)
disk 91.69% <ø> (ø)
dns_check 93.90% <ø> (ø)
druid 97.70% <ø> (ø)
ecs_fargate 80.05% <ø> (ø)
eks_fargate 94.05% <ø> (ø)
elastic 91.61% <ø> (ø)
envoy 94.00% <ø> (-0.23%) ⬇️
etcd 93.96% <ø> (ø)
exchange_server 96.85% <ø> (+11.81%) ⬆️
external_dns 89.09% <ø> (ø)
fluentd 94.77% <ø> (ø)
foundationdb 83.88% <ø> (ø)
gearmand 78.26% <ø> (+1.24%) ⬆️
gitlab 89.94% <ø> (ø)
gitlab_runner 91.94% <ø> (ø)
glusterfs 80.09% <ø> (+0.92%) ⬆️
go_expvar 92.73% <ø> (ø)
gunicorn 92.10% <ø> (-0.76%) ⬇️
haproxy 95.12% <ø> (+0.16%) ⬆️
harbor 80.04% <ø> (ø)
hazelcast 92.39% <ø> (ø)
hdfs_datanode 89.74% <ø> (ø)
hdfs_namenode 86.72% <ø> (ø)
http_check 95.38% <ø> (+2.08%) ⬆️
ibm_ace 91.79% <ø> (ø)
ibm_db2 95.10% <ø> (ø)
ibm_i 81.95% <ø> (ø)
ibm_mq 91.32% <ø> (ø)
ibm_was 96.08% <ø> (ø)
iis 94.61% <ø> (+38.78%) ⬆️
impala 97.97% <ø> (ø)
istio 77.65% <ø> (+0.55%) ⬆️
kafka_consumer 84.06% <ø> (ø)
kong 87.56% <ø> (ø)
kube_apiserver_metrics 97.69% <ø> (ø)
kube_controller_manager 96.00% <ø> (ø)
kube_dns 95.33% <ø> (ø)
kube_metrics_server 94.87% <ø> (ø)
kube_proxy 96.89% <ø> (ø)
kube_scheduler 96.53% <ø> (ø)
kubelet 90.96% <ø> (ø)
linkerd 85.14% <ø> (+1.14%) ⬆️
linux_proc_extras 96.22% <ø> (ø)
mapr 82.70% <ø> (ø)
mapreduce 81.77% <ø> (+0.46%) ⬆️
mcache 93.26% <ø> (ø)
mesos_master 89.75% <ø> (ø)
mesos_slave 93.63% <ø> (ø)
mongo 96.51% <ø> (ø)
network 93.92% <ø> (+0.95%) ⬆️
nfsstat 95.20% <ø> (ø)
nginx_ingress_controller 98.36% <ø> (ø)
openldap 96.33% <ø> (ø)
openmetrics 97.90% <ø> (ø)
openstack 51.45% <ø> (ø)
openstack_controller 90.94% <ø> (ø)
oracle 90.24% <ø> (ø)
pdh_check 95.65% <ø> (ø)
pgbouncer 91.33% <ø> (ø)
php_fpm 90.25% <ø> (+0.84%) ⬆️
postfix 88.04% <ø> (ø)
powerdns_recursor 96.65% <ø> (ø)
process 85.42% <ø> (+0.28%) ⬆️
prometheus 94.17% <ø> (ø)
proxysql 98.97% <ø> (ø)
pulsar 100.00% <ø> (ø)
rabbitmq 94.41% <ø> (ø)
redisdb 87.50% <ø> (ø)
rethinkdb 97.93% <ø> (ø)
riak 99.22% <ø> (ø)
riakcs 93.61% <ø> (ø)
scylla 100.00% <ø> (ø)
silk 93.33% <ø> (ø)
singlestore 90.81% <ø> (ø)
snmp 85.49% <ø> (+0.04%) ⬆️
snowflake 96.47% <ø> (ø)
sonarqube 98.21% <ø> (ø)
spark 93.57% <ø> (-0.29%) ⬇️
squid 100.00% <ø> (ø)
ssh_check 91.58% <ø> (ø)
statsd 87.36% <ø> (+1.05%) ⬆️
supervisord 92.30% <ø> (ø)
system_core 90.90% <ø> (ø)
system_swap 98.30% <ø> (ø)
tcp_check 91.58% <ø> (ø)
teamcity 88.35% <ø> (+2.87%) ⬆️
teradata 94.24% <ø> (ø)
tls 91.82% <ø> (+0.84%) ⬆️
tokumx 58.40% <ø> (?)
traffic_server 96.13% <ø> (ø)
twemproxy 79.45% <ø> (ø)
twistlock 79.62% <ø> (ø)
varnish 84.39% <ø> (+0.26%) ⬆️
vault 95.53% <ø> (+0.57%) ⬆️
vertica 98.50% <ø> (ø)
voltdb 96.84% <ø> (ø)
vsphere 89.91% <ø> (+0.08%) ⬆️
win32_event_log 86.40% <ø> (+0.27%) ⬆️
windows_performance_counters 98.36% <ø> (ø)
windows_service 98.00% <ø> (ø)
wmi_check 92.91% <ø> (ø)
yarn 89.14% <ø> (ø)
zk 86.63% <ø> (+1.55%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Add new config option `integration_profiling` which enables [profiling](https://docs.datadoghq.com/profiler/enabling/python/#usage) of python integrations.

This has proven valuable several times already while troubleshooting performance issues so we're adding it as an option to have it available by default without requiring a custom build.
@djova djova force-pushed the djova/add-integration-profiling-option branch from 844c579 to c86b0f7 Compare December 23, 2022 20:47
djova added a commit that referenced this pull request Dec 23, 2022
Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables bettter linking between APM & profiling data.

Since all integrations are now reporting under a single service, we now include the integration name in the resource names to be able to differentiate between the integrations.
@ofek ofek changed the title add option to enable profiling of python integrations Add option to enable profiling of Python integrations Dec 27, 2022
@ofek ofek merged commit ea51537 into master Dec 27, 2022
@ofek ofek deleted the djova/add-integration-profiling-option branch December 27, 2022 17:26
djova added a commit that referenced this pull request Dec 27, 2022
Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables bettter linking between APM & profiling data.

Since all integrations are now reporting under a single service, we now include the integration name in the resource names to be able to differentiate between the integrations.
djova added a commit that referenced this pull request Dec 27, 2022
Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables better linking between APM & profiling data, specifically the "Code Hotspots" feature.

Since all integrations are now reporting under a single service, the resource name is updated to refer only to the integration name in order to enable us to differentiate reporting from the different integrations. They are all visible in the resource list for that one common service.
djova added a commit that referenced this pull request Dec 27, 2022
Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables better linking between APM & profiling data, specifically the "Code Hotspots" feature.

Since all integrations are now reporting under a single service, the resource name is updated to refer only to the integration name in order to enable us to differentiate reporting from the different integrations. They are all visible in the resource list for that one common service.
djova added a commit that referenced this pull request Dec 27, 2022
Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables better linking between APM & profiling data, specifically the "Code Hotspots" feature.

Since all integrations are now reporting under a single service, the resource name is updated to refer only to the integration name in order to enable us to differentiate reporting from the different integrations. They are all visible in the resource list for that one common service.
djova added a commit to DataDog/datadog-agent that referenced this pull request Dec 28, 2022
Add a new option to enable profiling of python integrations. It's used only within the python integrations. See DataDog/integrations-core#13576.
djova added a commit that referenced this pull request Dec 28, 2022
* update integration tracing naming scheme

Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables better linking between APM & profiling data, specifically the "Code Hotspots" feature.

Since all integrations are now reporting under a single service, the resource name is updated to refer only to the integration name in order to enable us to differentiate reporting from the different integrations. They are all visible in the resource list for that one common service.

* dbm use job name
guyarb added a commit to DataDog/datadog-agent that referenced this pull request Jan 18, 2023
* [fargate] Make hostname resolution more reliable (#14746)

* [config/environment] Check AWS_EXECUTION_ENV in Fargate detection

* [util/fargate] Rely on features for ECS Fargate detection

* [fargate/detection] Rely on features to detect EKS

* [trace-agent/config] Call fargate.GetOrchestrator after loading config

* add unit-test for trace-agent config on fargate

* Add release note

* [cmd/trace-agent/config] Fix TestFargateConfig in macOS

Co-authored-by: Cedric Lamoriniere <[email protected]>

* 7.41.0 CHANGELOG (#14675) (#14745)

* Updated Python to 3.8.16

* CWS: sync BTFhub constants (#14804)

Co-authored-by: paulcacheux <[email protected]>

* [CSPM] respect verbose on compliance check cli cmd (#14750)

* CODEOWNERS: splitting files so USM can own its own files (#14789)

* config: test: Removed duplicated test (#14705)

* Running dockers in the kitchen test (#14589)

* ci: kitchen: Allow running dockers in kitchen test, and extend the filesystem

The PR introduce a way to run external dockers in the kitchen tests, without pulling them
As we cannot authenticate in the kitchen machines to dockerhub, we had to work around that
and we are pulling and saving the dockers in gitlab, uploading them to the remote machine
using kitchen, and then loading those dockers on the remote machine so they are available
for usage.

In the PR we added steps to install docker and docker compose on the kitchen machines.

The PR introduce an example test that runs dockers.

During the PR we faced the problem of "no space left on the device", to solve those errors
we have to extend the filesystem of the remote machines.

* Fixed cr comments

* Debugging the artifacts

* Debugging the artifacts

* Debugging the artifacts

* Debugging the artifacts

* revert artifacts

* Giving another try to dependencies

* Fixed path

* Fixed CR comment

* [CWS] Add tests for activity dump processes content (#14708)

* [CWS] Add two checks to avoid adding nodes with abnormal paths in activity dumps (#14698)

* [gitlab] Repack macOS JUnit tarball to include correct name and job URL (#14793)

* Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /pkg/security/secl (#14710)

* Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /pkg/security/secl

Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.3.0 to 0.4.0.
- [Release notes](https://github.com/golang/tools/releases)
- [Commits](https://github.com/golang/tools/compare/v0.3.0...v0.4.0)

---
updated-dependencies:
- dependency-name: golang.org/x/tools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-generate go.sum and LICENSE-3rdparty.csv changes

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: paulcacheux <[email protected]>

* [single-machine-performance] Introduce regression detector jobs (#14528)

* [WIP][single-machine-performance] Introduce regression detector jobs

This PR intends to introduce the Single Machine Performance regression detector
into Agent CI. This builds on work done in #14477 and is peer to #14438. The
Regression Detector is a CI tool that determines if a changed introduced into a
project modifies project performance in a way that is more than just random
chance with some statistical guarantee. The Regression Detector is not a
microbenchmarking tool and must operate on the whole Agent. This PR introduces
only 'throughput' as an optmization goal -- how quickly can the Regression
Detector produce load into the Agent -- but other goals are
possible. Regressions are checked per-experiment, please see `tests/regression`
for details about how to define an experiment.

The Regression Detector runs today in vectordotdev/vector project and is
influential in keeping that project's performance consistently high.

REF SMP-208

Signed-off-by: Brian L. Troutwine <[email protected]>

* Use static smp binary

Signed-off-by: Brian L. Troutwine <[email protected]>

* different base sha calculation

Signed-off-by: Brian L. Troutwine <[email protected]>

* Try to clone the whole repo

Signed-off-by: Brian L. Troutwine <[email protected]>

* baseline sha computation redux

Signed-off-by: Brian L. Troutwine <[email protected]>

* specify region explicitly

Signed-off-by: Brian L. Troutwine <[email protected]>

* use smp 0.6.3-rc3

Signed-off-by: Brian L. Troutwine <[email protected]>

* Wait for job to complete, output report, status

Signed-off-by: Brian L. Troutwine <[email protected]>

* update job name

Signed-off-by: Brian L. Troutwine <[email protected]>

* Update smp, lading

Signed-off-by: Brian L. Troutwine <[email protected]>

* remove \

Signed-off-by: Brian L. Troutwine <[email protected]>

* Use smp 0.6.4

Signed-off-by: Brian L. Troutwine <[email protected]>

* diagnose why file_to_blackhole fails

Signed-off-by: Brian L. Troutwine <[email protected]>

* just one test for now

Signed-off-by: Brian L. Troutwine <[email protected]>

* set log level for smp

Signed-off-by: Brian L. Troutwine <[email protected]>

* tweaks

Signed-off-by: Brian L. Troutwine <[email protected]>

* debug

Signed-off-by: Brian L. Troutwine <[email protected]>

* actually add datadog.yaml et all, .gitignore issue?

Signed-off-by: Brian L. Troutwine <[email protected]>

* tidy up cases to initial trio, less file_to_blackhole which needs work

Signed-off-by: Brian L. Troutwine <[email protected]>

* update smp, config tweak

Signed-off-by: Brian L. Troutwine <[email protected]>

* override .gitignore

Signed-off-by: Brian L. Troutwine <[email protected]>

* Apply @GeorgeHahn's patches

Signed-off-by: Brian L. Troutwine <[email protected]>

* enable other tests, tweak OTEL

Signed-off-by: Brian L. Troutwine <[email protected]>

* more fiddling

Signed-off-by: Brian L. Troutwine <[email protected]>

* tweaks

Signed-off-by: Brian L. Troutwine <[email protected]>

* use markdown output report

Signed-off-by: Brian L. Troutwine <[email protected]>

* use OTEL http

Signed-off-by: Brian L. Troutwine <[email protected]>

* use smp 0.6.5-rc1

Signed-off-by: Brian L. Troutwine <[email protected]>

* debug -> info

Signed-off-by: Brian L. Troutwine <[email protected]>

* preserve output

Signed-off-by: Brian L. Troutwine <[email protected]>

* remove stray tick

Signed-off-by: Brian L. Troutwine <[email protected]>

* Update test/regression/README.md

Co-authored-by: Kylian Serrania <[email protected]>

* Update test/regression/README.md

Co-authored-by: Kylian Serrania <[email protected]>

Signed-off-by: Brian L. Troutwine <[email protected]>
Co-authored-by: Kylian Serrania <[email protected]>

* Split bundle params (#14702)

* Split BundleParams into ConfigParams and LogParams
* Move ConfigParams and LogParams to their own file
* Move WithXXX functions from BundleParams to config.Params
* Use constructors for config.Params
* Fix comp/core/log/params_test.go
* Make fields for log.Params unexported
* Make config.Params fields not exported.
* Fix package names in the security agent.
* Explain why `fx.Provide` is needed in bundle.go
* Remove configLoadSecurityAgent from NewSecurityAgentParams
* Add NewAgentParamsWithSecrets and NewAgentParamsWithoutSecrets

* CWS: sync BTFhub constants (#14815)

Co-authored-by: paulcacheux <[email protected]>

* Check the package exists before creating package. Restore install script after packaging. (#14777)

* change networks slack channel (#14819)

* fix close_time value display in INFO log (#14744)

* Updates prometheusScrape to support tag_by_endpoint and collect_counters_with_distributions (#14805)

* Updates prometheusScrape to support tag_by_endpoint

* Adds release note

* Cleans release note

* Also adds support for `collect_counters_with_distributions`

* Updates release note to include the second added parameter

* Updates release note based on suggestion by @clamoriniere

* Migrating flare to a component (#14234)

Migrating flare to a component

This adds a 'flare' component and rework the flare package to be
compatible with fx app and non-fx app.

The flare generation now happens through a FlareBuilder which handles
all the logic of adding data to a flare. This FlareBuilder can be used
directly (by the flare package) or be received by each component when
they register a flare provider.

Migration workflow for each component would be to move their dedicated
code from the flare package to a flare provider.

Note: Until `cmd/systray/` is migrated to fx we can't start using the
flare component from other flare (on windows the systray can create
flare on it's own).

* Add netlink process monitor (#14706)

This monitor will read the netlink socket process events queue and run it on parallel worker (map to n cpu cores)
ProcessMonitor require root or CAP_NET_ADMIN capabilities

Aim to Subscribe() to process event Exec, Exit
With or without metadata process Any, Name, MAPfile
    
ProcessMonitor will subscribe to the netlink process events like Exec, Exit
and call the subscribed callbacks
Initialize() will scan the current process and will call the subscribed callbacks

callbacks will be executed in parallel via a pool of goroutines (runtime.NumCPU())
callbackRunner is callbacks queue. The queue size is set by processMonitorMaxEvents

Multiple team can use the same ProcessMonitor,
the callers need to guarantee calling each Initialize() Stop() one single time this maintain an internal reference counter

Netlink process subscription, socket connection is allowed only by one PID

* protocols: refactor tests to allow pre-post setups (#14817)

* protocols: refactor tests to allow pre-post setups

* Added temporary nolint for skippers

* Fixed bugs

* Escape path in get-acl command (#14818)

* ci: Add manual benchmark step for trace-agent (#14466)

* pkg/trace/config: Lower max tracer payload to 25 MB to better align with backend limits (#14782)

* Revert #14367 and use nano timestamp instead (#14825)

* Revert "Replace timestamp by increasing id to avoid configVersion matching different config changed in the same second"

This reverts commit f8e097de2aa3322670fcc6a6c8cfc5c1ed9d6239.

* Revert #14367 and use nano timestamp instead

* Disable by default remote-tagger in clc-runner mode (#14821)

* fix gofmt -s for pkg/collector/collector_demux_test.go (#14808)

* Improve debug logging in cloud foundry container tagger (#14803)

* Add logging around container retries

* Add trace log

* Change to debug and add release note

* Delete Improve-container-tagger-logging-e48b0fffbe8563d0.yaml

* Add timestamp id to events

* Make id more specific, use container String method

* Just print class

* Update pkg/cloudfoundry/containertagger/container_tagger.go

Co-authored-by: NouemanKHAL <[email protected]>

* Address PR review

* Create event ID

Co-authored-by: NouemanKHAL <[email protected]>

* [Serverless] Merge serverless/main to main. (#14826)

* [Serverless] change account (#14755)

* Aj/buffer cold start span data (#14664)

* wip dirty commit - trace being created but not flushed properly. No further traces appearing

WIP: more debugging. StopChan properly set up

feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec

feat: Update specs to write to channels

feat: Merge conflicts resolved for tests

feat: Use smaller methods to handle locking

fix: pass coldstartSpanId to sls-init main

feat: Remove default

feat: Use Millisecond as Second is far longer than necessary

feat: No need to export ColdStartSpanId

fix: update units

feat: Directionality for lambdaSpanChan as well as for initDurationChan

fix: No need for the nil check, I need to stop javascripting my go

feat: ints

* feat: rebase missing changes from merge commits

* feat: update ints after moving accounts

* Empty commit to trigger ci

* [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783)

* Retry serverless integration test failures automatically. (#14801)

* [Serverless] Allow some keys to be option in serverless integration tests. (#14827)

* Ability to remove items from the json.

* Remove items from snapshot.

Co-authored-by: Maxime David <[email protected]>
Co-authored-by: AJ Stuyvenberg <[email protected]>

* Allow Regression Detector pipeline to fail (#14828)

At present there's a race condition in the CI pipeline with regard to Regression
Detector: we rely on an artifact to be created by main pipeline merge but have
no way of making a hard dependency on that artifact. If that artifact is not
present then the Regression Detection job will be submitted and then immediately
fail. Absent a solution we allow the Regression Detector job to fail,
unfortunately making any actual regressions caught but also not contributing to
alert blindness in the meanwhile.

Signed-off-by: Brian L. Troutwine <[email protected]>

Signed-off-by: Brian L. Troutwine <[email protected]>

* [process-agent] Remove unused properties from AgentConfig (#14842)

* [process-agent] Remove unused properties from AgentConfig

* Fix tests

* 7.41.1 changelog (#14822) (#14824)

* Add do-not-merge github action (#14843)

* [CWS] remove useless resolver function (#14792)

* [kitchen] Work around bundler and ruby version issue in verifier (#14851)

Modifies the script used to run kitchen tests to run the verify phase twice, and adds a pre_verify lifecycle hook to install the dependency needed for system-probe kitchen tests.

Works around an issue (version mismatch between ruby and bundler) that started happening after the release of version 2.4.0 of bundler.
As long as this workaround is needed, we can't have Gemfiles in test suites, and instead need to manually install gems whenever needed.

* Add the 'test' build tag to the 'unit-tests' flavor

This tag is needed to run unit-test but was not printed by
'inv print-default-build-tags -b unit-tests'. When running tests from
a IDE or other we need the correct list of tags to be returned.

* flare: Added /opt/datadog-agent directory permissions to permissions.log (#14848)

* flare: Added /opt/datadog-agent directory permissions to permissions.log

system-probe internal files (sysprobe.socket, runtime compilation source files, prebuilt version, etc.) are located in /opt/datadog-agent
when getting a flare, we cannot know those files permissions (and if they exist).

* Take directories from configuration

* Fixed cr comments

* Fixed cr comments

* Fixed cr comments

* Update comp/core/flare/helpers/helpers.go

Co-authored-by: maxime mouial <[email protected]>

* [USM] protocol classification: add RabbitMQ classification  (#14734)

* wip

* Fixed

* added support for amqp without tests

* added UT's for consumer and sender for rabbitmq

* removed redundant client and server

* added support to classify also protocol header of amqp

* removed redundant function

* test

* fixed most of the cr notes

* fixed all the cr notes

* add ut

* fixed licence issue

* fixed ci issue

* fixed event common protocol type number

* Revert update of github.com/DataDog/datadog-operator

* fixed all cr notes

* merged main

* fixed a cr note

* reverted datadog-operation

* update licence

* fixed ci issue

* merged main and updated ut

* fixed cr note

* added some UT's and support the latest classification uts update

* refactor the uts

* Added debug log

* Added debug log 2

* Added debug log 3

* Added pattern scanner

Co-authored-by: Guy Arbitman <[email protected]>

* Handle environment variables without an equal sign (#14806)

* usm: protocols: Refactored server creation (#14869)

* Removed example docker tests (#14852)

* [CWS][SEC-5573] add custom CWS product (#14748)

* [CWS] add custom CWS product

* Add a debouncer to limit reloads

* Update URL regexp to detect for Datadog's URL

In the past we use to edit the regexp everytime Datadog would open a new
location. This commit allow the agent to detect for all present and
future locations as long as they follow the format of 2 letters + 1
digit. Example: 'us3.datadoghq.com'.

* system-probe: tasks: Save all dockers from docker-compose files in the protocols dir (#14873)

* system-probe: tasks: Save all dockers from docker-compose files in the protocols dir

* Fixed lint

* [process-agent] Move data scrubber and disallow list from pkg/process/config (#14863)

- Move these two fields in preparation for removal of pkg/process/config package.
- Use inclusive naming where possible - will rename the config param in the future.
- Update imports in pkg/security using the DataScrubber type.

* add `integration_profiling` config option (#14847)

Add a new option to enable profiling of python integrations. It's used only within the python integrations. See https://github.com/DataDog/integrations-core/pull/13576.

* Fix flaky TestKSMCheckInitTags unit-test (#14832)

* Fix flaky TestKSMCheckInitTags unit-test
* improve config.GetConfiguredTags testability
* update GetConfiguredTags function description

* Deleting Security Agent for Windows resources (#14833)

* deleting windows resources

* removing windows operations for security-agent.build task

* removing secagent for windows resources in omnibus, addressing python lint

* [process-agent] Remove orchestrator config from AgentConfig (#14867)

* [process-agent] Move data scrubber and disallow list from pkg/process/config

- Move these two fields in preparation for removal of pkg/process/config package.
- Use inclusive naming where possible - will rename the config param in the future.
- Update imports in pkg/security using the DataScrubber type.

* [process-agent] Remove orchestrator config from AgentConfig

- Further decouple config management in prep for removal of pkg/process/config.
- Remove orchestrator config, push it into the pod check and collector structs.

* Address review feedback

* [process-agent] Display system probe process module status in process agent info commands (#14880)

Updates the process agent status information displayed by the datadog-agent status, process-agent status and process-agent --info commands to display whether or not the system probe's process module is enabled

* tooling: Add invoke vscode devcontainer cmd (#14031)

* Add invoke vscode envcontainer cmd

* Update agent_dev_env.md

* fix typo in documentation

* adding err to exit SecAgent. fixes hanging if there's no API key (#14856)

* Replace hardcoded /proc path with config field (#14773)

Use the config field instead of hardcoding /proc. The config field should
be automatically detected to either /proc or /host/proc inside containers.

* usm: protocols: Added redis classification (#14886)

* usm: protocols: Added redis classification

* Fixed CR comment

* Fixed CR comment

* Fixed warning on centos

* [CWS] extract custom events package (#14230)

* [CWS] extract custom events package

* [CWS] extract selftest custom event

* [CWS] allow to specify a rate per rule through config

* post rebase

* add lint exception

* use the good sender

* [process-agent] Remove check intervals from pkg/process/config (#14878)

* [process-agent] Remove check intervals from pkg/process/config

- Remove check interval management from pkg/process/config package
- Never store intervals, just use config settings
- Generalize check for process and process RT check intervals

* Fix MacOS tests

* Address review feedback from @just-chillin

* flare: Ignore system probe dirs if they are empty (#14893)

* [CWS] increase exit event test timings (#14813)

* [CWS] fix rule id not sent for custom event (#14897)

* Adding return statment in GUI when an error is encountered

* [CI] Artifactory for Python (#14473)

* Introduce new E2E tests based on test-infra-definitions (#13643)

* manual check tracing uses new exhaustive tracing config option (#14892)

* manual check tracing uses new exhaustive tracing config option

Following up to https://github.com/DataDog/integrations-core/pull/13618, we now need to set both `integration_tracing` and `integration_tracing_exhaustive` config options to enable exhaustive tracing of integrations.

When manually running a check the increased overhead of exhaustive tracing (tracing all check methods) is acceptable. When continuous integration tracing is desired only the `integration_tracing` option should be set in order to keep the overhead minimal.

* update core agent check command

* fix sort order

* pkg/trace/traceutil: Add fast-path for NormalizeTags to reduce cpu usage (#14881)

* usm: remove the scenario of nil subprograms (#14909)

* usm: remove the scenario of nil subprograms

* Fixed CR comments

* Import order

* Fixed CR comments

* Bump datadog-api-client from 2.6.0 to 2.7.0 in /test/e2e/cws-tests (#14914)

Bumps [datadog-api-client](https://github.com/DataDog/datadog-api-client-python) from 2.6.0 to 2.7.0.
- [Release notes](https://github.com/DataDog/datadog-api-client-python/releases)
- [Changelog](https://github.com/DataDog/datadog-api-client-python/blob/master/CHANGELOG.md)
- [Commits](https://github.com/DataDog/datadog-api-client-python/compare/2.6.0...2.7.0)

---
updated-dependencies:
- dependency-name: datadog-api-client
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* usm: http2: improved functions (#14917)

* update profiling endpoint when the fips is enabled to avoid 404 (#14807)

* fix(fips): update profiling endpoint when the fips is enabled to avoid 404

Signed-off-by: Nicolas Guerguadj <[email protected]>

* pkg/clusteragent/admission: introduce deployment patcher (#14500)

* [CWS] avoid using readonly map for eBPF test prog (#14780)

* [e2e] add codeowners for new e2e tests (#14865)

* DogstatsD component improvements (#14839)

* Inject defaultLogFile
* Move main.go inside command/command.go
* Move start command to subcommands/start
* Dogstatsd uses pkg/cli/subcommands/version/command.go for version command
* Use similar code for cfgpath compare to datadog-agent

* eval.Opts holds MacroStore and VariableStore (#14874)

* [fake-datadog] add docker compose (#14902)

* [fake-datadog] add docker compose

* [fake-datadog] add docker instructions

* usm: mongo: Added mongo classification (#14809)

* usm: mongo: Added mongo classification

* Fixed CR comment

* Fixed CR comment

* Fixed CR comment

* Fixed CR comment

* Update agent_dev_env.md (#14887)

Co-authored-by: Kaylyn <[email protected]>

* [CWS][SEC-6508] use tail call limit to increase the number of args/envs (#14796)

* use tail call limit to increase the number of args/envs

* do not validate process overflow events to avoid srubbing argv and timeout

* [notifications] Catch all image pull errors as infra failures (#14926)

Updates the regex to match infra failure logs when pulling images to include more patterns. The previous pattern didn't catch the following line:

WARNING: Failed to pull image with policy "always": context deadline exceeded (manager.go:203:7197s)

* Do not install the integrations downloader for python 2 (#14920)

* usm: classification: Shrink classification buffer to 24 bytes (#14925)

* config: usm: Added USM to system-probe.yaml.example file (#14908)

* setupConfig consumes 1 param instead of many, adding to SecAgent constructor (#14884)

* changing func signature of setupConfig

* setting security agent config file instead of merging because Viper only supports 1 config file per viper instance

* Revert "setting security agent config file instead of merging because Viper only supports 1 config file per viper instance"

This reverts commit 8e6736d5025db79e5c1f552a983f9050f86a2c5c.

* MergeConfigurationFiles is just for SecAgent

* undo moving sys probe and secagent merge

fix return of merge

* rename configMissingOK field to baseConfigMissingOK

* setting secagent config path and config load secrets params

* adding secagent bundle param test

* reverting renaming configMissingOK to baseConfigMissingOK

* params.configMissingOK should be false

* fixing test post bundle breaking into config and log components

* config params test copywrite info

* [e2e/ndm] add snmp test environment (#14768)

* [e2e/ndm] add snmpsim data folder

* [new e2e test] update test-infra-definition version

* [e2e] fix aws signature

* [e2e/ndm] add snmp test environment

* [e2e/ndm] simpliofy err return code

* [e2e/ndm] remove unused close function

* [e2e/ndm] actually parse flags

* [e2e] ndm: fix destroy

* [e2e/ndm] add copyright header

* [CWS] extract probe from event and activity dump manager (#14515)

* [CWS] extract TC resolver into own resolver

* no probe in event

* include tcresolver in usual resolvers

* fix test

* apply review suggestion

* apply review suggestion v2

* [corechecks/snmp] Add IP Addresses to NDM Metadata interfaces (IPv4) (#14823)

* {Dockerfiles/agent,trace-agent/config}: disable apm `max_memory` and `max_cpu_percent` by default (#14850)

* [pkg/otlp] Add a simple example on metric export (#14784)

* Bump github.com/vektra/mockery/v2 from 2.15.0 to 2.16.0 in /internal/tools (#14913)

* Bump github.com/vektra/mockery/v2 in /internal/tools

Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.15.0 to 2.16.0.
- [Release notes](https://github.com/vektra/mockery/releases)
- [Changelog](https://github.com/vektra/mockery/blob/master/.goreleaser.yml)
- [Commits](https://github.com/vektra/mockery/compare/v2.15.0...v2.16.0)

---
updated-dependencies:
- dependency-name: github.com/vektra/mockery/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* gen mocks

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Paul Cacheux <[email protected]>

* usm: Reducing chances for mistakes in the protocol type values (#14816)

* usm: classification: split the functions and helpers to protocol-dedicated-files (#14924)

* usm: classification: split the functions and helpers to protocol-dedicated-files

* usm: classification: rename protocol-classification-helpers to protocol-classification

* [process-agent] Remove host info from AgentConfig (#14885)

* [process-agent] Remove host info from AgentConfig

* Fix info command per review feedback

* [process-agent] Remove remaining properties from AgentConfig (#14889)

* Ignore RemoteSamplingClient when marshaling agent config (#14927)

* Ignore RemoteSamplingClient when marshaling agent config

* Add release note

* pkg/obfuscate: fix panic due to missing logger (#14859)

Obfuscator.log was uninitialized which was causing agent panic

* Update github.com/lxn/walk version (#14905)

* gitignore runtime compiled hash files (#14764)

* Try ignoring runtime compiled hash files

* Build object files before linting

* [process-agent] Remove pkg/process/config package (#14904)

* [process-agent] Remove pkg/process/config package

* Address review feedback from @kkhor-datadog

- Revert back to using util.PathExists for simplicity
- Clean up code with early exits

* Review feedback from @sgnn7

* Bump github.com/avast/retry-go/v4 from 4.3.1 to 4.3.2 (#14935)

Bumps [github.com/avast/retry-go/v4](https://github.com/avast/retry-go) from 4.3.1 to 4.3.2.
- [Release notes](https://github.com/avast/retry-go/releases)
- [Commits](https://github.com/avast/retry-go/compare/4.3.1...4.3.2)

---
updated-dependencies:
- dependency-name: github.com/avast/retry-go/v4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump github.com/prometheus/procfs from 0.8.0 to 0.9.0 (#14934)

Bumps [github.com/prometheus/procfs](https://github.com/prometheus/procfs) from 0.8.0 to 0.9.0.
- [Release notes](https://github.com/prometheus/procfs/releases)
- [Commits](https://github.com/prometheus/procfs/compare/v0.8.0...v0.9.0)

---
updated-dependencies:
- dependency-name: github.com/prometheus/procfs
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [CWS Agent] Bugfixing SecAgent Params constructor (#14939)

* [USM] use per-cpu array map instead of in-stack buffer for classification (#14756)

* protocol classification: add per-cpu array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* Outsmart the verifier

* change map type on unsupported systems

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix runtime-compilation on older kernels + doc

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* docs & refactor

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add missing editor flag to change map type

Signed-off-by: Guillaume Pagnoux <[email protected]>

* usm: Reverted #14925

Signed-off-by: Guillaume Pagnoux <[email protected]>
Co-authored-by: Guy Arbitman <[email protected]>

* [gitlab] Use DEB buildimage based on Ubuntu 14.04 instead of Debian 8 (#14929)

* Adding config option to disable delta profiles when profiling the Agent

* Fixed nil return instead of an error in DogStatsD file replay

* Removed sending API key as params in forwarder

* [CWS] remove now useless runtime files sync check (#14945)

* flags package to organize security agent subcommand flags (#14906)

* [CI] Improve visibility for `docker run` commands in the CI (#14899)

Add line breaks for docker run commands

* [CWS Agent] SecAgent command pkg to replace common pkg, moving status and version subcommands (#14907)

* adding command package, to replace common

* status and version subcommands

* Bump github.com/itchyny/gojq from 0.12.10 to 0.12.11 (#14938)

Bumps [github.com/itchyny/gojq](https://github.com/itchyny/gojq) from 0.12.10 to 0.12.11.
- [Release notes](https://github.com/itchyny/gojq/releases)
- [Changelog](https://github.com/itchyny/gojq/blob/main/CHANGELOG.md)
- [Commits](https://github.com/itchyny/gojq/compare/v0.12.10...v0.12.11)

---
updated-dependencies:
- dependency-name: github.com/itchyny/gojq
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Replacing TODOs in exposed comments with more meaningful comments (#14901)

* Revert "[agent] Support for running secrets backends with sha256 verification (#14529)" (#14940)

This reverts commit deb7fce8f668a4bca6697e76d0b77cb67d7f46f7.

* missing import in file with unsupported build flag (#14952)

* Bump golang.org/x/text from 0.5.0 to 0.6.0 (#14948)

Bumps [golang.org/x/text](https://github.com/golang/text) from 0.5.0 to 0.6.0.
- [Release notes](https://github.com/golang/text/releases)
- [Commits](https://github.com/golang/text/compare/v0.5.0...v0.6.0)

---
updated-dependencies:
- dependency-name: golang.org/x/text
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Download btfs for kitchen tests (#14587)

* Save btfs to dd-agent-omnibus s3 bucket

* Update folders to match new btfhub-archive names

* Download BTFs during kitchen-prepare task

* Add more details to error message

* Fix permissions

* Use btfs from dev box

* Update gitignore

* Check for bpftool compatability outside of generate_minimized_btfs

* Change x86-64 -> x86_64

* Fix generating minimized btfs

* Fix bpftool compatability check helper

* Fix python linting

* Fix python lint

* Only run BTF preparation outside of CI

* Explicitly indicate CI kitchen preparation

Co-authored-by: Hasan Mahmood <[email protected]>
Co-authored-by: Bryce Kahle <[email protected]>

* [secrets] Fix getDDAgentUserSID to account for NT AUTHORITY\SYSTEM (#14941)

* [secrets] Fix getDDAgentUserSID to account for NT AUTHORITY\SYSTEM

* Address review feedback from @clarkb7

* usm: classification: removed redundant nolint (#14958)

* Bump wheel versions (#14918)

* Fixing system_probe.py on linux machine (#14959)

* document trace API v04, including response (#14868)

* [CWS] improve mount fallback (#14779)

* [CWS] improve mount fallback

* post review

* [CWS] bump security agent policies to v0.42.1 (#14964)

* [orchestration] Add Vertical Pod Autoscalers (#14669)

* [orchestration] Add Vertical Pod Autoscalers

We want to start collecting Vertical Pod Autoscalers from Kubernetes.

Co-authored-by: Kangyi LI <[email protected]>
Co-authored-by: Bryce Eadie <[email protected]>

* [usm] Extract batching functionality into package (#14712)

* [Process Agent] Split Collector into Runner and Submitter (#14883)

* WIP

* Collector and submitter split, need to fix tests

* Rename receivers to `s`

* Delete components directory

* Add RT reporting to submitter

* Add `dropCheckPayloads` back into the submitter

* Move submitter tests to it's own file

* Delete component.go

* clean up comments and unused code

* Fix a couple tests

* Fix orchestrator tests

* Fix tests

* Fix copyright header

* Fix linter issues

* Use mocks in tests

* Fix import

* Fix data race in tests

* Fix data race in tests

* Update cmd/process-agent/collector.go

Co-authored-by: Ivan Ilichev <[email protected]>

* Refactor `Submit` to not return an error

* Remove `init()` in favor of using mock config

* Remove `init()` in favor of using mock config

* Update mockery to use version 2.16 since they were updated in #14913

* Fix linter errors (again)

* Fix `TestPodCheck/enabled` failing due to the clustername package caching a bad cluster name

* Remove `forwarderRetryQueueMaxBytes`

Co-authored-by: Ivan Ilichev <[email protected]>

* Bump Collector dependencies to v1.0.0-RC2/v0.68.0 (#14864)

* Bump Collector dependencies to v1.0.0-RC2/v0.68.0

* Revert InstrumentationLibraryMetadataAsTags changes

* Update collector test configuration error message

* Address PR comments

* Increase speed of generate_minimized_btfs jobs (#14585)

Co-authored-by: Bryce Kahle <[email protected]>

* Add dynamic way of determining eBPF helper availability on runtime compilation (#14685)

* Add KernelHeaderOptions type to prevent ebpf package dependency

* Add function to get available helpers on host

* Use dynamic method of finding available helpers

* Use static list for kernels with __BPF_FUNC_MAPPER macro

* Limit TestGetAvailableHelpers to kernels where it will work

* Fix udp bind for random ports (#14956)

* NDM: Add snmp.interface_status metric (#14797)

* NDM: Add snmp.interface_status metric

* update test

* Add reno

* Address review

* Rename metric

* Address review

* Add InterfaceStatus enum

* Remove iota and use explicit values

* NDM: Add snmp.device.[un]reachable metrics (#14649)

* NDM: Add snmp.device_up metric

* Address review

* update reno

* Address review

* fix import

* Improve log message (#14968)

Log the underlying error when GetUnitTypeProperties fails

* Use rv "0" when polling endpoint list (#13906)

Since this code path polls the endpoint list endpoint once every 60s by
default to update the internal stat in the agent, we don't really need
the consistency guarantees we implicitly get from the unset resource
version.

When the resource version is unset, the api-server needs to fetch all
endpoints from etcd, causing a costly round-trip that can potentially
result in a lot of data traffic. When setting resource version "0", all
requests are handled by the watch cache, meaning they will be much more
efficient and less costly.

For the most part, the actual returned data will be the same, but in
some cases where the API-servers are having a bad time, the data might
be a bit stall; but that is not very common. In that case, getting data
from the watch cache instead of not being able to list at all is
preferable.

The semantics are described in detail here;
https://kubernetes.io/docs/reference/using-api/api-concepts/#semantics-for-get-and-list

Signed-off-by: Odin Ugedal <[email protected]>

Signed-off-by: Odin Ugedal <[email protected]>

* Remove `CCA_IN_AD` flag and related unused code (#14955)

* remove CCA_IN_AD config flag

* PR feedback

* remove unused providers

* pr feedback

* epforwarder: add additional debug logging (#14161)

* Fix small typo in install XML. (#14687)

Causes Wix to throw error (although apparently non-fatal)

* CWS: sync BTFhub constants (#14986)

Co-authored-by: paulcacheux <[email protected]>

* Revert "pkg/obfuscate: improve formatting and string parsing in the SQL obfuscator (#11967)" (#14976)

This reverts commit 8ab1d187421087d8ae746ec0dcca00f25918a9f0.

* [CWS] remove unsafe pointer from eval.Context (#14890)

* [CWS] remove unsafe pointer for eval.Context

* Add user context

* move perf helper to a perf file

* remove resolvers from event

* generate handlers

* add extra field handlers

* remove accessors from probe

* remove model mock

* fix unit and functional tests

* refactor model/field_handlers

* add helper for common object creation

* fix stress tests

* [workloadmeta/collectors/containerd] Collect image metadata (#14592)

* [util/containerd] Rename Image to ImageOfContainer

To be able to introduce a new Image func that gets an image just by image ID,
regardless of whether it's being used in container.

* [util/containerd] Add Image func

* [workloadmeta] Add GetImage func

* [config] Add option to enable image collection in workloadmeta

* [workloadmeta/collectors/containerd] Collect image metadata

* [CSPM] remove the hostSelector field not used anymore (#14770)

* [CSPM] remove the hostSelector field not used anymore

In a more global effort to remove the internal compliance DSL after
our move to rego, this commit removes one field where it is still
being used.

The hostSelector field has been put in place in order to make sure
we only run specific rules on hosts that match, in particular for
k8s nodes. However, the rule were not used anymore since the hosts
"master" labels are not properly set. We rely other side effects
(like process and file existence) to avoid running some rules on
bad nodes.

* [CSPM] remove k8s nodeLabels retrieval from compliance rules execution

Now that hostSelector fields have been removed, fetching the k8s node labels
is not required anymore and completely useless.

This PR just remove the nodeLabels fetching and all the subsequent
dependencies.

* [CWS] add tests for live process monitoring (#14944)

* [system-probe][NET-2899] fix race condition in ephemeral port checker (#14802)

* [NET-2899] use mutex to lock fields causing race condition in ephemeral port checker

* [NET-2899] gofmt on changed files

* [NET-2899] remove mutex, move racey code to sync.once func

* [CWS] restore SECL documentation generation (#14993)

* [CWS] fix event missing field resolver (#14992)

* fix missing fields resolver in some events (around policy eval CLI)

* do not emit event in policy eval output

* Add __TARGET_ARCH_ to runtime compilation flags (#14983)

* Add __TARGET_ARCH_ to runtime compilation flags

* Use append instead

* Re-delete http runtime asset hash file (#14982)

* Add CO-RE version of TCP Queue Length check (#14763)

* Add CO-RE version of TCP Queue Length check

* Fix version

* Fix generate BTF job

* Invert err check on CO-RE load

* Add helper for missing BTF check

* Bump golang.org/x/tools from 0.4.0 to 0.5.0 in /pkg/security/secl (#14996)

* Bump golang.org/x/tools from 0.4.0 to 0.5.0 in /pkg/security/secl

Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/golang/tools/releases)
- [Commits](https://github.com/golang/tools/compare/v0.4.0...v0.5.0)

---
updated-dependencies:
- dependency-name: golang.org/x/tools
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>

* Auto-generate go.sum and LICENSE-3rdparty.csv changes

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com>

* Fix gateway lookup tests (#14951)

* [usm] Reduce HTTP test memory utilization (#15006)

* [CWS] mount fallback to pid 1 by default (#15007)

* [CWS][SEC-4020] parse args and envs from the new program stack pages (#13008)

* CWS: parse args and envs from the process stack

* remove useless function parameter

* get env vars offset from new program stack as well

* return from tailcall loop sooner

* use different kprobe to fix kernel function call order on CentOS 7

* [process-agent] Refactor conn rates with util/subscriptions (#14988)

* [process-agent] Refactor conn rates with util/subscriptions

* Update with a unit test for pub/sub

* Address feedback from @hmahmood

* [CWS] change programs to avoid mixing events between tests (#15012)

* [CWS] rework event json marshalling (#15010)

* externalize serialization

* a bit of cleanup

* refactor schema validators

* fix printfs

* re-enable policy eval event json

* fix trace dispatching

* fix deadcode

* fix validateProcessContextSECL error output

* [USM] protocol classification: add PostgreSQL classification (#14625)

* protocol classification: add per-cpu array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* Outsmart the verifier

* protocol classification: add per-cpu array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* Outsmart the verifier

* protocol classification: add PostgreSQL classification

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix licenses & set postgres port in test

Signed-off-by: Guillaume Pagnoux <[email protected]>

* test: fix port

Signed-off-by: Guillaume Pagnoux <[email protected]>

* test: use JoinHostPort instead of Sprintf

Signed-off-by: Guillaume Pagnoux <[email protected]>

* [USM] protocol classification: add Postgres detection

Signed-off-by: Guillaume Pagnoux <[email protected]>

* revert check_command fix

Signed-off-by: Guillaume Pagnoux <[email protected]>

* postgres: refactor check_command

Signed-off-by: Guillaume Pagnoux <[email protected]>

* change map type on unsupported systems

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix runtime-compilation on older kernels + doc

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix array map

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix merge

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: add documentation

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: add long query test

Signed-off-by: Guillaume Pagnoux <[email protected]>

* docs & refactor

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix licenses

Signed-off-by: Guillaume Pagnoux <[email protected]>

* refactor

Signed-off-by: Guillaume Pagnoux <[email protected]>

* postgres: try to classify from client start messages

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add missing Cgo defs

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add postgres docker image pulling

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add missing editor flag to change map type

Signed-off-by: Guillaume Pagnoux <[email protected]>

* remove unused import

Signed-off-by: Guillaume Pagnoux <[email protected]>

* case-insensitive check + docs

Signed-off-by: Guillaume Pagnoux <[email protected]>

* check on tmp buf

Signed-off-by: Guillaume Pagnoux <[email protected]>

* docs

Signed-off-by: Guillaume Pagnoux <[email protected]>

* try fixing verifier issue

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fix verifier issue

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: fix docker-compose path

Signed-off-by: Guillaume Pagnoux <[email protected]>

* fixup! Merge remote-tracking branch 'origin/main' into guillaume.pagnoux/USMO-9-protocol-classification-posgres

* re-delete re-added files

Signed-off-by: Guillaume Pagnoux <[email protected]>

* go mod tidy

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add docs

Signed-off-by: Guillaume Pagnoux <[email protected]>

* remove redundant check

Signed-off-by: Guillaume Pagnoux <[email protected]>

* refactor server creation in tests

Signed-off-by: Guillaume Pagnoux <[email protected]>

* rename guards

Signed-off-by: Guillaume Pagnoux <[email protected]>

* specify postgres version in docker-compose

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: skip when using NAT

Signed-off-by: Guillaume Pagnoux <[email protected]>

* split sql files

Signed-off-by: Guillaume Pagnoux <[email protected]>

* tests: add tests for all supported sql queries

Signed-off-by: Guillaume Pagnoux <[email protected]>

* move postgres struct to postgres-defs.h

Signed-off-by: Guillaume Pagnoux <[email protected]>

* remove redundant check

Signed-off-by: Guillaume Pagnoux <[email protected]>

* classify on command completion messages as well

Signed-off-by: Guillaume Pagnoux <[email protected]>

* add long response test

Signed-off-by: Guillaume Pagnoux <[email protected]>

* re-enable query detection

Signed-off-by: Guillaume Pagnoux <[email protected]>

Signed-off-by: Guillaume Pagnoux <[email protected]>
Co-authored-by: Guy Arbitman <[email protected]>

* [process-agent] Scaffold components for process agent (#14972)

* [process-agent] Scaffold components for process agent

* Addresss review comments from @ogaca-dd

* Addresss review comments from @ogaca-dd

* Change to use context.Context and reintroduce empty Component interface to suppress linting

* CWS: sync BTFhub constants (#15023)

Co-authored-by: paulcacheux <[email protected]>

* [CWS] rework/cleanup `FieldHandlers` (#15015)

* remove probe from FieldHandlers

* cleanup `NewProcessResolver` resolvers dependency

* resolvers only need a link to the manager

* Update CODEOWNERS (#15024)

* Use sc query to gain information about the service before attempting to stop it. (#15028)

* [security-agent] remove redundant String() in compliance agent log (#15026)

* [invoke] Print summary of test failures at the end of inv test (#14682)

Updates the inv test command to print a summary of failed tests at the end of a run, across all modules and flavors that were tested, to more easily identify the list of failures, without having to visually parse the full job logs.

* [system-probe][NET-2891] Fix tcp retransmit count (#14740)

* [NET-2891] initial pass at changes to prebuilt code

* [NET-2891] use retrans_out for runtime compiled tcp_retransmit counter

* [NET-2891] runtime compiled version of tcp_retrans updates

* [NET-2891] remove debug comment

* [NET-2891] fix log

* [NET-2891] update bytecode

* [NET-2891] code review comments, regenerate license

* [NET-2891] newline

* [NET-2891] fix probe definitions

* [NET-2891] update comment

* [NET-2891] runtime compilation fixes

* [NET-2891] fix byte padding for args init

* [NET-2891] fix formatting

* testing debug logic

* more debug logic, added some config for the map

* [NET-2891] enable kretprobe and remove debug

* [NET-2891] disable bpf debug be default

* [NET-2891] update bytecode

* [NET-2891] make function as maybe unused

* [NET-2891] handle different paths of incremental vs absolute retransmit counters

* [NET-2891] use enum to track increment vs absolute retransmits

* [NET-2891] change enum values

* [NET-2891] move retrans code to runtime tracer

* pulled in new gitignore

* Revert "pulled in new gitignore"

This reverts commit b4b0df587aeb6b6f655ea90d7bc96ae250934170.

* remove runtime gen files, code review comments

* [NET-2891] use retransmit count none in runtime tracer

* [NET-2891] use retransmit_count_none in handle_tcp_stats

* [NET-2891] nit comments from code review

* [NET-2891] try to get runtime compilation working on 4.4 kernel

* usm: upgraded pgdriver version to indirectly upgrade mellium.in/sasl version due to a CVE ofound (#15030)

* usm: upgraded pgdriver version to indirectly upgrade mellium.in/sasl version due to a CVE ofound

* Fixed go.sum

* [CWS] fix signal test (#15025)

* [process-agent] Support dynamically enabling profiling for process agent from CLI (#14995)

Adds support for dynamically enabling profiling for the process agent from the CLI

* pkg/obfuscate: Fix parsing of sqlserver identifiers enclosed in square brackets (#15019)

* DBM-2010 Fix parsing of sqlserver literals enclosed in square brackets

* .gitlab: move APM benchmark job to manual only (#15036)

* fix datatype (#13791)

related to #13770

* [AD/prometheus] Ignore headless services (#15031)

* Fix stop service (#15035)

* Fix check conf directory

Durring the migration to component the hardcoded directory 'etc/confd'
for check configuration was removed.

* Fix shipping of 'version-history.json' and 'registry.json' in flares

When migrating to component the logic to include /opt/datadog-agent/run/
was handled as a file instead of a folder. This broke collecting
'version-history.json' and 'registry.json from it.

* Fix datadog.yaml file name in flare

* Force file permission to 644 within a flare

* auto instru: add rc provider (#15008)

* pkg/obfuscate: use github.com/outcaste-io/ristretto instead of github.com/dgraph-io/ristretto (#15005)

Migrate the usage of github.com/dgraph-io/ristretto to github.com/outcaste-io/ristretto

* [workloadmeta/kubelet] Parse image ID if name is a SHA256

We now try to parse the resolved image ID if the image in the pod's
container status is a SHA256. This seems to happen when pinning the
SHA256 in the container spec. This fixes an issue where `image:` filters
in DD_CONTAINER_INCLUDE/DD_CONTAINER_EXCLUDE would not be respected.

* pkg/trace/api: remove unused internal OTLP HTTP server (#14965)

* [pkg/trace/api] Remove unused OTLP HTTP server

* [pkg/trace] Remove protocol argument

* Remove unnecessary fmt.Sprintf

* Fix tests

* [CWS] cleanup last uses of `jsonschema_description` (#15050)

* [Serverless] Merge `serverless/main` to `main` (#14980)

* [Serverless] change account (#14755)

* Aj/buffer cold start span data (#14664)

* wip dirty commit - trace being created but not flushed properly. No further traces appearing

WIP: more debugging. StopChan properly set up

feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec

feat: Update specs to write to channels

feat: Merge conflicts resolved for tests

feat: Use smaller methods to handle locking

fix: pass coldstartSpanId to sls-init main

feat: Remove default

feat: Use Millisecond as Second is far longer than necessary

feat: No need to export ColdStartSpanId

fix: update units

feat: Directionality for lambdaSpanChan as well as for initDurationChan

fix: No need for the nil check, I need to stop javascripting my go

feat: ints

* feat: rebase missing changes from merge commits

* feat: update ints after moving accounts

* Empty commit to trigger ci

* [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783)

* Retry serverless integration test failures automatically. (#14801)

* [Serverless] Allow some keys to be option in serverless integration tests. (#14827)

* Ability to remove items from the json.

* Remove items from snapshot.

* Do not expect spans when there is no spans object. (#14396)

* [Serverless] Improve stability of two tests. (#14895)

* Increase timeout while decreasing test time.

* Increase timeout in test.

* [Serverless] Consolidate log normalization to single file for integration tests. (#15004)

* Consolidate log normalization to single file.

* Save raw logs to a temp dir.

* Fix linting issues.

Co-authored-by: Maxime David <[email protected]>
Co-authored-by: AJ Stuyvenberg <[email protected]>

* Fixes multiple problems with http processing/tagging on Windows. (#15022)

* Fixes multiple problems with http processing/tagging on Windows.
- There was an offset error in which the port was not properly computed
  on ipv6 connections
- There was a problem with computing whether an ipv6 address was loopback or
  not
- The fullpath indication (which is used to compute the key) was not
  properly being computed.  This led to the same tuple being used
  as a different key, so transactions were not properly combined.

* fix grammar error in release notes

* Add the plumbing in the agent forwarder to submit container images and SBOM (#14962)

* Improve documentation for BundleParams (#15011)

* pkg/clusteragent/admission: add unit tests (#15044)

* [CWS] bump syscall table + extract into separate task (#15061)

* 5.19 -> 6.1

* switch syscall table generator from go generate to task

* extract linux version

* [gitlab] Temporarily disable SUSE Agent 5 upgrade tests (#15055)

* [corechecks/snmp] Add LLDP remote device IP address (#14946)

* [CWS] add discarders eBPF unit test (#14471)

* [CWS] add discarder retention ut

* add another test

* add a unit test task

* add trace param

* make eBPF test part of the CI

* fake time to speed up tests

* bump baloum version

* add more tests

* [CWS Agent] Moving SecAgent subcommands to new dir part 2 (#14915)

* moving flare command to subcommands dir

* consolidating and moving secagent config package

* moving runtime to subcommands dir

* moved check subcommand, updated compliance subcommand which is the entry point to check funcs

* moving compliance cmd to subcommand dir

* exporting CliParams and RunCheck in Check subcommand for Compliance tests

* fixing cluster agent entry point into the check subcommand

* Add `container_image` core check (#14567)

* Reorganize the specs for some kitchen test (#15027)

* [check command] Add `--instance-filter` option (#15034)

* Migrate systray to an fx.App (#14985)

Deprecate single-dash args and add double-dash args

Move code from cmd/systray to comp/systray

Update UAC manifest to requireAdministrator

Fix log file and add `system_tray.log_file` configuration option.

* epforwarder: update dbm samples endpoint prefix (#15053)

dbm-metrics-intake and dbquery-intake resolve to the same IPs. This change cleans up code so that we're only referencing one endpoint name.

* [process-agent] Refactor Check interface (#15063)

* [process-agent] Refactor Check interface

- Refactors Check interface to consolidate CheckWithRealTime features
- This will simplify integration with components in the future PRs since it eliminates casts

* Address feedback from @just-chillin

* usm: postgres classification: Reduced 5 seconds per test, 1m30s in total (#15070)

Improved the regex for which we are using to detect if the server is up and running, by that
we can spare the 'wait 5 seconds' in GetPGHandle

* CWS: sync BTFhub constants (#15074)

Co-authored-by: paulcacheux <[email protected]>

* [DCA] Convert commands to Fx apps

* Extract magic strings into command.* constants

* [CWS] Add 4 tests, one for each kernel rate limiter algo (#15064)

* [CWS] remove useless callbacks (#15046)

* remove useless error check

* remove useless callback

* Add `SBOM` core check (#14989)

* Prevent check from running after it was unscheduled. (#15065)

* Prevent check from running after it was unscheduled.

If a check runs after it was unscheduled, in particular after it's
sender and samplers were removed, would create sender and samplers
again, leaking resources. This may happen if the check was cancelled
after it was put in the worker channel, but before worker called Run.

This change adjusts check_wrapper to make Cancel fully mutually
exclusive with Run, and adds a flag that would prevent Run from
executing the check after Cancel has completed.

* go fmt

* Update test helper

* Restrict flare file from being accessible by other users on Unix (#14862)

* pkg/clusteragent/admission/patch: poll rc on leadership switch (#15062)

* pkg/clusteragent/admission: add additional libconfig env vars (#15059)

* usm: classification: Split USM and NPM classifications (#15075)

USM does not need all classifiers, only those which we have dispatchers for (HTTP, and soon HTTP2)

* Python memory telemetry (#14757)

* Track memory used by the python arena allocator

pymalloc [1], Python built-in arena allocator is responsible for
handling small-sized allocations, while the rest goes through
the system malloc.

This patch tracks the amount of memory requested by pymalloc from the
operating system, allowing low cost, low granularity view into a
segment of python memory usage.

[1]: https://docs.python.org/3/c-api/memory.html#the-pymalloc-allocator

* inv -e rtloader.format

* Remove rtloader_mem.h from rtloader.h

This allows to call C malloc without warnings when we implement a
custom raw memory allocator for python.

* Add python raw allocator tracking.

Together with tracking pymalloc requests, this should give
comprehensive picture of memory allocated by the python interpreter.

* Make sure to call global malloc/free

In Pyraw allocator implementation, make sure to call global
malloc/calloc/realloc/free symbols, to avoid undesired interaction
with the rtloader-specific memory tracking (for example, call libc
free instead of RtLoader::free).

* Move all memory tracking to the same file

* Update Go naming to match C functions

pymalloc is now one of two tracked allocators, use pymem as umbrella.

* Add a note about new metrics to the docs

* Python memory telemetry supports py3 only

* Add releasenote

* Expand telemetry documentation.

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update docs/dev/agent_memory.md

Co-authored-by: Kari Halsted <[email protected]>

* Update releasenotes/notes/pymem-telemetry-0f62acb520d80a1f.yaml

Co-authored-by: Kari Halsted <[email protected]>

* Update rtloader/three/three_mem.cpp

Co-authored-by: Scott Opell <[email protected]>

* Improve metric description and remove outdated comment.

* Fix typo

* Add a comment about allocation size adjustments

Co-authored-by: Kari Halsted <[email protected]>
Co-authored-by: Scott Opell <[email protected]>

* Add telemetry for number of contexts per origin (#15016)

* Add telemetry for number of contexts per origin

Report number of contexts at the end of flush for each container
sending dogstatsd metrics.

This PR relies on origin detection to provide a set of identifying
tags for each origin, and reports number of distinct contexts for each
tag set. While this may not fully identify individual origins when
running with low tagger cardinality, it accurately reflects the way
agent would aggregate metrics from different origins together if their
tags end up the same.

* Only enable per-origin stats if telemetry is enabled.

* [process-agent] Fix kitchen tests for process agent on main (#15072)

* include `functests` in `DD_PIPELINE_ID` for system probe and security agent functests (#15043)

* include `functests` in DD_PIPELINE_ID for system probe and security agent functests

* simpler/shorter pipeline_id

* [install_script] Backport removal of RPM signing key 4172A230 (#15082)

* [corechecks/snmp] LLDP resolve local interface (#14991)

* [CWS] fix rule in error reported twice (#15084)

* Add java package in our circle-ci image (#14665)

* Use DMI on EC2 Nitro instances to get host aliases

The Agent now leverage DMI information on Unix to get the instance ID on AWS EC2 when the metadata endpoint fails or
is not accessible. The instance ID is exposed throught DMI only on AWS Nitro instances.

This will not change the hostname of the Agent upon upgrading but will add to the list of host aliases.

* [CWS] add inode to pid context to detect exec loss (#14661)

* [CWS] add revision to pid context

* use inode instead of revision

* Fix post rebase

* Fix serializer tests flakiness (#15093)

* [RCM-632] Add UUID in request (#15088)

* Add org uuid field

* Add org uuid in request

* Remove generate file

* Comment exported method

* fix the receiver name consistency (#15068)

* Add limits to allocated dictionaries, prevent browser cross-site requests (#15067)

* pkg/trace/api: Move semantic conventions to separate internal package (#14963)

* [pkg/trace/api] Move semantic conventions to separate internal package

* Rename to shared

* Move tagContainersTags back to API package

* Rename package to 'header'

* Fix Windows build

* Factorize queue code duplicated at two places (#15098)

* Factorize the aggregating queue used by the SBOM and container image checks

* Mock time functions to make tests more reliable

* [single-machine-performance] Push agent containers to SMP ECR (#14438)

* [single-machine-performance] Push agent container to SMP ECR

This commit is an attempt to introduce pushing containers from Agent CI for
single-machine-performance's Regression Detector in our isolated
infrastructure. Much like we have done for vectordotdev/vector we intend to run
the Regression Detector on Agent changes, gi…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants