Add option to enable profiling of Python integrations #13576

djova · 2022-12-23T20:38:08Z

What does this PR do?

Add new config option integration_profiling which enables profiling of python integrations.

Motivation

This has proven valuable several times already while troubleshooting performance issues so we're adding it as an option to have it available by default without requiring a custom build.

Additional Notes

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
PR title must be written as a CHANGELOG entry (see why)
Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
PR must have changelog/ and integration/ labels attached
If the PR doesn't need to be tested during QA, please add a qa/skip-qa label.

See DataDog/integrations-core#13576

Add a new option to enable profiling of python integrations. It's used only within the python integrations. See DataDog/integrations-core#13576.

codecov · 2022-12-23T20:44:22Z

Codecov Report

Merging #13576 (c86b0f7) into master (28ad542) will increase coverage by 0.23%.
The diff coverage is 33.33%.

Flag	Coverage Δ
activemq_xml	`82.31% <ø> (ø)`
aerospike	`87.17% <ø> (+0.32%)`	⬆️
airflow	`90.00% <ø> (ø)`
amazon_msk	`88.67% <ø> (ø)`
ambari	`85.75% <ø> (ø)`
apache	`95.08% <ø> (ø)`
arangodb	`98.21% <ø> (ø)`
argocd	`88.43% <ø> (ø)`
avi_vantage	`92.50% <ø> (ø)`
azure_iot_edge	`82.00% <ø> (ø)`
boundary	`100.00% <ø> (ø)`
btrfs	`82.91% <ø> (ø)`
cacti	`87.90% <ø> (ø)`
calico	`83.33% <ø> (ø)`
cassandra_nodetool	`93.16% <ø> (ø)`
ceph	`91.02% <ø> (ø)`
cert_manager	`77.41% <ø> (ø)`
cilium	`75.34% <ø> (+0.93%)`	⬆️
citrix_hypervisor	`87.50% <ø> (ø)`
clickhouse	`95.31% <ø> (ø)`
cloud_foundry_api	`95.98% <ø> (+0.12%)`	⬆️
cloudera	`99.08% <ø> (ø)`
cockroachdb	`90.96% <ø> (ø)`
consul	`91.64% <ø> (ø)`
coredns	`94.54% <ø> (ø)`
couch	`95.19% <ø> (+0.24%)`	⬆️
couchbase	`83.58% <ø> (ø)`
datadog_checks_base	`89.51% <33.33%> (+0.32%)`	⬆️
datadog_checks_dev	`82.24% <ø> (+0.07%)`	⬆️
datadog_checks_downloader	`78.64% <ø> (+0.99%)`	⬆️
datadog_cluster_agent	`90.00% <ø> (ø)`
ddev	`98.63% <ø> (ø)`
disk	`91.69% <ø> (ø)`
dns_check	`93.90% <ø> (ø)`
druid	`97.70% <ø> (ø)`
ecs_fargate	`80.05% <ø> (ø)`
eks_fargate	`94.05% <ø> (ø)`
elastic	`91.61% <ø> (ø)`
envoy	`94.00% <ø> (-0.23%)`	⬇️
etcd	`93.96% <ø> (ø)`
exchange_server	`96.85% <ø> (+11.81%)`	⬆️
external_dns	`89.09% <ø> (ø)`
fluentd	`94.77% <ø> (ø)`
foundationdb	`83.88% <ø> (ø)`
gearmand	`78.26% <ø> (+1.24%)`	⬆️
gitlab	`89.94% <ø> (ø)`
gitlab_runner	`91.94% <ø> (ø)`
glusterfs	`80.09% <ø> (+0.92%)`	⬆️
go_expvar	`92.73% <ø> (ø)`
gunicorn	`92.10% <ø> (-0.76%)`	⬇️
haproxy	`95.12% <ø> (+0.16%)`	⬆️
harbor	`80.04% <ø> (ø)`
hazelcast	`92.39% <ø> (ø)`
hdfs_datanode	`89.74% <ø> (ø)`
hdfs_namenode	`86.72% <ø> (ø)`
http_check	`95.38% <ø> (+2.08%)`	⬆️
ibm_ace	`91.79% <ø> (ø)`
ibm_db2	`95.10% <ø> (ø)`
ibm_i	`81.95% <ø> (ø)`
ibm_mq	`91.32% <ø> (ø)`
ibm_was	`96.08% <ø> (ø)`
iis	`94.61% <ø> (+38.78%)`	⬆️
impala	`97.97% <ø> (ø)`
istio	`77.65% <ø> (+0.55%)`	⬆️
kafka_consumer	`84.06% <ø> (ø)`
kong	`87.56% <ø> (ø)`
kube_apiserver_metrics	`97.69% <ø> (ø)`
kube_controller_manager	`96.00% <ø> (ø)`
kube_dns	`95.33% <ø> (ø)`
kube_metrics_server	`94.87% <ø> (ø)`
kube_proxy	`96.89% <ø> (ø)`
kube_scheduler	`96.53% <ø> (ø)`
kubelet	`90.96% <ø> (ø)`
linkerd	`85.14% <ø> (+1.14%)`	⬆️
linux_proc_extras	`96.22% <ø> (ø)`
mapr	`82.70% <ø> (ø)`
mapreduce	`81.77% <ø> (+0.46%)`	⬆️
mcache	`93.26% <ø> (ø)`
mesos_master	`89.75% <ø> (ø)`
mesos_slave	`93.63% <ø> (ø)`
mongo	`96.51% <ø> (ø)`
network	`93.92% <ø> (+0.95%)`	⬆️
nfsstat	`95.20% <ø> (ø)`
nginx_ingress_controller	`98.36% <ø> (ø)`
openldap	`96.33% <ø> (ø)`
openmetrics	`97.90% <ø> (ø)`
openstack	`51.45% <ø> (ø)`
openstack_controller	`90.94% <ø> (ø)`
oracle	`90.24% <ø> (ø)`
pdh_check	`95.65% <ø> (ø)`
pgbouncer	`91.33% <ø> (ø)`
php_fpm	`90.25% <ø> (+0.84%)`	⬆️
postfix	`88.04% <ø> (ø)`
powerdns_recursor	`96.65% <ø> (ø)`
process	`85.42% <ø> (+0.28%)`	⬆️
prometheus	`94.17% <ø> (ø)`
proxysql	`98.97% <ø> (ø)`
pulsar	`100.00% <ø> (ø)`
rabbitmq	`94.41% <ø> (ø)`
redisdb	`87.50% <ø> (ø)`
rethinkdb	`97.93% <ø> (ø)`
riak	`99.22% <ø> (ø)`
riakcs	`93.61% <ø> (ø)`
scylla	`100.00% <ø> (ø)`
silk	`93.33% <ø> (ø)`
singlestore	`90.81% <ø> (ø)`
snmp	`85.49% <ø> (+0.04%)`	⬆️
snowflake	`96.47% <ø> (ø)`
sonarqube	`98.21% <ø> (ø)`
spark	`93.57% <ø> (-0.29%)`	⬇️
squid	`100.00% <ø> (ø)`
ssh_check	`91.58% <ø> (ø)`
statsd	`87.36% <ø> (+1.05%)`	⬆️
supervisord	`92.30% <ø> (ø)`
system_core	`90.90% <ø> (ø)`
system_swap	`98.30% <ø> (ø)`
tcp_check	`91.58% <ø> (ø)`
teamcity	`88.35% <ø> (+2.87%)`	⬆️
teradata	`94.24% <ø> (ø)`
tls	`91.82% <ø> (+0.84%)`	⬆️
tokumx	`58.40% <ø> (?)`
traffic_server	`96.13% <ø> (ø)`
twemproxy	`79.45% <ø> (ø)`
twistlock	`79.62% <ø> (ø)`
varnish	`84.39% <ø> (+0.26%)`	⬆️
vault	`95.53% <ø> (+0.57%)`	⬆️
vertica	`98.50% <ø> (ø)`
voltdb	`96.84% <ø> (ø)`
vsphere	`89.91% <ø> (+0.08%)`	⬆️
win32_event_log	`86.40% <ø> (+0.27%)`	⬆️
windows_performance_counters	`98.36% <ø> (ø)`
windows_service	`98.00% <ø> (ø)`
wmi_check	`92.91% <ø> (ø)`
yarn	`89.14% <ø> (ø)`
zk	`86.63% <ø> (+1.55%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Add new config option `integration_profiling` which enables [profiling](https://docs.datadoghq.com/profiler/enabling/python/#usage) of python integrations. This has proven valuable several times already while troubleshooting performance issues so we're adding it as an option to have it available by default without requiring a custom build.

Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables bettter linking between APM & profiling data. Since all integrations are now reporting under a single service, we now include the integration name in the resource names to be able to differentiate between the integrations.

Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables better linking between APM & profiling data, specifically the "Code Hotspots" feature. Since all integrations are now reporting under a single service, the resource name is updated to refer only to the integration name in order to enable us to differentiate reporting from the different integrations. They are all visible in the resource list for that one common service.

Add a new option to enable profiling of python integrations. It's used only within the python integrations. See DataDog/integrations-core#13576.

* update integration tracing naming scheme Follow-up to #13576, update the integrations tracing naming scheme to ensure all integrations appear under a single service, matching the name of the service used for profiling. This enables better linking between APM & profiling data, specifically the "Code Hotspots" feature. Since all integrations are now reporting under a single service, the resource name is updated to refer only to the integration name in order to enable us to differentiate reporting from the different integrations. They are all visible in the resource list for that one common service. * dbm use job name

* [fargate] Make hostname resolution more reliable (#14746) * [config/environment] Check AWS_EXECUTION_ENV in Fargate detection * [util/fargate] Rely on features for ECS Fargate detection * [fargate/detection] Rely on features to detect EKS * [trace-agent/config] Call fargate.GetOrchestrator after loading config * add unit-test for trace-agent config on fargate * Add release note * [cmd/trace-agent/config] Fix TestFargateConfig in macOS Co-authored-by: Cedric Lamoriniere <[email protected]> * 7.41.0 CHANGELOG (#14675) (#14745) * Updated Python to 3.8.16 * CWS: sync BTFhub constants (#14804) Co-authored-by: paulcacheux <[email protected]> * [CSPM] respect verbose on compliance check cli cmd (#14750) * CODEOWNERS: splitting files so USM can own its own files (#14789) * config: test: Removed duplicated test (#14705) * Running dockers in the kitchen test (#14589) * ci: kitchen: Allow running dockers in kitchen test, and extend the filesystem The PR introduce a way to run external dockers in the kitchen tests, without pulling them As we cannot authenticate in the kitchen machines to dockerhub, we had to work around that and we are pulling and saving the dockers in gitlab, uploading them to the remote machine using kitchen, and then loading those dockers on the remote machine so they are available for usage. In the PR we added steps to install docker and docker compose on the kitchen machines. The PR introduce an example test that runs dockers. During the PR we faced the problem of "no space left on the device", to solve those errors we have to extend the filesystem of the remote machines. * Fixed cr comments * Debugging the artifacts * Debugging the artifacts * Debugging the artifacts * Debugging the artifacts * revert artifacts * Giving another try to dependencies * Fixed path * Fixed CR comment * [CWS] Add tests for activity dump processes content (#14708) * [CWS] Add two checks to avoid adding nodes with abnormal paths in activity dumps (#14698) * [gitlab] Repack macOS JUnit tarball to include correct name and job URL (#14793) * Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /pkg/security/secl (#14710) * Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /pkg/security/secl Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.3.0 to 0.4.0. - [Release notes](https://github.com/golang/tools/releases) - [Commits](https://github.com/golang/tools/compare/v0.3.0...v0.4.0) --- updated-dependencies: - dependency-name: golang.org/x/tools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Auto-generate go.sum and LICENSE-3rdparty.csv changes Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: paulcacheux <[email protected]> * [single-machine-performance] Introduce regression detector jobs (#14528) * [WIP][single-machine-performance] Introduce regression detector jobs This PR intends to introduce the Single Machine Performance regression detector into Agent CI. This builds on work done in #14477 and is peer to #14438. The Regression Detector is a CI tool that determines if a changed introduced into a project modifies project performance in a way that is more than just random chance with some statistical guarantee. The Regression Detector is not a microbenchmarking tool and must operate on the whole Agent. This PR introduces only 'throughput' as an optmization goal -- how quickly can the Regression Detector produce load into the Agent -- but other goals are possible. Regressions are checked per-experiment, please see `tests/regression` for details about how to define an experiment. The Regression Detector runs today in vectordotdev/vector project and is influential in keeping that project's performance consistently high. REF SMP-208 Signed-off-by: Brian L. Troutwine <[email protected]> * Use static smp binary Signed-off-by: Brian L. Troutwine <[email protected]> * different base sha calculation Signed-off-by: Brian L. Troutwine <[email protected]> * Try to clone the whole repo Signed-off-by: Brian L. Troutwine <[email protected]> * baseline sha computation redux Signed-off-by: Brian L. Troutwine <[email protected]> * specify region explicitly Signed-off-by: Brian L. Troutwine <[email protected]> * use smp 0.6.3-rc3 Signed-off-by: Brian L. Troutwine <[email protected]> * Wait for job to complete, output report, status Signed-off-by: Brian L. Troutwine <[email protected]> * update job name Signed-off-by: Brian L. Troutwine <[email protected]> * Update smp, lading Signed-off-by: Brian L. Troutwine <[email protected]> * remove \ Signed-off-by: Brian L. Troutwine <[email protected]> * Use smp 0.6.4 Signed-off-by: Brian L. Troutwine <[email protected]> * diagnose why file_to_blackhole fails Signed-off-by: Brian L. Troutwine <[email protected]> * just one test for now Signed-off-by: Brian L. Troutwine <[email protected]> * set log level for smp Signed-off-by: Brian L. Troutwine <[email protected]> * tweaks Signed-off-by: Brian L. Troutwine <[email protected]> * debug Signed-off-by: Brian L. Troutwine <[email protected]> * actually add datadog.yaml et all, .gitignore issue? Signed-off-by: Brian L. Troutwine <[email protected]> * tidy up cases to initial trio, less file_to_blackhole which needs work Signed-off-by: Brian L. Troutwine <[email protected]> * update smp, config tweak Signed-off-by: Brian L. Troutwine <[email protected]> * override .gitignore Signed-off-by: Brian L. Troutwine <[email protected]> * Apply @GeorgeHahn's patches Signed-off-by: Brian L. Troutwine <[email protected]> * enable other tests, tweak OTEL Signed-off-by: Brian L. Troutwine <[email protected]> * more fiddling Signed-off-by: Brian L. Troutwine <[email protected]> * tweaks Signed-off-by: Brian L. Troutwine <[email protected]> * use markdown output report Signed-off-by: Brian L. Troutwine <[email protected]> * use OTEL http Signed-off-by: Brian L. Troutwine <[email protected]> * use smp 0.6.5-rc1 Signed-off-by: Brian L. Troutwine <[email protected]> * debug -> info Signed-off-by: Brian L. Troutwine <[email protected]> * preserve output Signed-off-by: Brian L. Troutwine <[email protected]> * remove stray tick Signed-off-by: Brian L. Troutwine <[email protected]> * Update test/regression/README.md Co-authored-by: Kylian Serrania <[email protected]> * Update test/regression/README.md Co-authored-by: Kylian Serrania <[email protected]> Signed-off-by: Brian L. Troutwine <[email protected]> Co-authored-by: Kylian Serrania <[email protected]> * Split bundle params (#14702) * Split BundleParams into ConfigParams and LogParams * Move ConfigParams and LogParams to their own file * Move WithXXX functions from BundleParams to config.Params * Use constructors for config.Params * Fix comp/core/log/params_test.go * Make fields for log.Params unexported * Make config.Params fields not exported. * Fix package names in the security agent. * Explain why `fx.Provide` is needed in bundle.go * Remove configLoadSecurityAgent from NewSecurityAgentParams * Add NewAgentParamsWithSecrets and NewAgentParamsWithoutSecrets * CWS: sync BTFhub constants (#14815) Co-authored-by: paulcacheux <[email protected]> * Check the package exists before creating package. Restore install script after packaging. (#14777) * change networks slack channel (#14819) * fix close_time value display in INFO log (#14744) * Updates prometheusScrape to support tag_by_endpoint and collect_counters_with_distributions (#14805) * Updates prometheusScrape to support tag_by_endpoint * Adds release note * Cleans release note * Also adds support for `collect_counters_with_distributions` * Updates release note to include the second added parameter * Updates release note based on suggestion by @clamoriniere * Migrating flare to a component (#14234) Migrating flare to a component This adds a 'flare' component and rework the flare package to be compatible with fx app and non-fx app. The flare generation now happens through a FlareBuilder which handles all the logic of adding data to a flare. This FlareBuilder can be used directly (by the flare package) or be received by each component when they register a flare provider. Migration workflow for each component would be to move their dedicated code from the flare package to a flare provider. Note: Until `cmd/systray/` is migrated to fx we can't start using the flare component from other flare (on windows the systray can create flare on it's own). * Add netlink process monitor (#14706) This monitor will read the netlink socket process events queue and run it on parallel worker (map to n cpu cores) ProcessMonitor require root or CAP_NET_ADMIN capabilities Aim to Subscribe() to process event Exec, Exit With or without metadata process Any, Name, MAPfile ProcessMonitor will subscribe to the netlink process events like Exec, Exit and call the subscribed callbacks Initialize() will scan the current process and will call the subscribed callbacks callbacks will be executed in parallel via a pool of goroutines (runtime.NumCPU()) callbackRunner is callbacks queue. The queue size is set by processMonitorMaxEvents Multiple team can use the same ProcessMonitor, the callers need to guarantee calling each Initialize() Stop() one single time this maintain an internal reference counter Netlink process subscription, socket connection is allowed only by one PID * protocols: refactor tests to allow pre-post setups (#14817) * protocols: refactor tests to allow pre-post setups * Added temporary nolint for skippers * Fixed bugs * Escape path in get-acl command (#14818) * ci: Add manual benchmark step for trace-agent (#14466) * pkg/trace/config: Lower max tracer payload to 25 MB to better align with backend limits (#14782) * Revert #14367 and use nano timestamp instead (#14825) * Revert "Replace timestamp by increasing id to avoid configVersion matching different config changed in the same second" This reverts commit f8e097de2aa3322670fcc6a6c8cfc5c1ed9d6239. * Revert #14367 and use nano timestamp instead * Disable by default remote-tagger in clc-runner mode (#14821) * fix gofmt -s for pkg/collector/collector_demux_test.go (#14808) * Improve debug logging in cloud foundry container tagger (#14803) * Add logging around container retries * Add trace log * Change to debug and add release note * Delete Improve-container-tagger-logging-e48b0fffbe8563d0.yaml * Add timestamp id to events * Make id more specific, use container String method * Just print class * Update pkg/cloudfoundry/containertagger/container_tagger.go Co-authored-by: NouemanKHAL <[email protected]> * Address PR review * Create event ID Co-authored-by: NouemanKHAL <[email protected]> * [Serverless] Merge serverless/main to main. (#14826) * [Serverless] change account (#14755) * Aj/buffer cold start span data (#14664) * wip dirty commit - trace being created but not flushed properly. No further traces appearing WIP: more debugging. StopChan properly set up feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec feat: Update specs to write to channels feat: Merge conflicts resolved for tests feat: Use smaller methods to handle locking fix: pass coldstartSpanId to sls-init main feat: Remove default feat: Use Millisecond as Second is far longer than necessary feat: No need to export ColdStartSpanId fix: update units feat: Directionality for lambdaSpanChan as well as for initDurationChan fix: No need for the nil check, I need to stop javascripting my go feat: ints * feat: rebase missing changes from merge commits * feat: update ints after moving accounts * Empty commit to trigger ci * [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783) * Retry serverless integration test failures automatically. (#14801) * [Serverless] Allow some keys to be option in serverless integration tests. (#14827) * Ability to remove items from the json. * Remove items from snapshot. Co-authored-by: Maxime David <[email protected]> Co-authored-by: AJ Stuyvenberg <[email protected]> * Allow Regression Detector pipeline to fail (#14828) At present there's a race condition in the CI pipeline with regard to Regression Detector: we rely on an artifact to be created by main pipeline merge but have no way of making a hard dependency on that artifact. If that artifact is not present then the Regression Detection job will be submitted and then immediately fail. Absent a solution we allow the Regression Detector job to fail, unfortunately making any actual regressions caught but also not contributing to alert blindness in the meanwhile. Signed-off-by: Brian L. Troutwine <[email protected]> Signed-off-by: Brian L. Troutwine <[email protected]> * [process-agent] Remove unused properties from AgentConfig (#14842) * [process-agent] Remove unused properties from AgentConfig * Fix tests * 7.41.1 changelog (#14822) (#14824) * Add do-not-merge github action (#14843) * [CWS] remove useless resolver function (#14792) * [kitchen] Work around bundler and ruby version issue in verifier (#14851) Modifies the script used to run kitchen tests to run the verify phase twice, and adds a pre_verify lifecycle hook to install the dependency needed for system-probe kitchen tests. Works around an issue (version mismatch between ruby and bundler) that started happening after the release of version 2.4.0 of bundler. As long as this workaround is needed, we can't have Gemfiles in test suites, and instead need to manually install gems whenever needed. * Add the 'test' build tag to the 'unit-tests' flavor This tag is needed to run unit-test but was not printed by 'inv print-default-build-tags -b unit-tests'. When running tests from a IDE or other we need the correct list of tags to be returned. * flare: Added /opt/datadog-agent directory permissions to permissions.log (#14848) * flare: Added /opt/datadog-agent directory permissions to permissions.log system-probe internal files (sysprobe.socket, runtime compilation source files, prebuilt version, etc.) are located in /opt/datadog-agent when getting a flare, we cannot know those files permissions (and if they exist). * Take directories from configuration * Fixed cr comments * Fixed cr comments * Fixed cr comments * Update comp/core/flare/helpers/helpers.go Co-authored-by: maxime mouial <[email protected]> * [USM] protocol classification: add RabbitMQ classification (#14734) * wip * Fixed * added support for amqp without tests * added UT's for consumer and sender for rabbitmq * removed redundant client and server * added support to classify also protocol header of amqp * removed redundant function * test * fixed most of the cr notes * fixed all the cr notes * add ut * fixed licence issue * fixed ci issue * fixed event common protocol type number * Revert update of github.com/DataDog/datadog-operator * fixed all cr notes * merged main * fixed a cr note * reverted datadog-operation * update licence * fixed ci issue * merged main and updated ut * fixed cr note * added some UT's and support the latest classification uts update * refactor the uts * Added debug log * Added debug log 2 * Added debug log 3 * Added pattern scanner Co-authored-by: Guy Arbitman <[email protected]> * Handle environment variables without an equal sign (#14806) * usm: protocols: Refactored server creation (#14869) * Removed example docker tests (#14852) * [CWS][SEC-5573] add custom CWS product (#14748) * [CWS] add custom CWS product * Add a debouncer to limit reloads * Update URL regexp to detect for Datadog's URL In the past we use to edit the regexp everytime Datadog would open a new location. This commit allow the agent to detect for all present and future locations as long as they follow the format of 2 letters + 1 digit. Example: 'us3.datadoghq.com'. * system-probe: tasks: Save all dockers from docker-compose files in the protocols dir (#14873) * system-probe: tasks: Save all dockers from docker-compose files in the protocols dir * Fixed lint * [process-agent] Move data scrubber and disallow list from pkg/process/config (#14863) - Move these two fields in preparation for removal of pkg/process/config package. - Use inclusive naming where possible - will rename the config param in the future. - Update imports in pkg/security using the DataScrubber type. * add `integration_profiling` config option (#14847) Add a new option to enable profiling of python integrations. It's used only within the python integrations. See https://github.com/DataDog/integrations-core/pull/13576. * Fix flaky TestKSMCheckInitTags unit-test (#14832) * Fix flaky TestKSMCheckInitTags unit-test * improve config.GetConfiguredTags testability * update GetConfiguredTags function description * Deleting Security Agent for Windows resources (#14833) * deleting windows resources * removing windows operations for security-agent.build task * removing secagent for windows resources in omnibus, addressing python lint * [process-agent] Remove orchestrator config from AgentConfig (#14867) * [process-agent] Move data scrubber and disallow list from pkg/process/config - Move these two fields in preparation for removal of pkg/process/config package. - Use inclusive naming where possible - will rename the config param in the future. - Update imports in pkg/security using the DataScrubber type. * [process-agent] Remove orchestrator config from AgentConfig - Further decouple config management in prep for removal of pkg/process/config. - Remove orchestrator config, push it into the pod check and collector structs. * Address review feedback * [process-agent] Display system probe process module status in process agent info commands (#14880) Updates the process agent status information displayed by the datadog-agent status, process-agent status and process-agent --info commands to display whether or not the system probe's process module is enabled * tooling: Add invoke vscode devcontainer cmd (#14031) * Add invoke vscode envcontainer cmd * Update agent_dev_env.md * fix typo in documentation * adding err to exit SecAgent. fixes hanging if there's no API key (#14856) * Replace hardcoded /proc path with config field (#14773) Use the config field instead of hardcoding /proc. The config field should be automatically detected to either /proc or /host/proc inside containers. * usm: protocols: Added redis classification (#14886) * usm: protocols: Added redis classification * Fixed CR comment * Fixed CR comment * Fixed warning on centos * [CWS] extract custom events package (#14230) * [CWS] extract custom events package * [CWS] extract selftest custom event * [CWS] allow to specify a rate per rule through config * post rebase * add lint exception * use the good sender * [process-agent] Remove check intervals from pkg/process/config (#14878) * [process-agent] Remove check intervals from pkg/process/config - Remove check interval management from pkg/process/config package - Never store intervals, just use config settings - Generalize check for process and process RT check intervals * Fix MacOS tests * Address review feedback from @just-chillin * flare: Ignore system probe dirs if they are empty (#14893) * [CWS] increase exit event test timings (#14813) * [CWS] fix rule id not sent for custom event (#14897) * Adding return statment in GUI when an error is encountered * [CI] Artifactory for Python (#14473) * Introduce new E2E tests based on test-infra-definitions (#13643) * manual check tracing uses new exhaustive tracing config option (#14892) * manual check tracing uses new exhaustive tracing config option Following up to https://github.com/DataDog/integrations-core/pull/13618, we now need to set both `integration_tracing` and `integration_tracing_exhaustive` config options to enable exhaustive tracing of integrations. When manually running a check the increased overhead of exhaustive tracing (tracing all check methods) is acceptable. When continuous integration tracing is desired only the `integration_tracing` option should be set in order to keep the overhead minimal. * update core agent check command * fix sort order * pkg/trace/traceutil: Add fast-path for NormalizeTags to reduce cpu usage (#14881) * usm: remove the scenario of nil subprograms (#14909) * usm: remove the scenario of nil subprograms * Fixed CR comments * Import order * Fixed CR comments * Bump datadog-api-client from 2.6.0 to 2.7.0 in /test/e2e/cws-tests (#14914) Bumps [datadog-api-client](https://github.com/DataDog/datadog-api-client-python) from 2.6.0 to 2.7.0. - [Release notes](https://github.com/DataDog/datadog-api-client-python/releases) - [Changelog](https://github.com/DataDog/datadog-api-client-python/blob/master/CHANGELOG.md) - [Commits](https://github.com/DataDog/datadog-api-client-python/compare/2.6.0...2.7.0) --- updated-dependencies: - dependency-name: datadog-api-client dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * usm: http2: improved functions (#14917) * update profiling endpoint when the fips is enabled to avoid 404 (#14807) * fix(fips): update profiling endpoint when the fips is enabled to avoid 404 Signed-off-by: Nicolas Guerguadj <[email protected]> * pkg/clusteragent/admission: introduce deployment patcher (#14500) * [CWS] avoid using readonly map for eBPF test prog (#14780) * [e2e] add codeowners for new e2e tests (#14865) * DogstatsD component improvements (#14839) * Inject defaultLogFile * Move main.go inside command/command.go * Move start command to subcommands/start * Dogstatsd uses pkg/cli/subcommands/version/command.go for version command * Use similar code for cfgpath compare to datadog-agent * eval.Opts holds MacroStore and VariableStore (#14874) * [fake-datadog] add docker compose (#14902) * [fake-datadog] add docker compose * [fake-datadog] add docker instructions * usm: mongo: Added mongo classification (#14809) * usm: mongo: Added mongo classification * Fixed CR comment * Fixed CR comment * Fixed CR comment * Fixed CR comment * Update agent_dev_env.md (#14887) Co-authored-by: Kaylyn <[email protected]> * [CWS][SEC-6508] use tail call limit to increase the number of args/envs (#14796) * use tail call limit to increase the number of args/envs * do not validate process overflow events to avoid srubbing argv and timeout * [notifications] Catch all image pull errors as infra failures (#14926) Updates the regex to match infra failure logs when pulling images to include more patterns. The previous pattern didn't catch the following line: WARNING: Failed to pull image with policy "always": context deadline exceeded (manager.go:203:7197s) * Do not install the integrations downloader for python 2 (#14920) * usm: classification: Shrink classification buffer to 24 bytes (#14925) * config: usm: Added USM to system-probe.yaml.example file (#14908) * setupConfig consumes 1 param instead of many, adding to SecAgent constructor (#14884) * changing func signature of setupConfig * setting security agent config file instead of merging because Viper only supports 1 config file per viper instance * Revert "setting security agent config file instead of merging because Viper only supports 1 config file per viper instance" This reverts commit 8e6736d5025db79e5c1f552a983f9050f86a2c5c. * MergeConfigurationFiles is just for SecAgent * undo moving sys probe and secagent merge fix return of merge * rename configMissingOK field to baseConfigMissingOK * setting secagent config path and config load secrets params * adding secagent bundle param test * reverting renaming configMissingOK to baseConfigMissingOK * params.configMissingOK should be false * fixing test post bundle breaking into config and log components * config params test copywrite info * [e2e/ndm] add snmp test environment (#14768) * [e2e/ndm] add snmpsim data folder * [new e2e test] update test-infra-definition version * [e2e] fix aws signature * [e2e/ndm] add snmp test environment * [e2e/ndm] simpliofy err return code * [e2e/ndm] remove unused close function * [e2e/ndm] actually parse flags * [e2e] ndm: fix destroy * [e2e/ndm] add copyright header * [CWS] extract probe from event and activity dump manager (#14515) * [CWS] extract TC resolver into own resolver * no probe in event * include tcresolver in usual resolvers * fix test * apply review suggestion * apply review suggestion v2 * [corechecks/snmp] Add IP Addresses to NDM Metadata interfaces (IPv4) (#14823) * {Dockerfiles/agent,trace-agent/config}: disable apm `max_memory` and `max_cpu_percent` by default (#14850) * [pkg/otlp] Add a simple example on metric export (#14784) * Bump github.com/vektra/mockery/v2 from 2.15.0 to 2.16.0 in /internal/tools (#14913) * Bump github.com/vektra/mockery/v2 in /internal/tools Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.15.0 to 2.16.0. - [Release notes](https://github.com/vektra/mockery/releases) - [Changelog](https://github.com/vektra/mockery/blob/master/.goreleaser.yml) - [Commits](https://github.com/vektra/mockery/compare/v2.15.0...v2.16.0) --- updated-dependencies: - dependency-name: github.com/vektra/mockery/v2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * gen mocks Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Paul Cacheux <[email protected]> * usm: Reducing chances for mistakes in the protocol type values (#14816) * usm: classification: split the functions and helpers to protocol-dedicated-files (#14924) * usm: classification: split the functions and helpers to protocol-dedicated-files * usm: classification: rename protocol-classification-helpers to protocol-classification * [process-agent] Remove host info from AgentConfig (#14885) * [process-agent] Remove host info from AgentConfig * Fix info command per review feedback * [process-agent] Remove remaining properties from AgentConfig (#14889) * Ignore RemoteSamplingClient when marshaling agent config (#14927) * Ignore RemoteSamplingClient when marshaling agent config * Add release note * pkg/obfuscate: fix panic due to missing logger (#14859) Obfuscator.log was uninitialized which was causing agent panic * Update github.com/lxn/walk version (#14905) * gitignore runtime compiled hash files (#14764) * Try ignoring runtime compiled hash files * Build object files before linting * [process-agent] Remove pkg/process/config package (#14904) * [process-agent] Remove pkg/process/config package * Address review feedback from @kkhor-datadog - Revert back to using util.PathExists for simplicity - Clean up code with early exits * Review feedback from @sgnn7 * Bump github.com/avast/retry-go/v4 from 4.3.1 to 4.3.2 (#14935) Bumps [github.com/avast/retry-go/v4](https://github.com/avast/retry-go) from 4.3.1 to 4.3.2. - [Release notes](https://github.com/avast/retry-go/releases) - [Commits](https://github.com/avast/retry-go/compare/4.3.1...4.3.2) --- updated-dependencies: - dependency-name: github.com/avast/retry-go/v4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump github.com/prometheus/procfs from 0.8.0 to 0.9.0 (#14934) Bumps [github.com/prometheus/procfs](https://github.com/prometheus/procfs) from 0.8.0 to 0.9.0. - [Release notes](https://github.com/prometheus/procfs/releases) - [Commits](https://github.com/prometheus/procfs/compare/v0.8.0...v0.9.0) --- updated-dependencies: - dependency-name: github.com/prometheus/procfs dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [CWS Agent] Bugfixing SecAgent Params constructor (#14939) * [USM] use per-cpu array map instead of in-stack buffer for classification (#14756) * protocol classification: add per-cpu array map Signed-off-by: Guillaume Pagnoux <[email protected]> * Outsmart the verifier * change map type on unsupported systems Signed-off-by: Guillaume Pagnoux <[email protected]> * fix runtime-compilation on older kernels + doc Signed-off-by: Guillaume Pagnoux <[email protected]> * fix array map Signed-off-by: Guillaume Pagnoux <[email protected]> * docs & refactor Signed-off-by: Guillaume Pagnoux <[email protected]> * add missing editor flag to change map type Signed-off-by: Guillaume Pagnoux <[email protected]> * usm: Reverted #14925 Signed-off-by: Guillaume Pagnoux <[email protected]> Co-authored-by: Guy Arbitman <[email protected]> * [gitlab] Use DEB buildimage based on Ubuntu 14.04 instead of Debian 8 (#14929) * Adding config option to disable delta profiles when profiling the Agent * Fixed nil return instead of an error in DogStatsD file replay * Removed sending API key as params in forwarder * [CWS] remove now useless runtime files sync check (#14945) * flags package to organize security agent subcommand flags (#14906) * [CI] Improve visibility for `docker run` commands in the CI (#14899) Add line breaks for docker run commands * [CWS Agent] SecAgent command pkg to replace common pkg, moving status and version subcommands (#14907) * adding command package, to replace common * status and version subcommands * Bump github.com/itchyny/gojq from 0.12.10 to 0.12.11 (#14938) Bumps [github.com/itchyny/gojq](https://github.com/itchyny/gojq) from 0.12.10 to 0.12.11. - [Release notes](https://github.com/itchyny/gojq/releases) - [Changelog](https://github.com/itchyny/gojq/blob/main/CHANGELOG.md) - [Commits](https://github.com/itchyny/gojq/compare/v0.12.10...v0.12.11) --- updated-dependencies: - dependency-name: github.com/itchyny/gojq dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Replacing TODOs in exposed comments with more meaningful comments (#14901) * Revert "[agent] Support for running secrets backends with sha256 verification (#14529)" (#14940) This reverts commit deb7fce8f668a4bca6697e76d0b77cb67d7f46f7. * missing import in file with unsupported build flag (#14952) * Bump golang.org/x/text from 0.5.0 to 0.6.0 (#14948) Bumps [golang.org/x/text](https://github.com/golang/text) from 0.5.0 to 0.6.0. - [Release notes](https://github.com/golang/text/releases) - [Commits](https://github.com/golang/text/compare/v0.5.0...v0.6.0) --- updated-dependencies: - dependency-name: golang.org/x/text dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Download btfs for kitchen tests (#14587) * Save btfs to dd-agent-omnibus s3 bucket * Update folders to match new btfhub-archive names * Download BTFs during kitchen-prepare task * Add more details to error message * Fix permissions * Use btfs from dev box * Update gitignore * Check for bpftool compatability outside of generate_minimized_btfs * Change x86-64 -> x86_64 * Fix generating minimized btfs * Fix bpftool compatability check helper * Fix python linting * Fix python lint * Only run BTF preparation outside of CI * Explicitly indicate CI kitchen preparation Co-authored-by: Hasan Mahmood <[email protected]> Co-authored-by: Bryce Kahle <[email protected]> * [secrets] Fix getDDAgentUserSID to account for NT AUTHORITY\SYSTEM (#14941) * [secrets] Fix getDDAgentUserSID to account for NT AUTHORITY\SYSTEM * Address review feedback from @clarkb7 * usm: classification: removed redundant nolint (#14958) * Bump wheel versions (#14918) * Fixing system_probe.py on linux machine (#14959) * document trace API v04, including response (#14868) * [CWS] improve mount fallback (#14779) * [CWS] improve mount fallback * post review * [CWS] bump security agent policies to v0.42.1 (#14964) * [orchestration] Add Vertical Pod Autoscalers (#14669) * [orchestration] Add Vertical Pod Autoscalers We want to start collecting Vertical Pod Autoscalers from Kubernetes. Co-authored-by: Kangyi LI <[email protected]> Co-authored-by: Bryce Eadie <[email protected]> * [usm] Extract batching functionality into package (#14712) * [Process Agent] Split Collector into Runner and Submitter (#14883) * WIP * Collector and submitter split, need to fix tests * Rename receivers to `s` * Delete components directory * Add RT reporting to submitter * Add `dropCheckPayloads` back into the submitter * Move submitter tests to it's own file * Delete component.go * clean up comments and unused code * Fix a couple tests * Fix orchestrator tests * Fix tests * Fix copyright header * Fix linter issues * Use mocks in tests * Fix import * Fix data race in tests * Fix data race in tests * Update cmd/process-agent/collector.go Co-authored-by: Ivan Ilichev <[email protected]> * Refactor `Submit` to not return an error * Remove `init()` in favor of using mock config * Remove `init()` in favor of using mock config * Update mockery to use version 2.16 since they were updated in #14913 * Fix linter errors (again) * Fix `TestPodCheck/enabled` failing due to the clustername package caching a bad cluster name * Remove `forwarderRetryQueueMaxBytes` Co-authored-by: Ivan Ilichev <[email protected]> * Bump Collector dependencies to v1.0.0-RC2/v0.68.0 (#14864) * Bump Collector dependencies to v1.0.0-RC2/v0.68.0 * Revert InstrumentationLibraryMetadataAsTags changes * Update collector test configuration error message * Address PR comments * Increase speed of generate_minimized_btfs jobs (#14585) Co-authored-by: Bryce Kahle <[email protected]> * Add dynamic way of determining eBPF helper availability on runtime compilation (#14685) * Add KernelHeaderOptions type to prevent ebpf package dependency * Add function to get available helpers on host * Use dynamic method of finding available helpers * Use static list for kernels with __BPF_FUNC_MAPPER macro * Limit TestGetAvailableHelpers to kernels where it will work * Fix udp bind for random ports (#14956) * NDM: Add snmp.interface_status metric (#14797) * NDM: Add snmp.interface_status metric * update test * Add reno * Address review * Rename metric * Address review * Add InterfaceStatus enum * Remove iota and use explicit values * NDM: Add snmp.device.[un]reachable metrics (#14649) * NDM: Add snmp.device_up metric * Address review * update reno * Address review * fix import * Improve log message (#14968) Log the underlying error when GetUnitTypeProperties fails * Use rv "0" when polling endpoint list (#13906) Since this code path polls the endpoint list endpoint once every 60s by default to update the internal stat in the agent, we don't really need the consistency guarantees we implicitly get from the unset resource version. When the resource version is unset, the api-server needs to fetch all endpoints from etcd, causing a costly round-trip that can potentially result in a lot of data traffic. When setting resource version "0", all requests are handled by the watch cache, meaning they will be much more efficient and less costly. For the most part, the actual returned data will be the same, but in some cases where the API-servers are having a bad time, the data might be a bit stall; but that is not very common. In that case, getting data from the watch cache instead of not being able to list at all is preferable. The semantics are described in detail here; https://kubernetes.io/docs/reference/using-api/api-concepts/#semantics-for-get-and-list Signed-off-by: Odin Ugedal <[email protected]> Signed-off-by: Odin Ugedal <[email protected]> * Remove `CCA_IN_AD` flag and related unused code (#14955) * remove CCA_IN_AD config flag * PR feedback * remove unused providers * pr feedback * epforwarder: add additional debug logging (#14161) * Fix small typo in install XML. (#14687) Causes Wix to throw error (although apparently non-fatal) * CWS: sync BTFhub constants (#14986) Co-authored-by: paulcacheux <[email protected]> * Revert "pkg/obfuscate: improve formatting and string parsing in the SQL obfuscator (#11967)" (#14976) This reverts commit 8ab1d187421087d8ae746ec0dcca00f25918a9f0. * [CWS] remove unsafe pointer from eval.Context (#14890) * [CWS] remove unsafe pointer for eval.Context * Add user context * move perf helper to a perf file * remove resolvers from event * generate handlers * add extra field handlers * remove accessors from probe * remove model mock * fix unit and functional tests * refactor model/field_handlers * add helper for common object creation * fix stress tests * [workloadmeta/collectors/containerd] Collect image metadata (#14592) * [util/containerd] Rename Image to ImageOfContainer To be able to introduce a new Image func that gets an image just by image ID, regardless of whether it's being used in container. * [util/containerd] Add Image func * [workloadmeta] Add GetImage func * [config] Add option to enable image collection in workloadmeta * [workloadmeta/collectors/containerd] Collect image metadata * [CSPM] remove the hostSelector field not used anymore (#14770) * [CSPM] remove the hostSelector field not used anymore In a more global effort to remove the internal compliance DSL after our move to rego, this commit removes one field where it is still being used. The hostSelector field has been put in place in order to make sure we only run specific rules on hosts that match, in particular for k8s nodes. However, the rule were not used anymore since the hosts "master" labels are not properly set. We rely other side effects (like process and file existence) to avoid running some rules on bad nodes. * [CSPM] remove k8s nodeLabels retrieval from compliance rules execution Now that hostSelector fields have been removed, fetching the k8s node labels is not required anymore and completely useless. This PR just remove the nodeLabels fetching and all the subsequent dependencies. * [CWS] add tests for live process monitoring (#14944) * [system-probe][NET-2899] fix race condition in ephemeral port checker (#14802) * [NET-2899] use mutex to lock fields causing race condition in ephemeral port checker * [NET-2899] gofmt on changed files * [NET-2899] remove mutex, move racey code to sync.once func * [CWS] restore SECL documentation generation (#14993) * [CWS] fix event missing field resolver (#14992) * fix missing fields resolver in some events (around policy eval CLI) * do not emit event in policy eval output * Add __TARGET_ARCH_ to runtime compilation flags (#14983) * Add __TARGET_ARCH_ to runtime compilation flags * Use append instead * Re-delete http runtime asset hash file (#14982) * Add CO-RE version of TCP Queue Length check (#14763) * Add CO-RE version of TCP Queue Length check * Fix version * Fix generate BTF job * Invert err check on CO-RE load * Add helper for missing BTF check * Bump golang.org/x/tools from 0.4.0 to 0.5.0 in /pkg/security/secl (#14996) * Bump golang.org/x/tools from 0.4.0 to 0.5.0 in /pkg/security/secl Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.4.0 to 0.5.0. - [Release notes](https://github.com/golang/tools/releases) - [Commits](https://github.com/golang/tools/compare/v0.4.0...v0.5.0) --- updated-dependencies: - dependency-name: golang.org/x/tools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Auto-generate go.sum and LICENSE-3rdparty.csv changes Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> * Fix gateway lookup tests (#14951) * [usm] Reduce HTTP test memory utilization (#15006) * [CWS] mount fallback to pid 1 by default (#15007) * [CWS][SEC-4020] parse args and envs from the new program stack pages (#13008) * CWS: parse args and envs from the process stack * remove useless function parameter * get env vars offset from new program stack as well * return from tailcall loop sooner * use different kprobe to fix kernel function call order on CentOS 7 * [process-agent] Refactor conn rates with util/subscriptions (#14988) * [process-agent] Refactor conn rates with util/subscriptions * Update with a unit test for pub/sub * Address feedback from @hmahmood * [CWS] change programs to avoid mixing events between tests (#15012) * [CWS] rework event json marshalling (#15010) * externalize serialization * a bit of cleanup * refactor schema validators * fix printfs * re-enable policy eval event json * fix trace dispatching * fix deadcode * fix validateProcessContextSECL error output * [USM] protocol classification: add PostgreSQL classification (#14625) * protocol classification: add per-cpu array map Signed-off-by: Guillaume Pagnoux <[email protected]> * Outsmart the verifier * protocol classification: add per-cpu array map Signed-off-by: Guillaume Pagnoux <[email protected]> * Outsmart the verifier * protocol classification: add PostgreSQL classification Signed-off-by: Guillaume Pagnoux <[email protected]> * fix licenses & set postgres port in test Signed-off-by: Guillaume Pagnoux <[email protected]> * test: fix port Signed-off-by: Guillaume Pagnoux <[email protected]> * test: use JoinHostPort instead of Sprintf Signed-off-by: Guillaume Pagnoux <[email protected]> * [USM] protocol classification: add Postgres detection Signed-off-by: Guillaume Pagnoux <[email protected]> * revert check_command fix Signed-off-by: Guillaume Pagnoux <[email protected]> * postgres: refactor check_command Signed-off-by: Guillaume Pagnoux <[email protected]> * change map type on unsupported systems Signed-off-by: Guillaume Pagnoux <[email protected]> * fix runtime-compilation on older kernels + doc Signed-off-by: Guillaume Pagnoux <[email protected]> * fix array map Signed-off-by: Guillaume Pagnoux <[email protected]> * fix merge Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: add documentation Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: add long query test Signed-off-by: Guillaume Pagnoux <[email protected]> * docs & refactor Signed-off-by: Guillaume Pagnoux <[email protected]> * fix licenses Signed-off-by: Guillaume Pagnoux <[email protected]> * refactor Signed-off-by: Guillaume Pagnoux <[email protected]> * postgres: try to classify from client start messages Signed-off-by: Guillaume Pagnoux <[email protected]> * add missing Cgo defs Signed-off-by: Guillaume Pagnoux <[email protected]> * add postgres docker image pulling Signed-off-by: Guillaume Pagnoux <[email protected]> * add missing editor flag to change map type Signed-off-by: Guillaume Pagnoux <[email protected]> * remove unused import Signed-off-by: Guillaume Pagnoux <[email protected]> * case-insensitive check + docs Signed-off-by: Guillaume Pagnoux <[email protected]> * check on tmp buf Signed-off-by: Guillaume Pagnoux <[email protected]> * docs Signed-off-by: Guillaume Pagnoux <[email protected]> * try fixing verifier issue Signed-off-by: Guillaume Pagnoux <[email protected]> * fix verifier issue Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: fix docker-compose path Signed-off-by: Guillaume Pagnoux <[email protected]> * fixup! Merge remote-tracking branch 'origin/main' into guillaume.pagnoux/USMO-9-protocol-classification-posgres * re-delete re-added files Signed-off-by: Guillaume Pagnoux <[email protected]> * go mod tidy Signed-off-by: Guillaume Pagnoux <[email protected]> * add docs Signed-off-by: Guillaume Pagnoux <[email protected]> * remove redundant check Signed-off-by: Guillaume Pagnoux <[email protected]> * refactor server creation in tests Signed-off-by: Guillaume Pagnoux <[email protected]> * rename guards Signed-off-by: Guillaume Pagnoux <[email protected]> * specify postgres version in docker-compose Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: skip when using NAT Signed-off-by: Guillaume Pagnoux <[email protected]> * split sql files Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: add tests for all supported sql queries Signed-off-by: Guillaume Pagnoux <[email protected]> * move postgres struct to postgres-defs.h Signed-off-by: Guillaume Pagnoux <[email protected]> * remove redundant check Signed-off-by: Guillaume Pagnoux <[email protected]> * classify on command completion messages as well Signed-off-by: Guillaume Pagnoux <[email protected]> * add long response test Signed-off-by: Guillaume Pagnoux <[email protected]> * re-enable query detection Signed-off-by: Guillaume Pagnoux <[email protected]> Signed-off-by: Guillaume Pagnoux <[email protected]> Co-authored-by: Guy Arbitman <[email protected]> * [process-agent] Scaffold components for process agent (#14972) * [process-agent] Scaffold components for process agent * Addresss review comments from @ogaca-dd * Addresss review comments from @ogaca-dd * Change to use context.Context and reintroduce empty Component interface to suppress linting * CWS: sync BTFhub constants (#15023) Co-authored-by: paulcacheux <[email protected]> * [CWS] rework/cleanup `FieldHandlers` (#15015) * remove probe from FieldHandlers * cleanup `NewProcessResolver` resolvers dependency * resolvers only need a link to the manager * Update CODEOWNERS (#15024) * Use sc query to gain information about the service before attempting to stop it. (#15028) * [security-agent] remove redundant String() in compliance agent log (#15026) * [invoke] Print summary of test failures at the end of inv test (#14682) Updates the inv test command to print a summary of failed tests at the end of a run, across all modules and flavors that were tested, to more easily identify the list of failures, without having to visually parse the full job logs. * [system-probe][NET-2891] Fix tcp retransmit count (#14740) * [NET-2891] initial pass at changes to prebuilt code * [NET-2891] use retrans_out for runtime compiled tcp_retransmit counter * [NET-2891] runtime compiled version of tcp_retrans updates * [NET-2891] remove debug comment * [NET-2891] fix log * [NET-2891] update bytecode * [NET-2891] code review comments, regenerate license * [NET-2891] newline * [NET-2891] fix probe definitions * [NET-2891] update comment * [NET-2891] runtime compilation fixes * [NET-2891] fix byte padding for args init * [NET-2891] fix formatting * testing debug logic * more debug logic, added some config for the map * [NET-2891] enable kretprobe and remove debug * [NET-2891] disable bpf debug be default * [NET-2891] update bytecode * [NET-2891] make function as maybe unused * [NET-2891] handle different paths of incremental vs absolute retransmit counters * [NET-2891] use enum to track increment vs absolute retransmits * [NET-2891] change enum values * [NET-2891] move retrans code to runtime tracer * pulled in new gitignore * Revert "pulled in new gitignore" This reverts commit b4b0df587aeb6b6f655ea90d7bc96ae250934170. * remove runtime gen files, code review comments * [NET-2891] use retransmit count none in runtime tracer * [NET-2891] use retransmit_count_none in handle_tcp_stats * [NET-2891] nit comments from code review * [NET-2891] try to get runtime compilation working on 4.4 kernel * usm: upgraded pgdriver version to indirectly upgrade mellium.in/sasl version due to a CVE ofound (#15030) * usm: upgraded pgdriver version to indirectly upgrade mellium.in/sasl version due to a CVE ofound * Fixed go.sum * [CWS] fix signal test (#15025) * [process-agent] Support dynamically enabling profiling for process agent from CLI (#14995) Adds support for dynamically enabling profiling for the process agent from the CLI * pkg/obfuscate: Fix parsing of sqlserver identifiers enclosed in square brackets (#15019) * DBM-2010 Fix parsing of sqlserver literals enclosed in square brackets * .gitlab: move APM benchmark job to manual only (#15036) * fix datatype (#13791) related to #13770 * [AD/prometheus] Ignore headless services (#15031) * Fix stop service (#15035) * Fix check conf directory Durring the migration to component the hardcoded directory 'etc/confd' for check configuration was removed. * Fix shipping of 'version-history.json' and 'registry.json' in flares When migrating to component the logic to include /opt/datadog-agent/run/ was handled as a file instead of a folder. This broke collecting 'version-history.json' and 'registry.json from it. * Fix datadog.yaml file name in flare * Force file permission to 644 within a flare * auto instru: add rc provider (#15008) * pkg/obfuscate: use github.com/outcaste-io/ristretto instead of github.com/dgraph-io/ristretto (#15005) Migrate the usage of github.com/dgraph-io/ristretto to github.com/outcaste-io/ristretto * [workloadmeta/kubelet] Parse image ID if name is a SHA256 We now try to parse the resolved image ID if the image in the pod's container status is a SHA256. This seems to happen when pinning the SHA256 in the container spec. This fixes an issue where `image:` filters in DD_CONTAINER_INCLUDE/DD_CONTAINER_EXCLUDE would not be respected. * pkg/trace/api: remove unused internal OTLP HTTP server (#14965) * [pkg/trace/api] Remove unused OTLP HTTP server * [pkg/trace] Remove protocol argument * Remove unnecessary fmt.Sprintf * Fix tests * [CWS] cleanup last uses of `jsonschema_description` (#15050) * [Serverless] Merge `serverless/main` to `main` (#14980) * [Serverless] change account (#14755) * Aj/buffer cold start span data (#14664) * wip dirty commit - trace being created but not flushed properly. No further traces appearing WIP: more debugging. StopChan properly set up feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec feat: Update specs to write to channels feat: Merge conflicts resolved for tests feat: Use smaller methods to handle locking fix: pass coldstartSpanId to sls-init main feat: Remove default feat: Use Millisecond as Second is far longer than necessary feat: No need to export ColdStartSpanId fix: update units feat: Directionality for lambdaSpanChan as well as for initDurationChan fix: No need for the nil check, I need to stop javascripting my go feat: ints * feat: rebase missing changes from merge commits * feat: update ints after moving accounts * Empty commit to trigger ci * [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783) * Retry serverless integration test failures automatically. (#14801) * [Serverless] Allow some keys to be option in serverless integration tests. (#14827) * Ability to remove items from the json. * Remove items from snapshot. * Do not expect spans when there is no spans object. (#14396) * [Serverless] Improve stability of two tests. (#14895) * Increase timeout while decreasing test time. * Increase timeout in test. * [Serverless] Consolidate log normalization to single file for integration tests. (#15004) * Consolidate log normalization to single file. * Save raw logs to a temp dir. * Fix linting issues. Co-authored-by: Maxime David <[email protected]> Co-authored-by: AJ Stuyvenberg <[email protected]> * Fixes multiple problems with http processing/tagging on Windows. (#15022) * Fixes multiple problems with http processing/tagging on Windows. - There was an offset error in which the port was not properly computed on ipv6 connections - There was a problem with computing whether an ipv6 address was loopback or not - The fullpath indication (which is used to compute the key) was not properly being computed. This led to the same tuple being used as a different key, so transactions were not properly combined. * fix grammar error in release notes * Add the plumbing in the agent forwarder to submit container images and SBOM (#14962) * Improve documentation for BundleParams (#15011) * pkg/clusteragent/admission: add unit tests (#15044) * [CWS] bump syscall table + extract into separate task (#15061) * 5.19 -> 6.1 * switch syscall table generator from go generate to task * extract linux version * [gitlab] Temporarily disable SUSE Agent 5 upgrade tests (#15055) * [corechecks/snmp] Add LLDP remote device IP address (#14946) * [CWS] add discarders eBPF unit test (#14471) * [CWS] add discarder retention ut * add another test * add a unit test task * add trace param * make eBPF test part of the CI * fake time to speed up tests * bump baloum version * add more tests * [CWS Agent] Moving SecAgent subcommands to new dir part 2 (#14915) * moving flare command to subcommands dir * consolidating and moving secagent config package * moving runtime to subcommands dir * moved check subcommand, updated compliance subcommand which is the entry point to check funcs * moving compliance cmd to subcommand dir * exporting CliParams and RunCheck in Check subcommand for Compliance tests * fixing cluster agent entry point into the check subcommand * Add `container_image` core check (#14567) * Reorganize the specs for some kitchen test (#15027) * [check command] Add `--instance-filter` option (#15034) * Migrate systray to an fx.App (#14985) Deprecate single-dash args and add double-dash args Move code from cmd/systray to comp/systray Update UAC manifest to requireAdministrator Fix log file and add `system_tray.log_file` configuration option. * epforwarder: update dbm samples endpoint prefix (#15053) dbm-metrics-intake and dbquery-intake resolve to the same IPs. This change cleans up code so that we're only referencing one endpoint name. * [process-agent] Refactor Check interface (#15063) * [process-agent] Refactor Check interface - Refactors Check interface to consolidate CheckWithRealTime features - This will simplify integration with components in the future PRs since it eliminates casts * Address feedback from @just-chillin * usm: postgres classification: Reduced 5 seconds per test, 1m30s in total (#15070) Improved the regex for which we are using to detect if the server is up and running, by that we can spare the 'wait 5 seconds' in GetPGHandle * CWS: sync BTFhub constants (#15074) Co-authored-by: paulcacheux <[email protected]> * [DCA] Convert commands to Fx apps * Extract magic strings into command.* constants * [CWS] Add 4 tests, one for each kernel rate limiter algo (#15064) * [CWS] remove useless callbacks (#15046) * remove useless error check * remove useless callback * Add `SBOM` core check (#14989) * Prevent check from running after it was unscheduled. (#15065) * Prevent check from running after it was unscheduled. If a check runs after it was unscheduled, in particular after it's sender and samplers were removed, would create sender and samplers again, leaking resources. This may happen if the check was cancelled after it was put in the worker channel, but before worker called Run. This change adjusts check_wrapper to make Cancel fully mutually exclusive with Run, and adds a flag that would prevent Run from executing the check after Cancel has completed. * go fmt * Update test helper * Restrict flare file from being accessible by other users on Unix (#14862) * pkg/clusteragent/admission/patch: poll rc on leadership switch (#15062) * pkg/clusteragent/admission: add additional libconfig env vars (#15059) * usm: classification: Split USM and NPM classifications (#15075) USM does not need all classifiers, only those which we have dispatchers for (HTTP, and soon HTTP2) * Python memory telemetry (#14757) * Track memory used by the python arena allocator pymalloc [1], Python built-in arena allocator is responsible for handling small-sized allocations, while the rest goes through the system malloc. This patch tracks the amount of memory requested by pymalloc from the operating system, allowing low cost, low granularity view into a segment of python memory usage. [1]: https://docs.python.org/3/c-api/memory.html#the-pymalloc-allocator * inv -e rtloader.format * Remove rtloader_mem.h from rtloader.h This allows to call C malloc without warnings when we implement a custom raw memory allocator for python. * Add python raw allocator tracking. Together with tracking pymalloc requests, this should give comprehensive picture of memory allocated by the python interpreter. * Make sure to call global malloc/free In Pyraw allocator implementation, make sure to call global malloc/calloc/realloc/free symbols, to avoid undesired interaction with the rtloader-specific memory tracking (for example, call libc free instead of RtLoader::free). * Move all memory tracking to the same file * Update Go naming to match C functions pymalloc is now one of two tracked allocators, use pymem as umbrella. * Add a note about new metrics to the docs * Python memory telemetry supports py3 only * Add releasenote * Expand telemetry documentation. * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update releasenotes/notes/pymem-telemetry-0f62acb520d80a1f.yaml Co-authored-by: Kari Halsted <[email protected]> * Update rtloader/three/three_mem.cpp Co-authored-by: Scott Opell <[email protected]> * Improve metric description and remove outdated comment. * Fix typo * Add a comment about allocation size adjustments Co-authored-by: Kari Halsted <[email protected]> Co-authored-by: Scott Opell <[email protected]> * Add telemetry for number of contexts per origin (#15016) * Add telemetry for number of contexts per origin Report number of contexts at the end of flush for each container sending dogstatsd metrics. This PR relies on origin detection to provide a set of identifying tags for each origin, and reports number of distinct contexts for each tag set. While this may not fully identify individual origins when running with low tagger cardinality, it accurately reflects the way agent would aggregate metrics from different origins together if their tags end up the same. * Only enable per-origin stats if telemetry is enabled. * [process-agent] Fix kitchen tests for process agent on main (#15072) * include `functests` in `DD_PIPELINE_ID` for system probe and security agent functests (#15043) * include `functests` in DD_PIPELINE_ID for system probe and security agent functests * simpler/shorter pipeline_id * [install_script] Backport removal of RPM signing key 4172A230 (#15082) * [corechecks/snmp] LLDP resolve local interface (#14991) * [CWS] fix rule in error reported twice (#15084) * Add java package in our circle-ci image (#14665) * Use DMI on EC2 Nitro instances to get host aliases The Agent now leverage DMI information on Unix to get the instance ID on AWS EC2 when the metadata endpoint fails or is not accessible. The instance ID is exposed throught DMI only on AWS Nitro instances. This will not change the hostname of the Agent upon upgrading but will add to the list of host aliases. * [CWS] add inode to pid context to detect exec loss (#14661) * [CWS] add revision to pid context * use inode instead of revision * Fix post rebase * Fix serializer tests flakiness (#15093) * [RCM-632] Add UUID in request (#15088) * Add org uuid field * Add org uuid in request * Remove generate file * Comment exported method * fix the receiver name consistency (#15068) * Add limits to allocated dictionaries, prevent browser cross-site requests (#15067) * pkg/trace/api: Move semantic conventions to separate internal package (#14963) * [pkg/trace/api] Move semantic conventions to separate internal package * Rename to shared * Move tagContainersTags back to API package * Rename package to 'header' * Fix Windows build * Factorize queue code duplicated at two places (#15098) * Factorize the aggregating queue used by the SBOM and container image checks * Mock time functions to make tests more reliable * [single-machine-performance] Push agent containers to SMP ECR (#14438) * [single-machine-performance] Push agent container to SMP ECR This commit is an attempt to introduce pushing containers from Agent CI for single-machine-performance's Regression Detector in our isolated infrastructure. Much like we have done for vectordotdev/vector we intend to run the Regression Detector on Agent changes, gi…

djova requested a review from a team as a code owner December 23, 2022 20:38

ghost added the base_package label Dec 23, 2022

djova added a commit to DataDog/datadog-agent that referenced this pull request Dec 23, 2022

add integration_profiling config option

bf28d3b

See DataDog/integrations-core#13576

djova added a commit to DataDog/datadog-agent that referenced this pull request Dec 23, 2022

add integration_profiling config option

090a16f

Add a new option to enable profiling of python integrations. It's used only within the python integrations. See DataDog/integrations-core#13576.

djova mentioned this pull request Dec 23, 2022

add integration_profiling config option DataDog/datadog-agent#14847

Merged

10 tasks

djova added the changelog/Added label Dec 23, 2022

djova force-pushed the djova/add-integration-profiling-option branch from 844c579 to c86b0f7 Compare December 23, 2022 20:47

djova mentioned this pull request Dec 23, 2022

Update integration tracing naming scheme #13579

Merged

5 tasks

ofek approved these changes Dec 27, 2022

View reviewed changes

ofek changed the title ~~add option to enable profiling of python integrations~~ Add option to enable profiling of Python integrations Dec 27, 2022

ofek merged commit ea51537 into master Dec 27, 2022

ofek deleted the djova/add-integration-profiling-option branch December 27, 2022 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to enable profiling of Python integrations #13576

Add option to enable profiling of Python integrations #13576

djova commented Dec 23, 2022 •

edited

Loading

codecov bot commented Dec 23, 2022 •

edited

Loading

Add option to enable profiling of Python integrations #13576

Add option to enable profiling of Python integrations #13576

Conversation

djova commented Dec 23, 2022 • edited Loading

What does this PR do?

Motivation

Additional Notes

Review checklist (to be filled by reviewers)

codecov bot commented Dec 23, 2022 • edited Loading

Codecov Report

djova commented Dec 23, 2022 •

edited

Loading

codecov bot commented Dec 23, 2022 •

edited

Loading