-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes multiple problems with http processing/tagging on Windows. #15022
Conversation
- There was an offset error in which the port was not properly computed on ipv6 connections - There was a problem with computing whether an ipv6 address was loopback or not - The fullpath indication (which is used to compute the key) was not properly being computed. This led to the same tuple being used as a different key, so transactions were not properly combined.
BenchmarksFound 0 performance improvements and 0 performance regressions! Performance is the same for 3 cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -486,7 +486,7 @@ func httpCallbackOnHTTPConnectionTraceTaskConnConn(eventInfo *etw.DDEtwEventInfo | |||
connOpen.conn.tup.Family = binary.LittleEndian.Uint16(userData[12:14]) | |||
connOpen.conn.tup.SrvPort = binary.BigEndian.Uint16(userData[14:16]) | |||
copy(connOpen.conn.tup.SrvAddr[:], userData[20:36]) | |||
connOpen.conn.tup.CliPort = binary.BigEndian.Uint16(userData[36:48]) | |||
connOpen.conn.tup.CliPort = binary.BigEndian.Uint16(userData[46:48]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, one suggestion!
fixes: | ||
- | | ||
On Windows, fixes bug in which HTTP connections were not properly accounted | ||
for when the client and server were same host (loopback). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for when the client and server were same host (loopback). | |
for when the client and server were the same host (loopback). |
) * Fixes multiple problems with http processing/tagging on Windows. - There was an offset error in which the port was not properly computed on ipv6 connections - There was a problem with computing whether an ipv6 address was loopback or not - The fullpath indication (which is used to compute the key) was not properly being computed. This led to the same tuple being used as a different key, so transactions were not properly combined. * fix grammar error in release notes
* [fargate] Make hostname resolution more reliable (#14746) * [config/environment] Check AWS_EXECUTION_ENV in Fargate detection * [util/fargate] Rely on features for ECS Fargate detection * [fargate/detection] Rely on features to detect EKS * [trace-agent/config] Call fargate.GetOrchestrator after loading config * add unit-test for trace-agent config on fargate * Add release note * [cmd/trace-agent/config] Fix TestFargateConfig in macOS Co-authored-by: Cedric Lamoriniere <[email protected]> * 7.41.0 CHANGELOG (#14675) (#14745) * Updated Python to 3.8.16 * CWS: sync BTFhub constants (#14804) Co-authored-by: paulcacheux <[email protected]> * [CSPM] respect verbose on compliance check cli cmd (#14750) * CODEOWNERS: splitting files so USM can own its own files (#14789) * config: test: Removed duplicated test (#14705) * Running dockers in the kitchen test (#14589) * ci: kitchen: Allow running dockers in kitchen test, and extend the filesystem The PR introduce a way to run external dockers in the kitchen tests, without pulling them As we cannot authenticate in the kitchen machines to dockerhub, we had to work around that and we are pulling and saving the dockers in gitlab, uploading them to the remote machine using kitchen, and then loading those dockers on the remote machine so they are available for usage. In the PR we added steps to install docker and docker compose on the kitchen machines. The PR introduce an example test that runs dockers. During the PR we faced the problem of "no space left on the device", to solve those errors we have to extend the filesystem of the remote machines. * Fixed cr comments * Debugging the artifacts * Debugging the artifacts * Debugging the artifacts * Debugging the artifacts * revert artifacts * Giving another try to dependencies * Fixed path * Fixed CR comment * [CWS] Add tests for activity dump processes content (#14708) * [CWS] Add two checks to avoid adding nodes with abnormal paths in activity dumps (#14698) * [gitlab] Repack macOS JUnit tarball to include correct name and job URL (#14793) * Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /pkg/security/secl (#14710) * Bump golang.org/x/tools from 0.3.0 to 0.4.0 in /pkg/security/secl Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.3.0 to 0.4.0. - [Release notes](https://github.com/golang/tools/releases) - [Commits](https://github.com/golang/tools/compare/v0.3.0...v0.4.0) --- updated-dependencies: - dependency-name: golang.org/x/tools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Auto-generate go.sum and LICENSE-3rdparty.csv changes Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: paulcacheux <[email protected]> * [single-machine-performance] Introduce regression detector jobs (#14528) * [WIP][single-machine-performance] Introduce regression detector jobs This PR intends to introduce the Single Machine Performance regression detector into Agent CI. This builds on work done in #14477 and is peer to #14438. The Regression Detector is a CI tool that determines if a changed introduced into a project modifies project performance in a way that is more than just random chance with some statistical guarantee. The Regression Detector is not a microbenchmarking tool and must operate on the whole Agent. This PR introduces only 'throughput' as an optmization goal -- how quickly can the Regression Detector produce load into the Agent -- but other goals are possible. Regressions are checked per-experiment, please see `tests/regression` for details about how to define an experiment. The Regression Detector runs today in vectordotdev/vector project and is influential in keeping that project's performance consistently high. REF SMP-208 Signed-off-by: Brian L. Troutwine <[email protected]> * Use static smp binary Signed-off-by: Brian L. Troutwine <[email protected]> * different base sha calculation Signed-off-by: Brian L. Troutwine <[email protected]> * Try to clone the whole repo Signed-off-by: Brian L. Troutwine <[email protected]> * baseline sha computation redux Signed-off-by: Brian L. Troutwine <[email protected]> * specify region explicitly Signed-off-by: Brian L. Troutwine <[email protected]> * use smp 0.6.3-rc3 Signed-off-by: Brian L. Troutwine <[email protected]> * Wait for job to complete, output report, status Signed-off-by: Brian L. Troutwine <[email protected]> * update job name Signed-off-by: Brian L. Troutwine <[email protected]> * Update smp, lading Signed-off-by: Brian L. Troutwine <[email protected]> * remove \ Signed-off-by: Brian L. Troutwine <[email protected]> * Use smp 0.6.4 Signed-off-by: Brian L. Troutwine <[email protected]> * diagnose why file_to_blackhole fails Signed-off-by: Brian L. Troutwine <[email protected]> * just one test for now Signed-off-by: Brian L. Troutwine <[email protected]> * set log level for smp Signed-off-by: Brian L. Troutwine <[email protected]> * tweaks Signed-off-by: Brian L. Troutwine <[email protected]> * debug Signed-off-by: Brian L. Troutwine <[email protected]> * actually add datadog.yaml et all, .gitignore issue? Signed-off-by: Brian L. Troutwine <[email protected]> * tidy up cases to initial trio, less file_to_blackhole which needs work Signed-off-by: Brian L. Troutwine <[email protected]> * update smp, config tweak Signed-off-by: Brian L. Troutwine <[email protected]> * override .gitignore Signed-off-by: Brian L. Troutwine <[email protected]> * Apply @GeorgeHahn's patches Signed-off-by: Brian L. Troutwine <[email protected]> * enable other tests, tweak OTEL Signed-off-by: Brian L. Troutwine <[email protected]> * more fiddling Signed-off-by: Brian L. Troutwine <[email protected]> * tweaks Signed-off-by: Brian L. Troutwine <[email protected]> * use markdown output report Signed-off-by: Brian L. Troutwine <[email protected]> * use OTEL http Signed-off-by: Brian L. Troutwine <[email protected]> * use smp 0.6.5-rc1 Signed-off-by: Brian L. Troutwine <[email protected]> * debug -> info Signed-off-by: Brian L. Troutwine <[email protected]> * preserve output Signed-off-by: Brian L. Troutwine <[email protected]> * remove stray tick Signed-off-by: Brian L. Troutwine <[email protected]> * Update test/regression/README.md Co-authored-by: Kylian Serrania <[email protected]> * Update test/regression/README.md Co-authored-by: Kylian Serrania <[email protected]> Signed-off-by: Brian L. Troutwine <[email protected]> Co-authored-by: Kylian Serrania <[email protected]> * Split bundle params (#14702) * Split BundleParams into ConfigParams and LogParams * Move ConfigParams and LogParams to their own file * Move WithXXX functions from BundleParams to config.Params * Use constructors for config.Params * Fix comp/core/log/params_test.go * Make fields for log.Params unexported * Make config.Params fields not exported. * Fix package names in the security agent. * Explain why `fx.Provide` is needed in bundle.go * Remove configLoadSecurityAgent from NewSecurityAgentParams * Add NewAgentParamsWithSecrets and NewAgentParamsWithoutSecrets * CWS: sync BTFhub constants (#14815) Co-authored-by: paulcacheux <[email protected]> * Check the package exists before creating package. Restore install script after packaging. (#14777) * change networks slack channel (#14819) * fix close_time value display in INFO log (#14744) * Updates prometheusScrape to support tag_by_endpoint and collect_counters_with_distributions (#14805) * Updates prometheusScrape to support tag_by_endpoint * Adds release note * Cleans release note * Also adds support for `collect_counters_with_distributions` * Updates release note to include the second added parameter * Updates release note based on suggestion by @clamoriniere * Migrating flare to a component (#14234) Migrating flare to a component This adds a 'flare' component and rework the flare package to be compatible with fx app and non-fx app. The flare generation now happens through a FlareBuilder which handles all the logic of adding data to a flare. This FlareBuilder can be used directly (by the flare package) or be received by each component when they register a flare provider. Migration workflow for each component would be to move their dedicated code from the flare package to a flare provider. Note: Until `cmd/systray/` is migrated to fx we can't start using the flare component from other flare (on windows the systray can create flare on it's own). * Add netlink process monitor (#14706) This monitor will read the netlink socket process events queue and run it on parallel worker (map to n cpu cores) ProcessMonitor require root or CAP_NET_ADMIN capabilities Aim to Subscribe() to process event Exec, Exit With or without metadata process Any, Name, MAPfile ProcessMonitor will subscribe to the netlink process events like Exec, Exit and call the subscribed callbacks Initialize() will scan the current process and will call the subscribed callbacks callbacks will be executed in parallel via a pool of goroutines (runtime.NumCPU()) callbackRunner is callbacks queue. The queue size is set by processMonitorMaxEvents Multiple team can use the same ProcessMonitor, the callers need to guarantee calling each Initialize() Stop() one single time this maintain an internal reference counter Netlink process subscription, socket connection is allowed only by one PID * protocols: refactor tests to allow pre-post setups (#14817) * protocols: refactor tests to allow pre-post setups * Added temporary nolint for skippers * Fixed bugs * Escape path in get-acl command (#14818) * ci: Add manual benchmark step for trace-agent (#14466) * pkg/trace/config: Lower max tracer payload to 25 MB to better align with backend limits (#14782) * Revert #14367 and use nano timestamp instead (#14825) * Revert "Replace timestamp by increasing id to avoid configVersion matching different config changed in the same second" This reverts commit f8e097de2aa3322670fcc6a6c8cfc5c1ed9d6239. * Revert #14367 and use nano timestamp instead * Disable by default remote-tagger in clc-runner mode (#14821) * fix gofmt -s for pkg/collector/collector_demux_test.go (#14808) * Improve debug logging in cloud foundry container tagger (#14803) * Add logging around container retries * Add trace log * Change to debug and add release note * Delete Improve-container-tagger-logging-e48b0fffbe8563d0.yaml * Add timestamp id to events * Make id more specific, use container String method * Just print class * Update pkg/cloudfoundry/containertagger/container_tagger.go Co-authored-by: NouemanKHAL <[email protected]> * Address PR review * Create event ID Co-authored-by: NouemanKHAL <[email protected]> * [Serverless] Merge serverless/main to main. (#14826) * [Serverless] change account (#14755) * Aj/buffer cold start span data (#14664) * wip dirty commit - trace being created but not flushed properly. No further traces appearing WIP: more debugging. StopChan properly set up feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec feat: Update specs to write to channels feat: Merge conflicts resolved for tests feat: Use smaller methods to handle locking fix: pass coldstartSpanId to sls-init main feat: Remove default feat: Use Millisecond as Second is far longer than necessary feat: No need to export ColdStartSpanId fix: update units feat: Directionality for lambdaSpanChan as well as for initDurationChan fix: No need for the nil check, I need to stop javascripting my go feat: ints * feat: rebase missing changes from merge commits * feat: update ints after moving accounts * Empty commit to trigger ci * [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783) * Retry serverless integration test failures automatically. (#14801) * [Serverless] Allow some keys to be option in serverless integration tests. (#14827) * Ability to remove items from the json. * Remove items from snapshot. Co-authored-by: Maxime David <[email protected]> Co-authored-by: AJ Stuyvenberg <[email protected]> * Allow Regression Detector pipeline to fail (#14828) At present there's a race condition in the CI pipeline with regard to Regression Detector: we rely on an artifact to be created by main pipeline merge but have no way of making a hard dependency on that artifact. If that artifact is not present then the Regression Detection job will be submitted and then immediately fail. Absent a solution we allow the Regression Detector job to fail, unfortunately making any actual regressions caught but also not contributing to alert blindness in the meanwhile. Signed-off-by: Brian L. Troutwine <[email protected]> Signed-off-by: Brian L. Troutwine <[email protected]> * [process-agent] Remove unused properties from AgentConfig (#14842) * [process-agent] Remove unused properties from AgentConfig * Fix tests * 7.41.1 changelog (#14822) (#14824) * Add do-not-merge github action (#14843) * [CWS] remove useless resolver function (#14792) * [kitchen] Work around bundler and ruby version issue in verifier (#14851) Modifies the script used to run kitchen tests to run the verify phase twice, and adds a pre_verify lifecycle hook to install the dependency needed for system-probe kitchen tests. Works around an issue (version mismatch between ruby and bundler) that started happening after the release of version 2.4.0 of bundler. As long as this workaround is needed, we can't have Gemfiles in test suites, and instead need to manually install gems whenever needed. * Add the 'test' build tag to the 'unit-tests' flavor This tag is needed to run unit-test but was not printed by 'inv print-default-build-tags -b unit-tests'. When running tests from a IDE or other we need the correct list of tags to be returned. * flare: Added /opt/datadog-agent directory permissions to permissions.log (#14848) * flare: Added /opt/datadog-agent directory permissions to permissions.log system-probe internal files (sysprobe.socket, runtime compilation source files, prebuilt version, etc.) are located in /opt/datadog-agent when getting a flare, we cannot know those files permissions (and if they exist). * Take directories from configuration * Fixed cr comments * Fixed cr comments * Fixed cr comments * Update comp/core/flare/helpers/helpers.go Co-authored-by: maxime mouial <[email protected]> * [USM] protocol classification: add RabbitMQ classification (#14734) * wip * Fixed * added support for amqp without tests * added UT's for consumer and sender for rabbitmq * removed redundant client and server * added support to classify also protocol header of amqp * removed redundant function * test * fixed most of the cr notes * fixed all the cr notes * add ut * fixed licence issue * fixed ci issue * fixed event common protocol type number * Revert update of github.com/DataDog/datadog-operator * fixed all cr notes * merged main * fixed a cr note * reverted datadog-operation * update licence * fixed ci issue * merged main and updated ut * fixed cr note * added some UT's and support the latest classification uts update * refactor the uts * Added debug log * Added debug log 2 * Added debug log 3 * Added pattern scanner Co-authored-by: Guy Arbitman <[email protected]> * Handle environment variables without an equal sign (#14806) * usm: protocols: Refactored server creation (#14869) * Removed example docker tests (#14852) * [CWS][SEC-5573] add custom CWS product (#14748) * [CWS] add custom CWS product * Add a debouncer to limit reloads * Update URL regexp to detect for Datadog's URL In the past we use to edit the regexp everytime Datadog would open a new location. This commit allow the agent to detect for all present and future locations as long as they follow the format of 2 letters + 1 digit. Example: 'us3.datadoghq.com'. * system-probe: tasks: Save all dockers from docker-compose files in the protocols dir (#14873) * system-probe: tasks: Save all dockers from docker-compose files in the protocols dir * Fixed lint * [process-agent] Move data scrubber and disallow list from pkg/process/config (#14863) - Move these two fields in preparation for removal of pkg/process/config package. - Use inclusive naming where possible - will rename the config param in the future. - Update imports in pkg/security using the DataScrubber type. * add `integration_profiling` config option (#14847) Add a new option to enable profiling of python integrations. It's used only within the python integrations. See https://github.com/DataDog/integrations-core/pull/13576. * Fix flaky TestKSMCheckInitTags unit-test (#14832) * Fix flaky TestKSMCheckInitTags unit-test * improve config.GetConfiguredTags testability * update GetConfiguredTags function description * Deleting Security Agent for Windows resources (#14833) * deleting windows resources * removing windows operations for security-agent.build task * removing secagent for windows resources in omnibus, addressing python lint * [process-agent] Remove orchestrator config from AgentConfig (#14867) * [process-agent] Move data scrubber and disallow list from pkg/process/config - Move these two fields in preparation for removal of pkg/process/config package. - Use inclusive naming where possible - will rename the config param in the future. - Update imports in pkg/security using the DataScrubber type. * [process-agent] Remove orchestrator config from AgentConfig - Further decouple config management in prep for removal of pkg/process/config. - Remove orchestrator config, push it into the pod check and collector structs. * Address review feedback * [process-agent] Display system probe process module status in process agent info commands (#14880) Updates the process agent status information displayed by the datadog-agent status, process-agent status and process-agent --info commands to display whether or not the system probe's process module is enabled * tooling: Add invoke vscode devcontainer cmd (#14031) * Add invoke vscode envcontainer cmd * Update agent_dev_env.md * fix typo in documentation * adding err to exit SecAgent. fixes hanging if there's no API key (#14856) * Replace hardcoded /proc path with config field (#14773) Use the config field instead of hardcoding /proc. The config field should be automatically detected to either /proc or /host/proc inside containers. * usm: protocols: Added redis classification (#14886) * usm: protocols: Added redis classification * Fixed CR comment * Fixed CR comment * Fixed warning on centos * [CWS] extract custom events package (#14230) * [CWS] extract custom events package * [CWS] extract selftest custom event * [CWS] allow to specify a rate per rule through config * post rebase * add lint exception * use the good sender * [process-agent] Remove check intervals from pkg/process/config (#14878) * [process-agent] Remove check intervals from pkg/process/config - Remove check interval management from pkg/process/config package - Never store intervals, just use config settings - Generalize check for process and process RT check intervals * Fix MacOS tests * Address review feedback from @just-chillin * flare: Ignore system probe dirs if they are empty (#14893) * [CWS] increase exit event test timings (#14813) * [CWS] fix rule id not sent for custom event (#14897) * Adding return statment in GUI when an error is encountered * [CI] Artifactory for Python (#14473) * Introduce new E2E tests based on test-infra-definitions (#13643) * manual check tracing uses new exhaustive tracing config option (#14892) * manual check tracing uses new exhaustive tracing config option Following up to https://github.com/DataDog/integrations-core/pull/13618, we now need to set both `integration_tracing` and `integration_tracing_exhaustive` config options to enable exhaustive tracing of integrations. When manually running a check the increased overhead of exhaustive tracing (tracing all check methods) is acceptable. When continuous integration tracing is desired only the `integration_tracing` option should be set in order to keep the overhead minimal. * update core agent check command * fix sort order * pkg/trace/traceutil: Add fast-path for NormalizeTags to reduce cpu usage (#14881) * usm: remove the scenario of nil subprograms (#14909) * usm: remove the scenario of nil subprograms * Fixed CR comments * Import order * Fixed CR comments * Bump datadog-api-client from 2.6.0 to 2.7.0 in /test/e2e/cws-tests (#14914) Bumps [datadog-api-client](https://github.com/DataDog/datadog-api-client-python) from 2.6.0 to 2.7.0. - [Release notes](https://github.com/DataDog/datadog-api-client-python/releases) - [Changelog](https://github.com/DataDog/datadog-api-client-python/blob/master/CHANGELOG.md) - [Commits](https://github.com/DataDog/datadog-api-client-python/compare/2.6.0...2.7.0) --- updated-dependencies: - dependency-name: datadog-api-client dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * usm: http2: improved functions (#14917) * update profiling endpoint when the fips is enabled to avoid 404 (#14807) * fix(fips): update profiling endpoint when the fips is enabled to avoid 404 Signed-off-by: Nicolas Guerguadj <[email protected]> * pkg/clusteragent/admission: introduce deployment patcher (#14500) * [CWS] avoid using readonly map for eBPF test prog (#14780) * [e2e] add codeowners for new e2e tests (#14865) * DogstatsD component improvements (#14839) * Inject defaultLogFile * Move main.go inside command/command.go * Move start command to subcommands/start * Dogstatsd uses pkg/cli/subcommands/version/command.go for version command * Use similar code for cfgpath compare to datadog-agent * eval.Opts holds MacroStore and VariableStore (#14874) * [fake-datadog] add docker compose (#14902) * [fake-datadog] add docker compose * [fake-datadog] add docker instructions * usm: mongo: Added mongo classification (#14809) * usm: mongo: Added mongo classification * Fixed CR comment * Fixed CR comment * Fixed CR comment * Fixed CR comment * Update agent_dev_env.md (#14887) Co-authored-by: Kaylyn <[email protected]> * [CWS][SEC-6508] use tail call limit to increase the number of args/envs (#14796) * use tail call limit to increase the number of args/envs * do not validate process overflow events to avoid srubbing argv and timeout * [notifications] Catch all image pull errors as infra failures (#14926) Updates the regex to match infra failure logs when pulling images to include more patterns. The previous pattern didn't catch the following line: WARNING: Failed to pull image with policy "always": context deadline exceeded (manager.go:203:7197s) * Do not install the integrations downloader for python 2 (#14920) * usm: classification: Shrink classification buffer to 24 bytes (#14925) * config: usm: Added USM to system-probe.yaml.example file (#14908) * setupConfig consumes 1 param instead of many, adding to SecAgent constructor (#14884) * changing func signature of setupConfig * setting security agent config file instead of merging because Viper only supports 1 config file per viper instance * Revert "setting security agent config file instead of merging because Viper only supports 1 config file per viper instance" This reverts commit 8e6736d5025db79e5c1f552a983f9050f86a2c5c. * MergeConfigurationFiles is just for SecAgent * undo moving sys probe and secagent merge fix return of merge * rename configMissingOK field to baseConfigMissingOK * setting secagent config path and config load secrets params * adding secagent bundle param test * reverting renaming configMissingOK to baseConfigMissingOK * params.configMissingOK should be false * fixing test post bundle breaking into config and log components * config params test copywrite info * [e2e/ndm] add snmp test environment (#14768) * [e2e/ndm] add snmpsim data folder * [new e2e test] update test-infra-definition version * [e2e] fix aws signature * [e2e/ndm] add snmp test environment * [e2e/ndm] simpliofy err return code * [e2e/ndm] remove unused close function * [e2e/ndm] actually parse flags * [e2e] ndm: fix destroy * [e2e/ndm] add copyright header * [CWS] extract probe from event and activity dump manager (#14515) * [CWS] extract TC resolver into own resolver * no probe in event * include tcresolver in usual resolvers * fix test * apply review suggestion * apply review suggestion v2 * [corechecks/snmp] Add IP Addresses to NDM Metadata interfaces (IPv4) (#14823) * {Dockerfiles/agent,trace-agent/config}: disable apm `max_memory` and `max_cpu_percent` by default (#14850) * [pkg/otlp] Add a simple example on metric export (#14784) * Bump github.com/vektra/mockery/v2 from 2.15.0 to 2.16.0 in /internal/tools (#14913) * Bump github.com/vektra/mockery/v2 in /internal/tools Bumps [github.com/vektra/mockery/v2](https://github.com/vektra/mockery) from 2.15.0 to 2.16.0. - [Release notes](https://github.com/vektra/mockery/releases) - [Changelog](https://github.com/vektra/mockery/blob/master/.goreleaser.yml) - [Commits](https://github.com/vektra/mockery/compare/v2.15.0...v2.16.0) --- updated-dependencies: - dependency-name: github.com/vektra/mockery/v2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * gen mocks Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Paul Cacheux <[email protected]> * usm: Reducing chances for mistakes in the protocol type values (#14816) * usm: classification: split the functions and helpers to protocol-dedicated-files (#14924) * usm: classification: split the functions and helpers to protocol-dedicated-files * usm: classification: rename protocol-classification-helpers to protocol-classification * [process-agent] Remove host info from AgentConfig (#14885) * [process-agent] Remove host info from AgentConfig * Fix info command per review feedback * [process-agent] Remove remaining properties from AgentConfig (#14889) * Ignore RemoteSamplingClient when marshaling agent config (#14927) * Ignore RemoteSamplingClient when marshaling agent config * Add release note * pkg/obfuscate: fix panic due to missing logger (#14859) Obfuscator.log was uninitialized which was causing agent panic * Update github.com/lxn/walk version (#14905) * gitignore runtime compiled hash files (#14764) * Try ignoring runtime compiled hash files * Build object files before linting * [process-agent] Remove pkg/process/config package (#14904) * [process-agent] Remove pkg/process/config package * Address review feedback from @kkhor-datadog - Revert back to using util.PathExists for simplicity - Clean up code with early exits * Review feedback from @sgnn7 * Bump github.com/avast/retry-go/v4 from 4.3.1 to 4.3.2 (#14935) Bumps [github.com/avast/retry-go/v4](https://github.com/avast/retry-go) from 4.3.1 to 4.3.2. - [Release notes](https://github.com/avast/retry-go/releases) - [Commits](https://github.com/avast/retry-go/compare/4.3.1...4.3.2) --- updated-dependencies: - dependency-name: github.com/avast/retry-go/v4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump github.com/prometheus/procfs from 0.8.0 to 0.9.0 (#14934) Bumps [github.com/prometheus/procfs](https://github.com/prometheus/procfs) from 0.8.0 to 0.9.0. - [Release notes](https://github.com/prometheus/procfs/releases) - [Commits](https://github.com/prometheus/procfs/compare/v0.8.0...v0.9.0) --- updated-dependencies: - dependency-name: github.com/prometheus/procfs dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [CWS Agent] Bugfixing SecAgent Params constructor (#14939) * [USM] use per-cpu array map instead of in-stack buffer for classification (#14756) * protocol classification: add per-cpu array map Signed-off-by: Guillaume Pagnoux <[email protected]> * Outsmart the verifier * change map type on unsupported systems Signed-off-by: Guillaume Pagnoux <[email protected]> * fix runtime-compilation on older kernels + doc Signed-off-by: Guillaume Pagnoux <[email protected]> * fix array map Signed-off-by: Guillaume Pagnoux <[email protected]> * docs & refactor Signed-off-by: Guillaume Pagnoux <[email protected]> * add missing editor flag to change map type Signed-off-by: Guillaume Pagnoux <[email protected]> * usm: Reverted #14925 Signed-off-by: Guillaume Pagnoux <[email protected]> Co-authored-by: Guy Arbitman <[email protected]> * [gitlab] Use DEB buildimage based on Ubuntu 14.04 instead of Debian 8 (#14929) * Adding config option to disable delta profiles when profiling the Agent * Fixed nil return instead of an error in DogStatsD file replay * Removed sending API key as params in forwarder * [CWS] remove now useless runtime files sync check (#14945) * flags package to organize security agent subcommand flags (#14906) * [CI] Improve visibility for `docker run` commands in the CI (#14899) Add line breaks for docker run commands * [CWS Agent] SecAgent command pkg to replace common pkg, moving status and version subcommands (#14907) * adding command package, to replace common * status and version subcommands * Bump github.com/itchyny/gojq from 0.12.10 to 0.12.11 (#14938) Bumps [github.com/itchyny/gojq](https://github.com/itchyny/gojq) from 0.12.10 to 0.12.11. - [Release notes](https://github.com/itchyny/gojq/releases) - [Changelog](https://github.com/itchyny/gojq/blob/main/CHANGELOG.md) - [Commits](https://github.com/itchyny/gojq/compare/v0.12.10...v0.12.11) --- updated-dependencies: - dependency-name: github.com/itchyny/gojq dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Replacing TODOs in exposed comments with more meaningful comments (#14901) * Revert "[agent] Support for running secrets backends with sha256 verification (#14529)" (#14940) This reverts commit deb7fce8f668a4bca6697e76d0b77cb67d7f46f7. * missing import in file with unsupported build flag (#14952) * Bump golang.org/x/text from 0.5.0 to 0.6.0 (#14948) Bumps [golang.org/x/text](https://github.com/golang/text) from 0.5.0 to 0.6.0. - [Release notes](https://github.com/golang/text/releases) - [Commits](https://github.com/golang/text/compare/v0.5.0...v0.6.0) --- updated-dependencies: - dependency-name: golang.org/x/text dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Download btfs for kitchen tests (#14587) * Save btfs to dd-agent-omnibus s3 bucket * Update folders to match new btfhub-archive names * Download BTFs during kitchen-prepare task * Add more details to error message * Fix permissions * Use btfs from dev box * Update gitignore * Check for bpftool compatability outside of generate_minimized_btfs * Change x86-64 -> x86_64 * Fix generating minimized btfs * Fix bpftool compatability check helper * Fix python linting * Fix python lint * Only run BTF preparation outside of CI * Explicitly indicate CI kitchen preparation Co-authored-by: Hasan Mahmood <[email protected]> Co-authored-by: Bryce Kahle <[email protected]> * [secrets] Fix getDDAgentUserSID to account for NT AUTHORITY\SYSTEM (#14941) * [secrets] Fix getDDAgentUserSID to account for NT AUTHORITY\SYSTEM * Address review feedback from @clarkb7 * usm: classification: removed redundant nolint (#14958) * Bump wheel versions (#14918) * Fixing system_probe.py on linux machine (#14959) * document trace API v04, including response (#14868) * [CWS] improve mount fallback (#14779) * [CWS] improve mount fallback * post review * [CWS] bump security agent policies to v0.42.1 (#14964) * [orchestration] Add Vertical Pod Autoscalers (#14669) * [orchestration] Add Vertical Pod Autoscalers We want to start collecting Vertical Pod Autoscalers from Kubernetes. Co-authored-by: Kangyi LI <[email protected]> Co-authored-by: Bryce Eadie <[email protected]> * [usm] Extract batching functionality into package (#14712) * [Process Agent] Split Collector into Runner and Submitter (#14883) * WIP * Collector and submitter split, need to fix tests * Rename receivers to `s` * Delete components directory * Add RT reporting to submitter * Add `dropCheckPayloads` back into the submitter * Move submitter tests to it's own file * Delete component.go * clean up comments and unused code * Fix a couple tests * Fix orchestrator tests * Fix tests * Fix copyright header * Fix linter issues * Use mocks in tests * Fix import * Fix data race in tests * Fix data race in tests * Update cmd/process-agent/collector.go Co-authored-by: Ivan Ilichev <[email protected]> * Refactor `Submit` to not return an error * Remove `init()` in favor of using mock config * Remove `init()` in favor of using mock config * Update mockery to use version 2.16 since they were updated in #14913 * Fix linter errors (again) * Fix `TestPodCheck/enabled` failing due to the clustername package caching a bad cluster name * Remove `forwarderRetryQueueMaxBytes` Co-authored-by: Ivan Ilichev <[email protected]> * Bump Collector dependencies to v1.0.0-RC2/v0.68.0 (#14864) * Bump Collector dependencies to v1.0.0-RC2/v0.68.0 * Revert InstrumentationLibraryMetadataAsTags changes * Update collector test configuration error message * Address PR comments * Increase speed of generate_minimized_btfs jobs (#14585) Co-authored-by: Bryce Kahle <[email protected]> * Add dynamic way of determining eBPF helper availability on runtime compilation (#14685) * Add KernelHeaderOptions type to prevent ebpf package dependency * Add function to get available helpers on host * Use dynamic method of finding available helpers * Use static list for kernels with __BPF_FUNC_MAPPER macro * Limit TestGetAvailableHelpers to kernels where it will work * Fix udp bind for random ports (#14956) * NDM: Add snmp.interface_status metric (#14797) * NDM: Add snmp.interface_status metric * update test * Add reno * Address review * Rename metric * Address review * Add InterfaceStatus enum * Remove iota and use explicit values * NDM: Add snmp.device.[un]reachable metrics (#14649) * NDM: Add snmp.device_up metric * Address review * update reno * Address review * fix import * Improve log message (#14968) Log the underlying error when GetUnitTypeProperties fails * Use rv "0" when polling endpoint list (#13906) Since this code path polls the endpoint list endpoint once every 60s by default to update the internal stat in the agent, we don't really need the consistency guarantees we implicitly get from the unset resource version. When the resource version is unset, the api-server needs to fetch all endpoints from etcd, causing a costly round-trip that can potentially result in a lot of data traffic. When setting resource version "0", all requests are handled by the watch cache, meaning they will be much more efficient and less costly. For the most part, the actual returned data will be the same, but in some cases where the API-servers are having a bad time, the data might be a bit stall; but that is not very common. In that case, getting data from the watch cache instead of not being able to list at all is preferable. The semantics are described in detail here; https://kubernetes.io/docs/reference/using-api/api-concepts/#semantics-for-get-and-list Signed-off-by: Odin Ugedal <[email protected]> Signed-off-by: Odin Ugedal <[email protected]> * Remove `CCA_IN_AD` flag and related unused code (#14955) * remove CCA_IN_AD config flag * PR feedback * remove unused providers * pr feedback * epforwarder: add additional debug logging (#14161) * Fix small typo in install XML. (#14687) Causes Wix to throw error (although apparently non-fatal) * CWS: sync BTFhub constants (#14986) Co-authored-by: paulcacheux <[email protected]> * Revert "pkg/obfuscate: improve formatting and string parsing in the SQL obfuscator (#11967)" (#14976) This reverts commit 8ab1d187421087d8ae746ec0dcca00f25918a9f0. * [CWS] remove unsafe pointer from eval.Context (#14890) * [CWS] remove unsafe pointer for eval.Context * Add user context * move perf helper to a perf file * remove resolvers from event * generate handlers * add extra field handlers * remove accessors from probe * remove model mock * fix unit and functional tests * refactor model/field_handlers * add helper for common object creation * fix stress tests * [workloadmeta/collectors/containerd] Collect image metadata (#14592) * [util/containerd] Rename Image to ImageOfContainer To be able to introduce a new Image func that gets an image just by image ID, regardless of whether it's being used in container. * [util/containerd] Add Image func * [workloadmeta] Add GetImage func * [config] Add option to enable image collection in workloadmeta * [workloadmeta/collectors/containerd] Collect image metadata * [CSPM] remove the hostSelector field not used anymore (#14770) * [CSPM] remove the hostSelector field not used anymore In a more global effort to remove the internal compliance DSL after our move to rego, this commit removes one field where it is still being used. The hostSelector field has been put in place in order to make sure we only run specific rules on hosts that match, in particular for k8s nodes. However, the rule were not used anymore since the hosts "master" labels are not properly set. We rely other side effects (like process and file existence) to avoid running some rules on bad nodes. * [CSPM] remove k8s nodeLabels retrieval from compliance rules execution Now that hostSelector fields have been removed, fetching the k8s node labels is not required anymore and completely useless. This PR just remove the nodeLabels fetching and all the subsequent dependencies. * [CWS] add tests for live process monitoring (#14944) * [system-probe][NET-2899] fix race condition in ephemeral port checker (#14802) * [NET-2899] use mutex to lock fields causing race condition in ephemeral port checker * [NET-2899] gofmt on changed files * [NET-2899] remove mutex, move racey code to sync.once func * [CWS] restore SECL documentation generation (#14993) * [CWS] fix event missing field resolver (#14992) * fix missing fields resolver in some events (around policy eval CLI) * do not emit event in policy eval output * Add __TARGET_ARCH_ to runtime compilation flags (#14983) * Add __TARGET_ARCH_ to runtime compilation flags * Use append instead * Re-delete http runtime asset hash file (#14982) * Add CO-RE version of TCP Queue Length check (#14763) * Add CO-RE version of TCP Queue Length check * Fix version * Fix generate BTF job * Invert err check on CO-RE load * Add helper for missing BTF check * Bump golang.org/x/tools from 0.4.0 to 0.5.0 in /pkg/security/secl (#14996) * Bump golang.org/x/tools from 0.4.0 to 0.5.0 in /pkg/security/secl Bumps [golang.org/x/tools](https://github.com/golang/tools) from 0.4.0 to 0.5.0. - [Release notes](https://github.com/golang/tools/releases) - [Commits](https://github.com/golang/tools/compare/v0.4.0...v0.5.0) --- updated-dependencies: - dependency-name: golang.org/x/tools dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Auto-generate go.sum and LICENSE-3rdparty.csv changes Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> * Fix gateway lookup tests (#14951) * [usm] Reduce HTTP test memory utilization (#15006) * [CWS] mount fallback to pid 1 by default (#15007) * [CWS][SEC-4020] parse args and envs from the new program stack pages (#13008) * CWS: parse args and envs from the process stack * remove useless function parameter * get env vars offset from new program stack as well * return from tailcall loop sooner * use different kprobe to fix kernel function call order on CentOS 7 * [process-agent] Refactor conn rates with util/subscriptions (#14988) * [process-agent] Refactor conn rates with util/subscriptions * Update with a unit test for pub/sub * Address feedback from @hmahmood * [CWS] change programs to avoid mixing events between tests (#15012) * [CWS] rework event json marshalling (#15010) * externalize serialization * a bit of cleanup * refactor schema validators * fix printfs * re-enable policy eval event json * fix trace dispatching * fix deadcode * fix validateProcessContextSECL error output * [USM] protocol classification: add PostgreSQL classification (#14625) * protocol classification: add per-cpu array map Signed-off-by: Guillaume Pagnoux <[email protected]> * Outsmart the verifier * protocol classification: add per-cpu array map Signed-off-by: Guillaume Pagnoux <[email protected]> * Outsmart the verifier * protocol classification: add PostgreSQL classification Signed-off-by: Guillaume Pagnoux <[email protected]> * fix licenses & set postgres port in test Signed-off-by: Guillaume Pagnoux <[email protected]> * test: fix port Signed-off-by: Guillaume Pagnoux <[email protected]> * test: use JoinHostPort instead of Sprintf Signed-off-by: Guillaume Pagnoux <[email protected]> * [USM] protocol classification: add Postgres detection Signed-off-by: Guillaume Pagnoux <[email protected]> * revert check_command fix Signed-off-by: Guillaume Pagnoux <[email protected]> * postgres: refactor check_command Signed-off-by: Guillaume Pagnoux <[email protected]> * change map type on unsupported systems Signed-off-by: Guillaume Pagnoux <[email protected]> * fix runtime-compilation on older kernels + doc Signed-off-by: Guillaume Pagnoux <[email protected]> * fix array map Signed-off-by: Guillaume Pagnoux <[email protected]> * fix merge Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: add documentation Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: add long query test Signed-off-by: Guillaume Pagnoux <[email protected]> * docs & refactor Signed-off-by: Guillaume Pagnoux <[email protected]> * fix licenses Signed-off-by: Guillaume Pagnoux <[email protected]> * refactor Signed-off-by: Guillaume Pagnoux <[email protected]> * postgres: try to classify from client start messages Signed-off-by: Guillaume Pagnoux <[email protected]> * add missing Cgo defs Signed-off-by: Guillaume Pagnoux <[email protected]> * add postgres docker image pulling Signed-off-by: Guillaume Pagnoux <[email protected]> * add missing editor flag to change map type Signed-off-by: Guillaume Pagnoux <[email protected]> * remove unused import Signed-off-by: Guillaume Pagnoux <[email protected]> * case-insensitive check + docs Signed-off-by: Guillaume Pagnoux <[email protected]> * check on tmp buf Signed-off-by: Guillaume Pagnoux <[email protected]> * docs Signed-off-by: Guillaume Pagnoux <[email protected]> * try fixing verifier issue Signed-off-by: Guillaume Pagnoux <[email protected]> * fix verifier issue Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: fix docker-compose path Signed-off-by: Guillaume Pagnoux <[email protected]> * fixup! Merge remote-tracking branch 'origin/main' into guillaume.pagnoux/USMO-9-protocol-classification-posgres * re-delete re-added files Signed-off-by: Guillaume Pagnoux <[email protected]> * go mod tidy Signed-off-by: Guillaume Pagnoux <[email protected]> * add docs Signed-off-by: Guillaume Pagnoux <[email protected]> * remove redundant check Signed-off-by: Guillaume Pagnoux <[email protected]> * refactor server creation in tests Signed-off-by: Guillaume Pagnoux <[email protected]> * rename guards Signed-off-by: Guillaume Pagnoux <[email protected]> * specify postgres version in docker-compose Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: skip when using NAT Signed-off-by: Guillaume Pagnoux <[email protected]> * split sql files Signed-off-by: Guillaume Pagnoux <[email protected]> * tests: add tests for all supported sql queries Signed-off-by: Guillaume Pagnoux <[email protected]> * move postgres struct to postgres-defs.h Signed-off-by: Guillaume Pagnoux <[email protected]> * remove redundant check Signed-off-by: Guillaume Pagnoux <[email protected]> * classify on command completion messages as well Signed-off-by: Guillaume Pagnoux <[email protected]> * add long response test Signed-off-by: Guillaume Pagnoux <[email protected]> * re-enable query detection Signed-off-by: Guillaume Pagnoux <[email protected]> Signed-off-by: Guillaume Pagnoux <[email protected]> Co-authored-by: Guy Arbitman <[email protected]> * [process-agent] Scaffold components for process agent (#14972) * [process-agent] Scaffold components for process agent * Addresss review comments from @ogaca-dd * Addresss review comments from @ogaca-dd * Change to use context.Context and reintroduce empty Component interface to suppress linting * CWS: sync BTFhub constants (#15023) Co-authored-by: paulcacheux <[email protected]> * [CWS] rework/cleanup `FieldHandlers` (#15015) * remove probe from FieldHandlers * cleanup `NewProcessResolver` resolvers dependency * resolvers only need a link to the manager * Update CODEOWNERS (#15024) * Use sc query to gain information about the service before attempting to stop it. (#15028) * [security-agent] remove redundant String() in compliance agent log (#15026) * [invoke] Print summary of test failures at the end of inv test (#14682) Updates the inv test command to print a summary of failed tests at the end of a run, across all modules and flavors that were tested, to more easily identify the list of failures, without having to visually parse the full job logs. * [system-probe][NET-2891] Fix tcp retransmit count (#14740) * [NET-2891] initial pass at changes to prebuilt code * [NET-2891] use retrans_out for runtime compiled tcp_retransmit counter * [NET-2891] runtime compiled version of tcp_retrans updates * [NET-2891] remove debug comment * [NET-2891] fix log * [NET-2891] update bytecode * [NET-2891] code review comments, regenerate license * [NET-2891] newline * [NET-2891] fix probe definitions * [NET-2891] update comment * [NET-2891] runtime compilation fixes * [NET-2891] fix byte padding for args init * [NET-2891] fix formatting * testing debug logic * more debug logic, added some config for the map * [NET-2891] enable kretprobe and remove debug * [NET-2891] disable bpf debug be default * [NET-2891] update bytecode * [NET-2891] make function as maybe unused * [NET-2891] handle different paths of incremental vs absolute retransmit counters * [NET-2891] use enum to track increment vs absolute retransmits * [NET-2891] change enum values * [NET-2891] move retrans code to runtime tracer * pulled in new gitignore * Revert "pulled in new gitignore" This reverts commit b4b0df587aeb6b6f655ea90d7bc96ae250934170. * remove runtime gen files, code review comments * [NET-2891] use retransmit count none in runtime tracer * [NET-2891] use retransmit_count_none in handle_tcp_stats * [NET-2891] nit comments from code review * [NET-2891] try to get runtime compilation working on 4.4 kernel * usm: upgraded pgdriver version to indirectly upgrade mellium.in/sasl version due to a CVE ofound (#15030) * usm: upgraded pgdriver version to indirectly upgrade mellium.in/sasl version due to a CVE ofound * Fixed go.sum * [CWS] fix signal test (#15025) * [process-agent] Support dynamically enabling profiling for process agent from CLI (#14995) Adds support for dynamically enabling profiling for the process agent from the CLI * pkg/obfuscate: Fix parsing of sqlserver identifiers enclosed in square brackets (#15019) * DBM-2010 Fix parsing of sqlserver literals enclosed in square brackets * .gitlab: move APM benchmark job to manual only (#15036) * fix datatype (#13791) related to #13770 * [AD/prometheus] Ignore headless services (#15031) * Fix stop service (#15035) * Fix check conf directory Durring the migration to component the hardcoded directory 'etc/confd' for check configuration was removed. * Fix shipping of 'version-history.json' and 'registry.json' in flares When migrating to component the logic to include /opt/datadog-agent/run/ was handled as a file instead of a folder. This broke collecting 'version-history.json' and 'registry.json from it. * Fix datadog.yaml file name in flare * Force file permission to 644 within a flare * auto instru: add rc provider (#15008) * pkg/obfuscate: use github.com/outcaste-io/ristretto instead of github.com/dgraph-io/ristretto (#15005) Migrate the usage of github.com/dgraph-io/ristretto to github.com/outcaste-io/ristretto * [workloadmeta/kubelet] Parse image ID if name is a SHA256 We now try to parse the resolved image ID if the image in the pod's container status is a SHA256. This seems to happen when pinning the SHA256 in the container spec. This fixes an issue where `image:` filters in DD_CONTAINER_INCLUDE/DD_CONTAINER_EXCLUDE would not be respected. * pkg/trace/api: remove unused internal OTLP HTTP server (#14965) * [pkg/trace/api] Remove unused OTLP HTTP server * [pkg/trace] Remove protocol argument * Remove unnecessary fmt.Sprintf * Fix tests * [CWS] cleanup last uses of `jsonschema_description` (#15050) * [Serverless] Merge `serverless/main` to `main` (#14980) * [Serverless] change account (#14755) * Aj/buffer cold start span data (#14664) * wip dirty commit - trace being created but not flushed properly. No further traces appearing WIP: more debugging. StopChan properly set up feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec feat: Update specs to write to channels feat: Merge conflicts resolved for tests feat: Use smaller methods to handle locking fix: pass coldstartSpanId to sls-init main feat: Remove default feat: Use Millisecond as Second is far longer than necessary feat: No need to export ColdStartSpanId fix: update units feat: Directionality for lambdaSpanChan as well as for initDurationChan fix: No need for the nil check, I need to stop javascripting my go feat: ints * feat: rebase missing changes from merge commits * feat: update ints after moving accounts * Empty commit to trigger ci * [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783) * Retry serverless integration test failures automatically. (#14801) * [Serverless] Allow some keys to be option in serverless integration tests. (#14827) * Ability to remove items from the json. * Remove items from snapshot. * Do not expect spans when there is no spans object. (#14396) * [Serverless] Improve stability of two tests. (#14895) * Increase timeout while decreasing test time. * Increase timeout in test. * [Serverless] Consolidate log normalization to single file for integration tests. (#15004) * Consolidate log normalization to single file. * Save raw logs to a temp dir. * Fix linting issues. Co-authored-by: Maxime David <[email protected]> Co-authored-by: AJ Stuyvenberg <[email protected]> * Fixes multiple problems with http processing/tagging on Windows. (#15022) * Fixes multiple problems with http processing/tagging on Windows. - There was an offset error in which the port was not properly computed on ipv6 connections - There was a problem with computing whether an ipv6 address was loopback or not - The fullpath indication (which is used to compute the key) was not properly being computed. This led to the same tuple being used as a different key, so transactions were not properly combined. * fix grammar error in release notes * Add the plumbing in the agent forwarder to submit container images and SBOM (#14962) * Improve documentation for BundleParams (#15011) * pkg/clusteragent/admission: add unit tests (#15044) * [CWS] bump syscall table + extract into separate task (#15061) * 5.19 -> 6.1 * switch syscall table generator from go generate to task * extract linux version * [gitlab] Temporarily disable SUSE Agent 5 upgrade tests (#15055) * [corechecks/snmp] Add LLDP remote device IP address (#14946) * [CWS] add discarders eBPF unit test (#14471) * [CWS] add discarder retention ut * add another test * add a unit test task * add trace param * make eBPF test part of the CI * fake time to speed up tests * bump baloum version * add more tests * [CWS Agent] Moving SecAgent subcommands to new dir part 2 (#14915) * moving flare command to subcommands dir * consolidating and moving secagent config package * moving runtime to subcommands dir * moved check subcommand, updated compliance subcommand which is the entry point to check funcs * moving compliance cmd to subcommand dir * exporting CliParams and RunCheck in Check subcommand for Compliance tests * fixing cluster agent entry point into the check subcommand * Add `container_image` core check (#14567) * Reorganize the specs for some kitchen test (#15027) * [check command] Add `--instance-filter` option (#15034) * Migrate systray to an fx.App (#14985) Deprecate single-dash args and add double-dash args Move code from cmd/systray to comp/systray Update UAC manifest to requireAdministrator Fix log file and add `system_tray.log_file` configuration option. * epforwarder: update dbm samples endpoint prefix (#15053) dbm-metrics-intake and dbquery-intake resolve to the same IPs. This change cleans up code so that we're only referencing one endpoint name. * [process-agent] Refactor Check interface (#15063) * [process-agent] Refactor Check interface - Refactors Check interface to consolidate CheckWithRealTime features - This will simplify integration with components in the future PRs since it eliminates casts * Address feedback from @just-chillin * usm: postgres classification: Reduced 5 seconds per test, 1m30s in total (#15070) Improved the regex for which we are using to detect if the server is up and running, by that we can spare the 'wait 5 seconds' in GetPGHandle * CWS: sync BTFhub constants (#15074) Co-authored-by: paulcacheux <[email protected]> * [DCA] Convert commands to Fx apps * Extract magic strings into command.* constants * [CWS] Add 4 tests, one for each kernel rate limiter algo (#15064) * [CWS] remove useless callbacks (#15046) * remove useless error check * remove useless callback * Add `SBOM` core check (#14989) * Prevent check from running after it was unscheduled. (#15065) * Prevent check from running after it was unscheduled. If a check runs after it was unscheduled, in particular after it's sender and samplers were removed, would create sender and samplers again, leaking resources. This may happen if the check was cancelled after it was put in the worker channel, but before worker called Run. This change adjusts check_wrapper to make Cancel fully mutually exclusive with Run, and adds a flag that would prevent Run from executing the check after Cancel has completed. * go fmt * Update test helper * Restrict flare file from being accessible by other users on Unix (#14862) * pkg/clusteragent/admission/patch: poll rc on leadership switch (#15062) * pkg/clusteragent/admission: add additional libconfig env vars (#15059) * usm: classification: Split USM and NPM classifications (#15075) USM does not need all classifiers, only those which we have dispatchers for (HTTP, and soon HTTP2) * Python memory telemetry (#14757) * Track memory used by the python arena allocator pymalloc [1], Python built-in arena allocator is responsible for handling small-sized allocations, while the rest goes through the system malloc. This patch tracks the amount of memory requested by pymalloc from the operating system, allowing low cost, low granularity view into a segment of python memory usage. [1]: https://docs.python.org/3/c-api/memory.html#the-pymalloc-allocator * inv -e rtloader.format * Remove rtloader_mem.h from rtloader.h This allows to call C malloc without warnings when we implement a custom raw memory allocator for python. * Add python raw allocator tracking. Together with tracking pymalloc requests, this should give comprehensive picture of memory allocated by the python interpreter. * Make sure to call global malloc/free In Pyraw allocator implementation, make sure to call global malloc/calloc/realloc/free symbols, to avoid undesired interaction with the rtloader-specific memory tracking (for example, call libc free instead of RtLoader::free). * Move all memory tracking to the same file * Update Go naming to match C functions pymalloc is now one of two tracked allocators, use pymem as umbrella. * Add a note about new metrics to the docs * Python memory telemetry supports py3 only * Add releasenote * Expand telemetry documentation. * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update releasenotes/notes/pymem-telemetry-0f62acb520d80a1f.yaml Co-authored-by: Kari Halsted <[email protected]> * Update rtloader/three/three_mem.cpp Co-authored-by: Scott Opell <[email protected]> * Improve metric description and remove outdated comment. * Fix typo * Add a comment about allocation size adjustments Co-authored-by: Kari Halsted <[email protected]> Co-authored-by: Scott Opell <[email protected]> * Add telemetry for number of contexts per origin (#15016) * Add telemetry for number of contexts per origin Report number of contexts at the end of flush for each container sending dogstatsd metrics. This PR relies on origin detection to provide a set of identifying tags for each origin, and reports number of distinct contexts for each tag set. While this may not fully identify individual origins when running with low tagger cardinality, it accurately reflects the way agent would aggregate metrics from different origins together if their tags end up the same. * Only enable per-origin stats if telemetry is enabled. * [process-agent] Fix kitchen tests for process agent on main (#15072) * include `functests` in `DD_PIPELINE_ID` for system probe and security agent functests (#15043) * include `functests` in DD_PIPELINE_ID for system probe and security agent functests * simpler/shorter pipeline_id * [install_script] Backport removal of RPM signing key 4172A230 (#15082) * [corechecks/snmp] LLDP resolve local interface (#14991) * [CWS] fix rule in error reported twice (#15084) * Add java package in our circle-ci image (#14665) * Use DMI on EC2 Nitro instances to get host aliases The Agent now leverage DMI information on Unix to get the instance ID on AWS EC2 when the metadata endpoint fails or is not accessible. The instance ID is exposed throught DMI only on AWS Nitro instances. This will not change the hostname of the Agent upon upgrading but will add to the list of host aliases. * [CWS] add inode to pid context to detect exec loss (#14661) * [CWS] add revision to pid context * use inode instead of revision * Fix post rebase * Fix serializer tests flakiness (#15093) * [RCM-632] Add UUID in request (#15088) * Add org uuid field * Add org uuid in request * Remove generate file * Comment exported method * fix the receiver name consistency (#15068) * Add limits to allocated dictionaries, prevent browser cross-site requests (#15067) * pkg/trace/api: Move semantic conventions to separate internal package (#14963) * [pkg/trace/api] Move semantic conventions to separate internal package * Rename to shared * Move tagContainersTags back to API package * Rename package to 'header' * Fix Windows build * Factorize queue code duplicated at two places (#15098) * Factorize the aggregating queue used by the SBOM and container image checks * Mock time functions to make tests more reliable * [single-machine-performance] Push agent containers to SMP ECR (#14438) * [single-machine-performance] Push agent container to SMP ECR This commit is an attempt to introduce pushing containers from Agent CI for single-machine-performance's Regression Detector in our isolated infrastructure. Much like we have done for vectordotdev/vector we intend to run the Regression Detector on Agent changes, gi…
* Fix datadog.yaml file name in flare * Force file permission to 644 within a flare * auto instru: add rc provider (#15008) * pkg/obfuscate: use github.com/outcaste-io/ristretto instead of github.com/dgraph-io/ristretto (#15005) Migrate the usage of github.com/dgraph-io/ristretto to github.com/outcaste-io/ristretto * [workloadmeta/kubelet] Parse image ID if name is a SHA256 We now try to parse the resolved image ID if the image in the pod's container status is a SHA256. This seems to happen when pinning the SHA256 in the container spec. This fixes an issue where `image:` filters in DD_CONTAINER_INCLUDE/DD_CONTAINER_EXCLUDE would not be respected. * pkg/trace/api: remove unused internal OTLP HTTP server (#14965) * [pkg/trace/api] Remove unused OTLP HTTP server * [pkg/trace] Remove protocol argument * Remove unnecessary fmt.Sprintf * Fix tests * [CWS] cleanup last uses of `jsonschema_description` (#15050) * [Serverless] Merge `serverless/main` to `main` (#14980) * [Serverless] change account (#14755) * Aj/buffer cold start span data (#14664) * wip dirty commit - trace being created but not flushed properly. No further traces appearing WIP: more debugging. StopChan properly set up feat: Starting coldstart creator as a daemon, and recieving data from two channels. Todo: spec feat: Update specs to write to channels feat: Merge conflicts resolved for tests feat: Use smaller methods to handle locking fix: pass coldstartSpanId to sls-init main feat: Remove default feat: Use Millisecond as Second is far longer than necessary feat: No need to export ColdStartSpanId fix: update units feat: Directionality for lambdaSpanChan as well as for initDurationChan fix: No need for the nil check, I need to stop javascripting my go feat: ints * feat: rebase missing changes from merge commits * feat: update ints after moving accounts * Empty commit to trigger ci * [Serverless] Fix flaky integration tests and make them more easily maintainable. (#14783) * Retry serverless integration test failures automatically. (#14801) * [Serverless] Allow some keys to be option in serverless integration tests. (#14827) * Ability to remove items from the json. * Remove items from snapshot. * Do not expect spans when there is no spans object. (#14396) * [Serverless] Improve stability of two tests. (#14895) * Increase timeout while decreasing test time. * Increase timeout in test. * [Serverless] Consolidate log normalization to single file for integration tests. (#15004) * Consolidate log normalization to single file. * Save raw logs to a temp dir. * Fix linting issues. Co-authored-by: Maxime David <[email protected]> Co-authored-by: AJ Stuyvenberg <[email protected]> * Fixes multiple problems with http processing/tagging on Windows. (#15022) * Fixes multiple problems with http processing/tagging on Windows. - There was an offset error in which the port was not properly computed on ipv6 connections - There was a problem with computing whether an ipv6 address was loopback or not - The fullpath indication (which is used to compute the key) was not properly being computed. This led to the same tuple being used as a different key, so transactions were not properly combined. * fix grammar error in release notes * Add the plumbing in the agent forwarder to submit container images and SBOM (#14962) * Improve documentation for BundleParams (#15011) * pkg/clusteragent/admission: add unit tests (#15044) * [CWS] bump syscall table + extract into separate task (#15061) * 5.19 -> 6.1 * switch syscall table generator from go generate to task * extract linux version * [gitlab] Temporarily disable SUSE Agent 5 upgrade tests (#15055) * [corechecks/snmp] Add LLDP remote device IP address (#14946) * [CWS] add discarders eBPF unit test (#14471) * [CWS] add discarder retention ut * add another test * add a unit test task * add trace param * make eBPF test part of the CI * fake time to speed up tests * bump baloum version * add more tests * [CWS Agent] Moving SecAgent subcommands to new dir part 2 (#14915) * moving flare command to subcommands dir * consolidating and moving secagent config package * moving runtime to subcommands dir * moved check subcommand, updated compliance subcommand which is the entry point to check funcs * moving compliance cmd to subcommand dir * exporting CliParams and RunCheck in Check subcommand for Compliance tests * fixing cluster agent entry point into the check subcommand * Add `container_image` core check (#14567) * Reorganize the specs for some kitchen test (#15027) * [check command] Add `--instance-filter` option (#15034) * Migrate systray to an fx.App (#14985) Deprecate single-dash args and add double-dash args Move code from cmd/systray to comp/systray Update UAC manifest to requireAdministrator Fix log file and add `system_tray.log_file` configuration option. * epforwarder: update dbm samples endpoint prefix (#15053) dbm-metrics-intake and dbquery-intake resolve to the same IPs. This change cleans up code so that we're only referencing one endpoint name. * [process-agent] Refactor Check interface (#15063) * [process-agent] Refactor Check interface - Refactors Check interface to consolidate CheckWithRealTime features - This will simplify integration with components in the future PRs since it eliminates casts * Address feedback from @just-chillin * usm: postgres classification: Reduced 5 seconds per test, 1m30s in total (#15070) Improved the regex for which we are using to detect if the server is up and running, by that we can spare the 'wait 5 seconds' in GetPGHandle * CWS: sync BTFhub constants (#15074) Co-authored-by: paulcacheux <[email protected]> * [DCA] Convert commands to Fx apps * Extract magic strings into command.* constants * [CWS] Add 4 tests, one for each kernel rate limiter algo (#15064) * [CWS] remove useless callbacks (#15046) * remove useless error check * remove useless callback * Add `SBOM` core check (#14989) * Prevent check from running after it was unscheduled. (#15065) * Prevent check from running after it was unscheduled. If a check runs after it was unscheduled, in particular after it's sender and samplers were removed, would create sender and samplers again, leaking resources. This may happen if the check was cancelled after it was put in the worker channel, but before worker called Run. This change adjusts check_wrapper to make Cancel fully mutually exclusive with Run, and adds a flag that would prevent Run from executing the check after Cancel has completed. * go fmt * Update test helper * Restrict flare file from being accessible by other users on Unix (#14862) * pkg/clusteragent/admission/patch: poll rc on leadership switch (#15062) * pkg/clusteragent/admission: add additional libconfig env vars (#15059) * usm: classification: Split USM and NPM classifications (#15075) USM does not need all classifiers, only those which we have dispatchers for (HTTP, and soon HTTP2) * Python memory telemetry (#14757) * Track memory used by the python arena allocator pymalloc [1], Python built-in arena allocator is responsible for handling small-sized allocations, while the rest goes through the system malloc. This patch tracks the amount of memory requested by pymalloc from the operating system, allowing low cost, low granularity view into a segment of python memory usage. [1]: https://docs.python.org/3/c-api/memory.html#the-pymalloc-allocator * inv -e rtloader.format * Remove rtloader_mem.h from rtloader.h This allows to call C malloc without warnings when we implement a custom raw memory allocator for python. * Add python raw allocator tracking. Together with tracking pymalloc requests, this should give comprehensive picture of memory allocated by the python interpreter. * Make sure to call global malloc/free In Pyraw allocator implementation, make sure to call global malloc/calloc/realloc/free symbols, to avoid undesired interaction with the rtloader-specific memory tracking (for example, call libc free instead of RtLoader::free). * Move all memory tracking to the same file * Update Go naming to match C functions pymalloc is now one of two tracked allocators, use pymem as umbrella. * Add a note about new metrics to the docs * Python memory telemetry supports py3 only * Add releasenote * Expand telemetry documentation. * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update docs/dev/agent_memory.md Co-authored-by: Kari Halsted <[email protected]> * Update releasenotes/notes/pymem-telemetry-0f62acb520d80a1f.yaml Co-authored-by: Kari Halsted <[email protected]> * Update rtloader/three/three_mem.cpp Co-authored-by: Scott Opell <[email protected]> * Improve metric description and remove outdated comment. * Fix typo * Add a comment about allocation size adjustments Co-authored-by: Kari Halsted <[email protected]> Co-authored-by: Scott Opell <[email protected]> * Add telemetry for number of contexts per origin (#15016) * Add telemetry for number of contexts per origin Report number of contexts at the end of flush for each container sending dogstatsd metrics. This PR relies on origin detection to provide a set of identifying tags for each origin, and reports number of distinct contexts for each tag set. While this may not fully identify individual origins when running with low tagger cardinality, it accurately reflects the way agent would aggregate metrics from different origins together if their tags end up the same. * Only enable per-origin stats if telemetry is enabled. * [process-agent] Fix kitchen tests for process agent on main (#15072) * include `functests` in `DD_PIPELINE_ID` for system probe and security agent functests (#15043) * include `functests` in DD_PIPELINE_ID for system probe and security agent functests * simpler/shorter pipeline_id * [install_script] Backport removal of RPM signing key 4172A230 (#15082) * [corechecks/snmp] LLDP resolve local interface (#14991) * [CWS] fix rule in error reported twice (#15084) * Add java package in our circle-ci image (#14665) * Use DMI on EC2 Nitro instances to get host aliases The Agent now leverage DMI information on Unix to get the instance ID on AWS EC2 when the metadata endpoint fails or is not accessible. The instance ID is exposed throught DMI only on AWS Nitro instances. This will not change the hostname of the Agent upon upgrading but will add to the list of host aliases. * [CWS] add inode to pid context to detect exec loss (#14661) * [CWS] add revision to pid context * use inode instead of revision * Fix post rebase * Fix serializer tests flakiness (#15093) * [RCM-632] Add UUID in request (#15088) * Add org uuid field * Add org uuid in request * Remove generate file * Comment exported method * fix the receiver name consistency (#15068) * Add limits to allocated dictionaries, prevent browser cross-site requests (#15067) * pkg/trace/api: Move semantic conventions to separate internal package (#14963) * [pkg/trace/api] Move semantic conventions to separate internal package * Rename to shared * Move tagContainersTags back to API package * Rename package to 'header' * Fix Windows build * Factorize queue code duplicated at two places (#15098) * Factorize the aggregating queue used by the SBOM and container image checks * Mock time functions to make tests more reliable * [single-machine-performance] Push agent containers to SMP ECR (#14438) * [single-machine-performance] Push agent container to SMP ECR This commit is an attempt to introduce pushing containers from Agent CI for single-machine-performance's Regression Detector in our isolated infrastructure. Much like we have done for vectordotdev/vector we intend to run the Regression Detector on Agent changes, giving a reasonable statistical guarantee that a change does or does not modify Agent performance by more than random chance. In order for the Regression Detector to run jobs it must have access to a 'baseline' and 'comparison' target. Baseline in this project would be a container built from current `main` branch, comparison would be a container built from the tip of a PR. The main thing demonstrated here is that the team credentials SMP has created for Agent are functional and are able to push up a containers, in a way that is acceptable to Agent Platform. I have ammended `.docker_build_job_definition` to mirror every created container to single-machine-performance's ECR, noting that the tag now avoids the use of `CI_PIPELINE_ID`. In a later commit we will introduce job submission and will rely on being able to compute the tag of a previous pipeline's container from available Gitlab metadata, specificall `CI_COMMIT_SHA` for the comparison container and whatever metadata maps to the base branch's current SHA, `CI_MERGE_REQUEST_SOURCE_BRANCH_SHA`? There are two outstanding questions regarding this work that I am aware of: * Is there a race condition present between the triggering of this pipeline vs main if users squash commits? * Should we grant the exisitng CI user permissions into single-machine-performance rather than use an issued bot account as done presently and for vectordotdev/vector? We've successfully demonstrated pushing up containers in a previous iteration of this work, see https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/195939127. Signed-off-by: Brian L. Troutwine <[email protected]> * PR feedback Signed-off-by: Brian L. Troutwine <[email protected]> * trim ECR URL out of destination Signed-off-by: Brian L. Troutwine <[email protected]> * correct job dependency Signed-off-by: Brian L. Troutwine <[email protected]> * drop parallel.matrix Signed-off-by: Brian L. Troutwine <[email protected]> Signed-off-by: Brian L. Troutwine <[email protected]> * Bump github.com/Microsoft/hcsshim from 0.9.4 to 0.9.6 (#14785) Bumps [github.com/Microsoft/hcsshim](https://github.com/Microsoft/hcsshim) from 0.9.4 to 0.9.6. - [Release notes](https://github.com/Microsoft/hcsshim/releases) - [Commits](https://github.com/Microsoft/hcsshim/compare/v0.9.4...v0.9.6) --- updated-dependencies: - dependency-name: github.com/Microsoft/hcsshim dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [Windows] implement mapping of pid to service name (#15039) * [Windows] implement mapping of pid to service name Checks the pid against the table of SCM controlled processes. If it's SCM controlled, returns the service information. Because we must enumerate the entire SCM (there doesn't seem to be an api for that), the SCMManager object maintains a cache of objects, and refreshes only when it sees a PID it hasn't seen before. On a machine with high process churn, this could still result in a lot of accesses. However, if process agent queries only when doing the process check (i.e. every 30s), then it should only iterate the list once per 30s. * ci fixes, add tests * fix test/improper conversion of data buffer * review feedback * more review feedback * Rename structure * Update README.md (#15126) * Update README.md * Update README.md * Update README.md * [CWS] remove useless variable (#15112) * [CWS][SEC-4478] add RC to e2e tests (#14877) * [CWS] add RC to e2e tests * fix host name * check remote-config before file * test embedded policy * use a configmap to have a first policy * make rc configurable * CWS: sync BTFhub constants (#15123) Co-authored-by: paulcacheux <[email protected]> * [CWS] rename json fields to make them less misleading (#15097) * pkg/trace/testutil: improve the randomization in test spans generator (#15108) * DOCS-2215 Add @env variables to datadog.yaml (#10069) Co-authored-by: hestonhoffman <[email protected]> * [CSPM] e2e remote configuration fix (#15130) * Add missing remote_configuration_enabled parameters in CSPM workflow * add other missing parameters * [CWS] chain only different binaries on activity dumps (#15095) * [CWS][SEC-6381] Update template configuration file to add activity dumps and network detection parameters (#14835) * [CWS] remove unsafe usage in `ScopedVariables` (#15134) * Revert Remove CCA_IN_AD flag and related unused code (#15115) * Revert "Remove `CCA_IN_AD` flag and related unused code (#14955)" This reverts commit 394ac59ad9707dfff8009c0dec03320d1df20098. # Conflicts: # cmd/agent/common/autodiscovery.go * Bump github.com/hashicorp/consul/api from 1.13.0 to 1.15.3 (#13978) Bumps [github.com/hashicorp/consul/api](https://github.com/hashicorp/consul) from 1.13.0 to 1.15.3. - [Release notes](https://github.com/hashicorp/consul/releases) - [Changelog](https://github.com/hashicorp/consul/blob/main/CHANGELOG.md) - [Commits](https://github.com/hashicorp/consul/compare/v1.13.0...api/v1.15.3) --- updated-dependencies: - dependency-name: github.com/hashicorp/consul/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Lénaïc Huard <[email protected]> * usm: classification: Fixed AMQP publish flakiness (#15071) * usm: classification: Fixed AMQP publish flakiness Publish has no response, so added classification by close-reply and close-ok messages. * Moved closing of the client into the teardown * Fixed test error * Removed wrong bpf debug setup * Set windows-agent as CODEOWNER for systray (#15117) * Fix shutdown deadlock in docker socket tailer (#15138) * fix shutdown deadlock in docker socket tailer * Bump github.com/CycloneDX/cyclonedx-go from 0.6.0 to 0.7.0 (#15081) * Bump github.com/CycloneDX/cyclonedx-go from 0.6.0 to 0.7.0 Bumps [github.com/CycloneDX/cyclonedx-go](https://github.com/CycloneDX/cyclonedx-go) from 0.6.0 to 0.7.0. - [Release notes](https://github.com/CycloneDX/cyclonedx-go/releases) - [Changelog](https://github.com/CycloneDX/cyclonedx-go/blob/master/.goreleaser.yml) - [Commits](https://github.com/CycloneDX/cyclonedx-go/compare/v0.6.0...v0.7.0) --- updated-dependencies: - dependency-name: github.com/CycloneDX/cyclonedx-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Adapt conversion functions to CycloneDX/cyclonedx-go 0.7.0 Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Lénaïc Huard <[email protected]> * [usm] Fix `ProcessMonitor` termination (#15140) * [usm] Fix `ProcessMonitor` termination * Ensure `refcount` doesn't go below zero * [gitlab] update builderimages for go 1.19.4 builds (#14930) * [gitlab] update builderimages for go 1.19.4 builds * [reno] adding relnote for golang 1.19.4 update * [golang] bumping 1.19.4 in other relevant places * [circleci] updating images * [golangci-lint] fixing issues after 1.19.4 bump * [gitlab] source_test: force external linker when running -race - troubleshooting * [omnibus] devtoolset changes not required w/RHEL when building with centos 7+ * Revert "[gitlab] source_test: force external linker when running -race - troubleshooting" This reverts commit 8a7d7344297554cf0eb42ee7f59ed847787ce217. * [gitlab] source_test: use centos7 based images to enable -race detector * [golangci-lint] address new format issue after merge * Apply suggestions from code review Co-authored-by: Vickenty Fesunov <[email protected]> * [golangci-lint] address missed lints on windows * [gitlab] updating buildimages yet again * [gitlab] setting buildimages after buildimages merged to main Co-authored-by: Vickenty Fesunov <[email protected]> * [Windows] Create SCM and Windows Service utilities Adds a set of SCM and Windows Service utility functions * Revert "[system-probe][NET-2891] Fix tcp retransmit count (#14740)" (#15141) This reverts commit e677798efdfc27223f9c036c1fc2e429b2dd24e7. * [process-agent] Remove check singletons (#15121) * [process-agent] Remove check singletons * Address feedback from @just-chillin * Simplified the interface of httptx (#15146) * Simplified the interface of httptx * Fixed error * [CWS] refactor eval options (#15132) * [CWS] refactor eval options * fix windows * [CWS] enable ring buffer by default (#15111) * Admission controller: support injecting multiple libs in the same pod (#14736) * [NDM] Do not send empty tags for snmp.interface.status (#15157) * Do not send empty tags for snmp.interface.status * fix test * [RCM-598] upgrade(remote-config): Use layered gRPC client between trace-agent & core-agent (#15100) * upgrade(remote-config): Bump message size limit to 500MB * fix(size): Size down to 110MB max * refactor(auth): Refactor RC auth * fix(interface): Remove opts * [CWS] AD: drop event if its process lineage is incomplete and add a guard to avoid sending empty dumps (#15013) In addition, two new metrics introduced to trace these new drops * [tasks/licenses] Don't call `open` on dirs (#15161) Co-authored-by: Alexandre Menasria <[email protected]> * [process-agent] Set HintMask in CollectorProc during process checks. (#14759) Adds a new process discovery hint in the process agent when the regular process and container checks run. * [fake-datadog] fix index list numbers (#15154) * Enable orchestrator manifest collection by default (#15094) manifest collection GA * Update CODEOWNERS and JOBOWNERS after team and job renames (#15021) Updates the JOBOWNERS and the GITHUB_SLACK_MAP file to account for recent team and job changes. * Changed TSM to USM (#15143) * Network USM : add java TLS support (#14620) Adding support to attach a live java process and send it "agent-usm.jar" runtime agent payload Supporting JVMTI Hotspot mechanism Configuration: DD_SERVICE_MONITORING_CONFIG_ENABLE_JAVA_TLS_SUPPORT = true service_monitoring_config: enable_java_tls_support: true Co-authored-by: Guy Arbitman <[email protected]> * Fix instance-filter error type (#15163) * cmd/trace-agent: set gomemlimit based on cgroups (#14552) * pkg/runtime,cmd/trace-agent: set gomemlimit based on cgroups * search in stderr for expected log line as well (#15167) * Add additional distros/versions to btfhub archive build (#15152) * Tweak system-probe kitchen tests (#15165) * [serverless] feat: add _dd.origin tags for azure and gcp (#15137) * feat: add _dd.origin tags for azure and gcp * add release note * remove serverless release notes from this repo * Bump github.com/Microsoft/go-winio from 0.5.2 to 0.6.0 (#13728) * Bump github.com/Microsoft/go-winio from 0.5.2 to 0.6.0 Bumps [github.com/Microsoft/go-winio](https://github.com/Microsoft/go-winio) from 0.5.2 to 0.6.0. - [Release notes](https://github.com/Microsoft/go-winio/releases) - [Commits](https://github.com/Microsoft/go-winio/compare/v0.5.2...v0.6.0) --- updated-dependencies: - dependency-name: github.com/Microsoft/go-winio dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Auto-generate go.sum and LICENSE-3rdparty.csv changes * go mod tidy Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Brian Floersch <[email protected]> * Factorize all the functions creating a pointer (#15118) * Logging and pipeline changes to address failures to send data to DBM endpoints (#15058) * epforwarder: double dbm-samples defaultBatchMaxContentSize We have been seeing some issues where data fails to send to our dbm samples endpoint. We see far fewer examples of the dbm-activity and dbm-metrics pipelines being blocked or failing to send data, and we suspect that this is partly due to the drastic differences in size between those pipelines and the dbm-samples pipeline. This change doubles the defaultBatchMaxContentSize for the dbm-samples pipeline to try to address this. * batch_strategy: add debug logging for dbm pipelines * http/destination: enrich logmessages about sleep backoff * logs...http/destination: additional logging for err reponses * logs: new metric to capture err status codes from http resp * Update ebpf-manager (#15174) * Send failed report first (#14700) * [RCM] Allow agent clients to specify their k8s cluster (#15148) This allows predicates to be written and configurations to be targeted towards agent clients running on a particular cluster. * Container image metadata check: add layers history (#15150) * Add to permissions.log windows permissions (#14866) Add to permissions.log windows permissions * [process-agent] Fix data race in a test (#15177) * [windows] [system-probe] Split connection gathering between open and closed. (#14164) * Split connection gathering between open and closed. Continue to only poll open connections on 30s intervals. Poll closed connections on 30s interval, or when async notification threshold is reached. gofmt Review feedback Change the buffer used for reading from the driver to it's own type rather than generic slice, to make more clear the pointer relationship in function calls where pointers are used. Change resizeDriverBuffer to operate on the buffer type for clarity add better default guessing to new config variable review feedback; wait for closed connections loop to stop before exiting lint update to latest driver for signing Sync driver header file correctly for build. Fixes hang of system probe on exit. Fixes correctly draining full bucket fix typo in install xml Rebase to main (230106) Update to latest test driver (rebased to one change) * Update release.json to 2.3.0 unsigned driver for signing build * update to signed version * USM: new configuration path service_monitoring_config.enable_go_tls_support (#15156) * USM: new configuration path service_monitoring_config.enable_go_tls_support * Fixed CR comment * Added releasenotes about the changed configuration value * Update releasenotes/notes/changing-gotls-configuration-flag-cb003ca4d25472ad.yaml Co-authored-by: May Lee <[email protected]> Co-authored-by: Nicolas PLANEL <[email protected]> Co-authored-by: May Lee <[email protected]> * [CWS] init args/envs for empty data from proc (#15181) * [serializer] Use google's protobuf lib for marshaling SBOMs and Images (#15172) * [serializer] Use google's protobuf lib for marshaling SBOMs * [serializer] Use google's protobuf lib for marshaling Container Images Co-authored-by: Lénaïc Huard <[email protected]> * [CWS] add dump count limiter (#15014) * Revert "Tweak system-probe kitchen tests (#15165)" (#15179) This reverts commit e4aab732d2a9083d832c5a850773832fd551149b. * [CSPM] Make sure the failing reports are reported first using a stable sort (#15189) * convert config subcommand to fx (#15178) * Add GlobalTags on basic telemetry when no hostname is detected (#14776) * Auto compute timeWindow on external metrics based on DatadogMetrics maxAge (#14840) * Auto compute timeWindow on external metrics based on DatadogMetrics maxAge * Update releasenotes/notes/add-automatic-time-window-ffed100742f51246.yaml Co-authored-by: Kaylyn <[email protected]> * Review feedback Co-authored-by: Kaylyn <[email protected]> * Add a Stop lifecycle for log to call log.Flush (#15185) kitchen_test_system_probe_xxx are flaky tests. * Use Wixsharp for the Windows Installer (#13459) * [gitlab+golang] updating builders + bumping tooling 1.19.5 (#15164) * [gitlab+golang] updating builders + bumping tooling 1.19.5 * [changelog] updating changelog entry * [circleci] updating runners * [test] small cleanup * [release] PR merged so MACOS_BUILD_VERSION back to master * Add detection for ECS EC2 and start workloadmeta ECS collector accordingly (#14978) * Adding log source latency stats to InfoProvider (#15038) * adding NoProxyNonexactMatchExplicitlySet to ProxyMeta payload within Host Payload to identify if customer did explicitly use/change the param no_proxy_nonexact_match within agent config. * adding testing log source attributes * removing bytes read and adding that attribute into log source instead * adjust type32 to type64 for the agent build * Update pkg/logs/sources/source.go Byte to Bytes Co-authored-by: Dustin J. Mitchell <[email protected]> * new helper function for BytesRead type * Update InfoProvider with Log Latency and getting rid of log latency stats from Source * Changing language and replacing strconv.FormatInt with fmt.sprintf function. Co-authored-by: Dustin J. Mitchell <[email protected]> * [omnibus] retrieve freetds tarball using HTTPS instead of FTP (#15159) * cmd/cluster-agent/subcommands/start: fix rcClient instanciation (#15193) * [dca] filter pod annotation during workloadmeta collection (#15089) Co-authored-by: Lénaïc Huard <[email protected]> * introduce root node validator (#15032) * [workloadmeta/containerd] Collect SBOMs with Trivy (#15139) * [go.mod] Add Trivy * [util/kubernetes/apiserver/leaderelection] Adapt to new k8s version * Run inv security-agent.gen-mocks * [docker] Adapt tests and fakes to new version * [util/containerd] Expose raw client * [util] Add Trivy client * [workloadmeta/collectors/containerd] Collect SBOMs with Trivy * [build-tags] Use trivy only in core agent * Add release note * Add missing licenses * [github/codeowners] Assign /pkg/util/trivy to container-integrations * [security-agent] bump security agent policies to v0.43.0 (#15190) * Fix failing test (#15180) rerun pipeline * Fix `sbom` check (#15192) * [DCA] Add tagger-list and workload-list cmds (#15135) * [DCA] Add tagger-list and workload-list cmds * Apply suggestions from code review Co-authored-by: Lénaïc Huard <[email protected]> * Include workload- and tagger-list in DCA flare Co-authored-by: Lénaïc Huard <[email protected]> * fix(cluster-agent): apply cluster-name normalization in ksm-core (#15057) Co-authored-by: Bryce Eadie <[email protected]> Co-authored-by: Xavier Lucas <[email protected]> * Add option to use image mount instead of image export (requires mounting /var/lib and SYS_ADMIN in core Agent) (#15183) * [tasks] Read go unit test reports in utf-8 format (#15202) Explicitly sets the file reading encoding to utf-8 when reading the Go unit test report json file. * [CSPM] bump `github.com/open-policy-agent/opa` to v0.48.0 (#15200) * bump `github.com/open-policy-agent/opa` to v0.48.0 * `inv -e generate-licenses` * [corechecks/snmp] Add `id` and `source_type` to Topology Links data (#15184) Co-authored-by: pducolin <[email protected]> * add orchestrator and cws url overwrite when fips is enabled (#15195) * feat(fips): add orchestrator and cws url overwrite when fips is enabled Signed-off-by: Nicolas Guerguadj <[email protected]> * [CWS] Converting SecAgent Start command to fx (#14814) * compliance subcmd using log and config components * runtime using components * fixing rebase clobbers * moving root command subcommand * converting app/start.go and app/app.go to start/command.go (and command.go in previous commit) start start test start start start command var change start cmd import * compliance, runtime, and check commands using command instead of common * moving logs context to command from common, and deleting duplicated logs context code in runtime * param setup * handle case with no config files * squashing more usages of pkg/util/log and pkg/config * moving ConfigParams in compliance check fnc until after cfgpath has been parsed * adding cluster agent bool to check entrypoint * using log component in logs context * fixing no api key error message * adding log.Flush() to one shot funcs. TODO: remove once log component has self-flushing capabilities * release note * release note edits * using lambda fnc to handle different entrypoints to check * release note edit * commit: removing log.Flush() because log component now has a lifecycle hook for flushing: https://github.com/DataDog/datadog-agent/pull/15185/files * fixing check unsupported * fixing import in check unsupported * release note edit * Ensure cloud foundry container tags are unique (#15066) * Ensure collector tags are unique * small refactor * Format file Co-authored-by: NouemanKHAL <[email protected]> * [collector/python] uses ianlancetaylor's cgosymbolizer. (#14673) * [collector/python] uses ianlancetaylor's cgosymbolizer. The one we ship and use is outdated and is known to cause hangs. See: https://github.com/golang/go/issues/45558#issuecomment-820764029 * Update cgosymbolizer. * Update licencse * add linux build tag Co-authored-by: Brian Floersch <[email protected]> * Configure analyzers used for SBOM generation (#15204) * Allow specifying the trivy analyzers * Only scan a few folders when using only os analyzers Co-authored-by: Cedric Lamoriniere <[email protected]> * Network USM : avoid ebpf maps contentions (#15166) TLDR all maps with conn_tuple_t as key must be edited by the loader to MaxConnectionTracked (65536 by default) Avoiding hash_lru_map contention (+50% system cpu on user pod) due to map too small compare the numbers of connections On staging setup : >8000 sockets running on 16 cores, on the packets receive path (socket/classifier) The main issue is the kernel spinlock an internal LRU list for evicted elements, this list is shared with all cores Moved the ebpf maps to the ebpf program that instantiate them as they not shared Only instantiate maps only once * Add test for multiple items (#13319) * Use container_image_collection in config options (#15197) rename `workload.image_collection` config option to `container_image_collection` to be more product feature oriented. * [CWS] [SEC-176] Enable CWS in integration tests (#14146) * Enable CWS in integration tests * Remove system probe files at package removal * Display remaining files after package removal * Catch error when parsing 'status' JSON output * Do not test CWS on iot agent flavor * Only enable CWS on supported platforms * Give more time to security-agent to communicate with system-probe * Retry to reach the agent config endpoint * Install policycoreutils-python on CentOS to apply SElinux rules at install time * Add release notes * deleting original MergeConfigurationFiles fnc (#14896) * CWS: sync BTFhub constants (#15215) Co-authored-by: paulcacheux <[email protected]> * Move public IPv4 support to the cloudprovider package * Remove dead code for EC2 local IPv4 This code is no longer use since the PR #12971. * Move EC2 imds helper to their own file * Move EC2 network metadata support to its own file * Adding 'cloud_provider_source" to the inventories payload We now track from where we fetch metadata related to a cloud provider. This only support AWS EC2 for now. Depending on the instances types, configuration, ... the Agent can use multiple sources to deduce it's running one EC2. The source used is now sent as 'cloud_provider_source' in inventories. * [kitchen] Fix CWS integration tests on CentOS step-by-step tests (#15218) Follow-up of #14146, applies the SELinux fix (from the install script cookbook) to the step-by-step cookbook, to make sure system-probe works correctly on CentOS 7 (which has SELinux enabled by default). * pkg/clusteragent/admission: fix rc tracking annotations (#15219) * Update release.json and Go modules for 6/7.43.0-rc.1 (#15216) * [CWS] force DNS resolver to read `/etc/resolv.conf` (#15220) * [CWS] fix stacktrace in signal/ptrace rules evaluation (#15225) * [CWS] remove dead-lock in AD finalize when resolving tags (#15233) * [workloadmeta/collectors/containerd] Disable sbom correctly when Trivy is not built (#15234) When SBOM collection was enabled in a built without Trivy, the agent was still pushing images to the `imagesToScan` channel. The channel was not initialized, so this was blocking the agent. * Bump snowflake to 2.8.3 and add back installing library (#15207) * Bump snowflake to 2.8.3 and add back installing library * Only include pip change * Add back snowflake bump * Fix version * Changelogs for 7.42.0 release (#15158) (#15237) * Changelogs for 7.42.0 release (#15158) * Changelogs for 7.42.0 release * Update CHANGELOG.rst * Update CHANGELOG.rst * Update CHANGELOG.rst * Add empty space * https: soWatcher, shared_libraries use a pathIdentifier as key of ELF binaries (#13748) Bugfixes to support `network_config.enable_https_monitoring` in a k8s clusters * pathIdentifier is an unique id of and ELF (system-wide) as it contain dev and inode as key. * New Unregister path, thanks to processMonitor that recieve process EXIT event and unregister the uprobe (maintained by a refcount) * ebpf UID use pathIdentifier as source of truth and use wider alphabet (base64), specially because the UID is limited (5 chars) Motivation : Follow up on #incident-16860, #incident-18347 * Add logging in max cpu/mem defaulting (#15257) * Fix the invocation of the secret backend from the cluster agent (#15250) * Bump the version of `emicklei/go-restful` (#15252) * Bump the version of `gopkg.in/yaml.v2` (#15253) * [process-agent] Fix nil deref in check cmd (#15254) * [sbom] Store generation duration and report it (#15258) * [workloadmeta/collectors/containerd] Add image scan duration to telemetry * [workloadmeta/collectors/containerd] Store SBOM generation duration * [corechecks/sbom] Process SBOM generation duration * [sbom] Store and report generation time * [CWS] flush upstream kernel btf spec cache after use (#15264) * Fix a bug in workloadmeta containerd collector (#15260) * Update release.json and Go modules for 6/7.43.0-rc.2 (#15243) * Fix system-probe build tags difference (#15268) * [USM] go TLS cleanup debug messages (#15246) * scan existence /proc/pid for 10 ms, it's better to do that in the callback * report golang hooking issue only if it's a golang binary * report only once when we unregister binary * Release BTF cached by cilium/ebpf (#15269) * [process-agent] Fix `Drop Check Payloads` status (#15274) * Workaround lxn/walk issue on Windows 7/2008r2 (#15275) * [CWS] bump security agent policies to v0.43.1 (#15280) * CWS: sync BTFhub constants (#15285) Co-authored-by: paulcacheux <[email protected]> * [CWS] useless lock from AD manager (#15287) * Update last_stable entries in release.json to 6/7.42.0 (#15289) Updates the last_stable entries to 6/7.42.0 on main. * [contimage] Split container image metadata in one event per registry (#15292) * [sbom] Split sbom in one event per registry (#15295) * [kitchen] Use official datadog cookbook for initial Agent install in upgrade scenario (#15300) Updates the win-upgrade-rollback kitchen test suite to use the official datadog cookbook for the initial install. * [CWS] always lock AD in the same order (#15290) * Update release.json and Go modules for 6/7.43.0-rc.3 (#15296) * Fix silent mutation of integration.Config in secret decryption (#15298) * Do not run SBOM collection while running `agent check` (#15327) * Lower memory allocated to ring buffer (#15245) * [Windows] Fix the connection established check. (#15301) Fixes the reporting of the established state on windows. Also disables the test for `TCPCollectionDisabled`, as it is a (now) known problem on Windows. * Remove BTF exceptions (#15316) We have these kernels now * Speed up system-probe build by not copying unnecessary files (#15231) * Speed up system-probe build by not copying unnecessary files * Add fallback to find if rsync not available * Fix python lint * Bump Collector dependencies to v1.0.0-RC4/v0.70.0 (#15230) * Bump Collector dependencies to v1.0.0-RC4/v0.70.0 * Add to release notes * Fix format * Update releasenotes/notes/v0.70.0-otel-c59cf4b8673d9497dc27f4d4f38dea2db79e74ed.yaml Co-authored-by: Bryce Eadie <[email protected]> * Upgrade opentelemetry-collector-contrib version --------- Co-authored-by: Bryce Eadie <[email protected]> * Address VPA fixes caught in QA (#15328) * Address fixes caught in QA * Commit to retrigger build * protocols: Uses alpine based images as they are slimmer (#15310) * system-probe: re-ordered protocols into directories (#15308) * protocol classifications: tests: Restructure tests to cut 50% runtime (#14987) * protocol classifications: tests: Restructure tests to cut 50% runtime * Fixed cr comment * Bump go.uber.org/zap from 1.23.0 to 1.24.0 in /pkg/otlp/model (#14597) Bumps [go.uber.org/zap](https://github.com/uber-go/zap) from 1.23.0 to 1.24.0. - [Release notes](https://github.com/uber-go/zap/releases) - [Changelog](https://github.com/uber-go/zap/blob/master/CHANGELOG.md) - [Commits](https://github.com/uber-go/zap/compare/v1.23.0...v1.24.0) --- updated-dependencies: - dependency-name: go.uber.org/zap dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * java tls: remove requirement for runtime compilation (#15276) * [usm] Fix concurrency issue in test (#15273) * [system-probe] Handle stats overflows (#15282) * [system-probe] make sure stats are reported as soon a module registration happens (#15329) When modules are registered in the system-probe, a process to update stats in the global `l` field every 15 seconds starts. Those stats are pulled in by the `agent status` command which user can use to see what's running in the agent. The loop was written as: ``` for now := <-ticker { // update stuff } ``` Which mean that for the first 15 seconds fo the agents life, stats would come back incomplete. Meaning that for the first 15 seconds of the agent's. This PR changes the stats loop to compute stats *then* wait for 15 seconds. QA instructions (meant to run on a vanilla linux host): * in one terminal window run `watch "sudo -u dd-agent /opt/datadog-agent/bin/agent/agent status | head -n10"` * in another terminal window, restart the system-probe * you should *not* see errors about systemprobe.tmpl failing to render. Running these steps on 7.42 and below shows: ``` Getting the status from the agent. template: /systemprobe.tmpl:11:34: executing "/systemprobe.tmpl" at <.updated_at>: invalid value; expected float64 ``` For the first 15 seconds. You should *not see* see these errors in the first 15 seconds of the system probe running (or ever). * Bump resource limits for circleci unit_tests job (#15342) * usm: Added usm to agent status command (#15232) * usm: Added usm to agent status command * Added releasenotes * Rename releasenotes file due to a CI linter * Fixed CR comment * Fixed release notes * Fixed cr comment * Fixed tests * Reducing duplication * Fixed tests * [system-probe] Add process monitoring and USM tagging (#12280) * tracer: protocol classification: Adding a workaround to handle hitting instructions limits on socket filter (#15343) * tracer: protocol classification: Adding a workaround to handle hitting instructions limits on socket filter * Fixed condition * fix(RCM): Fix timeout of client bypass (#15326) Before this commit, when the previous request was still pending, the client bypass wouldn't timeout until it managed to send its request, meaning that the effective maximum timeout for a new client was (request TTL) + (bypass TTL). This was made more visible when the refresh interval went from 60s to 5s minimum. Now, the client bypass timeout takes the previous request into account as well, so that a client doesn't wait for more than 2s on its first request * [CWS Agent] fix signature of NewRuleFilterModel on non-linux platforms * [CWS] fix size of args (#15323) * [CWS] split ResolveFields (#15261) * pkg/clusteragent/admission/patch: make file provider ready for e2e testing (#15221) * pkg/clusteragent/admission/patch: make file provider ready for e2e testing * [USM] old java hotspot need credential (#15278) Hotspot reject connection if credential by checking uid/gid of the connect() SOL_SOCKET/SO_PEERCRED but older hotspot JRE (1.8.0) doesn't accept root and want explicitly uid/gid matching side effect for go, during the connect() syscall we don't want to fork() and stay on the same pthread to avoid side effect of set effective uid/gid. * [corechecks/snmp] No local resolution if multiple results (#15262) * [corechecks/snmp] Use dd_id instead of idType ndm (#15265) * [pkg/trace] Embed ptraceotlp.UnimplementedGRPCServer to address future breaking change (#15291) * Moved couple of noisy logs to trace (#15340) * Don't update the configuration if it already exists (#15339) The failures in the unit tests are unrelated. * pkg/trace: Emit APM onboarding events on startup (#14799) Collect trace agent startup errors and successes using instrumentation-telemetry "apm-onboarding-event" messages. * switch account (#14572) * lower log level (#15306) * tracer: Use aliases to string instead of converting types (#15344) * tracer: Use aliases to string instead of converting types * Removed another conversion * [CWS] remove unsafe cache (#15213) * specialize string cache * int and bool caches * remove unused pointer import * better error reporting * fixup some cache get and cleanup * remove `AppendFieldValues` (#15244) * Separate system-probe config from datadog config (#14024) * [serverless] fix: do not try to enable log api for local testing (#15229) * fix: do not try to enable log api for local testing * refactor: move out some code in functions, do not use go routine when not needed * Revert "refactor: move out some code in functions, do not use go routine when not needed" This reverts commit 750aa784895578ef01c7ffbe5bb150542b9b621f. * use go friendly return to avoid extra indent * export one constant rather than reusing string for local test * [omnibus] Upgrade setuptools to 66.1.1, pip to 22.3.1 in Python 3 embedded environment (#15356) - The python3 software definition has been updated to install the versions of pip (==22.0.4) and setuptools (==56.0.0) that are bundled alongside Python 3.8.16, - The pip3 and setuptools3 software definitions have been updated: instead of installing from scratch (using python3 setup.py install), they use the bundled pip to install themselves. pip3 has been updated to 22.3.1, setuptools3 to 66.1.1, - pip-tools (installed in the datadog-agent-integrations-py3 software definition) was upgraded from 6.4.0 to 6.12.1, as 6.5.0+ is required for pip 22.x support. Co-authored-by: Lénaïc Huard <[email protected]> * [CWS] remove useless usage of unsafe in SECL registers (#15214) * [corechecks/{containerimage,sbom}] Fix parsing of config (#15355) * [corechecks/containerimage] Fix parsing of config * [corechecks/sbom] Fix parsing of config * [kitchen] Use busser-rspec_datadog gem for tests (#15271) Switches kitchen tests to use the busser-rspec_datadog gem, published to RubyGems by Datadog, from the busser-rspec-datadog fork of busser-rspec. This gem behaves the same way as the upstream busser-rspec gem, except for the bundler version it installs, which is pinned to 2.3.26, to ensure it remains compatible with Ruby 2.5. To do so, all kitchen folders previously named rspec are now named rspec_datadog (as busser uses folder names to guess which gem to install). Removes the workaround introduced in #14851. * [gitlab] Add Windows Agent team to GITHUB_SLACK_MAP (#15203) Add the Windows Agent team to the GITHUB_SLACK_MAP. * [Serverless] fix http + https proxy (#15320) * USM: adding service_monitoring.java_agent_args=string parameter (#15314) USM: adding service_monitoring.java_agent_args=string parameter to pass through injected agent-usm.jar : agentmain(java_agent_args) * [golangci-lint] Upgrade to version 1.50.1 (#15348) * Also increase golangci-lint timeout * Bump requests from 2.28.1 to 2.28.2 in /test/e2e/cws-tests (#15375) Bumps [requests](https://github.com/psf/requests) from 2.28.1 to 2.28.2. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.28.1...v2.28.2) --- updated-dependencies: - dependency-name: requests dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump datadog-api-client from 2.7.0 to 2.8.0 in /test/e2e/cws-tests (#15374) Bumps [datadog-api-client](https://github.com/DataDog/datadog-api-client-python) from 2.7.0 to 2.8.0. - [Release notes](https://github.com/DataDog/datadog-api-client-python/releases) - [Changelog](https://github.com/DataDog/datadog-api-client-python/blob/master/CHANGELOG.md) - [Commits](https://github.com/DataDog/datadog-api-client-python/compare/2.7.0...2.8.0) --- updated-dependencies: - dependency-name: datadog-api-client dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [CWS] introduce SECLDoc and add secl field examples to documentation (#15109) * [CWS] improve SetFieldValue tests (#15201) * [CWS] fix e2e tests (#15381) * Add history workload-list container-image entity (#15353) * [workloadmeta] Delete unnecessary attrs of ContainerImageMetadata (#15385) * [CWS] bump ebpf-manager fixing a data race (#15382) * [CSPM] [SEC-6966] Allow specifying processes environment variables as rule inputs (#15241) * [CSPM] Allow specifying processes environment variables as rule inputs * [CSPM] Support env variables without "=" * [CSPM] Add omitempty flag for Process Envs param * [tagger/telemetry] Extract subsystem const (#15386) * Bump system probe build image (remove entrypoint) (#15393) * [pkg/otlp/model] Do not send first value for cumulative monotonic sums if start timestamp matches timestamp (#15363) * [pkg/otlp/model] Do not send first value for cumulative monotonic sums if start timestamp matches timestamp * Add to release note * Fix filename lint * Fix release note type * [process-agent] Refactor chunking to use generics (#15318) * First draft of generic chunking * Second draft of generics * All use cases migrated to generics * Fix null ptr return * Add comments * Remove ptr * Fix test * Use SetActiveChunk API * Relnotes * Relnotes update * change flake8 url to github (#15398) * Oracle integration boilerplate * Create dockerpool for testing * Add pkgs * register oracle check * hello world metric --------- Signed-off-by: Brian L. Troutwine <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Nicolas Guerguadj <[email protected]> Co-authored-by: Maxime mouial <[email protected]> Co-authored-by: Ahmed Mezghani <[email protected]> Co-authored-by: Katie Hockman <[email protected]> Co-authored-by: Julio Greff <[email protected]> Co-authored-by: Pablo Baeyens <[email protected]> Co-authored-by: Paul Cacheux <[email protected]> Co-authored-by: Rey Abolofia <[email protected]> Co-authored-by: Maxime David <[email protected]> Co-authored-by: AJ Stuyvenberg <[email protected]> Co-authored-by: Derek Brown <[email protected]> Co-authored-by: Lénaïc Huard <[email protected]> Co-authored-by: Olivier G <[email protected]> Co-authored-by: Slavek Kabrda <[email protected]> Co-authored-by: Alexandre Yang <[email protected]> Co-authored-by: Sylvain Afchain <[email protected]> Co-authored-by: modernplumbing <[email protected]> Co-authored-by: Julien Lebot <[email protected]> Co-authored-by: Branden Clark <[email protected]> Co-authored-by: Emma Ferguson <[email protected]> Co-authored-by: Ivan Ilichev <[email protected]> Co-authored-by: Guy Arbitman <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: paulcacheux <[email protected]> Co-authored-by: Jonathan Ribas <[email protected]> Co-authored-by: Vickenty Fesunov <[email protected]> Co-authored-by: maxime mouial <[email protected]> Co-authored-by: Kyle Verhoog <[email protected]> Co-authored-by: Kari Halsted <[email protected]> Co-authored-by: Scott Opell <[email protected]> Co-authored-by: Nicolas PLANEL <[email protected]> Co-authored-by: Paul <[email protected]> Co-authored-by: William Yu <[email protected]> Co-authored-by: Andrew Glaude <[email protected]> Co-authored-by: Brian L. Troutwine <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yoann Ghigoff <[email protected]> Co-authored-by: ruthnaebeck <[email protected]> Co-authored-by: hestonhoffman <[email protected]> Co-authored-by: Brian Floersch <[email protected]> Co-authored-by: Lénaïc Huard <[email protected]> Co-authored-by: Pedro Lambert <[email protected]> Co-authored-by: Jaime Fullaondo <[email protected]> Co-authored-by: Rich L <[email protected]> Co-authored-by: Adam Karpowich <[email protected]> Co-authored-by: Florian Veaux <[email protected]> Co-authored-by: Baptiste Foy <[email protected]> Co-authored-by: David Ortiz <[email protected]> Co-authored-by: Alexandre Menasria <[email protected]> Co-authored-by: daniel-taf <[email protected]> Co-authored-by: pducolin <[email protected]> Co-authored-by: Kangyi LI <[email protected]> Co-authored-by: Kylian Serrania <[email protected]> Co-authored-by: Usama Saqib <[email protected]> Co-authored-by: Bryce Kahle <[email protected]> Co-authored-by: alexgallotta <[email protected]> Co-authored-by: dependabot[bot] <dependabot[bot]@users.noreply.github.com> Co-authored-by: Sylvain Baubeau <[email protected]> Co-authored-by: Kyle Ames <[email protected]> Co-authored-by: Len Gamburg <[email protected]> Co-authored-by: May Lee <[email protected]> Co-authored-by: Guillaume Fournier <[email protected]> Co-authored-by: Pierre Guilleminot <[email protected]> Co-authored-by: Vincent Boulineau <[email protected]> Co-authored-by: Kaylyn <[email protected]> Co-authored-by: Duong (Yoon) <[email protected]> Co-authored-by: Dustin J. Mitchell <[email protected]> Co-authored-by: David du Colombier <[email protected]> Co-authored-by: Cedric Lamoriniere <[email protected]> Co-authored-by: Bryce Eadie <[email protected]> Co-authored-by: Xavier Lucas <[email protected]> Co-authored-by: Nicolas Guerguadj <[email protected]> Co-authored-by: Sarah Witt <[email protected]> Co-authored-by: NouemanKHAL <[email protected]> Co-authored-by: Rémy Mathieu <[email protected]> Co-authored-by: Kaden Wilkinson <[email protected]> Co-authored-by: Kacper <[email protected]> Co-authored-by: Andrew Zhang <[email protected]> Co-authored-by: Corrina Sivak <[email protected]> Co-authored-by: Yang Song <[email protected]> Co-authored-by: Joshua Lineaweaver <[email protected]> Co-authored-by: Hasan Mahmood <[email protected]> Co-authored-by: Lee Avital <[email protected]> Co-authored-by: paullegranddc <[email protected]> Co-authored-by: Misha Badov <[email protected]> Co-authored-by: alexbarksdale <[email protected]>
Motivation
Additional Notes
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
Take a previous build. Install IIS.
Make a connection via loopback interface.
use the debug endpoint, dump the output of the http transactions>
Note that the port is off.
Note two transactions, some with correct tags and some not>
Test again with fix; above should be remedied.
Reviewer's Checklist
Triage
milestone is set.major_change
label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.changelog/no-changelog
label has been applied.qa/skip-qa
label is not applied.team/..
label has been applied, indicating the team(s) that should QA this change.need-change/operator
andneed-change/helm
labels have been applied.k8s/<min-version>
label, indicating the lowest Kubernetes version compatible with this feature.