Releases: DataDog/datadog-agent
7.49.0
Agent
Prelude
Release on: 2023-11-02
- Refer to the 7.49.0 tag on integrations-core for the list of changes on the core checks
New Features
-
Add --use-unconnected-udp-socket flag to agent snmp walk command.
-
Add support for image pull metrics in the containerd check.
-
Add kubelet stats.summary check (kubernetes_core.kubelet.*) to the Agent's core checks to replace the old kubernetes.kubelet check generated from Python.
-
APM: [BETA] Adds peer_tags configuration to allow for more tags in APM stats that can add granularity and clarity to a peer.service. To set this config, use
DD_APM_PEER_TAGs='["aws.s3.bucket", "db.instance", ...]
orapm_config.peer_tags: ["aws.s3.bucket", "db.instance", ...]
in datadog.yaml. Please note thatDD_APM_PEER_SERVICE_AGGREGATION
orapm_config.peer_service_aggregation
must also be set totrue
. -
Introduces new Windows crash detection check. Upon initial check run, sends a DataDog event if it is determined that the machine has rebooted due to a system crash.
-
Install the Aerospike integration on ARM platforms for Python 3
-
CWS: Detect patterns in processes and files paths to improve accuracy of anomaly detections.
-
Add Dynamic Instrumentation diagnostics proxy endpoint to the trace-agent http server.
At present, diagnostics are forwarded through the debugger endpoint on the trace-agent server to logs. Since Dynamic Instrumentation also allows adding dynamic metrics and dynamic spans, we want to remove the dependency on logs for diagnostics - the new endpoint uploads diagnostic messages on a dedicated track.
-
Adds a configurable jmxfetch telemetry check that collects additional data on the running jmxfetch JVM in addition to data about the JVMs jmxfetch is monitoring. The check can be configured by enabling the jmx_telemetry_enabled option in the Agent.
-
[NDM] Collect diagnoses from SNMP devices.
-
Adding support for Oracle 12.2.
-
Add support for Oracle 18c.
-
CWS now computes hashes for all the files involved in the generation of a Security Profile and an Anomaly Detection Event
-
[Beta] Cluster agent supports APM Single Step Instrumentation for Kubernetes. Can be enabled in Kubernetes cluster by setting `DD_APM_INSTRUMENTATION_ENABLED=true. Single Step Instrumentation can be turned on in specific namespaces using environment variable DD_APM_INSTRUMENTATION_ENABLED_NAMESPACES. Single Step Instrumentation can be turned off in specific namespaces using environment variable DD_APM_INSTRUMENTATION_DISABLED_NAMESPACES.
Enhancement Notes
-
Moving the Orchestrator Explorer pod check from the process agent to the core agent. In the following release we will be removing the process agent check and defaulting to the core agent check. If you want to migrate ahead of time you can set
orchestrator_explorer.run_on_node_agent
= true in your configuration. -
Add new GPU metrics in the KSM Core check:
kubernetes_state.node.gpu_capacity
tagged bynode
,resource
,unit
andmig_profile
.kubernetes_state.node.gpu_allocatable
tagged bynode
,resource
,unit
andmig_profile
.kubernetes_state.container.gpu_limit
tagged by kube_namespace, pod_name, kube_container_name,node
,resource
,unit
andmig_profile
.
-
Tag container entity with
image_id
tag. -
max_message_size_bytes
can now be configured inlogs_config
. This allows the default message content limit of 256,000 bytes to be increased up to 1MB. If a log line is larger than this byte limit, the overflow bytes will be truncated. -
APM: Add regex support for filtering tags by apm_config.filter_tags_regex or environment variables DD_APM_FILTER_TAGS_REGEX_REQUIRE and DD_APM_FILTER_TAGS_REGEX_REJECT.
-
Agents are now built with Go
1.20.10
. -
CWS: Support fentry/fexit eBPF probes which provide lower overhead than kprobe/kretprobes (currently disabled by default and supported only on Linux kernel 5.10 and later).
-
CWS: Improved username resolution in containers and handle their creation and deletion at runtime.
-
CWS: Apply policy rules on processes already present at startup.
-
CWS: Reduce memory usage of BTF symbols.
-
Remote Configuration for Cloud Workload Security detection rules is enabled if Remote Configuration is globally enabled for the Datadog Agent. Remote Configuration for Cloud Workload Security can be disabled while Remote Configuration is globally enabled by setting the runtime_security_config.remote_configuration.enabled value to false. Remote Configuration for Cloud Workload Security cannot be enabled if Remote Configuration is not globally enabled.
-
Add
gce-container-declaration
to default GCE excluded host tags. Seeexclude_gce_tags
configuration settings for more. -
Add metrics for the workloadmeta extractor to process-agent status output.
-
Add a heartbeat mechanism for SBOM collection to avoid having to send the whole SBOM if it has not changed since the last computation. The default interval for the host SBOM has changed from 24 hours to 1 hour.
-
Prefix every entry in the log file with details about the database server and port to distinguish log entries originating from different databases.
-
JMXFetch internal telemetry is now included in the
agent status
output when the verbose flag is included in the request. -
Sensitive information is now scrubbed from pod annotations.
-
The image_id tag no longer includes the
docker-pullable://
prefix when using Kubernetes with Docker as runtime. -
Improve SQL text collection for self-managed installations. The Agent selects text from V$SQL instead of V$SQLSTATS. If it isn't possible to query the text, the Agent tries to identify the context, such as parsing or closing cursor, and put it in the SQL text.
-
Improve the Oracle check example configuration file.
-
Collect Oracle execution plans by default.
-
Add global custom queries to Oracle checks.
-
Add connection refused handling.
-
Add the hosting-type tag, which can have one of the following values: self-managed, RDS, or OCI.
-
Add a hidden parameter to log unobfuscated execution plan information.
-
Adding real_hostname tag.
-
Add sql_id and plan_hash_value to obfuscation error message.
-
Add Oracle
pga_over_allocation_count_metric
. -
Add information about missing privileges with the link to the grant commands.
-
Add TCPS configuration to conf.yaml.example.
-
The container check reports two new metrics:
container.memory.page_faults
container.memory.major_page_faults
to report the page fault counters per container.
-
prometheus_scrape: Adds support for multiple OpenMetrics V2 features in the
prometheus_scrape.checks[].configurations[]
items:exclude_metrics_by_labels
raw_line_filters
cache_shared_labels
use_process_start_time
hostname_label
hostname_format
telemetry
ignore_connection_errors
request_size
log_requests
persist_connections
allow_redirects
auth_token
For a description of each option, refer to the sample configuration in https://github.com/DataDog/integrations-core/blob/master/openmetrics/datadog_checks/openmetrics/data/conf.yaml.example.
-
Improved the SBOM check function to now communicate the status of scans and any potential errors directly to DataDog for more streamlined error management and resolution.
-
Separate init-containers from containers in the KubernetesPod structure of workloadmeta.
-
Improve marshalling performance in the
system-probe
->process-agent
path. This improves memory footprint when NPM and/or USM are enabled. -
Raise the default
logs_config.open_files_limit
to500
on Windows.
Deprecation Notes
- service_monitoring_config.enable_go_tls_support is deprecated and replaced by service_monitoring_config.tls.go.enabled. network_config.enable_https_monitoring is deprecated and replaced by service_monitoring_config.tls.native.enabled.
Security Notes
- APM: The Agent now obfuscates the entire Memcached command by default. You can revert to the previous behavior where only the values were obfuscated by setting
DD_APM_OBFUSCATION_MEMCACHED_KEEP_COMMAND=true
orapm_config.obfuscation.memcached.keep_command: true
in datadog.yaml. - Fix
CVE-2023-39325
- Bump
golang.org/x/net
to v0.17.0 to fix CVE-2023-44487.
Bug Fixes
- Fix Agent Flare not including Trace Agent's expvar output.
- Fixes a panic that occurs when the Trace Agent receives an OTLP payload during shutdown
- Fixes a crash upon receiving an OTLP Exponential Histogram with no buckets.
- CWS: Scope network context to DNS events only as it may not be available to all events.
- CWS: Fix a bug that caused security profiles of already running workloads to be empty.
...
7.48.1
Prelude
Release on: 2023-10-17
- Please refer to the 7.48.1 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- Upgraded Python 3.9 to Python 3.9.18
Security Notes
- Bump embedded curl version to 8.4.0 to fix CVE-2023-38545 and CVE-2023-38546
- Updated the version of OpenSSL used by Python on Windows to 1.1.1w; addressed CVE-2023-4807, CVE-2023-3817, and CVE-2023-3446
Bug Fixes
- On some slow drives, when the Agent shuts down suddenly the Logs Agent registry file can become corrupt. This means that when the Agent starts again the registry file can't be read and therefore the Logs Agent reads logs from the beginning again. With this update, the Agent now attempts to update the registry file atomically to reduce the chances of a corrupted file.
7.48.0
Agent
Prelude
Release on: 2023-10-10
- Please refer to the 7.48.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
-
The EventIDs logged to the Windows Application Event Log by the Agent services have been normalized and now have the same meaning across Agent services. Some EventIDs have changed and the rendered message may be incorrect if you view an Event Log from a host that uses a different version of the Agent than the host that created the Event Log. To ensure you see the correct message, choose "Display information for these languages" when exporting the Event Log from the host. This does not affect Event Logs collected by the Datadog Agent's Windows Event Log integration, which renders the event messages on the originating host. The EventIDs and messages used by the Agent services can be viewed in
pkg/util/winutil/messagestrings/messagestrings.mc
. -
datadog-connectivity
andmetadata-availability
subcommands do not exist anymore and their diagnoses are reported in a more general and structured way.Diagnostics previously reported via
datadog-connectivity
subcommand will be reported now as part ofconnectivity-datadog-core-endpoints
suite. Correspondingly, diagnostics previously reported viametadata-availability
subcommand will be reported now as part ofconnectivity-datadog-autodiscovery
suite. -
Streamlined settings by renaming workloadmeta.remote_process_collector.enabled and process_config.language_detection.enabled to language_detection.enabled.
-
The command line arguments to the Datadog Agent Trace Agent
trace-agent
have changed from single-dash arguments to double-dash arguments. For example,-config
must now be provided as--config
. Additionally, subcommands have been added, these may be listed with the--help
switch. For backward-compatibility reasons the old CLI arguments will still work in the foreseeable future but may be removed in future versions.
New Features
-
Added the kubernetes_state.pod.tolerations metric to the KSM core check
-
Grab, base64 decode, and attach trace context from message attributes passed through SNS->SQS->Lambda
-
Add kubelet healthz check (check_run.kubernetes_core.kubelet.check) to the Agent's core checks to replace the old kubernetes.kubelet.check generated from Python.
-
Tag the aws.lambda span generated by the datadog-extension with a language tag based on runtime information in dotnet and java cases
-
Extended the "agent diagnose" CLI command to allow the easy addition of new diagnostics for diverse and dispersed Agent code.
-
Add support for the
otlp_config.metrics.sums.initial_cumulative_monotonic_value
setting. -
[BETA] Adds Golang language and version detection through the system probe. This beta feature can be enabled by setting
system_probe_config.language_detection.enabled
totrue
in yoursystem-probe.yaml
. -
Add new kubelet corecheck, which will eventually replace the existing kubelet check.
-
Add custom queries to Oracle monitoring.
-
Adding new configuration setting
otlp_config.logs.enabled
to enable/disable logs support in the OTLP ingest endpoint. -
Add logsagentexporter, which is used in OTLP agent to translate ingested logs and forward them to logs-agent
-
Flush in-flight requests and pending retries to disk at shutdown when disk-based buffering of metrics is enabled (for example, when forwarder_storage_max_size_in_bytes is set).
-
Added a new collector in the process agent in workloadmeta. This collector allows for collecting processes when the process_config.process_collection.enabled is false and language_detection.enabled is true. The interval at which this collector collects processes can be adjusted with the setting workloadmeta.local_process_collector.collection_interval.
-
Tag lambda cold starts and proactive initializations on the root aws.lambda span
-
APM - This change improves the acceptance and queueing strategy for trace payloads sent to the Trace Agent. These changes create a system of backpressure in the Trace Agent, causing it to reject payloads when it cannot keep up with the rate of traffic, rather than buffering and causing OOM issues.
This change has been shown to increase overall throughput in the Trace Agent while decreasing peak resource usage. Existing configurations for CPU and memory work at least as well, and often better, with these changes compared to previous Agent versions. This means users do not have to adjust their configuration to take advantage of these changes, and they do not experience performance degredation as a result of upgrading.
Enhancement Notes
- When jmx_use_container_support is enabled you can use jmx_max_ram_percentage to set a maximum JVM heap size based off a percentage of the total container memory.
- SNMP profile detection now updates the SNMP profile for a given IP if the device at that IP changes.
- Add
Process Language Detection Enabled
in the output of the Agent Status command under theProcess Agent
section. - Improve
agent diagnose
command to be executed in context of running Agent process. - Agents are now built with Go
1.20.7
. This version of Golang fixesCVE-2023-29409
. - Added the
container.memory.usage.peak
metric to the container check. It shows the maximum memory usage recorded since the container started. - Unified
agent diagnose
CLI command by removingall
,datadog-connectivity
, andmetadata-availability
subcommands. These separate subcommands became one of the diagnose suites. Theall
subcommand became unnecessary. - APM: Improved performance and memory consumption in obfuscation, both halved on average.
- Agents are now built with Go
1.20.8
. - The processor frequency sent in metadata is now a decimal value on Darwin and Windows, as it already is on Linux. The precision of the value is increased on Darwin.
- CPU metadata which failed to be collected is no longer sent as empty values on Windows.
- Platform metadata which failed to be collected is no longer sent as empty values on Windows.
- Filesystem metadata is now collected without running the df binary on Unix.
- Adds language detection support for JRuby, which is detected as Ruby.
- Add the oracle.can_connect metric.
- Add duration to the plan payload.
- Increasing the collection interval for all the checks except for activity samples from 10s to 60s.
- Collect the number of CPUs and physical memory.
- Improve Oracle query metrics algorithm and the fetching time for execution plans.
- OTLP ingest pipeline panics no longer stop the Datadog Agent and instead only shutdown this pipeline. The panic is now available in the OTLP status section.
- During the process check, collect the command name from /proc/[pid]/comm. This allows more accurate language detection of processes.
- Change how SNMP trap variables with bit enumerations are resolved to hexadecimal strings prefixed with "0x" (previously base64 encoded strings).
- The Datadog agent container image is now using Ubuntu 23.04 lunar as the base image.
- Upgraded JMXFetch to 0.47.10 <https://github.com/DataDog/jmxfetch/releases/0.47.10>. This version improves how JMXFetch communicates with the Agent, and fixes a race condition where an exception is thrown if the Agent hasn't finished initializing before JMXFetch starts to shut down.
- Added
collector.worker_utilization
to the telemetry. This metric represents the amount of time that a runner worker has been running checks.
Deprecation Notes
- The command line arguments to the Datadog Agent Trace Agent
trace-agent
have changed from single-dash arguments to double-dash arguments. For example,-config
must now be provided as--config
. For backward-compatibility reasons the old CLI arguments will still work in the foreseeable future but may be removed in future versions.
Security Notes
-
APM: In order to improve the default customer experience regarding sensitive data, the Agent now obfuscates database statements within span metadata by default. This includes MongoDB queries, ElasticSearch request bodies, and raw commands from Redis and MemCached. Previously, this setting was off by default. This update could have performance implications, or obfuscate data that is not sensitive, and can be disabled or configured through the obfuscation options within the apm_config, or with the environment variables prefixed with DD_APM_OBFUSCATION. Please read the [Data Security documentation for full details](https://docs.datadoghq.com/tracing/configure_data_security/#trace-obfuscation).
-
This update ensures the sql.query tag is always obfuscated by the Datadog Agent even if this tag was already set by a tracer or manually by a user. This is to prevent potentially sensitive data from being sent to Datadog. If you wish to have a raw, unobfuscated query within a span, then manually add a span tag of a different name (for example, sql.rawquery).
-
Fix
CVE-2023-39320
,CVE-2023-39318
,CVE-2023-39319
, andCVE-2023-39321
. -
Update OpenSSL from 3.0.9 to 3.0.11. This addresses CVEs CVE-202...
7.47.1
Prelude
Release on: 2023-09-21
Bug Fixes
- Fixes issue with NPM driver restart failing with "File Not Found" error on Windows.
- APM: The
DD_APM_REPLACE_TAGS
environment variable andapm_config.replace_tags
setting now properly look for tags with numeric values. - Fix the issue introduced in 7.47.0 that causes the SE_DACL_AUTO_INHERITED flag to be removed from the installation drive directory when the installer fails and rolls back.
7.47.0
Agent
Prelude
Release on: 2023-08-31
- Please refer to the 7.47.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- Embedded Python 3 interpreter is upgraded to 3.9.17 in both Agent 6 and Agent 7. Embedded OpenSSL is upgraded to 3.0.9 in Agent 7 on Linux and macOS. On Windows, Python 3.9 in Agent 7 is still compiled with OpenSSL 1.1.1.
New Features
-
Add ability to send an Agent flare from the Datadog Application for Datadog support team troubleshooting. This feature requires enabling Remote Configuration.
-
Added workloadmeta remote process collector to collect process metadata from the Process-Agent and store it in the core agent.
-
Added new parameter
workloadmeta.remote_process_collector.enabled
to enable the workloadmeta remote process collector. -
Added a new tag
collector
todatadog.agent.workloadmeta_remote_client_errors
. -
APM: Added support for obfuscating all Redis command arguments. For any Redis command, all arguments will be replaced by a single "?". Configurable using config variable
apm_config.obfuscation.redis.remove_all_args
and environment variableDD_APM_OBFUSCATION_REDIS_REMOVE_ALL_ARGS
. Both accept a boolean value with default valuefalse
. -
Added an experimental setting process_config.language_detection.enabled. This enables detecting languages for processes. This feature is WIP.
-
Added an experimental gRPC server to process-agent in order to expose process entities with their detected language. This feature is WIP and controlled through the process_config.language_detection.enabled setting.
-
The Agent now sends its configuration to Datadog by default to be displayed in the Agent Configuration section of the host detail panel. See https://docs.datadoghq.com/infrastructure/list/#agent-configuration for more information. The Agent configuration is scrubbed of any sensitive information and only contains configuration you’ve set using the configuration file or environment variables. To disable this feature set inventories_configuration_enabled to false.
-
The Windows installer can now send a report to Datadog in case of installation failure.
-
The Windows installer can now send APM telemetry.
-
Add support for Oracle Autonomous Database (Oracle Cloud Infrastructure).
-
Add shared memory (a.k.a. system global area - SGA) metric for Oracle databases: oracle.shared_memory.size
-
With this release,
remote_config.enabled
is set totrue
by default in the Agent configuration file. This causes the Agent to request configuration updates from the Datadog site.To receive configurations from Datadog, you still need to enable Remote Configuration at the organization level and enable Remote Configuration capability on your API Key from the Datadog application. If you don't want the Agent to request configurations from Datadog, set
remote_config.enabled
tofalse
in the Agent configuration file. -
DD_SERVICE_MAPPING can be used to rename Serverless inferred spans' service names.
-
Adds a new agent command
stream-event-platform
to stream the event platform payloads being generated by the agent. This will help diagnose issues with payload generation, and should ease validation of payload changes.
Enhancement Notes
-
Add two new initContainer metrics to the Kubernetes State Core check: kubernetes_state.initcontainer.waiting and kubernetes_state.initcontainer.restarts.
-
Add the following sysmetrics to improve DBA/SRE/SE perspective:
avg_synchronous_single_block_read_latency,
active_background_on_cpu, active_background,
branch_node_splits, consistent_read_changes,
consistent_read_gets, active_sessions_on_cpu, os_load,
database_cpu_time_ratio, db_block_changes, db_block_gets,
dbwr_checkpoints, enqueue_deadlocks, execute_without_parse,
gc_current_block_received, gc_average_cr_get_time,
gc_average_current_get_time, hard_parses,
host_cpu_utilization, leaf_nodes_splits, logical_reads,
network_traffic_volume, pga_cache_hit, parse_failures,
physical_read_bytes, physical_read_io_requests,
physical_read_total_io_requests, physical_reads_direct_lobs,
physical_read_total_bytes, physical_reads_direct,
physical_write_bytes, physical_write_io_requests,
physical_write_total_bytes, physical_write_total_io_requests,
physical_writes_direct_lobs, physical_writes_direct,
process_limit, redo_allocation_hit_ratio, redo_generated,
redo_writes, row_cache_hit_ratio, soft_parse_ratio,
total_parse_count, user_commits -
Pause containers from the new Kubernetes community registry (registry.k8s.io/pause) are now excluded by default for containers and metrics collection.
-
[corechecks/snmp] Add forced type
rate
as an alternative tocounter
. -
[corechecks/snmp] Add symbol level
metric_type
for table metrics. -
Adds support for including the span.kind tag in APM stats aggregations.
-
Allow
ad_identifiers
to be used in file based logs integration configs in order to collect logs from disk. -
Agents are now built with Go
1.20.5
-
Agents are now built with Go
1.20.6
. This version of Golang fixes CVE-2023-29406. -
Improve error handling in External Metrics query logic by running queries with errors individually with retry and backoff, and batching only queries without errors.
-
CPU metadata is now collected without running the sysctl binary on Darwin.
-
Memory metadata is now collected without running the sysctl binary on Darwin.
-
Always send the swap size value in metadata as an integer in kilobytes.
-
Platform metadata is now collected without running the uname binary on Linux and Darwin.
-
Add new metrics for resource aggregation to the Kubernetes State Core check:
- kubernetes_state.node.<cpu|memory>_capacity.total
- kubernetes_state.node.<cpu|memory>_allocatable.total
- kubernetes_state.container.<cpu|memory>_requested.total
- kubernetes_state.container.<cpu|memory>_limit.total
-
The kube node name is now reported a host tag
kube_node
-
[pkg/netflow] Collect flow_process_nf_errors_count metric from goflow2.
-
APM: Bind
apm_config.obfuscation.*
parameters to new obfuscation environment variables. In particular, bind:
apm_config.obfuscation.elasticsearch.enabled
toDD_APM_OBFUSCATION_ELASTICSEARCH_ENABLED
: It accepts a boolean value with default value false.apm_config.obfuscation.elasticsearch.keep_values
toDD_APM_OBFUSCATION_ELASTICSEARCH_KEEP_VALUES
It accepts a list of strings of the form["id1", "id2"]
.apm_config.obfuscation.elasticsearch.obfuscate_sql_values
toDD_APM_OBFUSCATION_ELASTICSEARCH_OBFUSCATE_SQL_VALUES
It accepts a list of strings of the form["key1", "key2"]
.apm_config.obfuscation.http.remove_paths_with_digits
toDD_APM_OBFUSCATION_HTTP_REMOVE_PATHS_WITH_DIGITS
, It accepts a boolean value with default value false.apm_config.obfuscation.http.remove_query_string
toDD_APM_OBFUSCATION_HTTP_REMOVE_QUERY_STRING
, It accepts a boolean value with default value false.apm_config.obfuscation.memcached.enabled
toDD_APM_OBFUSCATION_MEMCACHED_ENABLED
: It accepts a boolean value with default value false.apm_config.obfuscation.mongodb.enabled
toDD_APM_OBFUSCATION_MONGODB_ENABLED
: It accepts a boolean value with default value false.apm_config.obfuscation.mongodb.keep_values
toDD_APM_OBFUSCATION_MONGODB_KEEP_VALUES
It accepts a list of strings of the form["id1", "id2"]
.apm_config.obfuscation.mongodb.obfuscate_sql_values
toDD_APM_OBFUSCATION_MONGODB_OBFUSCATE_SQL_VALUES
It accepts a list of strings of the form["key1", "key2"]
.apm_config.obfuscation.redis.enabled
toDD_APM_OBFUSCATION_REDIS_ENABLED
: It accepts a boolean value with default value false.apm_config.obfuscation.remove_stack_traces
toDD_APM_OBFUSCATION_REMOVE_STACK_TRACES
: It accepts a boolean value with default value false.apm_config.obfuscation.sql_exec_plan.enabled
toDD_APM_OBFUSCATION_SQL_EXEC_PLAN_ENABLED
: It accepts a boolean value with default value false.apm_config.obfuscation.sql_exec_plan.keep_values
toDD_APM_OBFUSCATION_SQL_EXEC_PLAN_KEEP_VALUES
It accepts a list of strings of the form["id1", "id2"]
.apm_config.obfuscation.sql_exec_plan.obfuscate_sql_values
toDD_APM_OBFUSCATION_SQL_EXEC_PLAN_OBFUSCATE_SQL_VALUES
It accepts a list of strings of the form["key1", "key2"]
.apm_config.obfuscation.sql_exec_plan_normalize.enabled
toDD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_ENABLED
: It accepts a boolean value with default value false.apm_config.obfuscation.sql_exec_plan_normalize.keep_values
toDD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_KEEP_VALUES
It accepts a list of strings of the form["id1", "id2"]
.apm_config.obfuscation.sql_exec_plan_normalize.obfuscate_sql_values
toDD_APM_OBFUSCATION_SQL_EXEC_PLAN_NORMALIZE_OBFUSCATE_SQL_VALUES
It accepts a ...
7.46.0
Agent
Prelude
Release on: 2023-07-10
- Please refer to the 7.46.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
-
Refactor the SBOM collection parameters from:
conf.d/container_lifecycle.d/conf.yaml existence (A) # to schedule the container lifecycle long running check conf.d/container_image.d/conf.yaml existence (B) # to schedule the container image metadata long running check conf.d/sbom.d/conf.yaml existence (C) # to schedule the SBOM long running check Inside datadog.yaml: container_lifecycle: enabled: (D) # Used to control the start of the container_lifecycle forwarder but has been decommissioned by #16084 (7.45.0-rc) dd_url: # \ additional_endpoints: # | use_compression: # | compression_level: # > generic parameters for the generic EVP pipeline … # | use_v2_api: # / container_image: enabled: (E) # Used to control the start of the container_image forwarder but has been decommissioned by #16084 (7.45.0-rc) dd_url: # \ additional_endpoints: # | use_compression: # | compression_level: # > generic parameters for the generic EVP pipeline … # | use_v2_api: # / sbom: enabled: (F) # control host SBOM collection and do **not** control container-related SBOM since #16084 (7.45.0-rc) dd_url: # \ additional_endpoints: # | use_compression: # | compression_level: # > generic parameters for the generic EVP pipeline … # | use_v2_api: # / analyzers: (G) # trivy analyzers user for host SBOM collection cache_directory: (H) clear_cache_on_exit: (I) use_custom_cache: (J) custom_cache_max_disk_size: (K) custom_cache_max_cache_entries: (L) cache_clean_interval: (M) container_image_collection: metadata: enabled: (N) # Controls the collection of the container image metadata in workload meta sbom: enabled: (O) use_mount: (P) scan_interval: (Q) scan_timeout: (R) analyzers: (S) # trivy analyzers user for containers SBOM collection check_disk_usage: (T) min_available_disk: (U)
to:
conf.d/{container_lifecycle,container_image,sbom}.d/conf.yaml no longer needs to be created. A default version is always shipped with the Agent Docker image with an underscore-prefixed ad_identifier that will be synthesized by the agent at runtime based on config {container_lifecycle,container_image,sbom}.enabled parameters. Inside datadog.yaml: container_lifecycle: enabled: (A) # Replaces the need for creating a conf.d/container_lifecycle.d/conf.yaml file dd_url: # \ additional_endpoints: # | use_compression: # | compression_level: # > unchanged generic parameters for the generic EVP pipeline … # | use_v2_api: # / container_image: enabled: (B) # Replaces the need for creating a conf.d/container_image.d/conf.yaml file dd_url: # \ additional_endpoints: # | use_compression: # | compression_level: # > unchanged generic parameters for the generic EVP pipeline … # | use_v2_api: # / sbom: enabled: (C) # Replaces the need for creating a conf.d/sbom.d/conf.yaml file dd_url: # \ additional_endpoints: # | use_compression: # | compression_level: # > unchanged generic parameters for the generic EVP pipeline … # | use_v2_api: # / cache_directory: (H) clear_cache_on_exit: (I) cache: # Factorize all settings related to the custom cache enabled: (J) max_disk_size: (K) max_cache_entries: (L) clean_interval: (M) host: # for host SBOM parameters that were directly below `sbom` before. enabled: (F) # sbom.host.enabled replaces sbom.enabled analyzers: (G) # sbom.host.analyzers replaces sbom.analyzers container_image: # sbom.container_image replaces container_image_collection.sbom enabled: (O) use_mount: (P) scan_interval: (Q) scan_timeout: (R) analyzers: (S) # trivy analyzers user for containers SBOM collection check_disk_usage: (T) min_available_disk: (U)
New Features
-
This change adds support for ingesting information such as database settings and schemas as database "metadata"
-
Add the capability for the security-agent compliance module to export detailed Kubernetes node configurations.
-
Add unsafe-disable-verification flag to skip TUF/in-toto verification when downloading and installing wheels with the integrations install command
-
Add container.memory.working_set metric on Linux (computed as Usage - InactiveFile) and Windows (mapped to Private Working Set)
-
Enabling
dogstatsd_metrics_stats_enable
will now enabledogstatsd_logging_enabled
. When enabled,dogstatsd_logging_enabled
generates dogstatsd log files at:- For
Windows
:
c:\programdata\datadog\logs\dogstatsd_info\dogstatsd-stats.log
- For
Linux
:
/var/log/datadog/dogstatsd_info/dogstatsd-stats.log
- For
MacOS
:
/opt/datadog-agent/logs/dogstatsd_info/dogstatsd-stats.log
These log files are also automatically attached to the flare.
- For
-
You can adjust the dogstatsd-stats logging configuration by using:
- dogstatsd_log_file_max_size:
SizeInBytes
(default:dogstatsd_log_file_max_size:"10Mb"
) - dogstatsd_log_file_max_rolls:
Int
(default:dogstatsd_log_file_max_rolls:3
)
- dogstatsd_log_file_max_size:
-
The network_config.enable_http_monitoring configuration has changed to service_monitoring_config.enable_http_monitoring.
-
Add Oracle execution plans
-
Oracle query metrics
-
Add support for Oracle RDS multi-tenant
Enhancement Notes
agent status -v
now shows verbose diagnostic information. Added tailer-specific stats to the verbose status page with improved auto multi-line detection information.- The
health
command from the Agent and Cluster Agent now have a configurable timeout (60 second by default). - Add two new metrics to the Kubernetes State Core check: kubernetes_state.configmap.count and kubernetes_state.secret.count.
- The metadata payload containing the status of every integration run by the Agent is now sent one minute after startup and then every ten minutes after that, as before. This means that the integration status will be visible in the app one minute after the Agent starts instead of ten minutes. The payload waits for a minute so the Agent has time to run every configured integration twice and collect an accurate status.
- Adds the ability to generate an Oracle SQL trace for Agent queries
- APM: The disable_file_logging setting is now respected.
- Collect conditions for a variety of Kubernetes resources.
- Documents the max_recv_msg_size_mib option and DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_MAX_RECV_MSG_SIZE_MIB environment variable in the OTLP config. This variable is used to configure the maximum size of messages accepted by the OTLP gRPC endpoint.
- Agents are now built with Go
1.19.10
- Inject container tags in instrumentation telemetry payloads
- Extract the task_arn tag from container tags and add it as its own header.
- [pkg/netflow] Add
flush_timestamp
to payload. - [pkg/netflow] Add sequence metrics.
- [netflow] Upgrade goflow2 to v1.3.3.
- Add Oracle sysmetrics, pga process memory usag...
7.45.1
Prelude
Release on: 2023-06-27
Security Notes
- Bump ncurses to 6.4 in the Agent embedded environment. Fixes CVE-2023-29491.
- Updated the version of OpenSSL used by Python to 1.1.1u; addressed CVE-2023-2650, CVE-2023-0466, CVE-2023-0465 and CVE-2023-0464.
7.45.0
Agent
Prelude
Release on: 2023-06-05
- Please refer to the 7.45.0 tag on integrations-core for the list of changes on the Core Checks
New Features
- Add Topology data collection with CDP.
- APM: Addition of configuration to add
peer.service
to trace stats exported by the Agent. - APM: Addition of configuration to compute trace stats on spans based on their
span.kind
value. - APM: Added a new endpoint in the trace-agent API /symdb/v1/input that acts as a reverse proxy forwarding requests to Datadog. The feature using this is currently in development.
- Add support for confluent-kafka.
- Add support for XCCDF benchmarks in CSPM. A new configuration option, 'compliance_config.xccdf.enabled', disabled by default, has been added for enabling XCCDF benchmarks.
- Add arguments to module load events
- Oracle DBM monitoring with activity sampling. The collected samples form the foundation for database load profiling. With Datadog GUI, samples can be aggregated and filtered to identify bottlenecks.
- Add reporting of container.{cpu|memory|io}.partial_stall metrics based on PSI Some values when host is running with cgroupv2 enabled (Linux only). This metric provides the wall time (in nanoseconds) during which at least one task in the container has been stalled on the given resource.
- Adding a new option secret_backend_remove_trailing_line_break to remove trailing line breaks from secrets returned by secret_backend_command. This makes it easier to use secret management tools that automatically add a line break when exporting secrets through files.
Enhancement Notes
-
Cluster Agent: User config, cluster Agent deployment and node Agent daemonset manifests are now added to the flare archive, when the Cluster Agent is deployed with Helm (version 3.23.0+).
-
Datadog Agent running as a systemd service can optionally read environment variables from a text file /etc/datadog-agent/environment containing newline-separated variable assignments. See https://www.freedesktop.org/software/systemd/man/systemd.exec.html#Environment
-
Add ability to filter kubernetes containers based on autodiscovery annotation. Containers in a pod can now be omitted by setting ad.datadoghq.com/<container_name>.exclude as an annotation on the pod. Logs can now be ommitted by setting ad.datadoghq.com/<container_name>.logs_exclude as an annotation on the pod.
-
Added support for custom resource definitions metrics: crd.count and crd.condition.
-
- Remove BadgerDB cache for Trivy.
- Add new custom LRU cache for Trivy backed by BoltDB and parametrized by:
- Periodically delete unused entries from the custom cache.
- Add telemetry metrics to monitor the cache:
sbom.cached_keys
: Number of cache keys stored in memorysbom.cache_disk_size
: Total size, in bytes, of the database as reported by BoltDB.sbom.cached_objects_size
: Total size, in bytes, of cached SBOM objects on disk. Limited by sbom.custom_cache_max_disk_size.sbom.cache_hits_total
: Total number of cache hits.sbom.cache_misses_total
: Total number of cache misses.sbom.cache_evicts_total
: Total number of cache evicts.
-
Added DD_ENV to the SBOMPayload in the SBOM check.
-
Added kubernetes_state.hpa.status_target_metric and kubernetes_state.deployment.replicas_ready metrics part of the kubernetes_state_core check.
-
Add support for emitting resources on metrics from tags in the format dd.internal.resource:type,name.
-
APM: Dynamic instrumentation logs and snapshots can now be shipped to multiple Datadog logs intakes.
-
Adds support for OpenTelemetry span links to the Trace Agent OTLP endpoint when converting OTLP spans (span links are added as metadata to the converted span).
-
Agents are now built with Go
1.19.9
. -
Make Podman DB path configurable for rootless environment. Now we can set
$HOME/.local/share/containers/storage/libpod/bolt_state.db
. -
Add ownership information for containers to the container-lifecycle check.
-
Add Pod exit timestamp to container-lifecycle check.
-
The Agent now uses the ec2_metadata_timeout value when fetching EC2 instance tags with AWS SDK. The Agent fetches instance tags when collect_ec2_tags is set to true.
-
Upgraded JMXFetch to
0.47.8
which has improvements aimed to help large metric collections drop fewer payloads. -
Kubernetes State Metrics Core: Adds collection of Kubernetes APIServices metrics
-
Add support for URLs with the http|https scheme in the dd_url or logs_dd_url parameters when configuring endpoints. Also automatically detects SSL needs, based on the scheme when it is present.
-
[pkg/netflow] Add NetFlow Exporter to NDM Metadata.
-
SUSE RPMs are now built with RPM 4.14.3 and have SHA256 digest headers.
-
observability_pipelines_worker
can now be used in place of thevector
config options. -
Add an option and an annotation to skip
kube_service
tags on Kubernetes pods.When the selector of a service matches a pod and that pod is ready, its metrics are decorated with a
kube_service
tag.When the readiness of a pod flips, so does the
kube_service
tag. This could create visual artifacts (spikes when the tag flips) on dashboards where the queries are missing.fill(null)
.If many services target a pod, the total number of tags attached to its metrics might exceed a limit that causes the whole metric to be discarded.
In order to mitigate these two issues, it’s now possible to set the
kubernetes_ad_tags_disabled
parameter tokube_config
to globally remove thekube_service
tags on all pods:: kubernetes_ad_tags_disabled- kube_service
It’s also possible to add a
tags.datadoghq.com/disable: kube_service
annotation on only the pods for which we want to remove thekube_service
tag.Note that
kube_service
is the only tag that can be removed via this parameter and this annotation. -
Support OTel semconv 1.17.0 in OTLP ingest endpoint.
-
When
otlp_config.metrics.histograms.send_aggregation_metrics
is set totrue
, the OTLP ingest pipeline will now send min and max metrics for delta OTLP Histograms and OTLP Exponential Histograms when available, in addition to count and sum metrics.The deprecated option
otlp_config.metrics.histograms.send_count_sum_metrics
now also sends min and max metrics when available. -
OTLP: Use minimum and maximum values from cumulative OTLP Histograms. Values are used only when we can assume they are from the last time window or otherwise to clamp estimates.
-
The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.75.0.
-
Secrets with ENC[] notation are now supported for proxy setting from environment variables. For more information you can refer to our [Secrets Management](https://docs.datadoghq.com/agent/guide/secrets-management/) and [Agent Proxy Configuration](https://docs.datadoghq.com/agent/proxy/) documentations.
-
[corechecks/snmp] Adds ability to send constant metrics in SNMP profiles.
-
[corechecks/snmp] Adds ability to map metric tag value to string in SNMP profiles.
-
[corechecks/snmp] Add support to format bytes into ip_address
Deprecation Notes
- APM OTLP: Field UsePreviewHostnameLogic is deprecated, and usage of this field has been removed. This is done in preparation to graduate the exporter.datadog.hostname.preview feature gate to stable.
- The Windows Installer NPM feature option, used in
ADDLOCAL=NPM
andREMOVE=NPM
, no longer controls the install state of NPM components. The NPM components are now always installed, but will only run when enabled in the agent configuration. The Windows Installer NPM feature option still exists for backwards compatability purposes, but has no effect. - Deprecate
otlp_config.metrics.histograms.send_count_sum_metrics
in favor ofotlp_config.metrics.histograms.send_aggregation_metrics
. - Removed the --info flag in the Process Agent, which has been replaced by the status command since 7.35.
Security Notes
- Handle the return value of Close() for writable files in
pkg/forwarder
- Fixes cwe 703. Handle the return value of Close() for writable files and forces writes to disks in system-probe
Bug Fixes
- APM: Setting apm_config.receiver_port: 0 now allows enabling UNIX Socket or Windows Pipes listeners.
- APM: OTLP: Ensure that container tags are set globally on the payload so that they can be picked up as primary tags in the app.
- APM: Fixes a bug with how stats are calculated when using single span sampling along with other sampling configurations.
- APM: Fixed the issue where not all trace stats are flushed on trace-agent shutdown.
- Fix an issue on the pod collection where the cluster name would not be consistently RFC1123 compliant.
- Make the agent able to detect it is runn...
7.44.1
Prelude
Release on: 2023-05-16
Enhancement Notes
- Agents are now built with Go
1.19.8
. - Added optional config flag process_config.cache_lookupid to cache calls to user.LookupId in the process Agent. Use to minimize the number of calls to user.LookupId and avoid potential leak.
Bug Fixes
- Fixes the inclusion of the
security-agent.yaml
file in the flare.
7.44.0
Agent
Prelude
Release on: 2023-04-27
- Please refer to the 7.44.0 tag on integrations-core for the list of changes on the Core Checks
New Features
- Added HTTP/2 parsing logic to Universal Service Monitoring.
- Adding Universal Service Monitoring to the Agent status check. Now Datadog has visibility into the status of Universal Service Monitoring. Startup failures appear in the status check.
- In the agent.log, a DEBUG, WARN, and ERROR log have been added to report how many file handles the core Agent process has open. The DEBUG log reports the info, the WARN log appears when the core Agent is over 90% of the OS file limit, and the ERROR log appears when the core Agent has reached 100% of the OS file limit. In the Agent status command, fields CoreAgentProcessOpenFiles and OSFileLimit have been added to the Logs Agent section. This feature is currently for Linux only.
- APM: Collect trace agent startup errors and successes using instrumentation-telemetry "apm-onboarding-event" messages.
- APM OTLP: Introduce OTLP Ingest probabilistic sampling, configurable via otlp_config.traces.probabilistic_sampler.sampling_percentage.
- The Datadog Admission Controller can inject the .NET APM library into Kubernetes containers for auto-instrumentation.
- Enable CWS Security Profiles by default.
- Support the config additional_endpoints for Data Streams monitoring.
- Added support for collecting container image metadata when using Docker.
- Added Kafka parsing logic to system-probe
- Allow writing SECL rules against container creation time through the new container.created_at field, similar to the existing process.container_at field. The container creation time is also reported in the sent events.
- [experimental] CWS generates an SBOM for any running workload on the machine.
- [experimental] CWS events are enriched with SBOM data.
- [experimental] CWS activity dumps are enriched with SBOM data.
- Enable OTLP endpoint for receiving traces in the Datadog Lambda Extension.
- On Windows, when service inference is enabled, process_context tags can now be populated by the service name in the SCM. This feature can be controlled by either the service_monitoring_config.process_service_inference.enabled config setting in the user's datadog.yaml config file, or it can be configured via the DD_SYSTEM_PROBE_PROCESS_SERVICE_INFERENCE_USE_WINDOWS_SERVICE_NAME environment variable. This setting is enabled by default.
Enhancement Notes
-
Added kubernetes_state.hpa.status_target_metric and kubernetes_state.deployment.replicas_ready metrics part of the kubernetes_state_core check.
-
The status page now includes a
Status render errors
section to highlight errors that occurred while rendering it. -
APM:
- Run the /debug/* endpoints in a separate server which uses port 5012 by default and only listens on
127.0.0.1
. The port is configurable throughapm_config.debug.port
andDD_APM_DEBUG_PORT
, set it to 0 to disable the server. - Scrub the content served by the expvar endpoint.
- Run the /debug/* endpoints in a separate server which uses port 5012 by default and only listens on
-
APM: apm_config.features is now configurable from the Agent configuration file. It was previously only configurable via DD_APM_FEATURES.
-
Agents are now built with Go
1.19.7
. -
The OTLP ingest endpoint now supports the same settings and protocol as the OpenTelemetry Collector OTLP receiver v0.71.0.
-
Collect Kubernetes Pod conditions.
-
Added the "availability-zone" tag to the Fargate integration. This matches the tag emitted by other AWS infrastructure integrations.
-
Allow to report all gathered data in case of partial failure of container metrics retrieval.
-
Upgraded JMXFetch to
0.47.8
which has improvements aimed to help large metric collections drop fewer payloads. -
JMXFetch upgraded to 0.47.5 which now supports pulling metrics from javax.management.openmbean.TabularDataSupport. Also contains a fix for pulling metrics from javax.management.openmbean.TabularDataSupport when no tags are specified.
-
Updated chunking util and use cases to use generics. No behavior change.
-
[corechecks/snmp] Add
interface_configs
to override interface speed. -
No longer increments TCP retransmit count when the retransmit fails.
-
The OTLP ingestion endpoint now supports the same settings and protocols as the OpenTelemetry Collector OTLP receiver v0.70.0.
-
Changes the retry mechanism of starting workloadmeta collectors so that instead of retrying every 30 seconds, it retries following an exponential backoff with initial interval of 1s and max of 30s. In general, this should help start sooner the collectors that failed on the first try.
-
Added the "pull_duration" metric in the workloadmeta telemetry. It measures the time that it takes to pull from the collectors.
Deprecation Notes
- Marked the "availability_zone" tag as deprecated for the Fargate integration, in favor of "availability-zone".
- Configuration
enable_sketch_stream_payload_serialization
is now deprecated.
Security Notes
- The Agent now checks containerd containers Spec size before parsing it. Any Spec exceeding 2MB will not be parsed and a warning will be emitted. This impacts the container_env_as_tags feature and %%hostname%% variable resolution for environments based on containerd outside of Kubernetes.
Bug Fixes
- APM: Fix issue where dogstatsd proxy would not work when bind address was set to localhost on MacOS. APM: Fix issue where setting bind_host to "::1" would break runtime metrics for the trace-agent.
- APM: Trace Agent not printing critical init errors.
- Fixes a bug where ignored container files (that were not tailed) were incorrectly counted against the total open files.
- Fixes the configuration parsing of the "container_lifecycle" check. Custom config values were not being applied.
- Corrects dogstatsd metric message validation to support all current (and some future) dogstatsd features
- Avoid panic in kubernetes_state_core check with specific Ingress objects configuration.
- Fixes a divide-by-zero panic when sketch serialization fails on the last metric of a given batch
- Fix issue introduced in 7.43 that prevents the Datadog Agent Manager application from executing from the checkbox at the end of the Datadog Agent installation when the installer is run by a non-elevated administrator user.
- Fixes a problem with USM and IIS on Windows Server 2022 due to a change in the way Microsoft reports IIS connections.
- Fixes the labelsAsTags parameter of the kube-state metrics core check. Tags were not properly formatted when they came from a label on one resource type (for example, namespace) and turned into a tag on another resource type (for example, pod).
- The OTLP ingest endpoint does not report the first cumulative monotonic sum value if the start timestamp of the timeseries matches its timestamp.
- Prevent disallowlisting on empty command line for processes in the Process Agent when encountering a failure to parse, use exe value instead.
- Make SNMP Listener support all authProtocol.
- Fix an issue where
agent status
would show incorrect system-probe status for 15 seconds as the system-probe started up. - Fix partial loss of NAT info in system-probe for pre-existing connections.
- Replace
;
with&
in the URL to open GUI to follow golang.org/issue/25192. - Workloadmeta now avoids concurrent pulls from the same collector. This bug could lead to incorrect or missing data when the collectors were too slow pulling data.
- Fixes a bug that prevents the containerd workloadmeta collector from starting sometimes when container_image_collection.metadata.enabled is set to true.
- Fixed a bug in the SBOM collection feature. In certain cases, some SBOMs were not collected.
Other Notes
- The
logs_config.cca_in_ad
has been removed.
Datadog Cluster Agent
New Features
- Add conditions to Vertical Pod Autoscalers
- Experimental: Support Ruby library injection through the Admission Controller on Kubernetes.
Enhancement Notes
- Add new metrics for the KSM Core check for extended resources:
- Pod requests and limits of the network bandwidth extended resource: kubernetes_state.container.network_bandwidth_limit, kubernetes_state.container.network_bandwidth_requested
- The capacity and allocatable network bandwidth extended resource of a node: kubernetes_state.node.network_bandwidth_allocatable, kubernetes_state.node.network_bandwidth_capacity
- Admission Controller: Add telemetry around auto-instrumentation via remote config.
- The UDS socket volume when using the Admission Controller is now mounted in readOnly mode.