Releases: DataDog/datadog-agent
Datadog Agent 7.24.0
7.24.0
Prelude
Release on: 2020-12-03
- Please refer to the 7.24.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- tcp_queue_length check: the previous metrics reported by this check (
tcp_queue.rqueue.size
,tcp_queue.rqueue.min
,tcp_queue.rqueue.max
,tcp_queue.wqueue.size
,tcp_queue.wqueue.min
,tcp_queue.wqueue.max
) were generating too much data because there was one time series generated per TCP connection.
Those metrics have been replaced bytcp_queue.read_buffer_max_usage_pct
,tcp_queue.write_buffer_max_usage_pct
which are aggregating all the connections of a container.
These metrics are reporting the maximum usage in percent (amount of data divided by the queue capacity) of the busiest buffer.
Additionally,only_count_nb_context
option from thetcp_queue_length
check configuration has been removed and will be ignored from now on.
New Features
-
Added new configuration flag,
system_probe_config.enable_conntrack_all_namespaces,
false by default. When set to true, this will allow system
probe to monitor conntrack entries (for NAT info) in all
namespaces that are peers of the root namespace. -
Added JMX version and java runtime version to agent status page
-
kubernetes_pod_annotations_as_tags
(DD_KUBERNETES_POD_ANNOTATIONS_AS_TAGS
) now support regex wildcards:
'{"*":"<PREFIX>_%%annotation%%"}'
can be used as value to collect all pod annotations as tags.
kubernetes_node_labels_as_tags
(DD_KUBERNETES_NODE_LABELS_AS_TAGS
) now support regex wildcards:
'{"*":"<PREFIX>_%%label%%"}'
can be used as value to collect all node labels as tags.
Note:kubernetes_pod_labels_as_tags
(DD_KUBERNETES_POD_LABELS_AS_TAGS
) supports this already. -
Listening for conntrack updates from all network namespaces
(system_probe_config.enable_conntrack_all_namespaces flag) is now turned
on by default
Enhancement Notes
-
Expand pause container image filter
-
Adds misconfig check for hidepid=2 option on proc mount.
-
It's possible to ignore
auto_conf.yaml
configuration files usingignore_autoconf
orDD_IGNORE_AUTOCONF
.
Example: DD_IGNORE_AUTOCONF="redisdb kubernetes_state" -
APM: The trace-agent now automatically sets the GOMAXPROCS value in
Linux containers to match allocated CPU quota, as opposed to the matching
the entire node's quota. -
APM: Lowered CPU usage when using analytics.
-
APM: Move UTF-8 validation from the span normalizer to the trace decoder, which reduces the number of times each distinct string will be validated to once, which is beneficial when the v0.5 trace format is used.
-
Add the config
forwarder_retry_queue_payloads_max_size
which defines the
maximum size in bytes of all the payloads in the forwarder's retry queue. -
When enabled, JMXFetch now logs to its own log file. Defaults to
jmxfetch.log
in the default agent log directory, and can be configured withjmx_log_file
. -
Added UDS support for JMXFetch
JMXFetch upgraded to 0.40.3 -
dogstatsd_mapper_profiles may now be defined as an environment variable DD_DOGSTATSD_MAPPER_PROFILES formatted as JSON
-
Add orchestrator explorer related section into DCA Status
-
Added byte count per log source and display it on the status page.
-
APM: refactored the SQL obfuscator to be significantly more efficient.
Deprecation Notes
-
IO check: device_blacklist_re has been deprecated in favor of device_exclude_re.
-
The config options tracemalloc_whitelist and tracemalloc_blacklist have been
deprecated in favor of tracemalloc_include and tracemalloc_exclude.
Bug Fixes
-
APM: Fix a bug where non-float64 numeric values in apm_config.analyzed_spans
would disable this functionality. -
Disable stack protector on system-probe to make it buildable on the environments which stack protector is enabled by default.
Some linux distributions like Alpine Linux enable stack protector by default which is not available on eBPF.
-
Fix a panic in containerd if retrieved ociSpec is nil
-
Fix random panic in Kubelet searchPodForContainerID due to concurrent modification of pod.Status.AllContainers
-
Add retries to Kubernetes host tags retrievals, minimize the chance of missing/changing host tags every 30mins
-
Fix rtloader build on strict posix environment, e.g. musl libc on Alpine Linux.
-
Allows system_probe to be enabled without enabling network performance monitoring.
Set
network_config.enabled=false
in yoursystem-probe.yaml
when running the system-probe without networks enabled. -
Fixes truncated output for status of compliance checks in Security Agent.
-
Under some circumstances, the Agent would delete all tags for a workload if
they were collected from different sources, such as the kubelet and docker,
but deleted from only one of them. Now, the agent keeps track of tags per
collector correctly.
Other Notes
-
The utilities provided by the
sysstat
package have been removed: theiostat
,
mpstat
,pidstat
,sar
,sadf
,cifsiostat
andnfsiostat-sysstat
binaries have been removed from the packaged Agent. This has no effect on the
behavior of the Agent and official integrations, but your custom checks may be
affected if they rely on these embedded binaries. -
Activate security-agent service by default in the Linux packages of the Agent (RPM/DEB). The security-agent won't be started if the file /etc/datadog-agent/security-agent.yaml does not exist.
Datadog Agent 6.24.0
6.24.0 ships the same features as 7.24.0 except for the Python versions it supports.
Please refer to the 7.24.0 changelog.
Datadog Cluster Agent 1.9.1
Release Notes
1.9.1
Prelude
Released on: 2020-10-21
Pinned to datadog-agent v7.23.1: CHANGELOG
Bug Fixes
- Support of secrets in JSON environment variables, added in
7.23.0
, is
reverted due to a side effect (e.g. a string value of"-"
would be loaded as a list). This
feature will be fixed and added again in a future release.
7.23.1
Release Notes
7.23.1
Prelude
Release on: 2020-10-21
Bug Fixes
- The
ec2_prefer_imdsv2
parameter was ignored when fetching EC2 tags from the metadata endpoint. This fixes a misleading warning log that was logged even whenec2_prefer_imdsv2
was left disabled in the Agent configuration. - Support of secrets in JSON environment variables, added in 7.23.0, is reverted due to a side effect (e.g. a string value of "-" would be loaded as a list). This feature will be fixed and added again in a future release.
- The Windows installer can now install on domains where the domain name is different from the Netbios name.
6.23.1
Datadog Cluster Agent 1.9.0
Release Notes
1.9.0
Prelude
Pinned to datadog-agent v7.23.0: CHANGELOG.
New Features
- Collect the node and cluster resource in Kubernetes for the Orchestrator Explorer (#6297).
- Add
resolve
option to the endpoint checks (#5918). - Add
health
command (#6144). - Add options to configure the External Metrics Server (#6406).
Enhancement Notes
- Fill DatadogMetric
AutoscalerReferences
field to ease usage/investigation of DatadogMetrics (#6367). - Only run compliance checks on the Cluster Agent leader (#6311).
- Add
orchestrator_explorer
configuration to enable the cluster-id ConfigMap creation and Orchestrator Explorer instanciation (#6189).
Bug Fixes
7.23.0
Release Notes
7.23.0
Prelude
Release on: 2020-10-06
- Please refer to the 7.23.0 tag on
integrations-core
for the list of changes on the Core Checks
Upgrade Notes
- Network monitoring: enable DNS stats collection by default.
New Features
- APM: Decoding errors reported by the datadog.trace-agent.receiver.error and
datadog.trace_agent.normalizer.traces_dropped
contain more detailed reason tags in case of EOFs and timeouts. - Running the agent flare with the -p flag now includes profiles for
the trace-agent. - APM: An SQL query obfuscation cache was added under the feature flag
DD_APM_FEATURES=sql_cache. In most cases where SQL queries are
repeated or prepared, this can significantly reduce CPU work. - Secrets handles are not supported inside JSON value set through
environment variables. For example setting a secret in a list DD_FLARE_STRIPPED_KEYS='["ENC[auth_token_name]"]'
datadog-agent run - Add basic support for UTF16 (BE and LE) encoding. It should be
manually enabled in a log configuration usingencoding: utf-16-be
orencoding: utf-16-le
other values are unsupported and ignored by
the agent.
Enhancement Notes
- Add new configuration parameter to allow 'GroupExec' permission on
the secret-backend command. Set to 'true' the new parameter
'secret_backend_command_allow_group_exec_perm' to activate it. - Add a map from DNS rcode to count of replies received with that
rcode - Enforces a size limit of 64MB to uncompressed sketch payloads
(distribution metrics). Payloads above this size will be split into
smaller payloads before being sent. - APM: Span normalization speed has been increased by 15%.
- Improve the
kubelet
check error reporting in the output of
agent status
in the case where the agent cannot properly connect
to the kubelet. - Add space_id, space_name, org_id and org_name as tags to both autodiscovered
containers as well as checks found through autodiscovery on Cloud
Foundry/Tanzu. - Improves compliance check status view in the security-agent status
command. - Include compliance benchmarks from
github.com/DataDog/security-agent-policies in the Agent packages and
the Cluster Agent image. - Windows Docker image is now based on Windows Server Nano instead of
Windows Server Core. - Allow sending the GCP project ID under the
project_id:
host tag
key, in addition to theproject:
host tag key, with the
gce_send_project_id_tag
config setting. - Add kubeconfig to GCE excluded host
tags (used on GKE) - The cluster name can now be longer than 40 characters, however the
combined length of the host name and cluster name must not exceed
254 characters. - When requesting EC2 metadata, you can use IMDSv2 by turning on a new
configuration option (ec2_prefer_imdsv2
). - When tailing logs from container in a kubernetes environment long
lines (>16kB usually) that got split by the container runtime
(docker & containerd at least) are now reassembled pending they do
not exceed the upper message length limit (256kB). - Move the cluster-id ConfigMap creation, and Orchestrator Explorer
controller instantiation behind the orchestrator_explorer config
flag to avoid it failing and generating error logs. - Add caching for sending kubernetes resources for live containers
- Agent log format improvement: logs can have kv-pairs as context to
make it easier to get all logs for a given context Sample:
2020-09-17 12:17:17 UTC | CORE | INFO |
(pkg/collector/runner/runner.go:327 in work) | check:io | Done
running check - The CRI check now supports container exclusion based on container
name, image and kubernetes namespace. - Added a network_config config to the system-probe that allows the
network module to be selectively enabled/disabled. Also added a
corresponding DD_SYSTEM_PROBE_NETWORK_ENABLED env var. The
network module will only be disabled if the network_config exists
and has enabled set to false, or if the env var is set to false. To
maintain compatibility with previous configs, the network module
will be enabled in all other cases. - Log a warning when a log file is rotated but has not finished
tailing the file. - The NTP check now uses the cloud provider's recommended NTP servers
by default, if the Agent detects that it's running on said cloud
provider.
Deprecation Notes
- process_config.orchestrator_additional_endpoints
and process_config.orchestrator_dd_url are
deprecated in favor of: orchestrator_explorer.orchestrator_additional_endpoints
and orchestrator_explorer.orchestrator_dd_url.
Bug Fixes
- Fixed an issue where the Datadog Agent would improperly filter all
remaining traces in a payload after a trace matching an
ignore_resources
filter was matched. - Allow agent integration install to
work even if the datadog agent configuration file doesn't exist.
This is typically the case when this command is run from a
Dockerfile in order to build a custom image from the datadog
official one. - Implement variable interpolation in the tagger when inferring the
standard tags from theDD_ENV
,DD_SERVICE
andDD_VERSION
environment variables - Fix a bug that was causing not picking checks and logs for
containers targeted by container-image-based autodiscovery. Or
picking checks and logs for containers that were not targeted by
container-image-based autodiscovery. This happened when several
image names were pointing to the same image digest. - APM: Allow digits in SQL literal identifiers (e.g. 1sad123jk)
- Fixes an issue with not always reporting ECS Fargate task_arn tag
due to a race condition in the tag collector. - The SUSE SysVInit service now correctly starts the Agent as the
dd-agent user instead of root. - APM: Allow double-colon operator in SQL obfuscator.
- UDP packets can be sent in two ways. In the "connected" way, a connect call is made first to assign the
remote/destination address, and then packets get sent with the send function or sendto function with destination address
set to NULL. In the "unconnected" way, packets get sent using sendto function with a non NULL destination
address. This fix addresss a bug where network stats were not being
generated for UDP packets sent using the "unconnected" way. - Fix the Windows systray not appearing sometimes (bug introduced with
6.20.0). - The Chocolatey package now uses a fixed URL to the MSI installer.
- Fix logs tagging inconsistency for restarted containers.
- On macOS, in Agent v6, the unversioned python binaries in
/opt/datadog-agent/embedded/bin
(example:python
,pip
) now
correctly point to the Python 2 binaries. - Fix truncated cgroup name on copy with bpf_probe_read_str in OOM
kill and TCP queue length checks. - Use double-precision floats for metric values submitted from Python
checks. - On Windows, the ddtray executable now has a digital signature
- Updates the logs package to get the short image name from Kubernetes
ContainerSpec, rather than ContainerStatus. This works around a
known issue where the image name in the ContainerStatus may be
incorrect. - On Windows, the Agent now responds to control signals from the OS
and shuts down gracefully. Coincidentally, a Windows Agent Container
will now gracefully stop when receiving the stop command.
Other Notes
7.22.1
Prelude
Release on: 2020-09-17
- Please refer to the 7.22.1 tag on integrations-core for the list of changes on the Core Checks
Bug Fixes
- Define a default logs file (security-agent.log) for the security-agent.
- Fix segfault when listing Garden containers that are in error state.
- Do not activate security-agent service by default in the Linux packages of the Agent (RPM/DEB).
The security-agent was already properly starting and exiting if not activated in configuration.
6.22.1
7.22.0
Release Notes
7.22.0
Prelude
Release on: 2020-08-25
- Please refer to the 7.22.0 tag on
integrations-core
for the list of changes on the Core Checks
New Features
- Implements agent-side compliance rule evaluation in security agent
using expressions. - Add IO operations monitoring for Docker check
(docker.io.read/write_operations) - Track TCP connection churn on system-probe
- The new Runtime Security Agent collects file integrity monitoring
events. It is disabled by default and only available for Linux for
now. - Make security-agent part of automatically started agents in
RPM/DEB/etc. packages (will do nothing and exit 0 by default) - Add support for receiving and processing SNMP traps, and forwarding
them as logs to Datadog. - APM: A new trace ingestion endpoint was introduced at /v0.5/traces which
supports a more compact payload format, greatly improving resource usage.
The spec for the new wire format can be viewed at here.
Tracers supporting this change, will automatically use the new endpoint.
Enhancement Notes
-
Adds a gauge for system.mem.slab_reclaimable. This is part
of slab memory that might be reclaimed (i.e. caches). Datadog 7.x
adds SReclaimable memory, if
available on the system, to the system.mem.cached gauge by default. This
may lead to inconsistent metrics for clients migrating from Datadog
5.x, where system.mem.cached didn't
include SReclaimable memory. Adding a
gauge for system.mem.slab_reclaimable allows inverse
calculation to remove this value from the system.mem.cached gauge. -
Expand GCR pause container image filter
-
Kubernetes events for pods, replicasets and deployments now have
tags that match the metrics metadata. Namely, pod_name, kube_deployment, kube_replicas_set. -
Enabled the collection of the kubernetes resource requirements
(requests and limits) by bumping the agent-payload dep. and
collecting the resource requirements. -
Implements resource fallbacks for complex compliance check
assertions. -
Add system.cpu.num_cores metric with the number of CPU cores
(windows/linux) -
compliance: Add support for Go custom compliance checks and
implement two for CIS Kubernetes -
Make DSD Mapper also map metrics that already contain tags.
-
If the retrieval of the AWS EC2 instance ID or hostname fails,
previously-retrieved values are now sent, which should mitigate host
aliases flapping issues in-app. -
Increase default timeout on AWS EC2 metadata endpoints, and make it
configurable withec2_metadata_timeout
-
Add container incl./excl. lists support for ECS Fargate
(process-agent) -
Adds support for a heap profile and cpu profile (of configurable
length) to be created and included in the flare. -
Upgrade embedded Python 3 to 3.8.5. Link to Python 3.8 changelog:
https://docs.python.org/3/whatsnew/3.8.htmlNote that the Python 2 version shipped in Agent v6 continues to be
version 2.7.18 (unchanged). -
Upgrade pip to v20.1.1. Link to pip 20.1.1 changelog:
https://pip.pypa.io/en/stable/news/#id54 -
Upgrade pip-tools to v5.3.1. Link to pip-tools 5.3.1 changelog:
https://github.com/jazzband/pip-tools/blob/master/CHANGELOG.md -
Introduces support for resolving pathFrom from in File and Audit
checks. -
On Windows, always add the user to the required groups during
installation. -
APM: A series of changes to internal algorithms were made which reduced
CPU usage between 20-40% based on throughput.
Bug Fixes
- Allow integration commands to work for pre-release versions.
- [Windows] Ensure
PYTHONPATH
variable is ignored correctly when
initializing the Python runtime. - Enable listening for conntrack info from all namespaces in system
probe - Fix cases where the resolution of secrets in integration configs
would not be performed for autodiscovered containers. - Fixes submission of containers blkio metrics that may modify array
after being already used by aggregator. Can cause missing tags on
containerd.* metrics - Restore support of JSON-formatted lists for configuration options
passed as environment variables. - Don't allow pressing the disable button on checks twice.
- Fix container_include_metrics
support for all container checks - Fix a bug where the Agent disables collecting tags when the cluster
checks advanced dispatching is enabled in the Daemonset Agent. - Fixes a bug where the ECS metadata endpoint V2 would get queried
even though it was not configured with the configuration option
cloud_provider_metadata. - Fix a bug when a kubernetes job has exited after some time the
tagger does not update it even if it did change its state. - Fixes the Agent failing to start on sysvinit on systems with
dpkg >= 1.19.3 - The agent was collecting docker container logs (metrics) even if
they are matching DD_CONTAINER_EXCLUDE_LOGS (resp. DD_CONTAINER_EXCLUDE_METRICS) if they
were started before the agent. This is now fixed. - Fix a bug where the Agent would not remove tags for pods that no
longer exist, potentially causing unbounded memory growth. - Fix pidfile support on security-agent
- Fixed system-probe not working on CentOS/RHEL 8 due to our custom
SELinux policy. We now install the custom policy only on CentOS/RHEL
7, where the system-probe is known not to work with the default. On
other platform the default will be used. - Stop sending payload for Cloud Foundry applications containers that
have no container_name tag attached
to avoid them showing up in the UI with empty name.
Other Notes
- APM: datadog.trace_agent.receiver.* metrics are now also tagged by
endpoint_version