Releases: DataDog/datadog-agent
6.13.0
Prelude
Released on: 2019-07-24
- Please refer to the 6.13.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
- The
port
option in the NTP check configuration is now parsed as an integer instead of a string.
New Features
- APM: add support for Unix Domain Sockets by means of the
apm_config.receiver_socket
configuration. It is off by default. When set, it must point to a valid sock file. - APM: API emitted metrics now have a lang_vendor tag when the Datadog-Meta-Lang-Vendor HTTP header is sent by clients.
- APM: Resource-based rate limiting in the API can now be completely disabled by setting
apm_config.max_memory
and/orapm_config.max_cpu_percent
to the value 0. - Add support for environment variables in checks' config files using the format "%%env_XXXX%%".
- Add new systemd integration to monitor systemd itself and the units managed by systemd.
- The total number of bytes received by dogstatsd is now reported by the
dogstatsd-udp/Bytes
anddogstatsd-uds/Bytes
expvar. - Adds the ability to use
DD_TAGS
to set global tags in Fargate. - Added a support for the new pod log directory pattern introduced in version 1.14 of Kubernetes to make sure the agent keeps on collecting logs after upgrade of a Kubernetes cluster.
Enhancement Notes
- Add a kube_cronjob tag in the tagger. It applies to container metrics, autodiscovery metrics and logs.
- Change the prefix of entity IDs to make it easier to query the tagger without knowing what the container runtime is.
- APM: reduce memory usage in high traffic by up to 10x.
- APM: Services are no longer aggregated in the agent, nor written to the Datadog API. Instead, they are now automatically extracted on the backend based on the received traces.
- APM: The default interval at which the agent watches its resource usage has been reduced from 20s to 10s.
- APM: Improved processing concurrency and as a result, CPU usage decreased by 20% in some scenarios.
- APM: Queued sender was rewritten to improve performance around scenarios where network problems are present.
- APM: Code clean up around configuration and writer.
- The
datadog-agent version
command now prints the version of Golang the agent was compiled with. - Display Go version in output of status command
- Upgraded JMXFetch to 0.30.0. See https://github.com/DataDog/jmxfetch/releases/tag/0.30.0
- APM: the trace agent now lets through a wider variety of traces, automatically correcting some malformed traces instead of dropping them. The following fields are now replaced with reasonable defaults if invalid or empty and truncated if exceeding max length:
span.service
,span.name
,span.resource
,span.type
.span.duration=0
is now allowed. Missing span start date now defaults toduration - now
. Thedatadog.trace_agent.receiver.traces_dropped
metric is now tagged with areason
tag explaining the reason it was dropped. There is a newdatadog.trace_agent.receiver.spans_malformed
metric also tagged byreason
explaining how the span was malformed. - Refactored permissions check in the integration command.
- Support Python 3 for the integration command.
Deprecation Notes
- APM: The presampler has been rebranded as a "rate limiter" to avoid confusing it with other sampling mechanisms.
- APM: The
datadog.trace_agent.presampler_rate
metric has been deprecated in favor ofdatadog.trace_agent.receiver.ratelimit
.
Security Issues
- On Windows, quote the service name when registering service. Mitigates CVE-2014-5455. Note that since the Agent is not running as admin, even a successful attack would not give admin rights as specified in the CVE.
Bug Fixes
- Fix the
tagger
behavior returningNone
when no tags are present for thekubelet
andfargate
integration. - APM: metrics generated by the processing function (such as
*.traces_priority
) now contain language specific tags. - APM: Memory spikes when retry queue grows have been fixed.
- Fix 'vcruntime140.dll is being held in use by the following process.
- System-probe s6 services: ensure that the system-probe binary is bundled before trying to run it / stop it. This is to ensure that the s6-services definitions will be backward compatible with older builds that didn't have the system-probe yet.
- Fix a bug in the log scanning logic of the JMXFetch wrapper that would make JMXFetch hang if it logged a very large log entry
- Fixed an issue where logs collected from kubernetes using '/var/log/pods' would show up with a wrong format '{"log":"x","stream":"y","time":"z"}' on the logs explorer when using docker as container runtime.
- Fix TLS connection handshake that hang forever making the whole logs pipeline to be stucked resulting in logs not being tailed and file descriptor not being closed.
- On Windows, fixes bug in which Agent can't start if the Go runtime can't determine the ddagentuser's profile directory. This information isn't used, so shouldn't cause a failure
- The External Metrics Setter no longer stops trying to get metrics after 3 failed attempts. Instead, it will retry indefinitely.
- Removes an unused duplicate copy of the
system-probe
binary from the Linux packages - The NTP check now properly uses the
port
configuration option.
Other Notes
- Logs informing about check runs and payload submission are now displayed once every 500 events instead of every 20 events.
6.12.2
Prelude
Release on: 2019-07-03
This release is only available on Windows and contains all the changes introduced in 6.12.0 and 6.12.1.
- Please refer to the 6.12.2 tag on integrations-core for the list of changes on the Core Checks
6.12.1
Prelude
Release on: 2019-06-28
This release is not available on Windows.
- Please refer to the 6.12.1 tag on integrations-core for the list of changes on the Core Checks
Bug Fixes
- Fixed a bug in the kubelet and fargate integrations preventing the collection of the
kubernetes.cpu.*
andkubernetes.memory.*
metrics.
6.12.0
Known Issues
Some metrics from the kubernetes and kubelet integrations (kubernetes.cpu.*
and kubernetes.memory.*
) are missing for certain configurations.
A fix will be released in v6.12.1. Meanwhile if downgrading to 6.11.3 is not an option we recommend using the runtime metrics (ex: docker.cpu.*
, docker.mem.*
, containerd.cpu.*
, ...).
Prelude
Release on: 2019-06-26
This release is not available on Windows.
- Please refer to the `6.12.0 tag on integrations-core for the list of changes on the Core Checks
Upgrade Notes
-
APM: Log throttling is now automatically enabled by default when
log_level
differs fromdebug
. A maximum of no more than 10 error
messages every 10 seconds will be displayed. If you had it enabled before,
it can now be removed from the config file. -
On Windows, the path of the embedded
python.exe
binary has changed from%ProgramFiles%\Datadog\Datadog Agent\embedded\python.exe
to%ProgramFiles%\Datadog\Datadog Agent\embedded2\python.exe
. If you use this path from your provisioning scripts, please update it accordingly.
Note: on Windows, to call the embedded pip directly, please use%ProgramFiles%\Datadog\Datadog Agent\embedded2\python.exe -m pip
. -
Logs: Breaking Change for Kubernetes log collection - In the version 6.11.2 logic was added in the Agent to first look for K8s container files if
/var/log/pods
was not available and then to go for the Docker socket.
This created some permission issues as/var/log/pods
can be a symlink in some configuration and the Agent also needed access to the symlink directory.This logic is reverted to its prior behaviour which prioritise the Docker socket for container log collection.
It is still possible to force the agent to go for the K8s log files even if the Docker socket is mounted by using thelogs_config.k8s_container_use_file' or
DD_LOGS_CONFIG_K8S_CONTAINER_USE_FILE`. parameter.
This is recommended when more than 10 containers are running on the same pod.
New Features
-
A count named
datadog.agent.started
is now sent with a value of 1 when the agent starts. -
APM: Maximum allowed CPU percentage usage is now
configurable via DD_APM_MAX_CPU_PERCENT. -
Node Agent can now perform checks on kubernetes service endpoints.
It consumes the check configs from the Cluster Agent API via the
endpointschecks config provider.
Versions 1.3.0+ of the Cluster Agent are required for this feature. -
Logs can now be collected from init and stopped containers (possibly short-lived).
-
Allow tracking pod labels and annotations value change to update labels/annotations_as_tags.
Make the explicit tagging feature dynamic (introduced in #3024).
Enhancement Notes
-
APM: the writer will now flush based on an estimated number of bytes
in accumulated buffer size, as opposed to a maximum number of spans. -
APM: traces are not dropped anymore because or rate limiting due to
performance issues. Instead, the trace is kept in a queue awaiting to
be processed. -
Logs docker container ID when parse invalid docker log in DEBUG level.
-
Set the User-Agent string to include the agent name and version string.
-
Adds host tags in the Hostname section of the
agent status command and the status tab of the GUI. -
Expose the number of logs processed and sent to the agent status
-
Added a warning message on agent status command and status gui
tab when ntp offset is too large and may result in metrics
ignored by Datadog. -
APM: minor improvements to CPU performance.
-
APM: improved trace writer performance by introducing concurrent writing.
-
APM: the stats writer now writes concurrently to the Datadog API, improving resource usage and processing speed of the trace-agent.
-
Extends the docker check to accommodate the kernel memory usage metric.
This metric shows the cgroup current kernel memory allocation. -
Ask confirmation before overwriting the output file while using
the dogstatsd-stats command. -
Do not ship autotools within the Agent package.
-
The
datadog-agent integration
subcommand is now capable of installing prereleases of official integration wheels -
Upgraded JMXFetch to 0.29.1. See https://github.com/DataDog/jmxfetch/releases/tag/0.28.0,
https://github.com/DataDog/jmxfetch/releases/tag/0.29.0 and
https://github.com/DataDog/jmxfetch/releases/tag/0.29.1 -
Added validity checks to NTP responses
-
Allow the '--check_period' flag of jmxfetch to be overriden by the
DD_JMX_CHECK_PERIOD environment variable. -
Ship integrations and their dependencies on Python 3 in Omnibus.
-
Added a warning about unknown keys in datadog.yaml.
Deprecation Notes
- APM: the yaml setting
apm_config.trace_writer.max_spans_per_payload
is no longer in use; writes are now based solely on accumulated byte
size.
Bug Fixes
-
Updated the DataDog/gopsutil library to include changes related to excessive DEBUG logging in the process agent
-
The computeMem is only called in the check when we ensure that it does not get passed with an empty pointer.
But if someone was to reuse it without checking for the nil pointer it could cause a segfault.
This PR moves the nil checking logic inside the function to ensure it is safe. -
APM: Fixed a bug where normalize tag would not truncate tags correctly
in some situations. -
APM: Fixed a small issue with normalizing tags that contained the
unicode replacement character. -
APM: fixed a bug where modulo operators caused SQL obfuscation to fail.
-
Fix issue on process agent for DD_PROCESS_AGENT_ENABLED where 'false' did not turn off process/container collection.
-
Fix an error when adding a custom check config through the GUI
when the folder where the config will reside does not
exist yet. -
APM: on macOS, trace-agent is now enabled by default, and, similarly to other
platforms, can be enabled/disabled with theapm_config.enabled
config setting
or theDD_APM_ENABLED
env var -
Fix a bug where when the log agent is mis-configured, it temporarily hog on resources after being killed
-
Fix a potential crash when doing a
configcheck
while the agent was not properly initialized yet. -
Fix a crash that could occur when having trouble connecting to the Kubelet.
-
Fix nil pointer access for container without memory cgroups.
-
Improved credentials scrubbing logic.
-
The
datadog-agent integration show
subcommand now properly accepts only Datadog integrations as argument -
Fix incorrectly reported IO metrics when OS counters wrap in Linux.
-
Fixed JMXFetch process not being terminated on Windows in certain cases.
-
Empty logs could appear when collecting Docker logs in addition
to the actual container logs. This was due to the way the Agent
handles the header Docker adds to the logs. The process has been
changed to make sure that no empty logs are generated. -
Fix bug when docker container terminate the last logs are missing
and partially recovered from restart. -
Properly move configuration files for wheels installed locally via the
integration
command. -
Reduced memory usage of the flare command
-
Use a custom patch for a costly regex in PyYAML,
see yaml/pyyaml#301. -
On Windows, restore the
system.mem.pagefile.pct_free
metric
Other Notes
- The 'integration freeze' cli subcommand now only
displays datadog packages instead of the complete
result of the 'pip freeze' command.
6.11.3 / 2019-06-04
6.11.3
Prelude
Release on: 2019-06-04
- Please refer to the
6.11.3 tag on process-agent <https://github.com/DataDog/datadog-process-agent/releases/tag/6.11.3>
_ for the list of changes on the Process Agent.
Upgrade Notes
- Upgrade JMXFetch to 0.27.1
Bug Fixes
- APM: fixed a bug where secrets in environment variables were ignored.
6.11.2 / 2019-05-23
6.11.2
Prelude
Release on: 2019-05-23
Enhancement Notes
- Add option
cf_os_hostname_aliasing
to send the OS hostname as an alias when using the BOSH agent on Cloud Foundry.
Bug Fixes
- Fixes problem in which Windows Agent wouldn't install on non-English machines due to assumption that "Performance Monitor Users" didn't need to be localized.
- Windows Installer is now more resilient to missing domain controller.
6.11.1 / 2019-05-06
6.11.1
Release on: 2019-05-06
- Please refer to the 6.11.0 tag on integrations-core for the list of changes on the Core Checks.
- Please refer to the 6.11.1 tag on process-agent for the list of changes on the Process Agent.
Upgrade Notes
- Change the prioritization between the two logic that we have to collect logs on Kubernetes.
Now attempt first to collect logs on '/var/log/pods' and fallback to using the docker socket if the initialization failed.
Bug Fixes
- Fix a bug where short image name wouldn't be properly set on old docker versions
- Properly handle docker container logs in multiline mode in case of infrequence log messages, log file rotations or agent restart
6.11.0
Important: 6.11.0
is not marked as latest for Windows: we are investigating some cases where 6.11.0
is not installing correctly on Windows. Downloading datadog-agent-6-latest.amd64.msi
will give you version 6.10.1
.
Prelude
Release on: 2019-04-17
-
Please refer to the
6.11.0
tag on integrations-core https://github.com/DataDog/integrations-core/blob/master/AGENT_CHANGELOG.md#datadog-agent-version-6110 for the list of changes on the Core Checks. -
Please refer to the
6.11.0
tag on process-agent https://github.com/DataDog/datadog-process-agent/releases/tag/6.11.0 for the list of changes on the Process Agent.
Upgrade Notes
-
APM: move flush notifications from level "INFO" to "DEBUG"
-
APM: logging format has been changed to match the format of the core agent.
-
Metrics coming through dogstatsd with the following internal prefixes:
activemq
,activemq_58
,cassandra
,jvm
,presto
,solr
,tomcat
,kafka
,datadog.trace_agent
,datadog.process
,datadog.agent
,datadog.dogstatsd
are no longer affected by thestatsd_metric_namespace
option. -
Removed the internal ability to send logs to a specific logset at agent level.
-
On Windows, the Datadog Agent now runs as a non-privileged user (ddagentuser by default) rather than LOCAL_SYSTEM. Please refer to our dedicated docs for more information
-
The Windows installer will no longer allow direct downgrades; if a downgrade is required, the user must uninstall the newer version and install the older version.
New Features
-
Secrets beta feature is now available on windows allowing users to pull secrets from secret management services.
-
APM: JSON logging is now supported using the
log_format_json: true
setting. -
Collect container thread count and thread limit
-
JMXFetch upgraded to 0.27.0. See
0.27.0
https://github.com/DataDog/jmxfetch/releases/tag/0.27.0 for more details. -
The agent now ignores pod that exited more than 15 minutes ago to reduce its resource footprint when pods are not garbage-collected.
This is configurable with the kubernetes_pod_expiration_duration option. -
Now support CRI-O container runtime for log collection on Kubernetes.
-
Automatically add a "dirname" tag representing the directory of logs tailed from a wildcard path.
Enhancement Notes
-
AutoDiscovery can now monitor unready pods.
It looks for a new pod annotation "ad.datadoghq.com/tolerate-unready" which, if set totrue
will make AutoDiscovery monitor that pod regardless of its readiness state. -
Add the ability for the
datadog-agent check
command to have Python checks start an interactive debugging session. -
Change the logging format to include the name of the logging agent instead of appending it in the agent container logs.
-
Add /metrics to the bare endpoints the agent can access.
This is required to support querying endpoints protected by RBAC, by kube-rbac-proxy for instance. -
APM: errors reported by the receiver's HTTP server are now shown in the logs.
-
APM: slightly improved normalization error logs.
-
On Windows, allows Agent to be installed to nonstandard directories.
Uses APPLICATIONDATADIRECTORY to set the root of the configuration file tree, and PROJECTLOCATION to set the root of the binary tree. Please refer to the docs for more details -
In order to decrease the number of API DCA request, the Agent now uses a different API endpoint to call the DCA's API only once in order to retrieve the Pods metadata.
-
Host metadata payloads are now zlib-compressed
-
Log file size and number of rotation is now configurable.
-
Add a command
dogstatsd-stats
to the agent to get basic stats about the processed metrics. -
Support JSON arrays within environment variables, in addition to space separated values.
-
On Google Compute Engine, the Agent now reports
<instance_name>.<project_id>
as a host alias instead of<hostname_prefix>.<prefix_id>
, which improves the uniqueness and relevance of the host alias when the GCE instance has a custom hostname. -
The import command doesn't stop anymore when there is no
conf.d
orauto_conf
directory. -
Kubernetes event collection timeout can now be configured.
-
Improve status page by splitting errors and warnings from the Logs agent
-
Secrets are no longer decrypted in agent command when it's not needed (commands like hostname, launchgui, configuration ...). This reduce the number of times the 'secret_command_backend' executable will be called.
-
Improved memory efficiency on hosts sending very high numbers of metrics.
-
Resolve once the DNS name given by docker and try the associated IP to reach the kubelet.
Prioritize HTTPS over HTTP to connect to kubelet.
Prioritize communication using IPs over hostnames to spare DNS servers accross the cluster.
Deprecation Notes
- Removal of largely unused go SNMP check. SNMP support still provided by the python variant.
Bug Fixes
-
Fix an auto-discovery annotation value parsing limitation in version 6 compared to version 5.
Now,ad.datadoghq.com/*.instances
annotation key supports value like[[{"foo":"bar1"}, {"foo":"bar2"}], {"name":"bar3"}]
-
The agent container will now output valid JSON when using JSON log format.
-
APM: Multiple value "Content-Type" headers are now parsed correctly for media type in the HTTP receiver.
-
APM: always reply with correct Content-Type in API responses.
-
APM: when a span's resource is empty, the error "
Resource
can not be empty" will be returned instead of the wrong "Resource
is invalid UTF-8". -
APM: sensitive information is now scrubbed from logs.
-
APM: Fix issue with
--version
flag when API key is unset. -
APM: Ensure UTF-8 characters are not cut mid-way when truncating
span fields. -
Metrics coming through dogstatsd with the following internal prefixes:
activemq
,activemq_58
,cassandra
,jvm
,presto
,solr
,tomcat
,kafka
,datadog.trace_agent
,datadog.process
,datadog.agent
,datadog.dogstatsd
are no longer affected by thestatsd_metric_namespace
option. -
Fixes ec2 tags collection when datadog agent is deployed into a kubernetes cluster along with kube2iam.
-
Fixes bug in which upgrading from agent5 doesn't correctly import the configuration
-
Fix a race condition in gohai that could make the Agent crash while collecting the host's filesystem metadata
-
Hostnames containing characters that are invalid for a filename no longer prevent the agent from generating a flare.
-
Allow macOS users to invoke the
datadog-agent integration
command as root since the installation directory is owned by root. -
Change to a randomized exponential backoff in case of connection failure
-
Ignore empty logs_dd_url to fall back on default config for logs agent.
-
Detect and handle Docker logs with only header and empty content
-
To mitigate issues with the hostname detection on AKS, hostnames gathered from the metadata endpoints of AWS, GCE, Azure, and Alibaba cloud are no longer considered valid if their length exceeds 255 characters.
Other Notes
- Bump embedded Python to 2.7.16
6.10.2
6.10.1
Prelude
Release on: 2019-03-07
Bug Fixes
-
APM: Mixing cases in
apm_config.analyzed_spans
andapm_config.analyzed_rate_by_service
entries is now allowed. Service names and operation names will be treated as case insensitive. -
Refactor the
ContainerdUtil
so that each call to thecontainerd
api has a dedicated timeout.