Releases: scalyr/scalyr-agent-2
Releases · scalyr/scalyr-agent-2
Endora
Features:
- Ability to upload logs to different Scalyr team accounts by specifying different API keys for different log files. See RELEASE_NOTES for more details.
- New configuration option
default_workers_per_api_key
which creates more than one session with the Scalyr servers to increase upload throughput. This may be set using theSCALYR_DEFAULT_WORKERS_PER_API_KEY
environment variable. - New configuration option
use_multiprocess_copying_workers
which uses separate processes for each upload session, thereby providing more CPU resources to the agent. This may be set using theSCALYR_USE_MULTIPROCESS_COPYING_WORKERS
environment variable.
Improvements: - Linux system metrics monitor now ignores the following special mounts points by default:
/sys/*
,/dev*
,/run*
. If you want still capturedf.*
metrics for those mount points, please refer to RELEASE_NOTES. - Update
url_monitor
so it sends correctUser-Agent
header which identifies requests are originating from the agent.
Misc:
- The default value for the
k8s_cri_query_filesystem
Kubernetes monitor config option (set via theSCALYR_K8S_CRI_QUERY_FILESYSTEM
environment var) has changed toTrue
. This means that by default when in CRI mode, the monitor will only query the filesystem for the list of active containers, rather than first querying the Kubelet API. If you wish to revert to the original default to prefer using the Kubelet API, setSCALYR_K8S_CRI_QUERY_FILESYSTEM
the environment variable to "false" for the Scalyr Agent daemonset. - New
global_monitor_sample_interval_enable_jitter
config option has been added which is enabled by default. When this option is enabled, random sleep between 2/10 and 8/10 of the configured monitor sample gather interval is used before gathering the sample for the first time. This ensures that sample gathering for all the monitors doesn't run at the same time. This comes in handy when running agent configured with many monitors on lower powered devices to spread the monitor sample gathering related load spike across a longer time frame.
Bug fixes:
- Fix to make sure we don't expect a valid Docker socket when running Kubernetes monitor in CRI mode. This fixes an issue preventing the K8s monitor from running in CRI mode if Docker is not available.
- Fix line grouping code and make sure we don't throw if line data contains bad or partial unicode escape sequence.
- Fix
scalyr_agent/run_monitor.py
script so it also works correctly out of the box when using source code installation. - Update Windows System Metrics monitor to better handle a situation when disk io counters are not available.
- Docker monitor has been fixed that when running in "API mode" (
docker_raw_logs: false
) it also correctly ingests logs from containerstderr
. Previously only logs fromstdout
have been ingested.
Hydrus
Features:
- Add new
initial_stopped_container_collection_window
configuration option to the Kubernetes monitor, which can be configured by setting theSCALY_INITIAL_STOPPED_CONTAINER_COLLECTION_WINDOW
environment variable. By default, the Scalyr Agent does not collect the logs from any pods stopped before the agent was started. To override this, set this parameter to the number of seconds the agent will look in the past (before it was started). It will collect logs for any pods that was started and stopped during this window. This can be useful in autoscaling environments to ensure all pod logs are captured since node creation, even if the Scalyr Agent daemonset starts just after other pods.
Improvements:
- Improve logging in the Kubernetes monitor.
- On agent start up we now also log the locale (language code and encoding) used by the agent process. This will make it easier to troubleshoot issues which are related to the agent process not using UTF-8 coding.
- Default value for
tcp_buffer_size
Syslog monitor config option has been increased from 2048 to 8192 bytes. - New
message_size_can_exceed_tcp_buffer
config option has been added to Syslog monitor. When set to True, monitor will support messages which are larger thantcp_buffer_size
bytes in size andtcp_buffer_size
config option will tell how much bytes we try to read from the socket at once / in a single recv() call. For backward compatibility reasons, it defaults to False.
Bug fixes:
- Fix a bug / race-condition in Docker monitor which could cause, under some scenarios, when monitoring containers running on the same host, logs to stop being ingested after the container restart. There was a relatively short time window when this could happen and it was more likely to affect containers which take longer to stop / start.
- Update code for all the monitors to correctly use UTC timezone everywhere. Previously some of the code incorrectly used local server time instead of UTC. This means some of those monitors could exhibit incorrect / undefined behavior when running the agent on a server which has local time set to something else than UTC.
- Fix
docker_raw_logs: false
functionality in the Docker monitor which has been broken for a while now. - Update Windows System Metrics monitor to better handle a situation when disk io counters are not available.
Celaeno
Bug fixes:
- Fix
scalyr-agent-2 status
command non-fatal error when running status command multiple times concurrently or in a short time frame. - Fix
scalyr-agent-status
command to not log config override warning to stdout since it may interfere with consumers of the status command output. - Fix merging of active-checkpoints.json and checkpoints.json checkpoint file data. Previously data from active checkpoints file was not correctly merged into full checkpoint data file which means that under some scenarios (e.g. agent crashed after active checkpoint file was written, but before full checkpoint file was written), data which was already sent to the server could be sent twice. Actual time window when this could happen was relatively small since full checkpoint data is written out every 60 seconds by default.
- Fix Postgres monitor error when specifying the Postgres
database_port
in the agent config.
Betelgeuze
- Upgrade
psutil
dependency which incorporates many critical fixes. As part of the change, Windows Server 2003/XP is no longer supported. - Small fix for the
pywin32
library which is used in the Windows version.
Aqua
Features:
- Add new
win32_max_open_fds
configuration option which allows user to overwrite maximum open file limit on Windows for the scalyr agent process.
Bug fixes:
- Fix bug in packaging which would cause agent to sometimes crash on Windows when using windows event log monitor.
Alcor
Bug fixes:
- Fix formatting of the "Health Check:" line in ``scalyr-agent-2 status -v` command output and make sure the value is left padded and consistent with other lines.
- Fix reporting of "Last successful communication with Scalyr" line value in the
scalyr-agent-2 status -v
command output if we never successfuly establish connection with the Scalyr API. - Fix a regression in
scalyr-agent-2-config --upgrade-windows
functionality which would sometimes throw an exception, depending on the configuration values.
Security fixes and improvments:
- Fix a bug with the agent not correctly validating that the hostname which is stored inside the certificate returned by the server matches the one the agent is trying to connect to (
scalyr_config
option). This would open up a possibility for MITM attack in case the attacker was able to spoof or control the DNS. - Fix a bug with the agent not correctly validating the server certificate and hostname when using
scalyr-agent-2-config --upgrade-windows
functionality under Python < 2.7.9. This would open up a possibility for MITM attack in case the attacker was able to spoof or control the DNS. - When connecting to the Scalyr API, agent now explicitly requests TLS v1.2 and aborts connection if the server doesn't support it or tries to use an older version. Recently Scalyr API deprecated support for TLS v1.1 which allows us to implement this change which makes the agent more robust against potential downgrade attacks. Due to lack of required functionality in older Python versions, this is only true when running the agent under Python >= 2.7.9.
- When connecting to the Scalyr API, server now sends a SNI header which matches the host specified in the agent config. Due to lack of required functionality in older Python versions, this is only true when running the agent under Python >= 2.7.9.
Ursa
Bug fixes:
- Fixed a regression in Scalyr Windows Agent cmdlet script (
ScalyrShell.cmd
) which prevents the agent from starting.
Titan
Features:
- The
status -v
command now contains health check information, and will have a return code of2
if the health check has failed. New optional flag for thestatus
CLI command-H
returns a short status with only health check info. A new configuration featurehealthy_max_time_since_last_copy_attempt
defines how many seconds is acceptable for the Agent to not attempt to send up logs before the health check should fail, defaulting to60.0
. For more information, please refer to the release notes document. - Kubernetes yaml has been updated to include a liveliness check based on the new health check info, which will cause a pod restart if the agent is considered unhealthy.
Bug fixes:
- Fixed race condition in pipelined requests which could lead to duplicate log upload, especially for systems with a large number of inactive log files. Log files would be reuploaded from their start over short period of time (seconds to minutes). This bug is triggered when pipelining is enabled, either by explicitly setting the
pipeline_threshold
config option or by using a Scalyr Agent release >= 2.1.6 (pipelining was turned on by default in 2.1.6). - Fixed the misconfiguration in Windows packager which causes some number of the monitors to not be included in Windows version. This generates import errors when attempting to use monitors like the syslog or shell monitor.
Misc:
compression_level
configuration option now defaults to6
when usingdeflate
compression_type
(deflate
is the default value for thecompression_type
configuration option).6
offers the best trade off between compression ratio and CPU usage. For more information, please refer to the release notes document.
Serenity
Features:
- New configuration feature
k8s_logs
allows configuring of Kubernetes logs similarly to thelogs
configuration but matches based on Kubernetes pod, namespace, and container name. Please see the RELEASE_NOTES for more details.
Bug fixes:
- Fixed race condition that sometimes resulted in duplicated K8s logs being uploaded on agent restart or configuration update.
Misc:
- The Windows package is now built using
pyInstaller
instead ofpy2exe
. As part of the change, we are no longer supporting 32-bit Windows systems. Nothing else should change due move topyInstaller
.
Rama
Features:
- New configuration option
max_send_rate_enforcement
allows setting a limit on the rate at which the Agent will upload log bytes to Scalyr. You may wish to set this if you are worried about bursts of log data from problematic files and want to avoid getting charged for these bursts. - New default overrides for a number of configuration parameters that will result in a higher throughput for the Agent. If you were relying on the lower throughput as a makeshift rate limiter we recommend setting the new
max_send_rate_enforcement
configuration option to an acceptable rate or "legacy" to maintain the current behavior. See the RELEASE_NOTES for more details.
Minor updates:
- Default value for
max_line_size
has been raised to 49900. If you have this value in your configuration you may wish to not set it anymore to use the new default.
Bug fixes:
- Fixed Syslog monitor issue causing the monitor to write binary strings to the log file.