This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
Version microsoft/oms:3.1.4 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:3.1.4 (linux)
Version microsoft/oms:win-3.1.5 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-3.1.5 (windows)
- Windows Agent
- Downgrade telegraf to version 1.24.2 due to Telegraf Data Collector Service error
Note: Starting 03/01/2023 we have moved to the semver versioning system for naming image tags with the release 3.1.4
Version microsoft/oms:3.1.4 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:3.1.4 (linux)
Version microsoft/oms:win-3.1.4 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-3.1.4 (windows)
- Common
- Move to semver versioning system for naming imageTags
- Fix bug in overwriting prometheus namespace
- Fix kube events timestamp using metadata timestamp
- Moved syslog telemetry from AI to pod inventory
- Update telegraf to version 1.25.2
- Fix bug wherein sidecar container was not muted by default
- Removed labelSelector from ama-logs.yaml and helm charts
- Update fluent-bit to version 2.0.5
- Enabled multiline support for ContainerLogV2 via configmap
- Linux Agent
- Update ruby to version 3.1.3
- Windows Agent
- Fix podName missing bug in ContainerLogV2
Note : The agent version(s) below has dates (ciprod<mmddyyyy>), which indicate the agent build dates (not release dates)
Version microsoft/oms:ciprod01182023-095c864a Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01182023-095c864a (linux)
Version microsoft/oms:win-ciprod01182023-095c864a Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod01182023-095c864a (windows)
- Linux Agent
- Replace ifconfig.co url with mcr.microsoft.com for network connectivity test
- Add xvda in devices so that telegraf can fetch the Disk IO data from xvda disk as well
- Common
- Added the config option to ignore proxy settings for the AMPLS + Proxy environment
- Changes to the MSI Onboarding ARM templates for handling DCR naming and handling the scenario where DCR name is more than 64 characters and also removing the spaces in the workspace region if user inputs the workspace region name with spaces
- Add specific master/control-plane toleration key to fix the node upgrade issue in Arc extension.
- AKS, Arc K8s and Provisioned cluster template updates for Data collection settings.
- Implementation of Data collection settings for Cost optimization feature
- Telemetry to track data collection settings enablement and settings
- Agent rename changes for the Azure Arc Provisioned cluster onboarding templates
- Update for troubleshooting script for collecting syslog data
Version microsoft/oms:ciprod12032022-c9f3dc30 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod12032022-c9f3dc30 (linux)
Version microsoft/oms:win-ciprod12032022-c9f3dc30 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod12032022-c9f3dc30 (windows)
- Linux Agent
- Enable syslog support
- Windows Agent
- Use ghrelease as gcc download path
- Move fluent-bit conf update before fluent-bit start
- Make containerd and secure port as default for windows same as linux
- Common (Linux & Windows Agent)
- Add FluentBit Telegraf Plugin buffer and chunk config options to configmap
- Remove references to healthservice in troubleshooting script
- Add pods namespace label name for all telegraf conf
- Update telegraf version from 1.23.2 to 1.24.2
- Address Computer Name mismatch in KubeEvents
Version microsoft/oms:ciprod10042022-3c05dd1b Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10042022-3c05dd1b (linux)
Version microsoft/oms:win-ciprod10042022-3c05dd1b Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod10042022-3c05dd1b (windows)
- Common
- Rename resource names from omsagent based to ama-logs based #823
- Chart major version bumped to 3.0.0 (no auto update)
- Change image tag format from ciprod<mmddyyyy> to ciprod<mmddyyyy>-<short-commit-id> where commit-id is the id of the last commit to be included in the release
- Add gatekeeper-system to exlude namespace #833
- flag to ignore proxy settings which are automatically set by arc k8s extension #824
- Added new rule group and rule properties #818
- fix: Update DCR creation to Clusters resource group instead of workspace (ARM Template update) #819
- Support LCM #820
- Bug fixes
Version microsoft/oms:ciprod08102022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08102022 (linux)
Version microsoft/oms:win-ciprod08102022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod08102022 (windows)
-
Linux Agent
- Consume ruby from RVM instead of brightbox ppa
-
Common
- Update to Ruby 3.1.1
- Update telegraf to 1.23.2
- Updates fluentd to 1.14.6
- Use default JSON gem instead of yajl-json
- Consume tomlrb as a gem instead of committed source code
- Move from beta.kubernetes.io to kubernetes.io
- Bug fixes
- Fix bug in processing fractional memory limits
- Fix log loss due to inode reuse
Version microsoft/oms:ciprod06272022-hotfix Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06272022-hotfix (linux)
- Fixes for sending the proper node allocatable cpu and memory value for the container which does not specify limits.
Version microsoft/oms:ciprod06272022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06272022 (linux)
- Fixes for following bugs in ciprod06142022 which are caught in AKS Canary region deployment
- Fix the exceptions related to file write & read access of the MDM inventory state file
- Fix for missing Node GPU allocatable & capacity metrics for the clusters which are whitelisted for AKS LargeCluster Private Preview feature
Version microsoft/oms:ciprod06142022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06142022 (linux)
Version microsoft/oms:win-ciprod06142022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod06142022 (windows)
- Linux Agent
- Prometheus sidecar memory optimization
- Fix for issue of Telegraf connecting to FluentD Port 25228 during container startup
- Add integration for collecting Subnets IP usage metrics for Azure CNI (turned OFF by default)
- Replicaset Agent improvements related to supporting of 5K Node cluster scale
- Common (Linux & Windows Agent)
- Make custom metrics endpoint configurable to support edge environments
- Misc
- Moved Trivy image scan to Azure Pipeline
Version microsoft/oms:ciprod05192022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05192022 (linux)
Version microsoft/oms:win-ciprod05192022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05192022 (windows)
- Linux Agent
- PodReadyPercentage metric bug fix
- add cifs & fuse file systems to ignore list
- CA Cert Fix for Mariner Hosts in Air Gap
- Disk usage metrics will no longer be collected for the paths "/mnt/containers" and "/mnt/docker"
- Windows Agent
- Ruby version upgrade from 2.6.5.1 to 2.7.5.1
- Added Support for Windows Server 2022
- Multi-Arch Image to support both Windows 2019 and Windows 2022
- Common (Linux & Windows Agent)
- Telegraf version update from 1.20.3 to 1.22.2 to fix the vulnerabilitis
- Removal of Health feature as part of deprecation plan
- AAD Auth MSI feature support for Arc K8s (not usable externally yet)
- MSI onboarding ARM template updates for both AKS & Arc K8s
- Fixed the bug related to windows metrics in MSI mode for AKS
- Configmap updates for log collection settings for v2 schema
- Misc
- Improvements related to CI/CD Multi-arc image
- Do trivy rootfs checks
- Disable push to ACR for PR and PR updates
- Enable batch builds
- Scope Dev/Prod pipelines to respective branches
- Shorten datetime component of image tag
- Troubleshooting script updates for MSI onboarding
- Instructions for testing of agent in MSI auth mode
- Add CI Windows Build to MultiArch Dev pipeline
- Updates related to building of Multi-arc image for windows in Build Pipeline and local dev builds
- Test yamls to test container logs and prometheus scraping on both WS2019 & WS2022
- Arc K8s conformance test updates
- Script to collect the Agent logs for troubleshooting
- Force run trivy stage for Linux
- Fix docker msi download link in windows install-build-pre-requisites.ps1 script
- Added Onboarding templates for legacy auth for internal testing
- Update the Build pipelines to have separate phase for Windows
- Improvements related to CI/CD Multi-arc image
Version microsoft/oms:ciprod03172022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod03172022 (linux)
Version microsoft/oms:win-ciprod03172022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod03172022 (windows)
- Linux Agent
- Multi-Arch Image to support both AMD64 and ARM64
- Ruby upgraded to version 2.7 from 2.6
- Fix Telegraf Permissions
- Fix ADX bug with database name
- Vulnerability fixes
- MDSD updated to 1.17.0
- HTTP Proxy support
- Retries for Log Analytics Ingestion
- ARM64 support
- Memory leak fixes for network failure scenario
- Windows Agent
- Bug fix for FluentBit stdout and stderr log filtering
- Common
- Upgrade Go lang version from 1.14.1 to 1.15.14
- MSI onboarding ARM template update
- AKS HTTP Proxy support
- Go packages upgrade to address vulnerabilities
Version microsoft/oms:ciprod01312022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01312022 (linux)
Version microsoft/oms:win-ciprod01312022 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod01312022 (windows)
- Linux Agent
- Configurable DB name via configmap for ADX (default DB name:containerinsights)
- Default to cAdvisor port to 10250 and container runtime to Containerd
- Update AgentVersion annotation in yamls (omsagent and chart) with released MDSD agent version
- Incresing windows agent CPU limits from 200m to 500m
- Ignore new disk path that comes from containerd starting with k8s version >= 1.19.x, which was adding unnecessary InsightsMetrics logs and increasing cost
- Route the AI SDK logs to log file instead of stdout
- Telemetry to collect ContainerLog Records with empty Timestamp
- FluentBit version upgrade from 1.6.8 to 1.7.8
- Windows Agent
- Update to use FluentBit for container log collection and removed FluentD dependency for container log collection
- Telemetry to track if any of the variable fields of windows container inventory records has field size >= 64KB
- Add windows os check in in_cadvisor_perf plugin to avoid making call in MDSD in MSI auth mode
- Bug fix for placeholder_hostname in telegraf metrics
- FluentBit version upgrade from 1.4.0 to 1.7.8
- Common
- Upgrade FluentD gem version from 1.12.2 to 1.14.2
- Upgrade Telegraf version from 1.18.0 to 1.20.3
- Fix for exception in node allocatable
- Telemetry to track nodeCount & containerCount
- Other changes
- Updates to Arc K8s Extension ARM Onboarding templates with GA API version
- Added ARM Templates for MSI Based Onboarding for AKS
- Conformance test updates relates to sidecar container
- Troubelshooting script to detect issues related to Arc K8s Extension onboarding
- Remove the dependency SP for CDPX since configured to use MSI
- Linux Agent Image build improvements
- Update msys2 version to fix windows agent build
- Add explicit exit code 1 across all the PS scripts
Version microsoft/oms:ciprod10132021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10132021 (linux)
Version microsoft/oms:win-ciprod10132021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod10132021 (windows)
- Linux Agent
- MDSD Proxy support for non-AKS
- log rotation for mdsd log files {err,warn, info & qos}
- Onboarding status
- AAD Auth MSI changes (not usable externally yet)
- Upgrade k8s and adx go packages to fix vulnerabilities
- Fix missing telegraf metrics (TelegrafMetricsSentCount & TelegrafMetricsSendErrorCount) in mdsd route
- Improve fluentd liveness probe checks to handle both supervisor and worker process
- Fix telegraf startup issue when endpoint is unreachable
- Windows Agent
- Windows liveness probe optimization
- Common
- Add new metrics to MDM for allocatable % calculation of cpu and memory usage
- Other changes
- Helm chart updates for removal of rbac api version and deprecation of.Capabilities.KubeVersion.GitVersion to .Capabilities.KubeVersion.Version
- Updates to build and release ev2
- Scripts to collect troubleshooting logs
- Unit test tooling
- Yaml updates in parity with aks rp yaml
- upgrade golang version for windows in pipelines
- Conformance test updates
Version microsoft/oms:ciprod08052021-1 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08052021-1 (linux)
- Bumping image tag for some tooling (no code changes except the IMAGE_TAG environment variable)
Version microsoft/oms:ciprod08052021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08052021 (linux)
- Linux Agent
- Fix for CPU spike which occurrs at around 6.30am UTC on every day because of unattended package upgrades
- Update MDSD build which has fixes for the following issues
- Undeterministic Core dump issue because of the non 200 status code and runtime exception stack unwindings
- Reduce the verbosity of the error logs for OMS & ODS code paths.
- Increase Timeout for OMS Homing service API calls from 30s to 60s
- Fix for Azure/AKS#2457
- In replicaset, tailing of the mdsd.err log file to agent telemetry
Version microsoft/oms:win-ciprod06112021-2 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod06112021-2 (windows)
- Hotfix for fixing NODE_IP environment variable not set issue for non sidecar mode
Version microsoft/oms:ciprod06112021-1 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06112021-1 (linux)
Version microsoft/oms:win-ciprod06112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod06112021 (windows)
- Hotfix for crash in clean_cache in in_kube_node_inventory plugin
- We didn't rebuild windows container, so the image version for windows container stays the same as last release (ciprod:win-ciprod06112021) before this hotfix
Version microsoft/oms:ciprod06112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06112021 (linux)
Version microsoft/oms:win-ciprod06112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod06112021 (windows)
- Linux Agent
- Removal of base omsagent dependency
- Using MDSD version 1.10.1 as base agent for all the supported LA data types
- Ruby version upgrade to 2.6 i.e. same version as windows agent
- Upgrade FluentD gem version to 1.12.2
- All the Ruby Fluentd Plugins upgraded to v1 as per Fluentd guidance
- Fluent-bit tail plugin Mem_Buf_limit is configurable via ConfigMap
- Windows Agent
- CA cert changes for airgapped clouds
- Send perf metrics to MDM from windows daemonset
- FluentD gem version upgrade from 1.10.2 to 1.12.2 to make same version as Linux Agent
- Doc updates
- README updates related to OSM preview release for Arc K8s
- README updates related to recommended alerts
Version microsoft/oms:ciprod05202021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05202021 (linux)
- Telegraf now waits 30 seconds on startup for network connections to complete (Linux only)
- Change adding telegraf to the liveness probe reverted (Linux only)
Version microsoft/oms:ciprod05122021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05122021 (linux)
- Upgrading oneagent to version 1.8 (only for Linux)
- Enabling oneagent for container logs for East US 2
Version microsoft/oms:ciprod04222021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04222021 (linux)
Version microsoft/oms:win-ciprod04222021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod04222021 (windows)
- Bug fixes for metrics cpuUsagePercentage and memoryWorkingSetPercentage for windows nodes
- Added metrics for threshold violation
- Made Job completion metric configurable
- Udated default buffer sizes in fluent-bit
- Updated recommended alerts
- Fixed bug where logs written before agent starts up were not collected
- Fixed bug which kept agent logs from being rotated
- Bug fix for Windows Containerd container log collection
- Bug fixes
- Doc updates
- Minor telemetry changes
Version microsoft/oms:ciprod03262021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod03262021 (linux)
Version microsoft/oms:win-ciprod03262021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod03262021 (windows)
- Started collecting new metric - kubelet running pods count
- Onboarding script fixes to add explicit json output
- Proxy and token updates for ARC
- Doc updates for Microsoft charts repo release
- Bug fixes for trailing whitespaces in enable-monitoring.sh script
- Support for higher volume of prometheus metrics by scraping metrics from sidecar
- Update to get new version of telegraf - 1.18
- Add label and field selectors for prometheus scraping using annotations
- Support for OSM integration
- Removed wireserver calls to get CA certs since access is removed
- Added liveness timeout for exec for linux containers
Version microsoft/oms:ciprod02232021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod02232021 (linux)
Version microsoft/oms:win-ciprod02232021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod02232021 (windows)
- ContainerLogV2 schema support for LogAnalytics & ADX (not usable externally yet)
- Fix nodemetrics (cpuusageprecentage & memoryusagepercentage) metrics not flowing. This is fixed upstream for k8s versions >= 1.19.7 and >=1.20.2.
- Fix cpu & memory usage exceeded threshold container metrics not flowing when requests and/or limits were not set
- Mute some unused exceptions from going to telemetry
- Collect containerimage (repository, image & imagetag) from spec (instead of runtime)
- Add support for extension MSI for k8s arc
- Use cloud specific instrumentation keys for telemetry
- Picked up newer version for apt
- Add priority class to daemonset (in our chart only)
Version microsoft/oms:ciprod01112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01112021 (linux)
Version microsoft/oms:win-ciprod01112021 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod01112021 (windows)
- Fixes for Linux Agent Replicaset Pod OOMing issue
- Update fluentbit (1.14.2 to 1.6.8) for the Linux Daemonset
- Make Fluentbit settings: log_flush_interval_secs, tail_buf_chunksize_megabytes and tail_buf_maxsize_megabytes configurable via configmap
- Support for PV inventory collection
- Removal of Custom metric region check for Public cloud regions and update to use cloud environment variable to determine the custom metric support
- For daemonset pods, add the dnsconfig to use ndots: 3 from ndots:5 to optimize the number of DNS API calls made
- Fix for inconsistency in the collection container environment variables for the pods which has high number of containers
- Fix for disabling of std{out;err} log_collection_settings via configmap issue in windows daemonset
- Update to use workspace key from mount file rather than environment variable for windows daemonset agent
- Remove per container info logs in the container inventory
- Enable ADX route for windows container logs
- Remove logging to termination log in windows agent liveness probe
- Fix for duplicate windows metrics
Version microsoft/oms:ciprod10272020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10272020 (linux)
Version microsoft/oms:win-ciprod10272020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod10272020 (windows)
- Activate oneagent in few AKS regions (koreacentral,norwayeast)
- Disable syslog
- Fix timeout for Windows daemonset liveness probe
- Make request == limit for Windows daemonset resources (cpu & memory)
- Schema v2 for container log (ADX only - applicable only for select customers for piloting)
Version microsoft/oms:ciprod10052020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10052020 (linux)
Version microsoft/oms:win-ciprod10052020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod10052020 (windows)
- Health CRD to version v1 (from v1beta1) for k8s versions >= 1.19.0
- Collection of PV usage metrics for PVs mounted by pods (kube-system pods excluded by default)(doc-link-needed)
- Zero fill few custom metrics under a timer, also add zero filling for new PV usage metrics
- Collection of additional Kubelet metrics ('kubelet_running_pod_count','volume_manager_total_volumes','kubelet_node_config_error','process_resident_memory_bytes','process_cpu_seconds_total','kubelet_runtime_operations_total','kubelet_runtime_operations_errors_total'). This also includes updates to 'kubelet' workbook to include these new metrics
- Collection of Azure NPM (Network Policy Manager) metrics (basic & advanced. By default, NPM metrics collection is turned OFF)(doc-link-needed)
- Support log collection when docker root is changed with knode. Tracked by this issue
- Support for Pods in 'Terminating' state for nodelost scenarios
- Fix for reduction in telemetry for custom metrics ingestion failures
- Fix CPU capacity/limits metrics being 0 for Virtual nodes (VK)
- Add new custom metric regions (eastus2,westus,australiasoutheast,brazilsouth,germanywestcentral,northcentralus,switzerlandnorth)
- Enable strict SSL validation for AppInsights Ruby SDK
- Turn off custom metrics upload for unsupported cluster types
- Install CA certs from wire server for windows (in certain clouds)
Note: This agent release targetted ONLY for non-AKS clusters via Azure Monitor for containers HELM chart update
Version microsoft/oms:ciprod09162020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod09162020 (linux)
Version microsoft/oms:win-ciprod09162020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod09162020 (windows)
- Collection of Azure Network Policy Manager Basic and Advanced metrics
- Add support in Windows Agent for Container log collection of CRI runtimes such as ContainerD
- Alertable metrics support Arc K8s cluster to parity with AKS
- Support for multiple container log mount paths when docker is updated through knode
- Bug fix related to MDM telemetry
Version microsoft/oms:ciprod08072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08072020 (linux)
Version microsoft/oms:win-ciprod08072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod08072020 (windows)
- Collection of KubeState metrics for deployments and HPA
- Add the Proxy support for Windows agent
- Fix for ContainerState in ContainerInventory to handle Failed state and collection of environment variables for terminated and failed containers
- Change /spec to /metrics/cadvisor endpoint to collect node capacity metrics
- Disable Health Plugin by default and can enabled via configmap
- Pin version of jq to 1.5+dfsg-2
- Bug fix for showing node as 'not ready' when there is disk pressure
- oneagent integration (disabled by default)
- Add region check before sending alertable metrics to MDM
- Telemetry fix for agent telemetry for sov. clouds
Version microsoft/oms:ciprod07152020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod07152020 (linux)
Version microsoft/oms:win-ciprod05262020-2 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Following hotfixes which are applicable only for Linux agent
- Fix the issue related to collection of multi-containers in pod for the ContainerInventory table
- Fix the containerhostname field value to have podname rather than nodename in ContainerInventory table
- Fix OOM issue during container startup if there are high number of pods or containers on the node
- Fix the ContainerName field value same as before in ContainerInventory table
- We didn't rebuild windows container, so the image version for windows container stays the same as last release (ciprod:win-ciprod05262020-2) before this hotfix
Version microsoft/oms:ciprod06302020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06302020 (linux)
Version microsoft/oms:win-ciprod05262020-2 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Hotfix for nested JSON log parsing bug (applicable only to Linux Daemonset)
- We didn't rebuild windows container, so the image version for windows container stays the same as last release (ciprod:win-ciprod05262020-2) before this hotfix
Version microsoft/oms:win-ciprod05262020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05262020-2 (windows)
- Update application insights instrumentation key for windows image to point to the production instance
Version microsoft/oms:ciprod05222020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod05222020 (linux)
Version microsoft/oms:win-ciprod05222020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:win-ciprod05222020 (windows)
- Windows Daemonset - Collection of Windows std/stderr logs
- More Alerable Metrics (going to Metrics Store/custom metrics - see Customer Impact section below for metrics list)
- Fix OOM-ing at high prometheus scrape volume
- Update fluentbit (0.14.4 to 1.4.2)
- Drop non-numeric metrics thru Telegraf
- Reduce Health exception (when API server response is nil)
- Add 'Computer' dimension to all telemetry (internal use)
- Support for specifiying HTTP & HTTPS Proxy for outbound/egress (applicable only for non-AKS clusters)
- Move to rbac.authorization.k8s.io/v1 for ClusterRole & ClusterRoleBinding
- Move to apiextensions.k8s.io/v1 for Health CRD
- Windows Logs - Customers will see agent automatically start collecting windows container STDOUT/STDERR logs sending them to same loganaytics workspace (containerlogs table)
- Alertable metrics - Customers will see the below metrics & namespaces in 'Metrics' TOC for AKS clusters
- Metrics
- diskUsagePercentage
- completedJobsCount
- oomKilledContainerCount
- podReadyPercentage
- restartingContainerCount
- cpuExceededPercentage
- memoryRssExceededPercentage
- memoryWorkingSetExceededPercentage
- Metric Namespaces
- insights.container/containers
- Metrics
- HTTP/S Proxy support - For non-AKS clusters, proxy can be configured when installing thru HELM. Please see documentation for more details
Note: This agent release targetted ONLY for non-AKS clusters via Azure Monitor for containers HELM chart update
Version microsoft/oms:ciprod04162020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04162020
- Add support for rate limiting
- Add support for Container Runtime Interface compatible container runtime(s) like CRI-O and ContainerD
- cAdvisor APIs are used to collect the container inventory for Docker/Moby and CRI runtime K8s environments
- Based on the container runtime, corresponding container log FluentBit parser(docker/cri) selected
- Ingestion will throttle the workspaces if the agent on the cluster sending the beyond Log Analytics Workspace throttling limits i.e. 500 MB/s
- On Docker runtime environments, Inventory of the containers obtained earlier via Docker REST API. Agent now uses the cAdvisor APIs to get the inventory of the containers for Docker and non-Docker container runtime environments.
Version microsoft/oms:ciprod03022020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod03022020
- Collection of GPU metrics as InsightsMetrics
- Enable config map settings to enable collection of 'Normal' kube events
- Fix kubehealth exceptions to handle empty/nil kube api responses
- Get resource limits for health and MDM from kubelet instead of kube api
- Bug fix for windows node image collection where image name contains multiple slashes
- Exclude ARO master node for data collection
- Telemetry for kube events flushed
- Changes to support msi for mdm if service principal doesnt exist
- Changes for AKS telemetry to ping ods endpoint first and then network check
- KubeEvents bug fix for KubeEvent type
- Providing capability for customers to collect 'Normal' kube events using config map
- Metrics for GPU are collected and ingested to customers workspace if they have GPU enabled nodes
- Bug fix for windows container image collection allows customers to get the right data in the ContainerInventory table for windows containers.
Version microsoft/oms:ciprod01072020 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod01072020
- Switch between 10255(old) and 10250(new) ports for cadvisor for older and newer versions of kubernetes
- Node cpu, node memory, container cpu and container memory metrics were obtained earlier by querying kubelet readonly port(http://$NODE_IP:10255). Agent now supports getting these metrics from kubelet port(https://$NODE_IP:10250) as well. During the agent startup, it checks for connectivity to kubelet port(https://$NODE_IP:10250), and if it fails the metrics source is defaulted to readonly port(http://$NODE_IP:10255).
Version microsoft/oms:ciprod12042019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod12042019
- Fix scheduler for all input plugins
- Fix liveness probe
- Reduce chunk sizes for all fluentD buffers to support larger clusters (nodes & pods)
- Chunk Kubernetes API calls (pods,nodes,events)
- Use HTTP.start instead of HTTP.new
- Merge KubePerf into KubePods & KubeNodes
- Merge KubeServices into KubePod
- Use stream based yajl for JSON parsing
- Health - Query only kube-system pods
- Health - Use keep_if instead of select
- Container log enrichment (turned OFF by default for ContainerName & ContainerImage)
- Application Insights Telemetry - Async
- Fix metricTime to be batch time for all metric input plugins
- Close socket connections properly for DockerAPIClient
- Fix top un handled exceptions in Kubernetes API Client and pod inventory
- Fix retries, wait between retries, chunk size, thread counts to be consistent for all FluentD workflows
- Back-off for containerlog enrichment K8S API calls
- Add new regions (3) for Azure Monitor Custom metrics
- Increase the cpu(1 core) & memory(750Mi) limits for replica-set to support larger clusters (nodes & pods)
- Move to Ubuntu 18.04 LTS
- Support for Kubernetes 1.16
- Use ifconfig for detecting network connectivity issues
- Collect eventType != Normal
Version microsoft/oms:ciprod10112019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod10112019
- Update prometheus config scraping capability to restrict collecting metrics from pods in specific namespaces.
- Feature to send custom configuration/prometheus scrape errors to KubeMonAgentEvents table in customer's workspace.
- Bug fix to collect data for init containers for Container Logs, KubePodInventory and Perf.
- Bug fix for empty array being a valid setting in custom config in configmap.
- Restrict kubelet_docker_operations and kubelet_docker_operations_errors to create_containers, remove_containers and pull_image operations.
- Fix top exceptions in telemetry
Version microsoft/oms:ciprod08222019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod08222019
- Cluster Health Private Preview based on config map setting
- Update resource requests for replicaset to 110m and 250Mi
- Update custom metrics supported regions
- Fix for promethus config map telemetry
- Telemetry for controller kind
- Update url to use one of the whitelisted urls for cp monitor telemetry
- Configmap with clusterid for AKS to be used by Application Insights
Version microsoft/oms:ciprod07092019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod07092019
- Prometheus custom metric collection using config map allowing omsagent to
- Scrape metrics from user defined urls
- Scrape kubernetes pods with prometheus annotations
- Scrape metrics from kubernetes services
- Exception fixes in daemonset and replicaset
- Container Inventory plugin changes to get image id from the repo digest and populate repository for image with only image digest
- Remove telegraf errors from being sent to ApplicationInsights and instead log it to stderr to provide visibility for customers
- Bug fixes for region names with spaces being processed incorrectly while sending mdm metrics
- Add log size in telemetry
- Remove buffer chunk size and buffer max size from fluentbit configuration
Version microsoft/oms:ciprod06142019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod06142019
- MDM pod metrics bug fixes - MDM rejecting pod metrics due to nodename or controllername dimensions being empty
- Prometheus metrics collection by default in every node for kubelet docker operations and kubelet docker operation errors
- Telegraf metric collection for diskio and networkio metrics
- Agent Configuration/ Settings for data collection
- Cluster level log collection enable/disable option
- Ability to enable/disable stdout and/or stderr logs collection per namespace
- Cluster level environment variable collection enable/disable option
- Config file version & config schema version
- Pod annotation for supported config schema version(s)
- Log collection optimization/tuning for better performance
- Derive k8s namespaces from log file name (instead of making call to k8s api service)
- Do not tail log files for containers in the excluded namespace list (if excluded both in stdout & stderr)
- Limit buffer size to 1M and flush logs more frequently [every 10 secs (instead of 30 secs)]
- Tuning of several other fluent bit settings
- Increase requests
- Replica set memory request by 75M (100M to 175M)
- Daemonset CPU request by 25m (50m to 75m)
- Will be pushing image only to MCR ( no more Docker) starting this release. AKS-engine will also start to pull our agent image from MCR
Version microsoft/oms:ciprod043232019 Version mcr.microsoft.com/azuremonitor/containerinsights/ciprod:ciprod04232019
- Windows node monitoring (metrics & inventory)
- Telegraf integration (Telegraf metrics to LogAnalytics)
- Node Disk usage metrics (used, free, used%) as InsightsMetrics
- Resource stamping for all types (inventory, metrics (perf), metrics (InsightsMetrics), logs) [Applicable only for AKS clusters]
- Upped daemonset memory request (not limit) from 150Mi to 225 Mi
- Added liveness probe for fluentbit
- Fix for MDM filter plugin when kubeapi returns non-200 response
- Fix for closing response.Body in outoms
- Update Mem_Buf_Limit to 5m for fluentbit
- Tail only files that were modified since 5 minutes
- Remove some unwanted logs that are chatty in outoms
- Fix for MDM disablement for AKS-Engine
- Fix for Pod count metric (same as container count) in MDM
- Container logs enrichment optimization
- Get container meta data only for containers in current node (vs cluster before)
- Update fluent bit 0.13.7 => 0.14.4
- This fixes the escaping issue in the container logs
- Mooncake cloud support for agent (AKS only)
- Ability to disable agent telemetry
- Ability to onboard and ingest to mooncake cloud
- Add & populate 'ContainerStatusReason' column to KubePodInventory
- Alertable (custom) metrics (to AzureMonitor - only for AKS clusters)
- Cpuusagenanocores & % metric
- MemoryWorkingsetBytes & % metric
- MemoryRssBytes & % metric
- Podcount by node, phase & namespace metric
- Nodecount metric
- ContainerNodeInventory_CL to fixed type
- Omsagent - 1.8.1.256 (nov 2018 release)
- Persist fluentbit state between container restarts
- Populate 'TimeOfCommand' for agent ingest time for container logs
- Get node cpu usage from cpuusagenanoseconds (and convert to cpuusgaenanocores)
- Container Node Inventory - move to fluentD from OMI
- Mount docker.sock (Daemon set) as /var/run/host
- Add omsagent user to docker group
- Move to fixed type for kubeevents & kubeservices
- Disable collecting ENV for our oms agent container (daemonset & replicaset)
- Disable container inventory collection for 'sandbox' containers & non kubernetes managed containers
- Agent telemetry - ContainerLogsAgentSideLatencyMs
- Agent telemetry - PodCount
- Agent telemetry - ControllerCount
- Agent telemetry - K8S Version
- Agent telemetry - NodeCoreCapacity
- Agent telemetry - NodeMemoryCapacity
- Agent telemetry - KubeEvents (exceptions)
- Agent telemetry - Kubenodes (exceptions)
- Agent telemetry - kubepods (exceptions)
- Agent telemetry - kubeservices (exceptions)
- Agent telemetry - Daemonset , Replicaset as dimensions (bug fix)
- Disable Container Image inventory workflow
- Kube_Events memory leak fix for replica-set
- Timeout (30 secs) for outOMS
- Reduce critical lock duration for quicker log processing (for log enrichment)
- Disable OMI based Container Inventory workflow to fluentD based Container Inventory
- Moby support for the new Container Inventory workflow
- Ability to disable environment variables collection by individual container
- Bugfix - No inventory data due to container status(es) not available
- Agent telemetry cpu usage & memory usage (for DaemonSet and ReplicaSet)
- Agent telemetry - log generation rate
- Agent telemetry - container count per node
- Agent telemetry - collect container logs from agent (DaemonSet and ReplicaSet) as AI trace
- Agent telemetry - errors/exceptions for Container Inventory workflow
- Agent telemetry - Container Inventory Heartbeat
- Fix for containerID being 00000-00000-00000
- Move from fluentD to fluentbit for container log collection
- Seg fault fixes in json parsing for container inventory & container image inventory
- Telemetry enablement
- Remove ContainerPerf, ContainerServiceLog, ContainerProcess fluentd-->OMI workflows
- Update log level for all fluentD based workflows
- Changes for node lost scenario (roll-up pod & container statuses as Unknown)
- Discover unscheduled pods
- KubeNodeInventory - delimit multiple true node conditions for node status
- UTF Encoding support for container logs
- Container environment variable truncated to 200K
- Handle json parsing errors for OMI provider for docker
- Test mode enablement for ACS-engine testing
- Latest OMS agent (1.6.0-163)
- Latest OMI (1.4.2.5)
- Remove node-0 dependency
- Remove passing WSID & Key as environment variables and pass them as kubernetes secret (for non-AKS; we already pass them as secret for AKS)
- Please note that if you are manually deploying thru yaml you need to -
- Provide workspaceid & key as base64 encoded strings with in double quotes (.yaml has comments to do so as well)
- Provide cluster name twice (for each container – daemonset & replicaset)
- Kubernetes RBAC enablement
- Latest released omsagent (1.6.0-42)
- Bug fix so that we do not collect kube-system namespace container logs when kube api calls fail occasionally (Bug #215107)
- .yaml changes (for RBAC)