Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent-bit v1.8.15 and v1.9.3 output azure on Windows fails to connect to Log Analytics #5381

Closed
desek opened this issue Apr 28, 2022 · 8 comments

Comments

@desek
Copy link

desek commented Apr 28, 2022

Bug Report

Describe the bug
When running fluent-bit 1.8.15 or 1.9.3 on Windows containers (Windows Server 2019 Datacenter 10.0.17763.2686 containerd://1.6.1) with Kubernetes 1.22.8 the output plugin azure output connection error and fails to sent data to Log Analytics.

The same error appears both with servercore and nanoserver images as both ContainerUser and ContainerAdministrator user.

To Reproduce

  • Example log message:
[2022/04/28 15:16:25] [debug] [input chunk] update output instances with new chunk size diff=1027
[2022/04/28 15:16:25] [debug] [task] created task=0000029E74F3BF80 id=0 OK
[2022/04/28 15:16:25] [error] [tls] error: unexpected EOF
[2022/04/28 15:16:25] [debug] [upstream] connection #1116 failed to 3863cb67-6c46-4780-854d-5737842a4d18.ods.opinsights.azure.com:443
[2022/04/28 15:16:25] [debug] [out flush] cb_destroy coro_id=0
[2022/04/28 15:16:25] [debug] [retry] new retry created for task_id=0 attempts=1
[2022/04/28 15:16:25] [ warn] [engine] failed to flush chunk '5548-1651158984.511061600.flb', retry in 6 seconds: task_id=0, input=tail.0 > output=azure.0 (out_id=0)
[2022/04/28 15:16:26] [debug] [input chunk] update output instances with new chunk size diff=1027
[2022/04/28 15:16:26] [debug] [task] created task=0000029E74F3B180 id=1 OK
[2022/04/28 15:16:26] [error] [tls] error: unexpected EOF
[2022/04/28 15:16:26] [debug] [upstream] connection #1152 failed to 3863cb67-6c46-4780-854d-5737842a4d18.ods.opinsights.azure.com:443
[2022/04/28 15:16:26] [debug] [out flush] cb_destroy coro_id=1
[2022/04/28 15:16:26] [debug] [retry] new retry created for task_id=1 attempts=1
[2022/04/28 15:16:26] [ warn] [engine] failed to flush chunk '5548-1651158986.5265500.flb', retry in 9 seconds: task_id=1, input=tail.0 > output=azure.0 (out_id=0)
[2022/04/28 15:16:27] [debug] [input chunk] update output instances with new chunk size diff=1027
[2022/04/28 15:16:27] [debug] [task] created task=0000029E74F3BD00 id=2 OK
[2022/04/28 15:16:27] [error] [tls] error: unexpected EOF
  • Steps to reproduce the problem:
  1. Start a plain windows server core or nanoserver container
  2. Download and install fluent-bit from zip-file
  3. Run fluent-bit with configuration (see below)
  4. Error appears

Expected behavior

Output plugin should successfully send data to Log Analytics.

Your Environment

  • Version used: 1.8.15 and 1.9.3
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?):
    • Kubernetes 1.22.8
    • Windows Server 2019 Datacenter 10.0.17763.2686 containerd://1.6.1
  • Server type and version: N/A
  • Operating System and version:
    • mcr.microsoft.com/windows/nanoserver:1809 runtime container
    • mcr.microsoft.com/windows/servercore:1809 runtime container
  • Filters and plugins:
    • Input: tail
    • Filter: kubernetes
    • Output: azure
    • Parser: cri

Additional context
Config:

    [SERVICE]
        Flush         1
        Log_Level     trace
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              C:\\var\\log\\containers\\fluent-bit*.log
        Parser            cri
        DB                C:\\var\\flb\\tail_cri.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Kube_CA_File        C:\\var\\run\\secrets\\kubernetes.io\\serviceaccount\\ca.crt
        Kube_Token_File     C:\\var\\run\\secrets\\kubernetes.io\\serviceaccount\\token
        Kube_Tag_Prefix     kube.C.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
    [OUTPUT]
        Name        azure
        Match       *
        tls         on
        tls.debug   4
        Customer_ID 3863cb67-6c46-4780-854d-5737842a4d18
        Shared_Key  <redacted>
    [PARSER]
        # http://rubular.com/r/tjUt3Awgg4
        Name cri
        Format regex
        Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
@bryangardner
Copy link

I am seeing similar issue when trying to use the cloudwatch_logs output in a Windows-based FluentBit container on Kubernetes. I see the same [tls] error: unexpected EOF when trying to connect to AWS STS and CloudWatch.

I suspect something is going wrong when trying to negotiate TLS in Windows containers.

FluentBit Version: 1.9.1
Windows OS (K8s node): Server 2019
Image base: mcr.microsoft.com/windows/servercore:ltsc2019

@desek
Copy link
Author

desek commented May 3, 2022

As a workaround I configured the output with tls.verify Off. Not optimal, but it gets the job done for now.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2022

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Aug 2, 2022
@bryangardner
Copy link

Please remove stale

@github-actions github-actions bot removed the Stale label Aug 5, 2022
@xulfiqar1
Copy link

xulfiqar1 commented Aug 11, 2022

As a workaround I configured the output with tls.verify Off. Not optimal, but it gets the job done for now.

@desek and @bryangardner I am facing a similar issue on a windows node in EKS cluster. Fluenentbit logs are similar to #4727

I tried 'tls.verify Off' in output but errors persist. Any suggestion to workaround this?

{"log":"[2022/08/11 07:18:33] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with Kubelet...\r\n","stream":"stderr","time":"2022-08-11T07:18:33.5290575Z"}
{"log":"[2022/08/11 07:18:33] [debug] [filter:kubernetes:kubernetes.0] Send out request to Kubelet for pods information.\r\n","stream":"stderr","time":"2022-08-11T07:18:33.5296845Z"}
{"log":"[2022/08/11 07:18:34] [error] [tls] C:\src\src\tls\mbedtls.c:390 NET - Sending information through the socket failed\r\n","stream":"stderr","time":"2022-08-11T07:18:34.5385134Z"}
{"log":"[2022/08/11 07:18:34] [debug] [upstream] connection #792 failed to 127.0.0.1:10250\r\n","stream":"stderr","time":"2022-08-11T07:18:34.5385134Z"}
{"log":"[2022/08/11 07:18:34] [error] [filter:kubernetes:kubernetes.0] kubelet upstream connection error\r\n","stream":"stderr","time":"2022-08-11T07:18:34.5385134Z"}
{"log":"[2022/08/11 07:18:34] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD fluent-bit-windows-92pjh\r\n","stream":"stderr","time":"2022-08-11T07:18:34.5385134Z"}

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Nov 10, 2022
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 16, 2022
@thebridge90
Copy link

Please re-open, I am seeing the same issue when using 2.0.5 on Windows 2022 nodes in AKS
Disabling the tls.verify works and logs are pushed to LAW.

Kubernetes version: 1.24.6
Node image: AKSWindows-2022-containerd-20348.1131.221019
image: ghcr.io/fluent/fluent-bit/staging:windows-2022-2.0.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants