Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiline parser stops working after Pod restarts #5256

Closed
irizzant opened this issue Apr 7, 2022 · 14 comments
Closed

multiline parser stops working after Pod restarts #5256

irizzant opened this issue Apr 7, 2022 · 14 comments
Labels
waiting-for-user Waiting for more information, tests or requested changes

Comments

@irizzant
Copy link

irizzant commented Apr 7, 2022

Bug Report

Describe the bug
I have fluentbit deployed in my k8s cluster configured to join Java stacktraces using multiline parser.

Here the configuration I use:

[SERVICE]
    Daemon Off
    Flush 1
    Log_Level info
    Parsers_File parsers.conf
    Parsers_File custom_parsers.conf
    HTTP_Server On
    HTTP_Listen 0.0.0.0
    HTTP_Port 2020
    Health_Check On
    storage.path /tmp/flb-storage/
    storage.backlog.mem_limit 500M

[INPUT]
    Name tail
    Path /var/log/containers/*.log
    Exclude_Path /var/log/containers/*fluent*.log
    multiline.parser cri
    Tag kube.*
    Mem_Buf_Limit 50MB
    Buffer_Chunk_Size 100MB
    Buffer_Max_Size 200MB
    Skip_Long_Lines On
    storage.type filesystem
[INPUT]
    Name systemd
    Tag host.*
    Systemd_Filter _SYSTEMD_UNIT=kubelet.service
    Read_From_Tail On

[FILTER]
    name                  multiline
    match                 kube.*
    buffer                on
    multiline.key_content log
    multiline.parser      multiline-java
    emitter_storage.type  filesystem
    emitter_mem_buf_limit 200MB
[FILTER]
    Name kubernetes
    Match kube.*
    Merge_Log On
    Keep_Log Off
    K8S-Logging.Parser On
    K8S-Logging.Exclude On
[FILTER]
    Name nest
    Match kube.*
    Operation lift
    Wildcard pod_name*
    Wildcard namespace_name*
    Wildcard host*
    Nested_under kubernetes
[FILTER]
    Name record_modifier
    Match kube.*
    Allowlist_key log
    Allowlist_key pod_name
    Allowlist_key namespace_name
    Allowlist_key host    

[OUTPUT]
    Name stdout
    Match *

Parser configuration:

[PARSER]
    Name docker_no_time
    Format json
    Time_Keep Off
    Time_Key time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
[MULTILINE_PARSER]
    name          multiline-java
    type          regex
    flush_timeout 1000
    #
    # Regex rules for multiline parsing
    # ---------------------------------
    #
    # configuration hints:
    #
    #  - first state always has the name: start_state
    #  - every field in the rule must be inside double quotes
    #
    # rules |   state name  | regex pattern                  | next state
    # ------|---------------|--------------------------------------------
    rule      "start_state"   "/(\d{8}_\d{6}_\d{3})(.*)/"      "cont"
    rule      "cont"          "/^[^\d].*/"                      "cont"

When the Pod first starts everything seems to be working fine:

[58] kube.var.log.containers.wms-5d8dfff84-94rkr_wms_wms-94af0bc11387e5980942ee6da5fe67f4d8c05a4d296e5a0b3a687b436299d6a4.log: [1649318345.423328754, {"log"=>"20220407_095905_418 ERROR #[[ServerService Thread Pool -- 622@srv=wms-5d8dfff84-94rkr]]# #[[it.sdb.jee.init.modules.ModuleRTC]]# Impossibile avviare integrazione con RTC: java.lang.RuntimeException: Unable to lookup service
        at it.sdb.apps.crmtrk.eng.rtc.RTCServiceFactories.getRtcConnection(RTCServiceFactories.java:33) [IntegrationService-RTC-API-7.0.1-SNAPSHOT.jar:]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_322]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_322]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_322]
        at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_322]
... 

But if I restart the Pod, multiline parser stops working:

[55] kube.var.log.containers.wms-859b94b47-vpbqm_wms_wms-1ae71c0dd10ed85802c7a3a18223b27c4b87db90ca46444e55d9fe7c9dc01cf3.log: [1649318612.231703657, {"log"=>"20220407_100332_226 ERROR #[[ServerService Thread Pool -- 288@srv=wms-859b94b47-vpbqm]]# #[[it.sdb.jee.init.modules.ModuleRTC]]# Impossibile avviare integrazione con RTC: java.lang.RuntimeException: Unable to lookup service", "pod_name"=>"wms-859b94b47-vpbqm", "namespace_name"=>"wms", "host"=>"k3d-test-server-0"}]
[56] kube.var.log.containers.wms-859b94b47-vpbqm_wms_wms-1ae71c0dd10ed85802c7a3a18223b27c4b87db90ca46444e55d9fe7c9dc01cf3.log: [1649318612.231735453, {"log"=>" at it.sdb.apps.crmtrk.eng.rtc.RTCServiceFactories.getRtcConnection(RTCServiceFactories.java:33) [IntegrationService-RTC-API-7.0.1-SNAPSHOT.jar:]", "pod_name"=>"wms-859b94b47-vpbqm", "namespace_name"=>"wms", "host"=>"k3d-test-server-0"}]
[57] kube.var.log.containers.wms-859b94b47-vpbqm_wms_wms-1ae71c0dd10ed85802c7a3a18223b27c4b87db90ca46444e55d9fe7c9dc01cf3.log: [1649318612.231739087, {"log"=>" at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_322]", "pod_name"=>"wms-859b94b47-vpbqm", "namespace_name"=>"wms", "host"=>"k3d-test-server-0"}]

To Reproduce

  • Steps to reproduce the problem:
  • deploy fluentbit with Helm chart with this values file fluentbit.yaml
    Expected behavior
    multiline parser should keep working

Your Environment

  • Version used: 1.8.14
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?): k3d version v5.2.1
    k3s version v1.21.7-k3s1 (default)
@chenlingmin
Copy link

chenlingmin commented Apr 8, 2022

#5245
I have the same problem.

You can try adding the number of replicas of the Pod directly without restarting the Pod, I guess you should get the same result as above.

@irizzant
Copy link
Author

scaling the replicas doesn't change the end result

@trallnag
Copy link

I have the same issue and I can reliably reproduce the problem.

  1. Pod x is running
  2. Deployed Fluent Bit
  3. Logs of pod x are merged
  4. Deleted pod x
  5. Deployment starts a new pod x
  6. Logs of pod x are not being merged anymore

Your Environment

  • Chart version 0.19.23
  • Image aws-for-fluent-bit version 2.23.3 (basically 1.8.15)

@lecaros
Copy link
Contributor

lecaros commented Apr 22, 2022

Hi,
do you have reproduction steps that we can just run and replicate?
Have you used built-in java parser?

@lecaros lecaros added waiting-for-user Waiting for more information, tests or requested changes and removed status: waiting-for-triage labels Apr 22, 2022
@irizzant
Copy link
Author

@lecaros I've added here the detailed configuration and example inputs, what else do you need as a reproducer?

@trallnag
Copy link

@lecaros, yes, I have used the built-in java parser.

@dalcouffe
Copy link

I have this exact same issue running fluent bit version 1.9.3.

@ehazan
Copy link

ehazan commented May 8, 2022

Same issue for me at 1.9.3. Fluentbit starts up fine but when I take the deployment replicas to 0 and then back to 1 it breaks
The multi-line isn't functioning anymore

@milen-simeonov
Copy link

Same issue for me at 1.9.3 and 1.9.4

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Sep 29, 2022
@irizzant
Copy link
Author

irizzant commented Sep 29, 2022

Not stale!

@github-actions github-actions bot removed the Stale label Sep 30, 2022
@kwkgaya
Copy link

kwkgaya commented Oct 7, 2022

Can you re-test with 1.9.6 as it seems the issue might be fixed.

@milen-simeonov
Copy link

Yes, I can confirm - the issue has been fixed since 1.9.6

@irizzant
Copy link
Author

irizzant commented Oct 7, 2022

Closing the issue since it's fixed in 1.9.6

@irizzant irizzant closed this as completed Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-for-user Waiting for more information, tests or requested changes
Projects
None yet
Development

No branches or pull requests

8 participants