Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluent-bit: JSON records are incorrectly un-escaped before writing to Loki #3245

Closed
sbaier1 opened this issue Jan 27, 2021 · 1 comment · Fixed by #3240
Closed

fluent-bit: JSON records are incorrectly un-escaped before writing to Loki #3245

sbaier1 opened this issue Jan 27, 2021 · 1 comment · Fixed by #3240

Comments

@sbaier1
Copy link
Contributor

sbaier1 commented Jan 27, 2021

Describe the bug
When wrapping log records with additional labels (for example when adding all k8s metadata to the record or adding other information using FLB filters), the resulting log record JSON is no longer valid, as newlines within the log field are being replaced with the corresponding control character, yielding invalid JSON and thus no longer being process-able.

This appears to be a regression due to the fix introduced for this issue: #2246

The original poster doesn't provide a proper reproducer as their fluent-bit log record output already passed through the kubernetes filter and has been parsed into labels, but from the logs and screenshots they show i am guessing they are using JSON logging and expecting the parsed logs to be displayed in un-escaped manner directly. Judging from this, here's an approximate reproducer for that original log line:

apiVersion: v1
kind: Pod
metadata:
  labels:
    run: busybox
  name: test-logger
spec:
  containers:
  - image: busybox
    name: test-logger
    resources:
      limits:
        cpu: 100m
        memory: 100Mi
    command:
      - /bin/sh
      - -c
    args:
      - |
        while true; do
          echo '{"app":"spring.application.name_IS_UNDEFINED","ts":"2020-06-21T11:45:31.031Z","logger":"org.zalando.problem.spring.web.advice.AdviceTrait","level":"ERROR","class":"org.zalando.problem.spring.web.advice.AdviceTrait","method":"log","file":"AdviceTrait.java","line":246,"thread":"http-nio-8088-exec-3","stack":"com.intellecteu.miml.ug.annotationsui.controller.errors.InternalServerErrorException: Could not fetch collections from database\n            at com.intellecteu.miml.ug.annotationsui.utils.IOUtils.convertStreamIntoCollection(IOUtils.java: 91)\n            at com.intellecteu.miml.ug.annotationsui.service.GridFsServiceImpl.lambda$fetchCollections$1(GridFsServiceImpl.java: 100)\n            at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java: 193)\n            at java.util.Iterator.forEachRemaining(Iterator.java: 116)\n            at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java: 1801)\n            at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java: 482)\n            at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java: 472)\n            at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java: 708)\n            at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java: 234)\n            at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java: 499)\n            at com.intellecteu.miml.ug.annotationsui.service.CollectionServiceImpl.getCollections(CollectionServiceImpl.java: 167)"}'
          sleep 1
        done
  dnsPolicy: ClusterFirst
  restartPolicy: Never

The log generated by this does indeed print \n characters when reading them from Loki without the fix proposed in that issue, but this is actually correct. After all we're putting JSON into Loki, within which newlines need to be escaped. I'm not sure why this fix was implemented, when the actual proper way to process the log record the poster proposes is something like this IMHO, to parse and then implicitly unescape the JSON when reading it with a query:
{instance="test-logger"} | json | stack != "" | line_format "{{ .stack }}"
(or, alternatively: Pretty printing the nested JSON in grafana's explore tab somehow. The fix surely can't be breaking JSON encoding when writing it to Loki IMO.)

To Reproduce
From the PR:
An easy reproducer would be:

  1. Deploy fluent-bit using the community chart, and keep the kubernetes key by removing it from this list: https://github.com/grafana/helm-charts/blob/main/charts/fluent-bit/values.yaml#L20
  2. Query the generated logs in Grafana explore tab and pass them through the json step
  3. Many of the log lines (that contain newlines) will fail to be parsed (returning a JSONParserErr) because they are, in fact, not stored in Loki as valid JSON anymore, as the stdout of the container has been unescaped yielding an invalid JSON string.

Expected behavior
All log records that contain multiple fields (either because the log emitted from the application is JSON or because high-cardinality "parsed fields"(?) are added as wrapped information in a fluent-bit stage to be parsed only when reading the logs instead of having them indexed) should be possible to parse by LogQL's json stage. With this misbehavior, it is impossible to work with container logs that contain wrapped meta-information.

The only reason the current default config of the fluent-bit chart works is because the kubernetes and stream labels are dropped, leading to no parsed fields being attached to the records and the record being put as plaintext as there is no wrapping JSON.

Environment:

  • Infrastructure: K8s, on EKS, applications that generate multi-line logs (containerd's line breaks at the end of each message suffice)
  • Deployment tool: helm, Fluent bit chart but with the grafana/fluent-bit-plugin-loki image, version 2.1.0

Screenshots, Promtail config, or terminal output
A screenshot of a broken log record as it is returned in the .data.result[].values[] array of records, unescaped from Loki's JSON wrapping, showing the reason why Loki's JSON parser is throwing an error: an actual newline control character is printed in the log, thus breaking the JSON encoding.
Bildschirmfoto 2021-01-27 um 14 17 07

@sbaier1 sbaier1 changed the title fluent-bit: JSON records are incorrectly un-escaped fluent-bit: JSON records are incorrectly un-escaped before writing to Loki Jan 27, 2021
@sbaier1
Copy link
Contributor Author

sbaier1 commented Jan 27, 2021

@cyriltovena would love if you could explain if i'm misunderstanding something here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant