Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsers: Add containerd log parser #873

Closed
wants to merge 1 commit into from
Closed

parsers: Add containerd log parser #873

wants to merge 1 commit into from

Conversation

karlskewes
Copy link

@karlskewes karlskewes commented Oct 31, 2018

Kubernetes 1.11 with Containerd runtime.

Example Fluent-bit log line on disk:

root@fluent-bit-29xjh:/# more /var/log/containers/fluent-bit-29xjh_dev-eeva-logging_fluent-bit-0db128860717a93c7d219425c9d49f4445c2d6c6e5fed3f4dc4298f03c582378.log 
2018-10-31T21:54:45.45487617Z stderr F Fluent-Bit v0.14.4
2018-10-31T21:54:45.454938595Z stderr F Copyright (C) Treasure Data
2018-10-31T21:54:45.454946545Z stderr F 
2018-10-31T21:54:45.455512949Z stderr F [2018/10/31 21:54:45] [ info] [engine] started (pid=1)
2018-10-31T21:54:48.837184585Z stderr F [2018/10/31 21:54:48] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
2018-10-31T21:54:48.837218858Z stderr F [2018/10/31 21:54:48] [ info] [filter_kube] local POD info OK
2018-10-31T21:54:48.837226848Z stderr F [2018/10/31 21:54:48] [ info] [filter_kube] testing connectivity with API server...
2018-10-31T21:54:49.036009069Z stderr F [2018/10/31 21:54:49] [ info] [filter_kube] API server connectivity OK
2018-10-31T21:54:49.036783868Z stderr F [2018/10/31 21:54:49] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020

Example Elasticsearch JSON document:

{
  "_index": "logstash-2018.10.31",
  "_type": "flb_type",
  "_id": "x3AjzGYBOSwObgwf-NGc",
  "_score": 1,
  "_source": {
    "time": "2018-10-31T21:54:45.454Z",
    "stream": "stderr",
    "partial_flag": "F",
    "message": "\u001b[1mFluent-Bit v0.14.4\u001b[0m",
    "kubernetes": {
      "pod_name": "fluent-bit-29xjh",
      "namespace_name": "dev-logging",
      "pod_id": "91a8b310-dd57-11e8-a780-b637f5d65632",
      "labels": {
        "controller-revision-hash": "1952590354",
        "k8s-app": "fluent-bit-logging",
        "kubernetes_io/cluster-service": "true",
        "pod-template-generation": "1",
        "version": "v1"
      },
      "annotations": {
        "prometheus_io/path": "/api/v1/metrics/prometheus",
        "prometheus_io/port": "2020",
        "prometheus_io/scrape": "true"
      },
      "host": "10.63.56.200",
      "container_name": "fluent-bit",
      "docker_id": "0db128860717a93c7d219425c9d49f4445c2d6c6e5fed3f4dc4298f03c582378"
    }
  },
  "fields": {
    "time": [
      "2018-10-31T21:54:45.454Z"
    ]
  }
}

@karlskewes
Copy link
Author

karlskewes commented Oct 31, 2018

This could be improved by doing something for multi-line messages (P/F?), UTF-8?, other?
Per example at the bottom: https://github.com/fluent-plugins-nursery/fluent-plugin-concat/blob/master/README.md
And here: kubernetes/kubernetes#44976 (comment)

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Nov 1, 2018

@kskewes Containerd format appears to be the same as CRI-O?

The current fluent-bit crio parser does not support the tag as a dedicated field, as the CRI log format changed a month after it was merged.

Please see #876 with an identical goal.

I wonder if the name of crio parser should be altered?

It appears that original CRI-O example logs looked like this:

2016-02-17T00:04:05.931087621Z stdout [info:2016-02-16T16:04:05.930-08:00] Some log text here

Since that time, they have been revised (note the F tag difference):

2016-02-17T00:04:05.931087621Z stdout F [info:2016-02-16T16:04:05.930-08:00] Some log text here

For reference, I am looking at this file: fluentd-es-configmap.yaml

The current crio parser uses this regex:

/^(?<time>.+)\b(?<stream>stdout|stderr)\b(?<log>.*)$/

You have this proposed this regex in #873 :

^(?<time>.+) (?<stream>stdout|stderr) (?<partialflag>F|P) (?<message>.*)$

In #876 the regex proposal is:

/^(?<time>.+)\b(?<stream>stdout|stderr)( (?<logtag>[A-Z]))?\b(?<log>.*)$/

If you look at the regex fluentd uses here Line 128 it is currently:

/^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/

Can you speak to which would best meet your understanding, use case, and experience?

@karlskewes
Copy link
Author

Timing with #876 was uncanny.

Good work putting all that together.

I found.

  • existing crio but that's a different runtime.
  • \b separator didn't work for me.
  • %:z for Zulu? Didn't work for me.
  • wanted to add tag, though didn't realise it might be extended.
  • prefer message to log as a key. Matches elastic and other parsers but maybe better to follow runtime or kubernetes naming convention.
  • fluentd implementation as a possible base, especially re tag.
  • that old kubernetes issue.

Be nice to just use one for kubernetes if all runtime are in agreement now.

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Nov 2, 2018

  • I like your use of message being more consistent.
  • If we are not using the partialFlag and tags to stitch together multilines, then probably discard it with [^ ]* from fluentd
  • \b is incorrect as it finds commas and other non-word character boundaries
  • I don't know that anyone is using the current crio parser, so we can probably just update it.

So how about:

^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<message>.*)$

@StevenACoffman
Copy link
Contributor

StevenACoffman commented Nov 2, 2018

AFAICT there is no documentation or formal spec, but this is kubernetes's own golang log parser that is the source of truth:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/logs/logs.go#L125-L169

There is an unused : Log tag delimiter defined in the constants if future log tags are added. Currently, only P and F log tags are generated with no delimiter.

AFAICT these are the two places logs are written:

Even if I don't see any examples actively using the : log tag delimiter, I have seen in #44976 discussion around:

2016-10-06T00:17:09.669794202Z stdout P:TAG1:TAG2 log content 1

I think that's just for the future. If we want to be able to stitch multiple partials back into one multi-line string, I think regex is probably not the way and we'll need to lean on C or golang implementation, preferably maintained upstream by the CRI people (sig-node).

@jlpettersson
Copy link

@kskewes Both ContainerD and CRI-O implements the same Container Runtime Interface (CRI). So I think only one parser is needed for this.

@StevenACoffman
Copy link
Contributor

@jlpettersson Thanks. Is there a better name for this logging format that implies both?

@karlskewes
Copy link
Author

karlskewes commented Nov 2, 2018

We could call it cri?
And I like the suggestion for logtag instead of partial flag to match the upstream code. Good find.
Crio pr has good revision to regex.
Shall we close this PR in favour of that one plus a commit with name change?
Or we/I can raise new clean PR for cri... Meta work. :)

@edsiper
Copy link
Member

edsiper commented Nov 2, 2018

new PR is suggested, you can reference this one.

@karlskewes
Copy link
Author

Sure, will do. Thanks for quick response and thanks to Steven as well.

StevenACoffman added a commit to StevenACoffman/fluent-bit that referenced this pull request Nov 2, 2018
StevenACoffman added a commit to StevenACoffman/fluent-bit that referenced this pull request Nov 2, 2018
@StevenACoffman
Copy link
Contributor

StevenACoffman commented Nov 2, 2018

See #876 (comment)

I made a new PR #881 that unified this discussion and the feedback from @chlunde

@karlskewes
Copy link
Author

Thanks! Closing.

@karlskewes karlskewes closed this Nov 2, 2018
@StevenACoffman
Copy link
Contributor

StevenACoffman commented Nov 2, 2018

Thank you so much as well! There's still more work to support multi-line with the new filter_join_partial module. Please see #852 for the json logger work.

A new filter for appending log events. Solves issue #821
Currently only implemented for Docker json-log format, but can support CRI-O logs with little work.
This plugin is supposed to be used early in the filter-chain, before kubernetes-filter, e.g. right after tail-input.

edsiper pushed a commit that referenced this pull request Nov 6, 2018
* Support CRI-O and containerd

See #876 and #873

Signed-off-by: Steve Coffman <[email protected]>

* Fix based on @kskewes from #881

Signed-off-by: Steve Coffman <[email protected]>
rawahars pushed a commit to rawahars/fluent-bit that referenced this pull request Oct 24, 2022
Signed-off-by: Youssef Rizkalla <[email protected]>

Signed-off-by: Youssef Rizkalla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants