Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proposal/discussion] specify Falco additional outputs in the configuration file #3235

Closed
Tracked by #3254
LucaGuerra opened this issue Jun 4, 2024 · 14 comments · Fixed by #3308
Closed
Tracked by #3254

[proposal/discussion] specify Falco additional outputs in the configuration file #3235

LucaGuerra opened this issue Jun 4, 2024 · 14 comments · Fixed by #3308
Assignees
Milestone

Comments

@LucaGuerra
Copy link
Contributor

Motivation

There are several issues open for providing more flexibility for the Falco output format:
see:

And I might be missing some.

Basically, the ask is:

  • Add more things to the rule output in the message (e.g. add "cluster name: %some_value"). This of course is specific by Falco setup
  • Add more things to the rule output in the json fields but not in the message, so it won't pollute the output string but would be parseable by any log aggregator
  • Ability to select which source (syscall/plugin) and/or which rule tag is affected by a specific change

Feature

I'm thinking about how to solve this and I came up with this option in Falco.yaml

add_output:
  - source: syscall
    tag: persistence
    format: "gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4]"
  - source: k8s_audit
    fields: ["ka.verb", "ka.uri"]
    format: "with response code %ka.response.code

What I do like about this is that it allows to do it all, select only specific tags, specify both format and fields per source. What I don't like too much is that it makes it hard to specify it on the command line with -o. Comments welcome.

@LucaGuerra
Copy link
Contributor Author

Re. milestone I'll put the next major as all similar issues

/milestone 0.39.0

@poiana poiana added this to the 0.39.0 milestone Jun 4, 2024
@leogr
Copy link
Member

leogr commented Jun 4, 2024

cc @sboschman

@leogr
Copy link
Member

leogr commented Jun 4, 2024

Hey @LucaGuerra

I like the spirit of this proposal. I'd brainstorm a bit more to eventually find a way that works better with -o. I also think it's important to find a simple way to allow users to override a predefined configuration.

@sboschman
Copy link
Contributor

Perhaps also allow targeting individual rules to expand the output fields:

add_output:
  - rule: <rule name>
    fields: [ ... ]

Another approach could be to allow rule overrides to target multiple rules (based on source and/or tag instead of the rule name) and introduce a new output_fields key.

- source: syscall
  tag: persistence
  output: "gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4]"
  override:
    output: append
- source: k8s_audit
  output: "with response code %ka.response.code"
  output_fields: ["ka.verb", "ka.uri"]
  override:
    output: append
    output_fields: append

@Andreagit97
Copy link
Member

I know we are talking about outputs here, but this issue could bring additional thought to the future design maybe #2805. In this case we are talking about a sort -p but on the rule condition

@Andreagit97
Copy link
Member

Add more things to the rule output in the json fields but not in the message, so it won't pollute the output string but would be parseable by any log aggregator.

Sorry for my lack of context here but is this really an issue? why don't we unify the normal output, with the JSON one. So everything in the normal output is also exposed as a JSON field. I expect alerts to be not so frequent in time, so is saving some bytes in the output a crucial point? If not I think this could simplify life for everyone...

Coming to the design part (not sure if we have technical blockers but in a ideal world i would like to have something similar to this 👇 )

As a user, I would expect a syntax similar to the override we use on single rules.
In Falco, we already have this config

rules:
  - disable:
      rule: "*"
  - enable:
      rule: Netcat Remote Code Execution in Container
  - enable:
      rule: Delete or rename shell history

What about something like this?

rules:
  - disable:
      rule: "*"
  - enable:
      rule: Netcat Remote Code Execution in Container
  # maybe append is better here since i suppose we don't want to admit a global replace...but override name keeps the same UX    
  - override:
      source: syscall
      # this should be valid both for json output and for plain output
      output: "gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4]"
      # maybe we don't want to support it immediately but at least we have already a solution for it.
      condition: "container.name != my-cool-pod"
  - override:
      source: k8saudit
      output: "..."
      condition: "..."  			

As with disable/enable the override should be enabled in order (even if it shouldn't change so much probably).
I would like to keep everything related to the rules behind this key and with the same evaluation order...

Just a side point:
Maybe in the future, we could unify everything under the rules key, so for example rules_files could become rules.files and rule_matching could become rules.matching. WDYT?

@sboschman
Copy link
Contributor

Add more things to the rule output in the json fields but not in the message, so it won't pollute the output string but would be parseable by any log aggregator.

Sorry for my lack of context here but is this really an issue? why don't we unify the normal output, with the JSON one. So everything in the normal output is also exposed as a JSON field. I expect alerts to be not so frequent in time, so is saving some bytes in the output a crucial point? If not I think this could simplify life for everyone...

This has nothing to do with saving bytes tbh. The additional fields exposed as output_fields are for example to provide metadata/context for the alert (the output message), so e.g. it can be traced back to its origin once forwarded to a central aggregation system.

Just to give you an idea of the workarounds used atm with the k8s_audit source:

  • falco is started with -p (since 0.38, before it was awk/sed voodoo on the rule files to append the output of every rule) to add a suffix to every rule output
    -p " || cluster_name=%jevt.value[/annotations/cluster_name]"

  • this forces falco to add an output_field named jevt.value[/annotations/cluster_name] , as this is now used in the output

  • as the cluster name is now available as output_field (jevt.value...), we can instruct falcosidekick to use it

templatedfields:
  cluster_name: '{{ or (index . "jevt.value[/annotations/cluster_name]") "unknown" }}'
loki:
  extralabels: "cluster_name"
  • use a loki processing pipeline to remove the || cluster_name=... stuff from the alert output/message

Note: luckily I am able to use loki here to drop the || cluster_name=... stuff before ingestion, but not every falcosidekick output destination might have a trick like this to cleanup the alert/output message.

Result: the alert / output / message as defined in the rules, labeled (has metadata/context) with the k8s cluster of origin.

@Andreagit97
Copy link
Member

ok, thank you for the additional info!

Note: luckily I am able to use loki here to drop the || cluster_name=... stuff before ingestion, but not every falcosidekick output destination might have a trick like this to cleanup the alert/output message.

So this is more an issue of verbosity, in case you add many fields you don't want to have them in the alert message but just as a context. Is it true?

@MprBol
Copy link

MprBol commented Jun 21, 2024

Hi ! I just wanted to +1 this; im running in the same issues and would like to add a request for implementation detail. Note that im using json output;

I would like to add mandatory output fields for all rules as a global setting. In this case I know for certain that e.g. user.uid and user.loginuid are set in output_fields. Not only for my custom rules, but also all default rules provided upstream.

It should still be possible to add additional fields per rule to output_fields as requested above.

Implementing this, we can have a human-readable 'output' with the additional context in output_fields. This enables me to use the 'output' string directly as the Alert-name in our SIEM solution, and provide the context details in output_fields as extra context.

Note that 'output' currently automatically prepends the timestamp + severity. For my use case that's not needed and I have to strip it off. Perhaps this can also be a setting, at least when someone uses json_output.

@LucaGuerra
Copy link
Contributor Author

Thank you all for the comments. It seems that we want to customize Falco's output by both adding and/or removing stuff. I would focus on adding things here:

We wish to add the following information:

  • generic output (like -p)
  • output fields that won't appear in the output but only in json, which can be Falco outputs or something static or even an environment variable

This stuff can be added to everything or only to specific sources, rules or tags. In this case I would probably stick with the original proposal, calling it append_output

append_output:
  - source: syscall
    tag: persistence
    rule: some rule name
    format: "gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4]"
  - source: k8s_audit
    fields: ["ka.verb", "ka.uri"]
    format: "with response code %ka.response.code

If you don't specify source/tag/rule the append will happen to everything, and the three fields will all restrict the scope. One thing that I am not sure about is how to add @incertum 's suggestion about having fields that are not falco fields, such as a static string or an environment variable, maybe with a different key static_fields but I'm not a big fan 🤔

Re. -o one option could be to allow json syntax to specify an object, e.g. -o 'add_output[]={"source": "syscall", "format": "some format..."}'.

Also, we want to remove the timestamp but I think that's a separate issue.

@LucaGuerra
Copy link
Contributor Author

LucaGuerra commented Aug 28, 2024

I'm proposing a patch to add this behavior. While implementing it I changed the format slightly to accommodate @incertum 's suggestion about having strings that are not pure filterchecks in the json event fields, so this is what I came up with (see the linked PR)

append_output:
  - source: syscall
    tag: persistence
    rule: some rule name
    format: "gparent=%proc.aname[2] ggparent=%proc.aname[3] gggparent=%proc.aname[4]"

  - source: k8s_audit
    fields: 
      - ka.verb: "%ka.verb"
      - home_directory: "${HOME}"
      - my_field: "this is event number %evt.num"
    format: "with response code %ka.response.code"

As you can see now the fields have maximum flexibility in terms of output, coming at the slight disadvantage of being strictly cast to string, even if you only specify proc.some_int: "%proc.some_int". I toyed with the idea of also allowing a string which only has the field name but it sounded a bit overkill.

@leogr
Copy link
Member

leogr commented Aug 28, 2024

@LucaGuerra, thank you!

I like your proposal; however, I suggest overcoming the limitation of strictly casting to string any added field.
We already have fields in the JSON output that are not strings, for example:

{"hostname": "x86",...," proc.pname":"zsh","proc.tty":34819,"user.loginuid":1000,"user.name":"leogr","user.uid":1000}}

So, I'd prefer to preserve the original type for added fields. My proposal would be:

append_output:
  - source: syscall
    fields: 
      - user.uid
      - home_directory: "${HOME}"
    format: "..."

👇

{..., "user.uid":1000, "home_directory":"/home/leogr"}

Furthermore, note that - user.uid is nicer than - user.uid: "%user.uid", but it may make the -o syntax worse (since it should translate to something like -o 'add_output[]={"user.uid": null, "home_directory": "${HOME}"}')

wdyt?

@incertum
Copy link
Contributor

Adding thoughts to the mix 🙃 -- agree with overcoming the string casting limitation. Having that limitation imposed for static / custom fields seems fine though.

Lastly, @LucaGuerra and @leogr not sure if here is the best place to mention that we still need #3277 for the first degree JSON keys / wrapper fields aka not nested under output_fields ... supporting all of that should give maximum flexibility. In fact the end user should be allowed to compose all high-level JSON keys. There is a closed PR that references that ask https://github.com/falcosecurity/falco/pull/2670/files.

Maybe now is the time to come up with a truly universal solution?

@LucaGuerra
Copy link
Contributor Author

@leogr @incertum : thanks for your feedback about type casting, I can implement both ways to leave flexibility for both use cases, it's not hard to do. Just need to figure out how to make it play well with the JSON schema 😅

@incertum : I can't really think of a universal way of customizing both things. In fact I believe that's a separate issue (as you correctly mentioned in #3277 , I know that @leogr maybe had different thoughts). Think about it: the current issue that we're discussing is about adding fields or pieces of output specific to events by rule, source and tag (or all of them if you want). Depending on the rule and the source you will have different fields you can use and this setting lets you customize that. The top level JSON keys differs from these because they only affect json (while setting fields affects json, grpc and possibly others) and also have no additional information about any event as in that case you don't want to tie them to specific events but rather would exist to make parsing easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants