Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

processor.regex mixing non capture groups and named capture groups gives incorrect result #14028

Closed
richardeaxon opened this issue Sep 30, 2023 · 5 comments · Fixed by #14084
Closed
Labels
bug unexpected problem or unintended behavior

Comments

@richardeaxon
Copy link

Relevant telegraf.conf

[[processors.regex]]
  order = 20
  namepass = ["arista"]
  metricpass = "fields.process == 'Bgp'"

  [[processors.regex.fields]]
    # extract peer
    key = "message"
    #pattern = '.*(peer|neighbor) (?P<ip_addr>\d+\.\d+\.\d+\.\d+).*'     # this does not work
    pattern = '.*neighbor (?P<ip_addr>\d+\.\d+\.\d+\.\d+).*'    # this works

Logs from Telegraf

N/A

System info

Telegraf nightly build

Docker

No response

Steps to reproduce

Using nightly build to test #13971 I get unexpected results when mixing non capture and named capture groups.

Using this input line:

received from neighbor 1.1.1.1 (VRF default AS 4200099061) 6/5 (Cease/connection rejected) 0 bytes

And telegraf.conf using regex pattern '.(peer|neighbor) (?P<ip_addr>\d+.\d+.\d+.\d+).' is get this:

arista process="Bgp",event="BGP-3-NOTIFICATION",message="",source="lab" 1696045222031424000

The message is replaced with "". Note the above regex works correctly in regex101 tester.

Expected behavior

Should result in:

arista source="lab",process="Bgp",event="BGP-3-NOTIFICATION",message="received from neighbor 1.1.1.1 (VRF default AS 4200099061) 6/5 (Cease/connection rejected) 0 bytes",ip_addr="1.1.1.1",vrf="default",resource_index="4200099061" 1696045740257530000

Actual behavior

Orignal source field is replaced with "".

Additional info

Possibly an additional code check is required to check if the 'replacement = ' config option is defined. If not, assume named capture groups used and don't fall through into replacement mode with empty key.

@richardeaxon richardeaxon added the bug unexpected problem or unintended behavior label Sep 30, 2023
@richardeaxon
Copy link
Author

richardeaxon commented Sep 30, 2023

Managed to fix it by adding ?: to the non capture group in the regex:

pattern = '.*(?:peer|neighbor) (?P<ip_addr>\d+\.\d+\.\d+\.\d+).*'

There is a subtle hint to this in the documentations example. Can I suggest adding a debug message when this condition in the code is met:

Something that reads "Non-named capture groups cannot be mixed with named capture groups. Prefix non-named capture groups with ?:"

Will save future me and others debugging time.

EDIT: I now think this should just throw a config error as you will get unexpected results.

@srebhan
Copy link
Member

srebhan commented Oct 4, 2023

@richardeaxon the documentation clearly states that for "named group batch processing" ALL groups need to be named or be non-capturing groups.

In your case both

[[processors.regex]]
  order = 20
  namepass = ["arista"]
  metricpass = "fields.process == 'Bgp'"

  [[processors.regex.fields]]
    # extract peer
    key = "message"
    pattern = '.*(peer|neighbor) (?P<ip_addr>\d+\.\d+\.\d+\.\d+).*'
    replacement = "${ip_addr}"
    result_key = "ip_addr"

or the making the first group a non-captured group (as you also found out) will work. So I don't think we need to add more complexity.

@srebhan srebhan added the waiting for response waiting for response from contributor label Oct 4, 2023
@srebhan
Copy link
Member

srebhan commented Oct 4, 2023

To add to this: We need to be backward compatible with old configurations, so outputting an error or silently accepting the named groups cannot be done as people might (mis)use this to delete e.g. tags by setting no (i.e. empty) replacement and no result_key... Maybe we could add a warning!?!? What do you think @richardeaxon?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Oct 4, 2023
@srebhan srebhan added the waiting for response waiting for response from contributor label Oct 4, 2023
@richardeaxon
Copy link
Author

To add to this: We need to be backward compatible with old configurations, so outputting an error or silently accepting the named groups cannot be done as people might (mis)use this to delete e.g. tags by setting no (i.e. empty) replacement and no result_key... Maybe we could add a warning!?!? What do you think @richardeaxon?

A warning would be great. Not being a go coder, could the above variables not start as nil and if they are then not set you can assume an error?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Oct 4, 2023
@srebhan
Copy link
Member

srebhan commented Oct 10, 2023

@richardeaxon is PR #14084 what you had in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants