Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide opt-in flag to avoid fields name clash when log format is json #15969

Merged
merged 14 commits into from
Apr 17, 2024

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Feb 22, 2024

Release notes

Exposes log.format.json.fix_duplicate_message_fields flag to avoid collision of field names in log lines when log.format is JSON.

What does this PR do?

Adds log.format.json.fix_duplicate_message_fields feature flag to rename the clashing fields when json logging format (log.format) is selected.
In case two message fields clashes on structured log message, then the second is renamed attaching _1 suffix to the field name.
By default the feature is disabled and requires user to explicitly enable the behaviour.
The PR provides description of the flag only in the throuble shooting section, and not in general description of all the command line flags and settings (

).
In this way the flag can be deprecated or dropped more easily and the behaviour enabled by default.

Why is it important/What is the impact to the user?

A user that enables json log format for their Logstash's logs could stumble on a problem to have two message fields in the same json document. Despite this is a valid json, is not common practice and could lead to confusion: which is the effective log message body and which is the field?
With this PR the user can choose to enable a stricter behaviour when encounter such problem.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • road test

How to test this PR locally

  1. run logstash with
bin/logstash -e "input {stdin{codec => json}} output{stdout{}}" --log.format json --log.format.json.fix_duplicate_message_fields true
  1. type some invalid input to trigger https://github.com/logstash-plugins/logstash-codec-json/blob/d2b10edf9a63646e17e60de8c77b51ca81614c73/lib/logstash/codecs/json.rb#L84
{"name": [}
  1. verify in console the json logs contains both message and message_1 fields.
{
   "level":"WARN",
   "loggerName":"logstash.codecs.jsonlines",
   "timeMillis":1710838609569,
   "thread":"[main]<stdin",
   "logEvent":{
      "message":"JSON parse error, original data now in message field",
      "message_1":"Unexpected close marker '}': expected ']' (for Array starting at [Source: (String)\"{\"name\": [}\"; line: 1, column: 10])\n at [Source: (String)\"{\"name\": [}\"; line: 1, column: 12]",
      "exception":"LogStash::Json::ParserError",
      "data":"{\"name\": [}"
   }
}

Related issues

Use cases

Screenshots

Logs

@andsel
Copy link
Contributor Author

andsel commented Mar 6, 2024

Notes

After discussion with team, the proposal is to avoid adding an extra layer, which potentially could be a breaking change.
Instead, postfix duplicated field names, so that message is the core message field, if there is a second clashing message field as log parameter, then rename it as message_1.
Make this feature enabled by feature flag.

@andsel andsel marked this pull request as ready for review March 22, 2024 08:19
@andsel andsel changed the title Fix/json logger message field clash Provide opt-in flag to avoid fields name clash when log format is json logger Mar 22, 2024
@andsel andsel changed the title Provide opt-in flag to avoid fields name clash when log format is json logger Provide opt-in flag to avoid fields name clash when log format is json Mar 22, 2024
Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions on naming

Plus, can we add the setting, and what it does to the logstash.yml and command-line docs pages?

docs/static/troubleshoot/ts-logstash.asciidoc Show resolved Hide resolved
logstash-core/lib/logstash/environment.rb Outdated Show resolved Hide resolved
logstash-core/lib/logstash/environment.rb Outdated Show resolved Hide resolved
docker/data/logstash/env2yaml/env2yaml.go Outdated Show resolved Hide resolved
@andsel
Copy link
Contributor Author

andsel commented Apr 12, 2024

@robbavey do you think we should also add the setting in the logstash.yml that we ship with the distribution:

# log.format: plain

Like we do for log.format ?

@robbavey
Copy link
Member

do you think we should also add the setting in the logstash.yml that we ship with the distribution:

Yes, that's a great idea.

@andsel andsel requested a review from robbavey April 12, 2024 16:43
@andsel andsel force-pushed the fix/json_logger_message_field_clash branch from 706dd26 to da53b5c Compare April 17, 2024 07:23
Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, other than a couple of doc nits

docs/static/troubleshoot/ts-logstash.asciidoc Outdated Show resolved Hide resolved
docs/static/troubleshoot/ts-logstash.asciidoc Outdated Show resolved Hide resolved
Co-authored-by: Rob Bavey <[email protected]>
Copy link

Quality Gate passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No Coverage information No data about Coverage
0.0% 0.0% Duplication on New Code

See analysis details on SonarQube

@@ -94,6 +96,16 @@ private void writeStructuredMessage(StructuredMessage message, JsonGenerator gen
}
}

private static String renameParamNameIfClashingWithMessage(Map.Entry<Object, Object> entry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's very unlikely to happen, wouldn't the static name message_1 be prone to value overrides in case some log, containing the message_1 key, is logged? for example: logger.info('this is test', :message => 'bar', :message_1' => 'foo') or logger.info('this is test', hash_with_messages_keys).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logger.info('this is test', :message => 'bar', :message_1' => 'foo')

It's possibile but little bit unrealistic given that those logger invocations come from code the inside Logstash code base.

logger.info('this is test', hash_with_messages_keys)

This could be a problem, where hash_with_messages_keys id something like

{ "message": "blalba", "message_1": "blabla 1" }

and it's out of code control.

In general, whatever policy we apply to avoid collision, for example move the duplicate field in some nested map, it is subject to this latest problem you are exposing, because maps can overlap on root names, but also on nested structures.

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @andsel

@andsel andsel merged commit 830733d into elastic:main Apr 17, 2024
7 checks passed
@andsel
Copy link
Contributor Author

andsel commented Apr 17, 2024

@logstashmachine backport 8.14

github-actions bot pushed a commit that referenced this pull request Apr 17, 2024
#15969)

Adds log.format.json.fix_duplicate_message_fields feature flag to rename the clashing fields when json logging format (log.format) is selected.
In case two message fields clashes on structured log message, then the second is renamed attaching _1 suffix to the field name.
By default the feature is disabled and requires user to explicitly enable the behaviour.

Co-authored-by: Rob Bavey <[email protected]>
(cherry picked from commit 830733d)
andsel added a commit that referenced this pull request Apr 17, 2024
#15969) (#16094)

Adds log.format.json.fix_duplicate_message_fields feature flag to rename the clashing fields when json logging format (log.format) is selected.
In case two message fields clashes on structured log message, then the second is renamed attaching _1 suffix to the field name.
By default the feature is disabled and requires user to explicitly enable the behaviour.

Co-authored-by: Rob Bavey <[email protected]>
(cherry picked from commit 830733d)

Co-authored-by: Andrea Selva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

json logging can log duplicate message fields
5 participants