Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify Key Value processor to support string literal grouping #4599

Merged
merged 7 commits into from
Jun 4, 2024

Conversation

kkondaka
Copy link
Collaborator

@kkondaka kkondaka commented Jun 3, 2024

Description

Adds the following new features

  1. Supports string literal grouping when valueGrouping option is enabled. String literal grouping allows text inside double quotes OR single quotes to be treated as one entity. When combined with drop_keys_with_no_value such text can be dropped from the output
  2. Support strict_grouping option where a group with the starting group character and missing end group character does not throw error and instead considered as the group till the end of the text
  3. Fix value_grouping to strictly apply grouping to values and not to keys and other parts of input.

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • [X ] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [ X] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Krishna Kondaka added 3 commits May 30, 2024 02:03
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
Krishna Kondaka added 3 commits June 3, 2024 23:02
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
@dlvenable dlvenable added this to the v2.9 milestone Jun 4, 2024
this.stringLiteralCharacter = keyValueProcessorConfig.getStringLiteralCharacter();
if (stringLiteralCharacter != null) {
if (!keyValueProcessorConfig.getValueGrouping()) {
LOG.warn("string literal character config is ignored becuase value grouping is disabled");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we fail on this instead? We could add an @AssertTrue statement in the config class.

LOG.warn("string literal character config is ignored becuase value grouping is disabled");
}
if (stringLiteralCharacter != '\"' && stringLiteralCharacter != '\'') {
throw new RuntimeException("Only single quote and double quote are supported as string literal character");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw new RuntimeException("Only single quote and double quote are supported as string literal character");
throw new RuntimeException("Only single quote (') or double quote (") are supported as string literal character");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, you should move this validation into the config class. This will help with our goals for more dynamic pipeline configurations.

@JsonProperty("strict_grouping")
private boolean strictGrouping = false;

@JsonProperty("string_literal_character")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should name this string_literal_characters and make it a String. But only allow a certain sieze.

Suggested change
@JsonProperty("string_literal_character")
@Size(min = 0, max = 1, message = "field_split_characters may only have character")
@JsonProperty("string_literal_characters")

@@ -340,7 +367,7 @@ public Collection<Record<Event>> doExecute(final Collection<Record<Event>> recor
continue;
}

final String groupsRaw = recordEvent.get(keyValueProcessorConfig.getSource(), String.class);
String groupsRaw = recordEvent.get(keyValueProcessorConfig.getSource(), String.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not look like you modify this. You can keep the final modifier.

@@ -489,10 +516,19 @@ private Map<String, Object> createNonRecursedMap(String[] groups) {
List<Object> valueList;

for(final String group : groups) {
// If a group starts and ends with stringLiteralCharacter,
// treat the entire group as key with null as the value
if (stringLiteralCharacter != null && group.charAt(0) == stringLiteralCharacter && group.charAt(group.length()-1) == stringLiteralCharacter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe refactor this into a method with a clear name to explain what it is:

private boolean isIgnoredGroup(String group) {
  return stringLiteralCharacter != null && group.charAt(0) == stringLiteralCharacter && group.charAt(group.length()-1) == stringLiteralCharacter;
}

@@ -187,6 +188,7 @@ void testDropKeysWithNoValue() {
@MethodSource("getKeyValueGroupingTestdata")
void testMultipleKvToObjectKeyValueProcessorWithValueGrouping(String fieldDelimiters, String input, Map<String, Object> expectedResultMap) {
lenient().when(mockConfig.getValueGrouping()).thenReturn(true);
lenient().when(mockConfig.getStringLiteralCharacter()).thenReturn('\"');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to think we should test these both with and without the string literal character.

You could include that as an argument in the test dynamically. Or split the test into two.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the Test testStringLiteralCharacter to test with both characters instead of this test.

Signed-off-by: Krishna Kondaka <[email protected]>
@@ -261,11 +264,14 @@ public int skipGroup(final String str, int idx, final Character endChar) {
i++;
continue;
} else if (str.charAt(i) == endChar) {
return i-1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a previous PR did this but this method doesn't need to be public

@kkondaka kkondaka merged commit 530be53 into opensearch-project:main Jun 4, 2024
41 of 46 checks passed
@kkondaka kkondaka deleted the kv-fixes branch July 30, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants