Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add truncate string processor #3924

Merged
merged 10 commits into from
Jan 11, 2024

Conversation

kkondaka
Copy link
Collaborator

@kkondaka kkondaka commented Jan 8, 2024

Description

Add new processor to mutate string family of processors, called truncate processor, which truncates the string values to the. configured length.

Resolves #3925

Issues Resolved

Resolves #3925

Check List

  • [ X] New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [ X] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Krishna Kondaka <[email protected]>
import org.opensearch.dataprepper.model.processor.Processor;

/**
* This processor takes in a key and changes its value to a string with the leading and trailing spaces trimmed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Javadoc does not match the behavior shown below.

@Override
public Stream<? extends Arguments> provideArguments(ExtensionContext context) {
return Stream.of(
Arguments.arguments("hello,world,no-truncate", 100, "hello,world,no-truncate"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to add a static import on Arguments.arguments such that these lines just look like:

arguments("hello,world,no-truncate", 100, "hello,world,no-truncate")

final Record<Event> record = createEvent(message);
final List<Record<Event>> truncatedRecords = (List<Record<Event>>) truncateStringProcessor.doExecute(Collections.singletonList(record));
assertThat(truncatedRecords.get(0).getData().get("message", Object.class), notNullValue());
assertThat(truncatedRecords.get(0).getData().get("message", Object.class), equalTo(truncatedMessage));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test is simple enough to use a mocked Event.

Then you just verify:

verify(truncatedRecords.get(0)).put("message", truncatedMessage);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is simple that we do not have to mock the event. I think it's better to not mock when the alternative is not hard.

when(expressionEvaluator.evaluateConditional(truncateWhen, record.getData())).thenReturn(false);
final List<Record<Event>> truncatedRecords = (List<Record<Event>>) truncateStringProcessor.doExecute(Collections.singletonList(record));

assertThat(truncatedRecords.get(0).getData().toMap(), equalTo(record.getData().toMap()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think a mock here is easy:

verify(truncatedRecords(0).getData(), never()).put(anyString(), any());

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

graytaylor0
graytaylor0 previously approved these changes Jan 8, 2024
Signed-off-by: Krishna Kondaka <[email protected]>
recordEvent.put(entry.getSource(), value.substring(0, entry.getLength()));
int startIndex = entry.getStartAt() == null ? 0 : entry.getStartAt();
Integer length = entry.getLength();
String truncatedValue = (length == null || startIndex+length >= value.length()) ? value.substring(startIndex) : value.substring(startIndex, startIndex + length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we catch IndexOutOfBoundsException so that this doesn't cause the pipeline to crash?

graytaylor0
graytaylor0 previously approved these changes Jan 8, 2024
dlvenable
dlvenable previously approved these changes Jan 8, 2024
@kkondaka kkondaka dismissed stale reviews from dlvenable and graytaylor0 via 49aa0eb January 8, 2024 23:03
dlvenable
dlvenable previously approved these changes Jan 8, 2024
Signed-off-by: Krishna Kondaka <[email protected]>
graytaylor0
graytaylor0 previously approved these changes Jan 9, 2024
public Entry() {}
}

private List<@Valid Entry> entries;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are moving away from the existing string processor, should we carry this convention with it?

Perhaps just make the usage:

- truncate:
    source: my_key
    length: 100

With multiple values:

- truncate:
    source: my_key1
    length: 100
- truncate:
    source: my_key2
    length: 100

This seems simpler than:

- truncate:
    entries:
    - source: my_key1
       length: 100
    - source: my_key2
       length: 100

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will remove entries field

Signed-off-by: Krishna Kondaka <[email protected]>
record_type: "event"
format: "json"
processor:
- trucate_string:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still says truncate_string here, but there's also a typo in trucate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the typo. Will fix it.

Krishna Kondaka added 2 commits January 10, 2024 00:38
graytaylor0
graytaylor0 previously approved these changes Jan 10, 2024
@kkondaka kkondaka merged commit c3fe808 into opensearch-project:main Jan 11, 2024
71 of 74 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 11, 2024
* Add truncate string processor

Signed-off-by: Krishna Kondaka <[email protected]>

* Addressed review comments

Signed-off-by: Krishna Kondaka <[email protected]>

* Added check for negative numbers in the config input

Signed-off-by: Krishna Kondaka <[email protected]>

* Fixed checkstyle error

Signed-off-by: Krishna Kondaka <[email protected]>

* Modified to make truncate processor a top level processor, not specific to strings

Signed-off-by: Krishna Kondaka <[email protected]>

* Addressed review comments

Signed-off-by: Krishna Kondaka <[email protected]>

* Updated documentation with correct configuration

Signed-off-by: Krishna Kondaka <[email protected]>

* Fixed typos in the documentation

Signed-off-by: Krishna Kondaka <[email protected]>

* Modified to allow more than one source keys in the config

Signed-off-by: Krishna Kondaka <[email protected]>

* Modified to allow multiple entries under configuration

Signed-off-by: Krishna Kondaka <[email protected]>

---------

Signed-off-by: Krishna Kondaka <[email protected]>
Co-authored-by: Krishna Kondaka <[email protected]>
(cherry picked from commit c3fe808)
@kkondaka kkondaka deleted the truncate-processor branch May 13, 2024 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add string truncate processor to the family of mutate string processor
3 participants