-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add truncate string processor #3924
Add truncate string processor #3924
Conversation
Signed-off-by: Krishna Kondaka <[email protected]>
import org.opensearch.dataprepper.model.processor.Processor; | ||
|
||
/** | ||
* This processor takes in a key and changes its value to a string with the leading and trailing spaces trimmed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Javadoc does not match the behavior shown below.
@Override | ||
public Stream<? extends Arguments> provideArguments(ExtensionContext context) { | ||
return Stream.of( | ||
Arguments.arguments("hello,world,no-truncate", 100, "hello,world,no-truncate"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to add a static import on Arguments.arguments
such that these lines just look like:
arguments("hello,world,no-truncate", 100, "hello,world,no-truncate")
final Record<Event> record = createEvent(message); | ||
final List<Record<Event>> truncatedRecords = (List<Record<Event>>) truncateStringProcessor.doExecute(Collections.singletonList(record)); | ||
assertThat(truncatedRecords.get(0).getData().get("message", Object.class), notNullValue()); | ||
assertThat(truncatedRecords.get(0).getData().get("message", Object.class), equalTo(truncatedMessage)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is simple enough to use a mocked Event
.
Then you just verify:
verify(truncatedRecords.get(0)).put("message", truncatedMessage);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is simple that we do not have to mock the event. I think it's better to not mock when the alternative is not hard.
when(expressionEvaluator.evaluateConditional(truncateWhen, record.getData())).thenReturn(false); | ||
final List<Record<Event>> truncatedRecords = (List<Record<Event>>) truncateStringProcessor.doExecute(Collections.singletonList(record)); | ||
|
||
assertThat(truncatedRecords.get(0).getData().toMap(), equalTo(record.getData().toMap())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think a mock here is easy:
verify(truncatedRecords(0).getData(), never()).put(anyString(), any());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
Signed-off-by: Krishna Kondaka <[email protected]>
recordEvent.put(entry.getSource(), value.substring(0, entry.getLength())); | ||
int startIndex = entry.getStartAt() == null ? 0 : entry.getStartAt(); | ||
Integer length = entry.getLength(); | ||
String truncatedValue = (length == null || startIndex+length >= value.length()) ? value.substring(startIndex) : value.substring(startIndex, startIndex + length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we catch IndexOutOfBoundsException
so that this doesn't cause the pipeline to crash?
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
…ic to strings Signed-off-by: Krishna Kondaka <[email protected]>
public Entry() {} | ||
} | ||
|
||
private List<@Valid Entry> entries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we are moving away from the existing string processor, should we carry this convention with it?
Perhaps just make the usage:
- truncate:
source: my_key
length: 100
With multiple values:
- truncate:
source: my_key1
length: 100
- truncate:
source: my_key2
length: 100
This seems simpler than:
- truncate:
entries:
- source: my_key1
length: 100
- source: my_key2
length: 100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will remove entries field
Signed-off-by: Krishna Kondaka <[email protected]>
record_type: "event" | ||
format: "json" | ||
processor: | ||
- trucate_string: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still says truncate_string here, but there's also a typo in trucate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for finding the typo. Will fix it.
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
Signed-off-by: Krishna Kondaka <[email protected]>
* Add truncate string processor Signed-off-by: Krishna Kondaka <[email protected]> * Addressed review comments Signed-off-by: Krishna Kondaka <[email protected]> * Added check for negative numbers in the config input Signed-off-by: Krishna Kondaka <[email protected]> * Fixed checkstyle error Signed-off-by: Krishna Kondaka <[email protected]> * Modified to make truncate processor a top level processor, not specific to strings Signed-off-by: Krishna Kondaka <[email protected]> * Addressed review comments Signed-off-by: Krishna Kondaka <[email protected]> * Updated documentation with correct configuration Signed-off-by: Krishna Kondaka <[email protected]> * Fixed typos in the documentation Signed-off-by: Krishna Kondaka <[email protected]> * Modified to allow more than one source keys in the config Signed-off-by: Krishna Kondaka <[email protected]> * Modified to allow multiple entries under configuration Signed-off-by: Krishna Kondaka <[email protected]> --------- Signed-off-by: Krishna Kondaka <[email protected]> Co-authored-by: Krishna Kondaka <[email protected]> (cherry picked from commit c3fe808)
Description
Add new processor to mutate string family of processors, called truncate processor, which truncates the string values to the. configured length.
Resolves #3925
Issues Resolved
Resolves #3925
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.