-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenSearch Index DateTime Pattern conversion #1045
Conversation
Signed-off-by: Shivani Shukla <[email protected]>
Signed-off-by: Shivani Shukla <[email protected]>
Signed-off-by: Shivani Shukla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this loos good. I made a few suggestions that I think will help improve the code.
private String convertIndexDateTimePattern(LogstashAttribute logstashAttribute) { | ||
|
||
final String logstashIndexAttributeValue = logstashAttribute.getAttributeValue().getValue().toString(); | ||
final Pattern pattern = Pattern.compile(REGEX_EXTRACTING_DATETIME_PATTERN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern can be compiled in the default constructor. Then you can make pattern
a field and re-use it each time. It won't have a huge impact, but could help some if there are two or more elasticsearch
outputs.
Also, a better name might be dateTimePattern
to be clear what the pattern is covering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion I can look into. For the name, later on line 52 I'm making a dateTimePattern
string as: final String dateTimePattern = dateTimePatternMatcher.group(0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, pattern means two things here: 1) regex and 2) a date-time pattern.
Maybe you can rename it to dateTimeRegexPattern
to be clear in what it provides.
.../main/java/org/opensearch/dataprepper/logstash/mapping/OpenSearchPluginAttributesMapper.java
Show resolved
Hide resolved
|
||
class OpenSearchPluginAttributesMapper extends AbstractLogstashPluginAttributesMapper { | ||
protected static final String LOGSTASH_OPENSEARCH_INDEX_ATTRIBUTE_NAME = "index"; | ||
protected static final String REGEX_EXTRACTING_DATETIME_PATTERN = "%\\{(.*?)\\}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the +
be in the pattern?
If it is not a date-time pattern in the Logstash configuration, this would still match. Then we'd make the conversions which are not quite accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I will add the +
in the Regex.
|
||
assertThat(pluginSettings, notNullValue()); | ||
assertThat(pluginSettings.size(), equalTo(1)); | ||
assertThat(pluginSettings, hasKey(dataPrepperIndexAttribute)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for the key is good. Should you also verify the value after this line?
assertThat(pluginSettings.get(dataPrepperIndexAttribute), equalTo(value));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add, I missed it in this test, thanks for catching this.
|
||
@Test | ||
void convert_logstash_index_date_time_pattern() { | ||
final String dataPrepperIndexAttribute = "index"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make this a constant in the class and use it elsewhere. It will make the code more readable since it will be clear that it is a constant.
} | ||
|
||
@Test | ||
void convert_logstashIndexPattern_joda_year_to_dataPrepperIndexPattern_java_year() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this test and the next three are the same except for the input and output. You can make this a JUnit 5 parameterized test and this would be clearer to use.
Here is somewhat of a starting point for it.
static class JodaToJava8IndicesArgumentsProvider implements ArgumentsProvider {
@Override
public Stream<? extends Arguments> provideArguments(final ExtensionContext extensionContext) {
return Stream.of(
Arguments.arguments("logstash-%{+yyyy.MM.dd}", "logstash-%{uuuu.MM.dd}"),
Arguments.arguments("logstash-index-%{+YYYY.MM.dd}", "logstash-index-%{yyyy.MM.dd}"),
// repeat for each
);
}
}
You can see this in use at:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought of parameterized tests but went ahead with separate tests to clarify each input and output we're expecting. It does seem redundant though so I will look into updating these.
} | ||
|
||
@Test | ||
void convert_logstash_index_date_time_pattern() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think that this test could be part of the parameterized tests as well.
protected static final String REGEX_EXTRACTING_DATETIME_PATTERN = "%\\{(.*?)\\}"; | ||
|
||
@Override | ||
protected void mapCustomAttributes(List<LogstashAttribute> logstashAttributes, LogstashAttributesMappings logstashAttributesMappings, Map<String, Object> pluginSettings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All parameters should be marked as final. Please implement elsewhere.
import java.util.Collections; | ||
import java.util.Map; | ||
|
||
class OpenSearchPluginAttributesMapperTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a test for when there is no pattern and the index is just "logstash". This should catch a bug with the current implementation.
final LogstashAttribute logstashIndexAttribute = logstashAttributes.stream() | ||
.filter(a -> a.getAttributeName().equals(LOGSTASH_OPENSEARCH_INDEX_ATTRIBUTE_NAME)) | ||
.findFirst() | ||
.orElseThrow(() -> new LogstashConfigurationException("Index attribute not found")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The elasticsearch output plugin and the opensearch output plugin both have default index patterns. We should handle those scenarios instead of throwing an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sshivanii , The AbstractLogstashPluginAttributesMapper
class will call mapCustomAttributes
whether or not the index
is defined. Thus, this exception is invalid as @cmanning09 pointed out.
If the index
is not supplied, then there is no custom attribute to process here. I'd recommend changing orElseThrow
to ifPresent
and then move lines 39 and 41 in there.
Also, this is a good unit test case. Provide an empty logstashAttributes
as mocked input. Then verify that the PluginSettings
is empty - it should have no index
key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, looks like I did not update this in the new revision, will do now. Thanks @cmanning09 @dlvenable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sshivanii, the current approach is not handling the default values for the elasticsearch output plugin or the opensearch output plugin. We need to support a migration of these indices.
Regarding the opensearch output plugin: we need to dive deep here to understand what the default values our. The documentation is not clear at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmanning09 Regarding the default values for index in Logstash ES output plugin, it's dependent on the ecs_compatibility
attribute which we do not support in our sink plugin so it will not be added to the converted .yaml
file. Since we don't support ecs compatibility, I believe these default values should not be added.
I dug deeper into the OpenSearch code to find they only have a default index depending on ecs
being enabled/disabled. Otherwise they throw an exception Unsupported ECS compatibility
https://github.com/opensearch-project/logstash-output-opensearch/blob/main/lib/logstash/outputs/opensearch.rb#L417
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to suggest that when the index name is not supplied, Data Prepper uses the index name for when ECS is disabled. ECS compatibility is a broader topic than this PR is covering, and I don't think it should hinder this PR.
Thus, if the index
is not present, the converter can just set the index to logstash-%{uuuu.MM.dd}
(assuming I converted that string correctly). The current two tests which have no index value would both need to change to verify this new behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please verify that this is also the default of the amazon_es
plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best solution here is to include a defaultAttributes
property to the Logstash configuration mapping YAML. Other plugins may need to have default values where the Logstash default is different from the Data Prepper default.
As this problem exists in the current converter and this change introduces no regressions, I'd rather we wrap up this PR and support this more generically in a second PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the default index attribute value in amazon_es
plugin and it is logstash-%{+YYYY.MM.dd}
so the converted value will be logstash-%{yyyy.MM.dd}
. This is different from the elasticsearch
output plugin default, and the plugin name is not known at this time.
Reference: https://github.com/awslabs/logstash-output-amazon_es/blob/master/lib/logstash/outputs/amazon_es/common_configs.rb#L17
if (character == '+') { | ||
continue; | ||
} else if (character == 'y') { | ||
updatedDateTimePattern.append('u'); | ||
} else if (character == 'Y') { | ||
updatedDateTimePattern.append('y'); | ||
} else if (character == 'x') { | ||
updatedDateTimePattern.append('Y'); | ||
} else { | ||
updatedDateTimePattern.append(character); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simplify this logic by using a map. It is more maintainable and reduces the number of conditional checks required.
We can declare a static map for this class
private static Map<Character, Character> PATTERN_CONVERTER_MAP;
static {
PATTERN_CONVERTER_MAP = new HashMap<>();
PATTERN_CONVERTER_MAP.put('y', 'u');
PATTERN_CONVERTER_MAP.put('Y', 'y');
PATTERN_CONVERTER_MAP.put('x', 'Y');
}
And then retrieve the converted characters here:
if (character != '+') {
final char convertedChar = PATTERN_CONVERTER_MAP.getOrDefault(character, character);
updatedDateTimePattern.append(convertedChar);
}
final StringBuilder updatedDateTimePattern = new StringBuilder(); | ||
String updatedIndexAttributeValue = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a best practice, variables should be minimally scoped to the block they are used and we should avoid re-assignment where possible. We can achieve this by leveraging an additional private method to handle the match and conversion part.
return dateTimePatternMatcher.find() ? matchAndConvertDateTimePattern(dateTimePatternMatcher, logstashIndexAttributeValue) : logstashIndexAttributeValue;
New method:
private String matchAndConvertDateTimePattern(final Matcher dateTimePatternMatcher, final String logstashIndexAttributeValue) {
final String dateTimePattern = dateTimePatternMatcher.group(0);
final StringBuilder updatedDateTimePattern = new StringBuilder();
for (final char character: dateTimePattern.toCharArray()) {
... // conversion logic from below.
}
return logstashIndexAttributeValue.replace(dateTimePattern, updatedDateTimePattern);
}
The current implementation also has a bug when a pattern is not provided because we are assigning the updatedIndexAttributeValue to ""
and returning ""
if a pattern does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. It does substitute ""
for plain Strings index names. I will update this and add a test for this usecase.
Signed-off-by: Shivani Shukla <[email protected]>
Signed-off-by: Shivani Shukla <[email protected]>
Signed-off-by: Shivani Shukla <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #1045 +/- ##
============================================
- Coverage 91.36% 91.32% -0.05%
Complexity 683 683
============================================
Files 84 84
Lines 1993 1994 +1
Branches 168 168
============================================
Hits 1821 1821
Misses 132 132
- Partials 40 41 +1
Continue to review full report at Codecov.
|
final LogstashAttribute logstashIndexAttribute = logstashAttributes.stream() | ||
.filter(a -> a.getAttributeName().equals(LOGSTASH_OPENSEARCH_INDEX_ATTRIBUTE_NAME)) | ||
.findFirst() | ||
.orElseThrow(() -> new LogstashConfigurationException("Index attribute not found")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sshivanii , The AbstractLogstashPluginAttributesMapper
class will call mapCustomAttributes
whether or not the index
is defined. Thus, this exception is invalid as @cmanning09 pointed out.
If the index
is not supplied, then there is no custom attribute to process here. I'd recommend changing orElseThrow
to ifPresent
and then move lines 39 and 41 in there.
Also, this is a good unit test case. Provide an empty logstashAttributes
as mocked input. Then verify that the PluginSettings
is empty - it should have no index
key.
...ation/src/main/resources/org/opensearch/dataprepper/logstash/mapping/opensearch.mapping.yaml
Show resolved
Hide resolved
Signed-off-by: Shivani Shukla <[email protected]>
Signed-off-by: Shivani Shukla <[email protected]>
final LogstashAttribute logstashAttribute = mock(LogstashAttribute.class); | ||
final LogstashAttributeValue logstashAttributeValue = mock(LogstashAttributeValue.class); | ||
|
||
when(logstashAttribute.getAttributeName()).thenReturn(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is testing what happens when index
is null
. This is a good thing to test for sure.
But, these four lines are confusing. At first glance it looks like you are testing what happens when index
is empty (""
). These four lines are actually unnecessary and misleading.
We should also have a test for when the index
is an empty string (""
). You can test this using createLogstashIndexAttribute("")
and don't need to copy these four lines around for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, added two testcases separating the two scenarios. Thanks for pointing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Please change these values from empty strings to random strings. It will make the intent clearer.
Signed-off-by: Shivani Shukla <[email protected]>
final Map<String, Object> pluginSettings = createObjectUnderTest() | ||
.mapAttributes(Collections.singletonList(logstashAttribute), logstashAttributesMappings); | ||
|
||
assertThat(pluginSettings.size(), equalTo(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand these assertions. There is one plugin setting here. But, it is not the index
. What is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pluginSettings
is creating this: pluginSettings: {null=}
where AttributeName is still index
since that is a constant but AttributeValue is ""
so it's empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dlvenable
I've further clarified the test cases, I hope it's more clear now.
- The
createLogstashIndexAttribute(final String indexAttributeValue)
method is setting a constant "index" forlogstashAttribute.getAttributeName()
and takes in param forindexAttributeValue
- So I've now mocked the
logstashAttribute
andlogstashAttributeValue
to be""
in the test case and assertedpluginSettings
is empty.
Signed-off-by: Shivani Shukla <[email protected]>
Signed-off-by: Shivani Shukla <[email protected]>
Signed-off-by: Shivani Shukla <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this PR handles the current tasking and is ready to be merged. Please create a new GitHub issue for handling the default values for OpenSearch, Elasticsearch, and amazon_es index names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sshivanii, please create an issue to track the delayed work for default values.
Here's the issue for adding default values. #1080 |
Description
year
,year-of-era
,week-year
patterns in index from Joda to Java-time:+
from DateTime patterns in Logstash configIssues Resolved
Resolves #854
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.