Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch Index DateTime Pattern conversion #1045

Merged
merged 12 commits into from
Feb 22, 2022

Conversation

sshivanii
Copy link
Contributor

@sshivanii sshivanii commented Feb 16, 2022

Description

  • Adds a custom AttributeMapper to converts year, year-of-era, week-year patterns in index from Joda to Java-time:
  • Removes + from DateTime patterns in Logstash config
  • Adds test cases

Issues Resolved

Resolves #854

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@sshivanii sshivanii requested a review from a team as a code owner February 16, 2022 06:40
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this loos good. I made a few suggestions that I think will help improve the code.

private String convertIndexDateTimePattern(LogstashAttribute logstashAttribute) {

final String logstashIndexAttributeValue = logstashAttribute.getAttributeValue().getValue().toString();
final Pattern pattern = Pattern.compile(REGEX_EXTRACTING_DATETIME_PATTERN);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern can be compiled in the default constructor. Then you can make pattern a field and re-use it each time. It won't have a huge impact, but could help some if there are two or more elasticsearch outputs.

Also, a better name might be dateTimePattern to be clear what the pattern is covering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion I can look into. For the name, later on line 52 I'm making a dateTimePattern string as: final String dateTimePattern = dateTimePatternMatcher.group(0);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, pattern means two things here: 1) regex and 2) a date-time pattern.

Maybe you can rename it to dateTimeRegexPattern to be clear in what it provides.


class OpenSearchPluginAttributesMapper extends AbstractLogstashPluginAttributesMapper {
protected static final String LOGSTASH_OPENSEARCH_INDEX_ATTRIBUTE_NAME = "index";
protected static final String REGEX_EXTRACTING_DATETIME_PATTERN = "%\\{(.*?)\\}";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the + be in the pattern?

If it is not a date-time pattern in the Logstash configuration, this would still match. Then we'd make the conversions which are not quite accurate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I will add the + in the Regex.


assertThat(pluginSettings, notNullValue());
assertThat(pluginSettings.size(), equalTo(1));
assertThat(pluginSettings, hasKey(dataPrepperIndexAttribute));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for the key is good. Should you also verify the value after this line?

assertThat(pluginSettings.get(dataPrepperIndexAttribute), equalTo(value));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add, I missed it in this test, thanks for catching this.


@Test
void convert_logstash_index_date_time_pattern() {
final String dataPrepperIndexAttribute = "index";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this a constant in the class and use it elsewhere. It will make the code more readable since it will be clear that it is a constant.

}

@Test
void convert_logstashIndexPattern_joda_year_to_dataPrepperIndexPattern_java_year() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this test and the next three are the same except for the input and output. You can make this a JUnit 5 parameterized test and this would be clearer to use.

Here is somewhat of a starting point for it.

    static class JodaToJava8IndicesArgumentsProvider implements ArgumentsProvider {
        @Override
        public Stream<? extends Arguments> provideArguments(final ExtensionContext extensionContext) {
            return Stream.of(
                    Arguments.arguments("logstash-%{+yyyy.MM.dd}", "logstash-%{uuuu.MM.dd}"),
                    Arguments.arguments("logstash-index-%{+YYYY.MM.dd}", "logstash-index-%{yyyy.MM.dd}"),
                    // repeat for each
            );
        }
    }

You can see this in use at:

https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-logstash-configuration/src/test/java/org/opensearch/dataprepper/logstash/LogstashConfigConverterIT.java#L52-L54

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought of parameterized tests but went ahead with separate tests to clarify each input and output we're expecting. It does seem redundant though so I will look into updating these.

}

@Test
void convert_logstash_index_date_time_pattern() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think that this test could be part of the parameterized tests as well.

protected static final String REGEX_EXTRACTING_DATETIME_PATTERN = "%\\{(.*?)\\}";

@Override
protected void mapCustomAttributes(List<LogstashAttribute> logstashAttributes, LogstashAttributesMappings logstashAttributesMappings, Map<String, Object> pluginSettings) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All parameters should be marked as final. Please implement elsewhere.

import java.util.Collections;
import java.util.Map;

class OpenSearchPluginAttributesMapperTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test for when there is no pattern and the index is just "logstash". This should catch a bug with the current implementation.

final LogstashAttribute logstashIndexAttribute = logstashAttributes.stream()
.filter(a -> a.getAttributeName().equals(LOGSTASH_OPENSEARCH_INDEX_ATTRIBUTE_NAME))
.findFirst()
.orElseThrow(() -> new LogstashConfigurationException("Index attribute not found"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elasticsearch output plugin and the opensearch output plugin both have default index patterns. We should handle those scenarios instead of throwing an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sshivanii , The AbstractLogstashPluginAttributesMapper class will call mapCustomAttributes whether or not the index is defined. Thus, this exception is invalid as @cmanning09 pointed out.

If the index is not supplied, then there is no custom attribute to process here. I'd recommend changing orElseThrow to ifPresent and then move lines 39 and 41 in there.

Also, this is a good unit test case. Provide an empty logstashAttributes as mocked input. Then verify that the PluginSettings is empty - it should have no index key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, looks like I did not update this in the new revision, will do now. Thanks @cmanning09 @dlvenable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sshivanii, the current approach is not handling the default values for the elasticsearch output plugin or the opensearch output plugin. We need to support a migration of these indices.

Regarding the opensearch output plugin: we need to dive deep here to understand what the default values our. The documentation is not clear at this time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmanning09 Regarding the default values for index in Logstash ES output plugin, it's dependent on the ecs_compatibility attribute which we do not support in our sink plugin so it will not be added to the converted .yaml file. Since we don't support ecs compatibility, I believe these default values should not be added.

I dug deeper into the OpenSearch code to find they only have a default index depending on ecs being enabled/disabled. Otherwise they throw an exception Unsupported ECS compatibility
https://github.com/opensearch-project/logstash-output-opensearch/blob/main/lib/logstash/outputs/opensearch.rb#L417

Copy link
Member

@dlvenable dlvenable Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to suggest that when the index name is not supplied, Data Prepper uses the index name for when ECS is disabled. ECS compatibility is a broader topic than this PR is covering, and I don't think it should hinder this PR.

Thus, if the index is not present, the converter can just set the index to logstash-%{uuuu.MM.dd} (assuming I converted that string correctly). The current two tests which have no index value would both need to change to verify this new behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please verify that this is also the default of the amazon_es plugin.

Copy link
Member

@dlvenable dlvenable Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best solution here is to include a defaultAttributes property to the Logstash configuration mapping YAML. Other plugins may need to have default values where the Logstash default is different from the Data Prepper default.

As this problem exists in the current converter and this change introduces no regressions, I'd rather we wrap up this PR and support this more generically in a second PR.

Copy link
Contributor Author

@sshivanii sshivanii Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the default index attribute value in amazon_es plugin and it is logstash-%{+YYYY.MM.dd} so the converted value will be logstash-%{yyyy.MM.dd}. This is different from the elasticsearch output plugin default, and the plugin name is not known at this time.
Reference: https://github.com/awslabs/logstash-output-amazon_es/blob/master/lib/logstash/outputs/amazon_es/common_configs.rb#L17

Comment on lines 55 to 65
if (character == '+') {
continue;
} else if (character == 'y') {
updatedDateTimePattern.append('u');
} else if (character == 'Y') {
updatedDateTimePattern.append('y');
} else if (character == 'x') {
updatedDateTimePattern.append('Y');
} else {
updatedDateTimePattern.append(character);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this logic by using a map. It is more maintainable and reduces the number of conditional checks required.

We can declare a static map for this class

    private static Map<Character, Character> PATTERN_CONVERTER_MAP;
    static {
        PATTERN_CONVERTER_MAP = new HashMap<>();
        PATTERN_CONVERTER_MAP.put('y', 'u');
        PATTERN_CONVERTER_MAP.put('Y', 'y');
        PATTERN_CONVERTER_MAP.put('x', 'Y');
    }

And then retrieve the converted characters here:

            if (character != '+') {
                final char convertedChar = PATTERN_CONVERTER_MAP.getOrDefault(character, character);
                updatedDateTimePattern.append(convertedChar);
            }

Comment on lines 48 to 49
final StringBuilder updatedDateTimePattern = new StringBuilder();
String updatedIndexAttributeValue = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a best practice, variables should be minimally scoped to the block they are used and we should avoid re-assignment where possible. We can achieve this by leveraging an additional private method to handle the match and conversion part.

return dateTimePatternMatcher.find() ? matchAndConvertDateTimePattern(dateTimePatternMatcher, logstashIndexAttributeValue) : logstashIndexAttributeValue;

New method:

private String matchAndConvertDateTimePattern(final Matcher dateTimePatternMatcher, final String logstashIndexAttributeValue) {

        final String dateTimePattern = dateTimePatternMatcher.group(0);
        final StringBuilder updatedDateTimePattern = new StringBuilder();
        for (final char character: dateTimePattern.toCharArray()) { 
           ... // conversion logic from below.
        }
        return logstashIndexAttributeValue.replace(dateTimePattern, updatedDateTimePattern);
    }

The current implementation also has a bug when a pattern is not provided because we are assigning the updatedIndexAttributeValue to "" and returning "" if a pattern does not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. It does substitute "" for plain Strings index names. I will update this and add a test for this usecase.

@codecov-commenter
Copy link

codecov-commenter commented Feb 17, 2022

Codecov Report

Merging #1045 (4f504aa) into main (e5434f6) will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##               main    #1045      +/-   ##
============================================
- Coverage     91.36%   91.32%   -0.05%     
  Complexity      683      683              
============================================
  Files            84       84              
  Lines          1993     1994       +1     
  Branches        168      168              
============================================
  Hits           1821     1821              
  Misses          132      132              
- Partials         40       41       +1     
Impacted Files Coverage Δ
...m/amazon/dataprepper/model/event/JacksonEvent.java 98.41% <0.00%> (-0.79%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5434f6...4f504aa. Read the comment docs.

final LogstashAttribute logstashIndexAttribute = logstashAttributes.stream()
.filter(a -> a.getAttributeName().equals(LOGSTASH_OPENSEARCH_INDEX_ATTRIBUTE_NAME))
.findFirst()
.orElseThrow(() -> new LogstashConfigurationException("Index attribute not found"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sshivanii , The AbstractLogstashPluginAttributesMapper class will call mapCustomAttributes whether or not the index is defined. Thus, this exception is invalid as @cmanning09 pointed out.

If the index is not supplied, then there is no custom attribute to process here. I'd recommend changing orElseThrow to ifPresent and then move lines 39 and 41 in there.

Also, this is a good unit test case. Provide an empty logstashAttributes as mocked input. Then verify that the PluginSettings is empty - it should have no index key.

@sshivanii sshivanii requested review from dlvenable and cmanning09 and removed request for cmanning09 February 18, 2022 21:24
final LogstashAttribute logstashAttribute = mock(LogstashAttribute.class);
final LogstashAttributeValue logstashAttributeValue = mock(LogstashAttributeValue.class);

when(logstashAttribute.getAttributeName()).thenReturn("");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is testing what happens when index is null. This is a good thing to test for sure.

But, these four lines are confusing. At first glance it looks like you are testing what happens when index is empty (""). These four lines are actually unnecessary and misleading.

We should also have a test for when the index is an empty string (""). You can test this using createLogstashIndexAttribute("") and don't need to copy these four lines around for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, added two testcases separating the two scenarios. Thanks for pointing it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Please change these values from empty strings to random strings. It will make the intent clearer.

final Map<String, Object> pluginSettings = createObjectUnderTest()
.mapAttributes(Collections.singletonList(logstashAttribute), logstashAttributesMappings);

assertThat(pluginSettings.size(), equalTo(1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand these assertions. There is one plugin setting here. But, it is not the index. What is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pluginSettings is creating this: pluginSettings: {null=} where AttributeName is still index since that is a constant but AttributeValue is "" so it's empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable
I've further clarified the test cases, I hope it's more clear now.

  • The createLogstashIndexAttribute(final String indexAttributeValue) method is setting a constant "index" for logstashAttribute.getAttributeName() and takes in param for indexAttributeValue
  • So I've now mocked the logstashAttribute and logstashAttributeValue to be "" in the test case and asserted pluginSettings is empty.

Signed-off-by: Shivani Shukla <[email protected]>
Signed-off-by: Shivani Shukla <[email protected]>
@sshivanii sshivanii requested a review from dlvenable February 20, 2022 22:06
Signed-off-by: Shivani Shukla <[email protected]>
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR handles the current tasking and is ready to be merged. Please create a new GitHub issue for handling the default values for OpenSearch, Elasticsearch, and amazon_es index names.

Copy link
Contributor

@cmanning09 cmanning09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sshivanii, please create an issue to track the delayed work for default values.

@sshivanii
Copy link
Contributor Author

@sshivanii, please create an issue to track the delayed work for default values.

Here's the issue for adding default values. #1080

@sshivanii sshivanii merged commit d775287 into opensearch-project:main Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LogstashConfigConverter: converting index with a date-time pattern
4 participants