Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added ANTLR visitor for parsing Logstash configuration #506

Merged
merged 6 commits into from
Nov 5, 2021

Conversation

asifsmohammed
Copy link
Collaborator

Description

  • Added Logstash visitor which populates model POJO's
  • Added Exception classes for converter
  • Updated checkstyle file to suppress checks for ANTLR generated files

Issues Resolved

Resolve #465

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@asifsmohammed asifsmohammed changed the title Visitor added ANTLR visitor to converter Nov 1, 2021
@asifsmohammed asifsmohammed changed the title added ANTLR visitor to converter added ANTLR visitor for parsing Logstash configuration Nov 1, 2021
package org.opensearch.dataprepper.logstash.exception;

/**
* Exception for visitor when it's unable to convert into Logstash models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the javadoc to be more descriptive about what is Visitor? You can update this to LogstashVisitor and in that mention what the visitor class does.

private final Map<String, Object> hashEntries = new LinkedHashMap<>();

@Override
public Object visitConfig(LogstashParser.ConfigContext ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename this to LogstashParser.ConfigContext configContext?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can rename the parameters.

@sshivanii
Copy link
Contributor

Sidenote: The first commit for this PR e56cf3ee9fd2beb3e1234731d0b4b4749941f1d5 is a Merge commit. Based on the coding guidance we discussed, we're opting to Rebase instead of Merge to avoid any unnecessary Merge commits.


/**
* Exception thrown when {@link org.opensearch.dataprepper.logstash.parser.LogstashVisitor} is unable to convert
* * Logstash configuration into Logstash model objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there is an extra *

Comment on lines 48 to 60
switch (pluginSectionContext.plugin_type().getText()) {
case "input":
logstashPluginType = LogstashPluginType.INPUT;
break;
case "filter":
logstashPluginType = LogstashPluginType.FILTER;
break;
case "output":
logstashPluginType = LogstashPluginType.OUTPUT;
break;
default:
throw new LogstashParsingException("only input, filter and output plugin sections are supported.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic belongs in LogstashPluginType enum as it is related to the creation of an enum. We can also eliminate the need for a case statement by using a map.

I would recommend updating the enum like so:

public enum LogstashPluginType {
    INPUT("input"),
    FILTER("filter"),
    OUTPUT("output");

    private final String value;

    private static final Map<Stirng, LogstashPlugingType> VALUES_MAP = Arrays.stream(LogstashPlugingType.values())
            .collect(Collectors.toMap(LogstashPlugingType::toString, Function.identity()));

    LogstashPlugingType(final String value) {
        this.value = value;
    }

    public String toString() {
        return value;
    }

    public static LogstashPlugingType getByValue(final String value) {
        return VALUES_MAP.get(name.toLowerCase());
    }
}

Then you can simplify this to:

final LogstashPluginType logstashPluginType = LogstashPlugingType.getByValue(pluginSectionContext.plugin_type().getText());


filler: (COMMENT | WS | NEWLINE)*;

plugin_section: plugin_type filler '{'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use camel-case for all our grammar definitions (for example: pluginSection) This would result in method signatures with a single casing:

public Object visitPluginSsection(LogstashParser.PluginSectionContext pluginSectionContext) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this raises a question as to what convention we should use in the Grammar - follow the convention from Logstash, which would make it easier to map back to it; or use conventions which make the Java code nicer. This code should be used exclusively by the Logstash configuration framework, so I'm fine with using the Logstash conventions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay. given this maps directly and is a manual process I am onboard with keep this as is to assist in the manual mapping process. Thanks for clarifying.

Comment on lines 28 to 47
private List<LogstashPlugin> logstashPluginList;
private final Map<LogstashPluginType, List<LogstashPlugin>> pluginSections = new LinkedHashMap<>();
private final Map<String, Object> hashEntries = new LinkedHashMap<>();

@Override
public Object visitConfig(LogstashParser.ConfigContext configContext) {
for(int i = 0; i < configContext.plugin_section().size(); i++) {
visitPlugin_section(configContext.plugin_section().get(i));
}
return LogstashConfiguration.builder()
.pluginSections(pluginSections)
.build();
}

@Override
public Object visitPlugin_section(LogstashParser.Plugin_sectionContext pluginSectionContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lists and maps don't need to be declared globally. We can use the methods as intended and return the constructed lists over mutating and reassigning the objects. This will simplify the code:

    @Override
    public Object visitConfig(final LogstashParser.ConfigContext configContext) {

        final Map<LogstashPluginType, List<LogstashPlugin>> pluginSections = new LinkedHashMap<>();

        configContext.plugin_section().forEach(pluginSection -> {
            final LogstashPluginType logstashPluginType = LogstashPluginType.getByValue(pluginSectionContext.plugin_type().getText();
            final List<LogstashPlugin> logstashPluginList = (List<LogstashPlugin>) visitPlugin_section(pluginSection);
            pluginSections.put(logstashPluginType, logstashPluginList);
                }
        );
        return LogstashConfiguration.builder()
                .pluginSections(pluginSections)
                .build();
    }

    @Override
    public Object visitPlugin_section(final LogstashParser.Plugin_sectionContext pluginSectionContext) {

        return pluginSectionContext.branch_or_plugin().stream()
                .map(this::visitBranch_or_plugin)
                .collect(Collectors.toList());
        });
    }

(see the comment on visitHashEntries() below)

String pluginName = pluginContext.name().getText();
List<LogstashAttribute> logstashAttributeList = new ArrayList<>();

for (int i = 0; i < pluginContext.attributes().attribute().size(); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange. I am not sure if there is a way around it but attributes is an AttributeContext while attribute is a list of AttributeContexts...

Is this an issue with our grammar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the attributes will have list of attribute in grammar.

attributes

attribute 1
attribute 2

We are storing each attribute from configuration in a list of type LogstashAttribute.


hashentries: hashentry (WS hashentry)*;

hashentry: hashname filler '=>' filler value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashEntry

Comment on lines 151 to 147
@Override
public Object visitHashentries(LogstashParser.HashentriesContext hashentriesContext) {
for (int i = 0; i < hashentriesContext.hashentry().size(); i++) {
visitHashentry((hashentriesContext.hashentry().get(i)));
}
return hashEntries;
}

@Override
public Object visitHashentry(LogstashParser.HashentryContext hashentryContext) {

if (hashentryContext.value().getChild(0) instanceof LogstashParser.ArrayContext)
hashEntries.put(hashentryContext.hashname().getText(), visitArray(hashentryContext.value().array()));

else
hashEntries.put(hashentryContext.hashname().getText(), hashentryContext.value().getText());

return hashEntries;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extending off what I said earlier to eliminate the use of the shared class value:

    @Override
    public Object visitHashentries(LogstashParser.HashentriesContext hashentriesContext) {

        final Map<String, Object> hashEntries = new LinkedHashMap<>();

        hashentriesContext.hashentry().forEach(hashentryContext -> {
            final String key = hashentryContext.hashname().getText();
            final Object value = visitHashentry(hashentryContext);
            hashEntries.put(key, value);
        });
        return hashEntries;
    }

    @Override
    public Object visitHashentry(LogstashParser.HashentryContext hashentryContext) {

        if (hashentryContext.value().getChild(0) instanceof LogstashParser.ArrayContext)
            return visitArray(hashentryContext.value().array());

        return hashentryContext.value().getText();
    }

Comment on lines 33 to 89
void visit_config_test() {
final LogstashParser.ConfigContext configContextMock = mock(LogstashParser.ConfigContext.class);
final LogstashParser.Plugin_sectionContext pluginSectionMock = mock(LogstashParser.Plugin_sectionContext.class);
final LogstashParser.Plugin_typeContext pluginTypeContextMock = mock(LogstashParser.Plugin_typeContext.class);
final LogstashParser.Branch_or_pluginContext branchOrPluginContextMock = mock(LogstashParser.Branch_or_pluginContext.class);

given(configContextMock.plugin_section()).willReturn(Collections.singletonList(pluginSectionMock));
given(pluginSectionMock.plugin_type()).willReturn(pluginTypeContextMock);
given(pluginTypeContextMock.getText()).willReturn("input");
given(pluginSectionMock.branch_or_plugin()).willReturn(Collections.singletonList(branchOrPluginContextMock));

LogstashVisitor logstashVisitor = createObjectUnderTest();
Mockito.doReturn(TestDataProvider.pluginWithOneArrayContextAttributeData()).when(logstashVisitor).visitBranch_or_plugin(branchOrPluginContextMock);

LogstashConfiguration actualLogstashConfiguration = (LogstashConfiguration) logstashVisitor.visitConfig(configContextMock);
LogstashConfiguration expectedLogstashConfiguration = TestDataProvider.configData();

assertThat(actualLogstashConfiguration.getPluginSection(LogstashPluginType.INPUT).size(),
equalTo(expectedLogstashConfiguration.getPluginSection(LogstashPluginType.INPUT).size()));
Mockito.verify(logstashVisitor, Mockito.times(1)).visitPlugin_section(pluginSectionMock);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears you are mocking the same objects over and over again in each test. I would encourage you to use @Mock annotation and create a class variable to reuse. I would also encourage you to create a LogstashVisitor class variable to assign in a @BeforeEach function as well. This will help reduce the duplicate code you have.

}

@Test
void visit_config_test() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is not very descriptive. What are we testing for here?

import java.util.ArrayList;

/**
* Class to populate Logstash configuration model objects using ANTLR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the following to the documentation? I am not familiar with ANTLR and how this whole process works. It took me a little while to understand where the LogstashBaseVisitor is coming from. I didn't realize I had to build my code to generate the class.

  • What is a LogstashBaseVisitor?
  • How is a LogstashBaseVisitor generated?
  • Where do all the methods come from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is confusing in a PR, this is all part of the build process and I think detailed documentation here is not quite appropriate. If anything, I'd keep it as simple as "... using ANTLR library classes and generated code."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works. Anything to point the reader in the right direction.

*
* @since 1.2
*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove extra lines between Javadocs and their classes/methods. I found this one, but there may be others.

*/

@SuppressWarnings("rawtypes")
public class LogstashVisitor extends LogstashBaseVisitor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be package private so that it is not used outside of this work.

import java.util.LinkedHashMap;


public class TestDataProvider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class should also be package private.


filler: (COMMENT | WS | NEWLINE)*;

plugin_section: plugin_type filler '{'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this raises a question as to what convention we should use in the Grammar - follow the convention from Logstash, which would make it easier to map back to it; or use conventions which make the Java code nicer. This code should be used exclusively by the Logstash configuration framework, so I'm fine with using the Logstash conventions.

package org.opensearch.dataprepper.logstash.exception;

/**
* Exception thrown when {@link org.opensearch.dataprepper.logstash.parser.LogstashVisitor} is unable to convert
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a big fan of {@link}, but I suggest removing it here. The LogstashVisitor is an internal implementation detail. Also, exceptions should generally not declare which classes use them. It goes the other way: classes specify which exceptions they throw.

I suggest rewording this somewhat.

dlvenable
dlvenable previously approved these changes Nov 4, 2021
class LogstashVisitor extends LogstashBaseVisitor {

@Override
public Object visitConfig(LogstashParser.ConfigContext configContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mark every parameter for all methods implemented in this class as final

value = attributeContext.value().getText().replaceAll("^\"|\"$|^'|'$", "");
}

LogstashAttributeValue logstashAttributeValue = LogstashAttributeValue.builder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

*
* @since 1.2
*/
public class LogstashGrammarException extends LogstashParsingException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception is never used. Will it be used in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The intention was to use it in the future.

Signed-off-by: Asif Sohail Mohammed <[email protected]>
added logstash visitor populate logstash model POJO's

Signed-off-by: Asif Sohail Mohammed <[email protected]>
updated javadoc for public classes

Signed-off-by: Asif Sohail Mohammed <[email protected]>
Signed-off-by: Asif Sohail Mohammed <[email protected]>
@asifsmohammed asifsmohammed merged commit 5458558 into opensearch-project:main Nov 5, 2021
graytaylor0 referenced this pull request in graytaylor0/data-prepper Nov 5, 2021
added visitor to populate Logstash model objects using ANTLR library

Signed-off-by: Asif Sohail Mohammed <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Logstash Configuration Parsing
4 participants