Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored the PipelineParser/PipelineConfiguration to use the PipelineDataFlowModel #1680

Conversation

dlvenable
Copy link
Member

@dlvenable dlvenable commented Aug 19, 2022

Description

Prior to this PR, Data Prepper had split approaches for serializing configuration YAML from deserializing YAML. The PipelineDataFlowModel is currently being used for serialization. And deserialization is handled by the PipelineParser and PipelineConfiguration classes. This PR fixes this split-brain issue. Deserialization now uses PipelineDataFlowModel as well. This helps ensure that deserialization is consistent with serialization.

This change also starts to remove the legacy prepper: configuration in YAML. I wasn't intending to do this, but it would be harder to keep that behavior and we plan to remove it in #619.

This change is important for supporting conditional routing of sinks.

Issues Resolved

None, but contributes toward #619 and #1337

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…pelineDataFlowModel for deserializing the pipeline YAML. This makes the deserialization consistent with serialization. It is also necessary for the upcoming conditional routing on sinks work.

Signed-off-by: David Venable <[email protected]>
@dlvenable dlvenable requested a review from a team as a code owner August 19, 2022 22:22
@codecov-commenter
Copy link

Codecov Report

Merging #1680 (879f4ef) into main (fabbb17) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main    #1680      +/-   ##
============================================
+ Coverage     93.54%   93.58%   +0.04%     
+ Complexity     1313     1307       -6     
============================================
  Files           172      172              
  Lines          3826     3820       -6     
  Branches        303      301       -2     
============================================
- Hits           3579     3575       -4     
+ Misses          177      175       -2     
  Partials         70       70              
Impacted Files Coverage Δ
.../opensearch/dataprepper/parser/PipelineParser.java 88.88% <100.00%> (+0.31%) ⬆️
...ataprepper/parser/model/PipelineConfiguration.java 97.95% <100.00%> (-2.05%) ⬇️
...prepper/parser/PipelineConfigurationValidator.java 71.18% <0.00%> (+1.69%) ⬆️
...rwarder/discovery/AwsCloudMapPeerListProvider.java 95.12% <0.00%> (+2.43%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

final PipelineModel pipelineModel = mock(PipelineModel.class);
when(pipelineModel.getSource()).thenReturn(source);
when(pipelineModel.getProcessors()).thenReturn(processors);
when(pipelineModel.getPreppers()).thenReturn(null);
Copy link
Contributor

@sshivanii sshivanii Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have preppers for this when statement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be able to remove this line. I originally had to set this to null for preppers to work. But, then some other issues came up, so I removed any support for prepper: in the PipelineConfiguration class.

TestDataProvider.validMultipleConfigurationOfSizeOne(),
null, null);
void testOnlySourceAndSink() {
sinks = TestDataProvider.validMultipleConfigurationOfSizeOne();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not continuing to use validMultipleConfiguration for sinks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters. But, the original test was using size of one. So, I kept it that way. I wanted to keep the tests as similar as possible to their original intent.

}
@Test
void testNoSourceConfiguration() {
final PipelineModel pipelineModel = mock(PipelineModel.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to pass null for source here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mockito's default behavior is to provide null when not mocked (it is different for collections and primitives).

Copy link
Contributor

@sshivanii sshivanii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the serialization/deserialization consistency by using PipelineDataFlowModel is a good change. I just have few minor clarifying comments.

@dlvenable dlvenable merged commit 255f89d into opensearch-project:main Aug 23, 2022
@dlvenable dlvenable deleted the parse-pipeline-with-data-flow-model branch September 8, 2022 00:20
engechas pushed a commit to engechas/data-prepper that referenced this pull request Sep 12, 2022
…pelineDataFlowModel for deserializing the pipeline YAML. This makes the deserialization consistent with serialization. It is also necessary for the upcoming conditional routing on sinks work. (opensearch-project#1680)

Signed-off-by: David Venable <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants