Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grok Prepper Matching Functionality #324

Merged
merged 10 commits into from
Oct 1, 2021

Conversation

graytaylor0
Copy link
Member

Signed-off-by: graytaylor0 [email protected]

Description

  • Grok matching functionality (single/mutli field, single/multi pattern)
  • Support for named_captures_only
  • Support for keep_empty_captures
  • Unit tests for Grok Prepper and Integration tests using java-grok library

Issues Resolved

#304

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@graytaylor0 graytaylor0 linked an issue Sep 29, 2021 that may be closed by this pull request
try (MockedStatic<GrokCompiler> grokCompilerMockedStatic = mockStatic(GrokCompiler.class)) {

capture = new HashMap<>();
capture.put("field_capture_1", "value_capture_1");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave the capture as an empty map in this setup. Tests which want specific values should set them directly. So, I'd recommend removing these three lines from here.

Additionally, you can make @Nested tests to help combine tests that should be more similar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, this could give a test in which there are no matches. That test would have an empty capture map.

Copy link
Collaborator

@chenqi0805 chenqi0805 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor comments.

String testData = "{\"message\":\"127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] \\\"GET /apache_pb.gif HTTP/1.0\\\" 200 2326\"}";
Record<String> record = new Record<>(testData);

String resultData = "{\"message\":\"127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] \\\"GET /apache_pb.gif HTTP/1.0\\\" 200 2326\","
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: we could construct it as map to avoid redundant deserialization.

pluginSetting.getSettings().put(GrokPrepperConfig.MATCH, matchConfig);
grokPrepper = new GrokPrepper(pluginSetting);

String testData = "{\"message\":\"127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] \\\"GET /apache_pb.gif HTTP/1.0\\\" 200 2326\"}";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: It seems to me the testData does not include %{DATA:rawrequest}. From the context I infer it is to cover empty_match = False? If so, we should better comment on this field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah rawrequest being null makes it easier to test keep_empty_captures and to test merging maps with null values to make sure no NPE's can occur. Can you elaborate on what you mean by "better comment on this field"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am correct, rawrequest key is not expected to appear in the keys of result data in this test cases while it appears in the grokking pattern. So we could call that out in the comment (maybe above result data)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will put a comment somewhere for this

Signed-off-by: graytaylor0 <[email protected]>
@@ -43,10 +69,35 @@ public GrokPrepper(final PluginSetting pluginSetting) {
*
* @param records Input records that will be modified/processed
* @return Record modified output records
*/
*/
@Override
public Collection<Record<String>> doExecute(Collection<Record<String>> records) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: parameters should be final

for (Map.Entry<String, List<Grok>> entry : fieldToGrok.entrySet()) {
for (Grok grok : entry.getValue()) {
if (recordMap.containsKey(entry.getKey())) {
Match match = grok.match(recordMap.get(entry.getKey()).toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: final

Comment on lines 61 to 63
for (Map.Entry<String, List<String>> entry : grokPrepperConfig.getMatch().entrySet()) {
fieldToGrok.put(entry.getKey(), entry.getValue().stream().map(item -> grokCompiler.compile(item, grokPrepperConfig.isNamedCapturesOnly())).collect(Collectors.toList()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very hard to read and I am not sure what is happening here. Let's pull this out into a separate function. A separate function will allow us to provide some context on what operation we are performing. Breaking up this line into multiple lines with each line focused on a concrete operation with help readability and not hide functionality.

Match match = grok.match(recordMap.get(entry.getKey()).toString());
match.setKeepEmptyCaptures(grokPrepperConfig.isKeepEmptyCaptures());

mergeUpdateWithOriginalMap(recordMap, match.capture());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't read very well. Might I recommend: "mergeCapturesWithRecordMap"


## Notes on Patterns

The Grok Prepper internally uses the [java-grok Library](https://github.com/thekrakken/java-grok), so any patterns compatible there are compatible with Grok Prepper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments here, but let's defer that to a separate PR where we can focus on the documentation for our users later. I don't want to block this functionality right now.

  • "so any patterns compatible there are compatible with Grok Prepper." - this is passive and "there" refers to the object of the sentence instead of the subject which is a little confusing. Might I recommend:
    "The Grok Prepper uses the the java-grok Library internally and supports all java-grok Library compatible patterns."

  • We should link to the default patterns as well.

@@ -71,20 +68,20 @@ public GrokPrepper(final PluginSetting pluginSetting) {
* @return Record modified output records
*/
@Override
public Collection<Record<String>> doExecute(Collection<Record<String>> records) {
public Collection<Record<String>> doExecute(final Collection<Record<String>> records) {
final List<Record<String>> recordsOut = new LinkedList<>();

for (Record<String> record : records) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: final

Signed-off-by: graytaylor0 <[email protected]>
@graytaylor0 graytaylor0 merged commit f82db2b into opensearch-project:main Oct 1, 2021
@graytaylor0 graytaylor0 deleted the GrokPrepper branch November 1, 2021 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grok Prepper Basic Matching
4 participants