Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load AggregateAction, implement AggregateAction.handleEvent in AggregateProcessor, add GroupStateManager and AggregateIdentificationKeysHasher classes for use in AggregateProcessor #931

Merged
merged 3 commits into from
Jan 25, 2022

Conversation

graytaylor0
Copy link
Member

@graytaylor0 graytaylor0 commented Jan 20, 2022

Signed-off-by: Taylor Gray [email protected]

Description

  • Loads the configured AggregateAction to the AggregateProcessor
  • Implements the calling of AggregateAction.handleEvent in the AggregateProcessor
  • Creates a GroupStateManager class that is used in AggregateProcessor. This class handles retrieving the requested groupState from allGroupStates, and creates a new empty groupState if the requested groupState doesn't exist.
  • Creates an AggregateIdentifcationKeysHasher class that is used in AggregateProcessor. This class handles creating the identificationKeysHash from an Event. This identificationKeysHash will act as the key that uniquely identifies a groupState.

Issues Resolved

Closes #871

Check List

  • New functionality includes testing.
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ateProcessor, add GroupStateManager and AggregateIdentificationKeysHasher classes for use in AggregateProcessor

Signed-off-by: Taylor Gray <[email protected]>
@graytaylor0 graytaylor0 requested a review from a team as a code owner January 20, 2022 23:33
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is looking good. I made a few comments that I'd like to see changed and few suggestions that I leave up to you.

I added GitHub tasks for the ones I'd really like done. I think it will be interesting to see if they help track when the work is ready for another review. (Plus it will help clarify which comments I'm requesting changes for).


}

Map<Object, Object> createIdentificationKeyHashFromEvent(final Event event, final List<String> identificationKeys) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The identificationKeys never changes for any given AggregateProcessor. You could make identificationKeys a constructor parameter in AggregateIdentificationKeysHasher. Then you can remove identificationKeys as a parameter from this function.


final AggregateActionResponse handleEventResponse = aggregateAction.handleEvent(event, groupStateForEvent);

final Optional<Event> aggregateActionResponseEvent = Optional.ofNullable(handleEventResponse.getEvent());
Copy link
Member

@dlvenable dlvenable Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is much value in wrapping this into an Optional just to use ifPresent. I think using an if is better here:

Event responseEvent = handleEventResponse.getEvent()
if(responseEvent != null)
  recordsOut.add(new Record<>(responseEvent, record.getMetadata()));
  • Remove unnecessary Optional

allGroupStates.put(identificationKeysHash, groupState);
}

Map<Object, Map<Object, Object>> getAllGroupStates() {
Copy link
Member

@dlvenable dlvenable Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to remove this. I'll add notes on an approach to testing in the test code.

  • Remove getAllGroupStates()

}

@Test
void getGroupState_with_non_existing_group_state_creates_and_returns_new_group_state_and_adds_to_allGroupStates() {
Copy link
Member

@dlvenable dlvenable Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better test would be to test the desired interfaces. Clients of this class do not care that it is available to getAllGroupStates() per se. What they really care about is that if the group state is created, it will be available on a subsequent call to getGroupState().

I'd probably testing this by calling getGroupState() twice and verifying that they are the same instance.

Starting at line 52:

final Map<Object, Object> emptyGroupState = groupStateManager.getGroupState(identificationKeysHash);
assertThat(emptyGroupState, equalTo(Collections.emptyMap()));

final Map<Object, Map<Object, Object>> secondGroupState = groupStateManager.getGroupState(identificationKeysHash);
assertThat(secondGroupState, is(sameInstance(emptyGroupState)));

import static org.hamcrest.CoreMatchers.equalTo;
import static org.hamcrest.Matchers.hasKey;

public class GroupStateManagerTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to have a test case that verifies that the Map returned by getGroupState is mutable. Clients do expect to write to this map. And it is an important part of the contract.

@Mock
private AggregateActionResponse aggregateActionResponse;

private AggregateProcessor aggregateProcessor;
Copy link
Member

@dlvenable dlvenable Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this field variable. Using local variables in tests is more maintainable.

  • Remove unnecessary testing field for object under test.


@Test
void handleEvent_returing_with_no_event_does_not_add_event_to_records_out() {
aggregateProcessor = createObjectUnderTest();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to name my test variable objectUnderTest. It's not necessary, but it helps with readability because you can quickly know that this is what we are actually testing.

}

@Test
void handleEvent_returing_with_no_event_does_not_add_event_to_records_out() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your test cases here.


}

Map<Object, Object> getGroupState(final Map<Object, Object> identificationKeysHash) {
Copy link
Member

@dlvenable dlvenable Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm of the opinion that the caller doesn't really care that this is a Map<Object, Object>. I can accept the PR without this change, but I'd probably create a new class to abstract this.

public static class IdentificationHash {
  private final Map<Object, Object> hash;

  @Override
  public equals(Object other) {
    return this.hash.equals(other);
  }

  @Override
  public int hashCode() {
    return hash.hashCode();
  }
}

Then:

IdentificationHash createIdentificationKeyHashFromEvent(...) {
  ...
  return new IdentificationHash(identificationKeysHash);
}

And:

Map<Object, Object> getGroupState(final IdentificationHash identificationHash) {
  ...
}

You could change how you hash without affecting clients.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it would be nested under AggregateIdentificationKeysHasher. It could also be a normal class and un-nested. Either way is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of abstracting out IdentificationHash, but I don't believe getGroupState should return an IdentificationKeyHash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soon I plan on creating a GroupState class that will be returned in getGroupState

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, that was a copy-and-paste mistake on my part because my comment is here on getGroupState(). The createIdentificationKeyHashFromEvent method would return an IdentificationHash. And then you pass it into getGroupState.

I'll update my original comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@graytaylor0 ,

Soon I plan on creating a GroupState class that will be returned in getGroupState

I very much like the idea here. But, I think the name should be something else. I'd like to keep "group state" as the name of the map which AggregateProcessor provides to AggregateActions.

I'd like to use a different term for the object which holds all of this information. Right now, GroupMetadata comes to mind. e.g.

public class GroupMetadata {
  private Map<Object, Object> groupState;
  private Instant windowStart;
}

Copy link
Member Author

@graytaylor0 graytaylor0 Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the Map isn't really Metadata. I kind of like the name GroupState here actually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Map isn't Metadata. I believe that we should continue to call the Map the "group state."

What I'm trying to avoid is calling two different things "group state." Let's call the Map "group state" and any class which holds the "group state" some other name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. What if we just name the class AggregateGroup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable

I am strongly for a group state class.

@cmanning09 , The discussion here is about is an "Identification Hash" class, not the group state.

I should have quoted the line I was referring to:

Soon I plan on creating a GroupState class that will be returned in getGroupState

@dlvenable
Copy link
Member

dlvenable commented Jan 21, 2022

Sadly, the tasks in comments don't show up in the task count. :(

GitHub reports that all tasks are done.

Comment on lines 34 to 36
void setGroupStateForIdentificationKeysHash(final Map<Object, Object> identificationKeysHash, final Map<Object, Object> groupState) {
allGroupStates.put(identificationKeysHash, groupState);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? I only see it used in test functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this can go away. The test which uses it feels a little redundant to me, as there is another that calls getGroupState to create a new GroupState, and then calls getGroupState again to check that it retrieves the created GroupState


}

Map<Object, Object> getGroupState(final Map<Object, Object> identificationKeysHash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly for a group state class. It's very difficult to read the code as is with all the Map<Object, Object> types. I like the suggestion.

Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! These are great changes!


}

public Map<Object, Object> getGroupState() {
Copy link
Contributor

@cmanning09 cmanning09 Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried we have created a class for the groupState but are still exposing the underlying data structure and therefore coupling the data to the processor and action.

@dlvenable & @graytaylor0 , have we thought about providing methods / interface to alter the group state without exposing the underlying dats structure?

(I don't want to block this PR but I think this is good conversation to have given our community will be interacting with the state as they write aggregate actions.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a valid point. I have thought about it, and I just don't think we want to restrict the AggregateAction creators like that. It's going to be more work on their part, as instead of getting a java Map that they likely already understand and know how to interact with, they will get some Java class that we've made, that they will have to both read about to understand how to use, and may also find themselves limited in the control that they have over it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely don't want to give them control of the AggregateGroup class, so the option is to extract the groupState map into a GroupState class that wraps the Map. However, this doesn't seem to add much value as we would just be adding functions for put, get, putAll, etc. that just call the actual Map functions, and there is nothing that I am aware of that we would really want to restrict the AggregateAction creator from calling on the Map. As far as validation goes, everything put into the Map will be turned into an Event, so the validation should happen there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a one way door decision that will couple the group state to the map interface. Maybe not completely one way door but it would require a breaking change to extend group state beyond the map interface. I just want to clarify, I do not want to limit the creator of an Aggregate Action from updating the group state as the desire. My biggest concern in raising this point is extensibility.

I don't want to debate the design here and I think others across the group may be interested in this conversation. I will raise an issue for this.

@graytaylor0 graytaylor0 merged commit f7cdbd7 into opensearch-project:main Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load aggregate action and implement the calling of handleEvent in Aggregate Processor for Aggregate Actions
3 participants