-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread Synchronization with concludeGroup
and resetting of an AggregateGroup
#938
Comments
concludeGroup
logic and resetting of groupState/windowconcludeGroup
and resetting of an AggregateGroup
Can you explain why you lock then immediately unlock on the second code snippet please? |
Sure! This is to keep threads from handling new Events after a thread says that it wants to conclude a group. After the |
I am missing a few things from this design. Can you highlight the following in your proposal:
|
These locks are meant to be mutex-like, and the Java For your other questions, I think they are all answered by the pseudo-code supplied. The pseudo-code shows how each lock is going to be used. I only put a comment for what is actually happening in their critical sections, but I feel like this portrays what is happening there. The code shown is also what I believe is going to keep deadlocks and race conditions from occurring. There are a lot of execution orderings that can occur with synchronized code like this, and if any scenario results in the requirements stated not being met, then that should be called out. Edit: After further reading on the fairness option for Locks, I think it makes the most sense to not use it at the moment. It has the potential to lower throughput, and starvation is unlikely with the current design, as AggregateGroups will vary widely. If a need for a better attempt at having Events be handled in the order they arrive and try to grab the Lock, then this could become a configurable option in the |
The RFC for Stateful Aggregation outlines two functions for an
AggregateAction
to perform,concludeGroup
andhandleEvent
. WhendoExecute
is run by a thread in theAggregateProcessor
, it will get all of the groups that should be concluded from theAggregateGroupManager
, and it will try to conclude all of them before moving on to handle the batch of Events.The Aggregate Processor will need to support multiple worker threads, and a single instance of AggregateProcessor will contain state (groupState) that is shared between worker threads. The threading synchronization has the following requirements:
concludeGroup
, no threads should be allowed to start handling events. This is achieved in the pseudo-code below using a Turnstile synchronization pattern where theconcludeGroupLock
is locked and immediately unlocked before trying to handle an Event.concludeGroup
should wait for in process events to be handled before concludingIn order to meet all of these requirements, each
AggregateGroup
will contain two Locks, which will be calledconcludeGroupLock
andhandleEventForGroupLock
. The following pseudocode shows the synchronization between handling events and concluding groups.Additional Information
The
AggregateProcessor
will not use the@SingleThread
annotation (which makes a single instance per worker thread) , and will refrain from using static variables so as not to shareAggregateGroups
between instances. Because of this, multipleAggregateProcessors
can be utilized in one pipeline, even if for some reason theidentificationKeys
are the same for both aggregate instances.Alternative Ideas for Threading
Locking on a single AggregateGroup should not result in performance issues, as the number of Events being processed at one time will likely vary widely in which AggregateGroup they belong to, and the concluding of a group will only happen once every window duration. However, if the performance does prove to be an issue, some alternatives could be considered to improve it.
A no lock AggregateProcessor would involve assigning each worker thread to a different section of the shared state. Before handling an event or concluding a group, the current thread would have to lookup which thread is assigned the AggregateGroup that it has, and would have to forward that AggregateGroup to the assigned thread. This would take some extensive design, and while it is possible, may not even improve performance. It may not improve performance because there is overhead with forwarding the AggregateGroup, and the assigned thread the AggregateGroup is sent to may not be ready to handle the AggregateGroup for a while (think of the scenario where it is in the processor following the AggregateProcessor and it has to go all the way back to the beginning of the pipeline, and come back to the AggregateProcessor before being able to process on the AggregateGroup
The text was updated successfully, but these errors were encountered: