-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Stateful Aggregation #699
Comments
For persisted windows, one scenario to consider is the window time when Data Prepper restarts. What should the window time be for existing groups? I see three possible scenarios: 1) Use the initial window start time; 2) Start the window duration over; and 3) attempt to track the remaining time per window before shutting down and use whatever time remains. Option 1 will not work well for short windows. This is because the window may have expired during the restart. So Data Prepper immediately closes the group, even if new messages are waiting. But, option 2 may not work well if window durations are very large. If a window duration is 24 hours and Data Prepper restarts a few times then windows will never close. Despite this possible issue, I suggest that Data Prepper initially solve this issue with approach 2. We shouldn't assume that Data Prepper will restart very often even during long windows. Also, we can make this a future configuration option so that pipeline authors can determine the behavior. But, for simplicity, I think 2 is a good start. Option 3 would be a good balance, but may be slightly more complicated since it requires updating the remaining time each time a group is processed. I think it is a good approach, but don't believe the added complexity is completely necessary for the first iteration. |
Background and Current Design
Users of Data Prepper often want to aggregate data flowing through Data Prepper.
Two common examples are:
These types of operations require more than one event over a period of time. For example, to combine four distinct events into one, Data Prepper needs to retain the first three events. When the fourth event arrives, then the data is combined and sent through the pipeline. Because Data Prepper must maintain previous events, this is stateful aggregation.
This RFC outlines a proposal for supporting stateful aggregation in Data Prepper.
Current Design
Data Prepper currently supports stateful aggregation only for Trace Analytics. Data Prepper can build an application service map using data from traces. There are two major components used for stateful aggregation in this scenario.
peer-forwarder
service-map-stateful
Data Prepper partitions stateful data in multi-node clusters by assigning each node a set of traces to process. Each node need only maintain the state for its set of data. The peer forwarder determines which node should handle a given trace and reroutes it to that node. It determines the dedicated node for a trace using consistent hashing and a hash ring.
The current Peer Forwarder takes an
ExportTraceServiceRequest
and splits it into different spans. It groups these spans by traceId and determines which nodes should operate on the traces. It then re-builds newExportTraceServiceRequest
from those traces. Then it makes an HTTP request to the OTel Source for the destination node. For traces that are already on the correct node, it returns them in the current pipeline.The following diagram outlines the current approach for aggregating traces into a service map. (For simplicity, this diagram excludes the raw trace pipeline). It shows the flow of trace data through a pipeline.
The current approach has limitations which prevent it from being used in situations other than trace analytics.
Data Prepper also has a service-map-stateful plugin which creates a service map from trace data. This plugin uses two windows to maintain state. There is a current window and a previous window. The plugin saves new state data in the current window and loads data from both current and previous. When the window duration ends, it replaces the previous window with the current and creates a fresh current window.
Proposed Changes
Data Prepper will include a stateful aggregate processor. Data Prepper will also include peer forwarding as a core feature which the aggregate processor can use. Other plugins could also make use of this feature if they need it.
The following diagram outlines the flow of an Event through a pipeline with the Aggregate processor.
Peer Forwarder Design
The proposed design is to create a more general Peer Forwarder as part of Data Prepper Core. In this design, any plugin can request peer forwarding of events between Data Prepper nodes. The details of the peer forwarder are outlined in #700.
For this design, the aggregate plugin will use the new Peer Forwarder which Data Prepper will provide.
Aggregate Plugin
Data Prepper will have a new processor named
aggregate
. The processor will handle the common aspects of aggregation such as storing state. Because the aggregations will vary between pipelines, users need to configure the actual aggregation logic. For the first iteration, the Aggregate processor will use the plugin framework. Customers can provide implementations which they can inject in the pipeline configuration file.The following example shows how the Aggregate processor could work.
Additionally, Data Prepper can include some default actions such as:
User-Defined Aggregations
Some Data Prepper users will want their own custom aggregations. The action uses the plugin framework so that users can add custom actions. Users can write these plugins in Java and include them in their Data Prepper installations.
The following class diagram outlines the relevant classes.
The
AggregateProcessor
is the Data Prepper Processor which performs the bulk of the aggregation work. TheAggregateAction
interface is a pluggable type for performing the custom aggregation steps.Explanation of operations:
Map<Object, Object>
for each group. It persists the map between Events in the same group.The following interface represents what is necessary for aggregation.
The following sequence diagram outlines the interactions:
This proposed design moves much of the complexity into the
AggregateProcessor
. It expects that theAggregateAction
implementations are as straightforward as possible.An example implementation for combining Events is as follows:
An example implementation for filtering duplicates:
Thread Synchronization
The AggregateProcessor must perform locking so that multiple processors can run. Each group state map can have its own lock to prevent thread contention for all Events.
Conclusion Conditions
Some Events have distinct ending conditions. In these cases, pipeline authors can configure a longer window and close the group early when the condition occurs. When the condition is true, then the AggregateProcessor will call the
concludeGroup
action immediately. Additionally, AggregateProcessor will clear the group state.If the condition is not reached within the window, then the AggregateProcessor will call
concludeGroup
and clear the state when the window ends.The conditions will use the same syntax as that proposed by Basic Conditional Logic in Preppers #522.
The following example shows the conclude_when property with an example of closing a network connection.
This approach can allow for the following when there is a conclusion condition.
window_duration
.window_duration
will send multiple aggregates to the sink. There will be duplicates for these.Peer Forwarder Integration
This section is based on the Peer Forwarder RFC as detailed in #700.
The aggregate processor will provide the identification_keys as the value for the
RequiresPeerForwarding::getCorrelationKeys
method. It should look somewhat like the following.Alternatives and Questions
What might a Complete Configuration Look Like?
Here are two example files. One is the pipeline configuration. The second is the Data Prepper configuration file.
pipelines.yaml
:data-prepper-config.yaml
:How will the Existing Trace Plugins Change?
The Trace Analytics pipeline currently uses the
ExportTraceServiceRequest
for trace data moving through peer-forwarder and service-map-stateful. The pipeline must be updated such that the specialized work of splitting up theExportTraceServiceRequest
happens prior to peer forwarding and building the service map. Each Event in Data Prepper for traces should represent a single span rather than holding batches.The current service map may be more complicated than the aggregate plugin is supporting. Refactoring service-map-stateful to use the aggregate plugin is beyond the scope of this RFC.
The service-map-stateful will use the core Peer Forwarder by implementing the
RequiresPeerForwarding
interface.AggregateAction in Pipeline
An alternate design would be to support the aggregate action in code within the pipeline definition. This could be supported by parsing the string as Groovy or Kotlin.
This is a feature which should be considered in the future if users of Data Prepper have much interest in it.
Users value having pre-defined aggregations so that they don’t have to re-writing similar code. If this is added later, it would complement the proposed design of having a pluggable AggregateAction.
Aggregation Persistence
This RFC only includes in-memory storage of the group state information. A future extension could allow the aggregate plugin to use a configurable store. This can help for groups which must have a window which is over a few minutes. Some likely options are local disk, Redis, or DynamoDB. Additionally, Redis or DynamoDB would help in scenarios where nodes leave or enter the cluster for rebalancing stored group state.
Default Aggregations
Are there any aggregations which are so common that Data Prepper should have available as part of the Aggregate plugin? This design includes deduplication and merging as possible candidates for defaults. Default implementations would be distributed along with the aggregate plugin.
The text was updated successfully, but these errors were encountered: