-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Core Peer Forwarding #700
Comments
Here is some of my thinking on how the When the Then it can wrap those in a
|
The original RFC provides authentication using HTTP Basic. I'd like to suggest that this change to mTLS. We expect that each peer in a Data Prepper cluster is configured to use the same SSL certificate to provide encryption and verification of the server. Building on top of this, each server could also use the same SSL certificate to determine if it trusts the client. So the default behavior for core peer-forwarding would be to use a single certificate and private key. This single certificate/key pair would be used for normal SSL verification on the client and also allow the server to authenticate the clients. A possible configuration might look like the following:
Additionally, a Data Prepper administrator could disable authentication using a configuration along the lines of the following.
(The exact syntax might change, but this should at least convey the basic concept) In future versions of Data Prepper, we could permit other authentication schemes. But, I propose that this be the initial solution. |
Core Peer Forwarder is implemented in Data Prepper 2.0. |
Background
The background for this change is explained in #699.
Proposal
Data Prepper will include peer forwarding as a core feature which any plugin can use. The aggregate plugin defined in #699 will use this new feature.
Design
The proposed design is to create a more general Peer Forwarder as part of Data Prepper Core. In this design, any plugin can request peer forwarding of events between Data Prepper nodes. Peer Forwarder takes Events, groups these by the plugin-defined correlation values, and then sends them to the correct Data Prepper node. It continues to use the existing hash ring approach for determining the destination.
The following diagram shows the flow of Event with the proposed Peer Forwarder.
Peer Forwarder Configuration
The user will configure Peer Forwarder in the existing
data-prepper-config.yaml
file. Below is a snippet depicting how a user can configure peer-forwarding and what options are available. For brevity, the example does not show all the existing configurations related to peer discovery.This design allows for one peer-forwarder in Data Prepper. See the Alternatives and Questions below for a discussion on supporting multiple peer-forwarders.
Service Discovery Configuration
The core Peer Forwarder will use the existing service discovery options. Presently, peers can be discovered via:
Security Configuration
The peer-forwarder will support authentication and TLS. For TLS encryption, peer-forwarder can utilize the work which is planned for unifying certificate loading #364.
For authentication, peer-forwarder can use the same mechanism for securing its endpoint as was provided in #464. Additionally, it will need a new concept for authenticating requests when it is the client. This could be based on the authentication configure so that the username and password need not be repeated.
Here is a possible secured configuration.
Peer Forwarder Communication
Peer Forwarder will send batches of Event objects. It will send them over HTTP/2 to a user-configurable port.
The model for communication is loosely defined as:
Each event is a string. It is the serialized JSON for that event.
The Peer Forwarder also specifies the destination plugin. It must do this so that multiple aggregate plugins can use one shared peer-forwarder.
Peer Forwarder Implementation
The peer forwarder will continue to use consistent hashing and a hash ring to determine the destination node. One significant implementation change is that it will now support multiple keys for determining the hash. Peer Forwarder will perform this by appending the values together into a single string or byte array value.
Peer Forwarder Plugins
Plugins requiring peer-forwarding must implement the following interface. Data Prepper will detect plugins which implement this interface and configure the peer-forwarder for that plugin.
Data Prepper will wrap the plugin with a peer-forwarder. With this, plugins will not need to write code to route to peer-forwarder or receive from peer-forwarder. The Data Prepper pipeline will resolve the peer-forwarding.
The plugin only needs to implement the
getCorrelationKeys()
method. The plugin will return a list of key names which the peer-forwarder will use to determine the node. For example, in Trace Analytics, this could be implemented as follows.Alternatives and Questions
How will the Peer Forwarder Migrate?
This proposal is to refactor the current peer-forwarder plugin to support the generic peer forwarding. Until the next major release (2.0), it must remain as a plugin. It should be left unchanged.
What Plugin Types can use Peer Forwarding?
The initial implementation will allow peer-forwarding only on Processor plugins. If you need a Source or Sink to peer-forward, please create a new GitHub issue to expand the functionality.
Multiple Peer Forwarders
Data Prepper could support multiple peer forwarders. Users would assign names so that different aggregate plugins could specify which to use. Below is a small example.
This could be confusing for users and there may not be a need for it. If you know of a specific use-case that would require this, please comment and explain in the issue.
Distinct Plugins
This RFC proposes core support for peer-forwarding and is based on #699. One alternative I considered is keeping peer-forwarder as distinct plugin which must run prior to the aggregate plugin.
Here is a notional pipeline definition (the details are left out for brevity).
Pros to proposed solution:
Pros to alternative solution:
Peer Forwarder as Processor and Source
Another solution would be to create a Peer Forwarder Source and a Peer Forwarder Processor. In this approach, a pipeline author must configure the pipeline to have both the source and processor.
Here is a notional pipeline definition (the details are left out for brevity).
Pros to the proposed solution:
Pros to the alternative solution:
Peer Forwarding gRPC
The Peer Forwarder can use gRPC for communication instead of raw HTTP. This may not be necessary since Peer Forwarder can use HTTP/2 and binary messages. However, the protocol must not change within a major version since this would make two Data Preppers of the same major version incompatible with each other.
Tasks
The text was updated successfully, but these errors were encountered: