Skip to content

Commit

Permalink
Added documentation for core peer forwarder (#1797)
Browse files Browse the repository at this point in the history
* Added documentation for core peer forwarder

Signed-off-by: Asif Sohail Mohammed <[email protected]>
  • Loading branch information
asifsmohammed authored Sep 26, 2022
1 parent e036dca commit 63307d9
Show file tree
Hide file tree
Showing 2 changed files with 125 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/core_apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,10 @@ authentication:
unauthenticated:
```
### Peer Forwarder
Peer forwarder can be configured to enable stateful aggregation across multiple Data Prepper nodes. For more information on configuring Peer Forwarder, see [Peer Forwarder Configuration](https://github.com/opensearch-project/data-prepper/blob/main/docs/peer_forwarder.md).
It is supported by `service_map_stateful`, `otel_trace_raw` and `aggregate` processors.

### Shutdown Timeouts
When the DataPrepper `shutdown` API is invoked, the sink and processor `ExecutorService`'s are given time to gracefully shutdown and clear any in-flight data. The default graceful shutdown timeout for these `ExecutorService`'s is 10 seconds. This can be configured with the following optional parameters:

Expand Down
121 changes: 121 additions & 0 deletions docs/peer_forwarder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@

## Peer Forwarder
An HTTP service which performs peer forwarding of `Event` between Data Prepper nodes for aggregation. Currently, supported by `aggregate`, `service_map_stateful`, `otel_trace_raw` processors.

Peer Forwarder groups events based on the identification keys provided the processors.
For `service_map_stateful` and `otel_trace_raw` it's `traceId` by default and can not be configured.
It's configurable for `aggregate` processor using `identification_keys` configuration option. You can find more information about identification keys [here](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#identification_keys).

---

Presently peer discovery is provided by either a static list or by a DNS record lookup or AWS Cloudmap.

### Static discovery mode
Static discover mode allows Data Prepper node to discover nodes using a list of IP addresses or domain names.
```yaml
peer_forwarder:
discovery_mode: static
static_endpoints: ["data-prepper1", "data-prepper2"]
```
### DNS lookup discovery mode
We recommend using DNS discovery over static discovery when scaling out a Data Prepper cluster. The core concept is to configure a DNS provider to return a list of Data Prepper hosts when given a single domain name.
This is a [DNS A Record](https://www.cloudflare.com/learning/dns/dns-records/dns-a-record/) which indicates a list of IP addresses of a given domain.
```yaml
peer_forwarder:
discovery_mode: dns
domain_name: "data-prepper-cluster.my-domain.net"
```
### AWS Cloud Map discovery mode
[AWS Cloud Map](https://docs.aws.amazon.com/cloud-map/latest/dg/what-is-cloud-map.html) provides API-based service discovery as well as DNS-based service discovery.
Peer forwarder can use the API-based service discovery. To support this you must have an existing
namespace configured for API instance discovery. You can create a new one following the instructions
provided by the [Cloud Map documentation](https://docs.aws.amazon.com/cloud-map/latest/dg/working-with-namespaces.html).
Your Data Prepper configuration needs to include:
* `aws_cloud_map_namespace_name` - Set to your Cloud Map Namespace name
* `aws_cloud_map_service_name` - Set to the service name within your specified Namespace
* `aws_region` - The AWS region where your namespace exists.
* `discovery_mode` - Set to `aws_cloud_map`

Your Data Prepper configuration can optionally include:
* `aws_cloud_map_query_parameters` - Key/value pairs to filter the results based on the custom attributes attached to an instance. Only instances that match all the specified key-value pairs are returned.

Example configuration:

```yaml
peer_forwarder:
discovery_mode: aws_cloud_map
aws_cloud_map_namespace_name: "my-namespace"
aws_cloud_map_service_name: "data-prepper-cluster"
aws_cloud_map_query_parameters:
instance_type: "r5.xlarge"
aws_region: "us-east-1"
```

The Data Prepper must also be running with the necessary permissions. The following
IAM policy shows the necessary permissions.

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudMapPeerForwarder",
"Effect": "Allow",
"Action": "servicediscovery:DiscoverInstances",
"Resource": "*"
}
]
}
```
---
## Configuration

* `port`(Optional): An `int` between 0 and 65535 represents the port peer forwarder server is running on. Default value is `21892`.
* `request_timeout`(Optional): Duration - An `int` representing the request timeout in milliseconds for Peer Forwarder HTTP server. Default value is `10000`.
* `server_thread_count`(Optional): An `int` representing number of threads used by Peer Forwarder server. Defaults to `200`.
* `client_thread_count`(Optional): An `int` representing number of threads used by Peer Forwarder client. Defaults to `200`.
* `maxConnectionCount`(Optional): An `int` representing maximum number of open connections for Peer Forwarder server. Default value is `500`.
* `discovery_mode`(Optional): A `String` representing the peer discovery mode to be used. Allowable values are `local_node`, `static`, `dns`, and `aws_cloud_map`. Defaults to `local_node` which processes events locally.
* `static_endpoints`(Optional): A `list` containing endpoints of all Data Prepper instances. Required if `discovery_mode` is set to `static`.
* `domain_name`(Optional): A `String` representing single domain name to query DNS against. Typically, used by creating multiple [DNS A Records](https://www.cloudflare.com/learning/dns/dns-records/dns-a-record/) for the same domain. Required if `discovery_mode` is set to `dns`.
* `aws_cloud_map_namespace_name`(Optional) - A `String` representing the Cloud Map namespace when using AWS Cloud Map service discovery. Required if `discovery_mode` is set to `aws_cloud_map`.
* `aws_cloud_map_service_name`(Optional) - A `String` representing the Cloud Map service when using AWS Cloud Map service discovery. Required if `discovery_mode` is set to `aws_cloud_map`.
* `aws_cloud_map_query_parameters`(Optional): A `Map` of Key/value pairs to filter the results based on the custom attributes attached to an instance. Only instances that match all the specified key-value pairs are returned.
* `buffer_size`(Optional): An `int` representing max number of unchecked records the buffer accepts (num of unchecked records = num of records written into the buffer + num of in-flight records not yet checked by the Checkpointing API). Default is `512`.
* `batch_size`(Optional): An `int` representing max number of records the buffer returns on read. Default is `48`.
* `aws_region`(Optional) : A `String` represents the AWS region to use `ACM`, `S3` or `AWS Cloud Map`. Required if `use_acm_certificate_for_ssl` is set to `true` or `ssl_certificate_file` and `ssl_key_file` is `AWS S3` path or if `discovery_mode` is set to `aws_cloud_map`.

### SSL
The SSL configuration for setting up trust manager for peer forwarding client to connect to other Data Prepper instances.

* `ssl`(Optional) : A `boolean` that enables TLS/SSL. Default value is `false`.
* `ssl_certificate_file`(Optional) : A `String` representing the SSL certificate chain file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to `true`.
* `ssl_key_file`(Optional) : A `String` represents the SSL key file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Required if `ssl` is set to `true`.
* `ssl_insecure_disable_verification`(Optional) : A `boolean` that disables the verification of server's TLS certificate chain. Default value is `false`.
* `use_acm_certificate_for_ssl`(Optional) : A `boolean` that enables TLS/SSL using certificate and private key from AWS Certificate Manager (ACM). Default is `false`.
* `acm_certificate_arn`(Optional) : A `String` represents the ACM certificate ARN. ACM certificate take preference over S3 or local file system certificate. Required if `use_acm_certificate_for_ssl` is set to `true`.
* `acm_certificate_timeout_millis`(Optional) : An `int` representing the timeout in milliseconds for ACM to get certificates. Default value is `120000`.
* `aws_region`(Optional) : A `String` represents the AWS region to use `ACM`, `S3` or `AWS Cloud Map`. Required if `use_acm_certificate_for_ssl` is set to `true` or `ssl_certificate_file` and `ssl_key_file` is `AWS S3` path or if `discovery_mode` is set to `aws_cloud_map`.

```yaml
peer_forwarder:
ssl: true
ssl_certificate_file: "<cert-file-path>"
ssl_key_file: "<private-key-file-path>"
```

### Authentication

* `authentication`(Optional) : A `Map` that enables mTLS. It can either be `mutual_tls` or `unauthenticated`. Default value is `unauthenticated`.
```yaml
peer_forwarder:
authentication:
mutual_tls:
```


0 comments on commit 63307d9

Please sign in to comment.