Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Peer Forwarder to doc website repo #2373

Merged
merged 49 commits into from
Feb 3, 2023
Merged
Changes from 39 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
30586d3
Add Peer forwarder to doc site repo.
carolxob Nov 16, 2022
73fab9f
Minor updates.
carolxob Nov 16, 2022
f8feef6
Minor updates to header section.
carolxob Nov 16, 2022
24d4a5e
Minor copyedits and heading adjustements.
carolxob Nov 21, 2022
7fb9b33
Minor copyedits and heading updates. Additional of Optional section t…
carolxob Nov 21, 2022
e709ce9
Minor copyedits and heading adjustments.
carolxob Nov 22, 2022
74d1ed3
Changed capitalization in title.
carolxob Dec 9, 2022
5f3fb82
Minor changes from doc review feedback.
carolxob Dec 9, 2022
f32c31b
Update _data-prepper/peer-forwarder.md
carolxob Dec 9, 2022
01c4fb0
Update _data-prepper/peer-forwarder.md
carolxob Dec 9, 2022
d4b1fae
Update _data-prepper/peer-forwarder.md
carolxob Dec 9, 2022
73cc07e
Update _data-prepper/peer-forwarder.md
carolxob Dec 9, 2022
c9c3727
Update _data-prepper/peer-forwarder.md
carolxob Dec 9, 2022
db822ee
Update _data-prepper/peer-forwarder.md
carolxob Dec 9, 2022
491db10
Trying to commit file.
carolxob Dec 9, 2022
a877610
Trying to push file.
carolxob Dec 9, 2022
ab8edae
Trying to push file again.
carolxob Dec 9, 2022
f83dffd
Trying to push file one more time after rebasing.
carolxob Dec 9, 2022
0ee7850
Minor change.
carolxob Dec 9, 2022
2cb064d
Minor edits.
carolxob Dec 9, 2022
0609ee4
Converted optional configuration section to table.
carolxob Dec 9, 2022
2d1cf02
Minor adjustmenets.
carolxob Dec 9, 2022
cb5134a
Minor adjustments again.
carolxob Dec 9, 2022
0b4a055
Updates from doc review feedback.
carolxob Dec 12, 2022
850625f
Made changes based on doc review feedback.
carolxob Dec 12, 2022
588da3f
Made minor heading adjustements.
carolxob Dec 13, 2022
81e08ed
Made edits based on doc review feedback.
carolxob Dec 13, 2022
9f5313e
Made updates based on editorial feedback.
carolxob Dec 15, 2022
156ddc7
Made extensive changes based on editorial feedback.
carolxob Dec 16, 2022
5c640ca
Incorporated minor editorial feedback changes.
carolxob Dec 20, 2022
a0eb2ec
Incorporated more editorial feedback.
carolxob Jan 3, 2023
069ef71
Minor changes.
carolxob Jan 4, 2023
f4cc653
Committed existing Peer Forwarder file to new PR to simplify feedback.
carolxob Jan 10, 2023
ccf12e7
Incorporated doc review feedback.
carolxob Jan 19, 2023
2cbdd7b
Minor updates.
carolxob Jan 19, 2023
05e7a06
Minor changes.
carolxob Jan 23, 2023
ae8070f
Updates specifically to table formatting.
carolxob Jan 24, 2023
f1b4f1f
Minor updates.
carolxob Jan 24, 2023
b1dc95d
Made minor edit to the end of the Configuration table.
carolxob Jan 27, 2023
5ae8f0c
Minor edits from doc review feedback.
carolxob Jan 31, 2023
30d51e5
Minor updates from doc review.
carolxob Feb 1, 2023
320cc45
Incorporated more editorial feedback.
carolxob Feb 1, 2023
0a4fa3b
Minor update to line 112.
carolxob Feb 1, 2023
e85fec4
Made minor edits to references to S3. Changed to Amazon S3 where appl…
carolxob Feb 1, 2023
b3a5b03
Minor edits.
carolxob Feb 1, 2023
8aab515
Minor updates to text.
carolxob Feb 2, 2023
d949d1f
Made updates based on SME and doc review feedback.
carolxob Feb 2, 2023
ccc1ab1
Minor change to one word.
carolxob Feb 3, 2023
f165005
Removed a comma.
carolxob Feb 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
181 changes: 181 additions & 0 deletions _data-prepper/peer-forwarder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
layout: default
title: Peer Forwarder
nav_order: 12
---

# Peer Forwarder

Peer Forwarder is an HTTP service that performs peer forwarding of an `event` between Data Prepper nodes for aggregation. This HTTP service uses a hash-ring approach to aggregate events and determine which Data Prepper node should handle on a given trace before rerouting it to that node. Currently, Peer Forwarder is supported by the `aggregate`, `service_map_stateful`, and `otel_trace_raw` processors.
carolxob marked this conversation as resolved.
Show resolved Hide resolved

Peer Forwarder groups events based on the identification keys provided by the processors. For `service_map_stateful` and `otel_trace_raw`, the identification key is `traceId` by default and cannot be configured. The `aggregate` processor is configured using the `identification_keys` configuration option. From here, you can specify which keys to use for Peer Forwarder. See [Aggregate Processor page](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-plugins/aggregate-processor#identification_keys) for more information about identification keys.
carolxob marked this conversation as resolved.
Show resolved Hide resolved

Peer discovery allows Data Prepper to find other nodes that it will communicate with. Currently, peer discovery is currently provided by a static list, a DNS record lookup, or AWS Cloud Map.
carolxob marked this conversation as resolved.
Show resolved Hide resolved

## Discovery modes

The following sections provide information about discovery modes.

### Static

Static discovery mode allows a Data Prepper node to discover nodes using a list of IP addresses or domain names. See the following YAML file for an example of static discovery mode:

```yaml
peer_forwarder:4
discovery_mode: static
static_endpoints: ["data-prepper1", "data-prepper2"]
```

### DNS lookup

DNS discovery is preferred over static discovery when scaling out a Data Prepper cluster. DNS discovery configures a DNS provider to return a list of Data Prepper hosts when given a single domain name. This list is a [DNS A record](https://www.cloudflare.com/learning/dns/dns-records/dns-a-record/), and a list of IP addresses of a given domain. See the following YAML file example of DNS lookup:
carolxob marked this conversation as resolved.
Show resolved Hide resolved

```yaml
peer_forwarder:
discovery_mode: dns
domain_name: "data-prepper-cluster.my-domain.net"
```

### AWS Cloud Map

[AWS Cloud Map](https://docs.aws.amazon.com/cloud-map/latest/dg/what-is-cloud-map.html) provides API-based service discovery as well as DNS-based service discovery.

Peer Forwarder can use the API-based service discovery. To support this, you must have an existing namespace configured for API instance discovery. You can create a new one by following the instructions provided by the [AWS Cloud Map documentation](https://docs.aws.amazon.com/cloud-map/latest/dg/working-with-namespaces.html).
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved

Your Data Prepper configuration needs to include the following:
* `aws_cloud_map_namespace_name` – Set to your AWS Cloud Map namespace name.
* `aws_cloud_map_service_name` – Set to the service name within your specified namespace.
* `aws_region` – Set to the AWS Region where your namespace exists.
carolxob marked this conversation as resolved.
Show resolved Hide resolved
* `discovery_mode` – Set to `aws_cloud_map`.

Your Data Prepper configuration can optionally include the following:
* `aws_cloud_map_query_parameters` – Key-value pairs are used to filter the results based on the custom attributes attached to an instance. Results include only those instances that match all of the specified key-value pairs.

#### Example configuration

See the following YAML file example of AWS Cloud Map configuration:

```yaml
peer_forwarder:
discovery_mode: aws_cloud_map
aws_cloud_map_namespace_name: "my-namespace"
aws_cloud_map_service_name: "data-prepper-cluster"
aws_cloud_map_query_parameters:
instance_type: "r5.xlarge"
aws_region: "us-east-1"
```

### IAM policy with necessary permissions

Data Prepper must also be running with the necessary permissions. The following AWS Identity and Access Management (IAM) policy shows the necessary permissions:

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CloudMapPeerForwarder",
"Effect": "Allow",
"Action": "servicediscovery:DiscoverInstances",
"Resource": "*"
}
]
}
```


## Configuration

The following table provides optional configuration values.


| Value | Type | Description |
| ---- | --- | ----------- |
| `port` | Integer | A value between 0 and 65535 represents the port that the Peer Forwarder server is running on. Default value is `4994`. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `request_timeout` | Integer | Represents the request timeout duration in milliseconds for the Peer Forwarder HTTP server. Default value is `10000`. |
| `server_thread_count` | Integer | Represents the number of threads used by the Peer Forwarder server. Default value is `200`.|
| `client_thread_count` | Integer | Represents the number of threads used by the Peer Forwarder client. Default value is `200`.|
| `maxConnectionCount` | Integer | Represents the maximum number of open connections for the Peer Forwarder server. Default value is `500`. |
| `discovery_mode` | String | Represents the peer discovery mode to be used. Allowable values are `local_node`, `static`, `dns`, and `aws_cloud_map`. Defaults to `local_node`, which processes events locally. |
| `static_endpoints` | List | Contains the endpoints of all Data Prepper instances. Required if `discovery_mode` is set to `static`. |
| `domain_name` | String | Represents the single domain name to query DNS against. Typically used by creating multiple [DNS A records](https://www.cloudflare.com/learning/dns/dns-records/dns-a-record/) for the same domain. Required if `discovery_mode` is set to `dns`. |
| `aws_cloud_map_namespace_name` | String | Represents the AWS Cloud Map namespace when using AWS Cloud Map service discovery. Required if `discovery_mode` is set to `aws_cloud_map`. |
| `aws_cloud_map_service_name` | String | Represents the AWS Cloud Map service when using AWS Cloud Map service discovery. Required if `discovery_mode` is set to `aws_cloud_map`. |
| `aws_cloud_map_query_parameters` | Map | Key-value pairs used to filter the results based on the custom attributes attached to an instance. Only instances that match all the specified key-value pairs are returned. |
| `buffer_size` | Integer | Represents the maximum number of unchecked records the buffer accepts (the number of unchecked records equals the number of records written into the buffer plus the number of records that are still processing and not yet checked by the Checkpointing API). Default is `512`. |
| `batch_size` | Integer | Representing max number of records the buffer returns on read. Default is `48`. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `aws_region` | String | Represents the AWS Region to use `ACM`, `S3` or `AWS Cloud Map` and is required when any of the following conditions are met:<br> - The `use_acm_certificate_for_ssl` setting is `true`. <br> - Either `ssl_certificate_file` or `ssl_key_file` specifies an AWS S3 URI (for example, s3://mybucket/path/to/public.cert).<br> - The `discovery_mode` setting is `aws_cloud_map`. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `drain_timeout` | Duration | Represents the wait time for the Peer Forwarder to complete processing data before shutdown. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved

## SSL configuration

The following SSL configuration table provides optional SSL configuration values and allows you to set up trust manager for the peer forwarding client to connect to other Data Prepper instances.
carolxob marked this conversation as resolved.
Show resolved Hide resolved

| Value | Type | Description |
| ----- | ---- | ----------- |
| `ssl` | Boolean | Enables TLS/SSL. Default value is `true`. |
| `ssl_certificate_file`| String | Representings the SSL certificate chain file path or Amazon Simple Storage Service (Amazon S3) path. The following is an example of an Amazon S3 path: `s3://<bucketName>/<path>`. Defaults to the default certificate file,`config/default_certificate.pem`. See [Default Certificates](https://github.com/opensearch-project/data-prepper/tree/main/examples/certificates) for more information about how the certificate is generated. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `ssl_key_file`| String | Represents the SSL key file path or AWS S3 path. S3 path example `s3://<bucketName>/<path>`. Defaults to `config/default_private_key.pem` which is default private key file. Read more about how the private key file is generated at the [Default Certificates](https://github.com/opensearch-project/data-prepper/tree/main/examples/certificates) page. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `ssl_insecure_disable_verification` | Boolean | that disables the verification of the server's TLS certificate chain. Default value is `false`. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `ssl_fingerprint_verification_only` | Boolean | Disables the verification of the server's TLS certificate chain and instead verifies only the certificate fingerprint. Default value is `false`. |
| `use_acm_certificate_for_ssl` | Boolean | Enables TLS/SSL using the certificate and private key from AWS Certificate Manager (ACM). Default value is `false`. |
| `acm_certificate_arn`| String | Representings the ACM certificate Amazon Resource Name (ARN). The ACM certificate takes precedence over S3 or the local file system certificate. Required if `use_acm_certificate_for_ssl` is set to `true`. |
| `acm_private_key_password` | String | Represents the ACM private key password that will be used to decrypt the private key. If it's not provided, a random password will be generated. |
| `acm_certificate_timeout_millis` | Integer |representing the timeout in milliseconds required for ACM to get certificates. Default value is `120000`. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `aws_region` | String | Represents the AWS Region that uses `ACM`, `S3` or `AWS Cloud Map`. Required if `use_acm_certificate_for_ssl` is set to `true` or `ssl_certificate_file`. Also required when the `ssl_key_file` is set to the `AWS S3` path, or if `discovery_mode` is set to `aws_cloud_map`. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved

#### Example configuration

The following YAML file provides an example configuration:

```yaml
peer_forwarder:
ssl: true
ssl_certificate_file: "<cert-file-path>"
ssl_key_file: "<private-key-file-path>"
```

## Authentication

`Authentication` is optional and is a `Map` that enables mutual TLS (mTLS). It can either be `mutual_tls` or `unauthenticated`. The default value is `unauthenticated`. The following YAML file provides an exmaple for authentication:
carolxob marked this conversation as resolved.
Show resolved Hide resolved

```yaml
peer_forwarder:
authentication:
mutual_tls:
```

## Metrics

Core Peer Forwarder introduces the following custom metrics. All the metrics are prefixed by `core.peerForwarder`.

### Timer

Peer Forwarder's timer capability provides the following information:

- `requestForwardingLatency`: Measures latency of requests forwarded by the Peer Forwarder client.
- `requestProcessingLatency`: Measures latency of requests processed by the Peer Forwarder server.

### Counter

The following table provides counter metric options.

| Value | Description |
| ----- | ----------- |
| `requests`| Measures the total number of forwarded requests. |
| `requestsFailed`| Measures the total number of failed requests. Applies to requests with HTTP response code other than `200`. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `requestsSuccessful`| Measures the total number of successful requests. Applies to requests with HTTP response code `200`. |
| `requestsTooLarge`| Measures the total number of requests that are too large to be written to the Peer Forwarder buffer. Applies to requests with HTTP response code `413`. |
| `requestTimeouts`| Measures the total number of requests that time out while writing content to the Peer Forwarder buffer. Applies to requests with HTTP response code `408`. |
| `requestsUnprocessable`| Measures the total number of requests that fail due to an unprocessable entity. Applies to requests with HTTP response code `422`. |
| `badRequests`| Measures the total number of requests with bad request format. Applies to requests with HTTP response code `400`. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `recordsSuccessfullyForwarded`| Measures the total number of successfully forwarded records. |
| `recordsFailedForwarding`| Measures the total number of records fail to be forwarded. |
carolxob marked this conversation as resolved.
Show resolved Hide resolved
| `recordsToBeForwarded` | Measures the total number of records to be forwarded. |
| `recordsToBeProcessedLocally` | Measures the total number of records to be processed locally. |
| `recordsActuallyProcessedLocally`| Measures the total number of records actually processed locally. This value is the sum of `recordsToBeProcessedLocally` and `recordsFailedForwarding`. |
| `recordsReceivedFromPeers`| Measures the total number of records received from remote peers. |

### Gauge

`peerEndpoints` Measures the number of dynamically discovered peer Data Prepper endpoints. For `static` mode, the size is fixed.