Skip to content

Commit

Permalink
Merge branch 'main' into tweak-compatiblity-chart
Browse files Browse the repository at this point in the history
Signed-off-by: Naarcha-AWS <[email protected]>
  • Loading branch information
Naarcha-AWS authored Mar 19, 2024
2 parents d3b5d96 + c83584b commit 88e2975
Show file tree
Hide file tree
Showing 19 changed files with 240 additions and 25 deletions.
2 changes: 1 addition & 1 deletion _api-reference/document-apis/index-document.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Parameter | Type | Description | Required
&lt;_id&gt; | String | A unique identifier to attach to the document. To automatically generate an ID, use `POST <target>/doc` in your request instead of PUT. | No
if_seq_no | Integer | Only perform the index operation if the document has the specified sequence number. | No
if_primary_term | Integer | Only perform the index operation if the document has the specified primary term.| No
op_type | Enum | Specifies the type of operation to complete with the document. Valid values are `create` (create the index if it doesn't exist) and `index`. If a document ID is included in the request, then the default is `index`. Otherwise, the default is `create`. | No
op_type | Enum | Specifies the type of operation to complete with the document. Valid values are `create` (index a document only if it doesn't exist) and `index`. If a document ID is included in the request, then the default is `index`. Otherwise, the default is `create`. | No
pipeline | String | Route the index operation to a certain pipeline. | No
routing | String | value used to assign the index operation to a specific shard. | No
refresh | Enum | If true, OpenSearch refreshes shards to make the operation visible to searching. Valid options are `true`, `false`, and `wait_for`, which tells OpenSearch to wait for a refresh before executing the operation. Default is false. | No
Expand Down
2 changes: 1 addition & 1 deletion _clients/OpenSearch-dot-net.md
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ internal class Program
FirstName = "Paulo",
LastName = "Santos",
Gpa = 3.93,
GradYear = 2021 };v
GradYear = 2021 };
var response = client.Index<StringResponse>("students", "100",
PostData.Serializable(student));
Console.WriteLine(response.Body);
Expand Down
3 changes: 3 additions & 0 deletions _clients/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,9 @@ OpenSearch provides clients for the following programming languages and platform
* [OpenSearch .NET clients]({{site.url}}{{site.baseurl}}/clients/dot-net/)
* **Rust**
* [OpenSearch Rust client]({{site.url}}{{site.baseurl}}/clients/rust/)
* **Hadoop**
* [OpenSearch Hadoop client](https://github.com/opensearch-project/opensearch-hadoop)


For a client compatibility matrix, see the COMPATIBILITY.md file in the client's repository.
{: .note}
Expand Down
2 changes: 1 addition & 1 deletion _data-prepper/managing-data-prepper/core-apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,4 +83,4 @@ processorShutdownTimeout: "PT15M"
sinkShutdownTimeout: 30s
```

The values for these parameters are parsed into a `Duration` object through the [Data Prepper Duration Deserializer](https://github.com/opensearch-project/data-prepper/tree/main/data-prepper-core/src/main/java/org/opensearch/dataprepper/parser/DataPrepperDurationDeserializer.java).
The values for these parameters are parsed into a `Duration` object through the [Data Prepper Duration Deserializer](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-pipeline-parser/src/main/java/org/opensearch/dataprepper/pipeline/parser/DataPrepperDurationDeserializer.java).
Original file line number Diff line number Diff line change
Expand Up @@ -60,5 +60,5 @@ And then you run the `add_entries` processor using the example pipeline, it adds
{"message": "hello", "newMessage": 3}
```

> If `newMessage` already exists, its existing value is overwritten with a value of `3`.
If `newMessage` already exists, its existing value is overwritten with a value of `3`.

Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ You can configure the `copy_values` processor with the following options.
| `entries` | Yes | A list of entries to be copied in an event. |
| `from_key` | Yes | The key of the entry to be copied. |
| `to_key` | Yes | The key of the new entry to be added. |
| `overwrite_if_key_exists` | No | When set to `true`, the existing value is overwritten if `key` already exists in the event. The default value is `false`. |
| `overwrite_if_to_key_exists` | No | When set to `true`, the existing value is overwritten if `key` already exists in the event. The default value is `false`. |

## Usage

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ nav_order: 75

# otel_trace

The `otel_trace` processor completes trace-group-related fields in all incoming Data Prepper span records by state caching the root span information for each `tradeId`.
The `otel_trace` processor completes trace-group-related fields in all incoming Data Prepper span records by state caching the root span information for each `traceId`.

## Parameters

Expand Down Expand Up @@ -41,4 +41,4 @@ The following table describes common [Abstract processor](https://github.com/ope
The `otel_trace` processor includes the following custom metrics:

* `traceGroupCacheCount`: The number of trace groups in the trace group cache.
* `spanSetCount`: The number of span sets in the span set collection.
* `spanSetCount`: The number of span sets in the span set collection.
178 changes: 178 additions & 0 deletions _ingest-pipelines/processors/community_id.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
layout: default
title: Community ID
parent: Ingest processors
nav_order: 55
---

# Community ID processor

The `community_id` processor is used to generate the community ID flow hash for network flow tuples. The community ID flow hash algorithm is defined in the [community ID specification](https://github.com/corelight/community-id-spec). The processor-generated hash value can be used to correlate all related network events so that you can filter the network flow data by the hash value or generate statistics by aggregating on the hash field. The processor supports the TCP, UDP, SCTP, ICMP, and IPv6-ICMP network protocols. The SHA-1 hash algorithm is used to generate the hash value.

The following is the `community_id` processor syntax:

```json
{
"community_id": {
"source_ip_field": "source_ip",
"source_port_field": "source_port",
"destination_ip_field": "destination_ip",
"destination_port_field": "destination_port",
"iana_protocol_number_field": "iana_protocol_number",
"source_port_field": "source_port",
"target_field": "community_id"
}
}
```
{% include copy-curl.html %}

## Configuration parameters

The following table lists the required and optional parameters for the `community_id` processor.

Parameter | Required/Optional | Description |
|-----------|-----------|-----------|
`source_ip_field` | Required | The name of the field containing the source IP address. |
`source_port_field` | Optional | The name of the field containing the source port address. If the network protocol is TCP, UDP, or SCTP, then the field is required. Otherwise, it is not required.|
`destination_ip_field` | Required | The name of the field containing the destination IP address. |
`destination_port_field` | Optional | The name of the field containing the destination port address. If the network protocol is TCP, UDP, or SCTP, then the field is required. Otherwise, it is not required. |
`iana_protocol_number` | Optional | The name of the field containing the protocol number defined by the Internet Assigned Numbers Authority (IANA). The supported values are 1 (ICMP), 6 (TCP), 17 (UDP), 58 (IPv6-ICMP), and 132 (SCTP). |
`protocol_field` | Optional | The name of the field containing the protocol name. If `iana_protocol_number` is not set, then the field is required. Otherwise, it is not required. |
`icmp_type_field` | Optional | The name of the field containing the ICMP message type. Required when the protocol is ICMP or IPv6-ICMP. |
`icmp_code_field` | Optional | The name of the field containing the ICMP message code. For certain ICMP message types that do not have a code, the field is optional. Otherwise, it is required. |
`seed` | Optional | The seed for generating the community ID hash. The value must be between 0 and 65535. |
`target_field` | Optional | The name of the field in which to store the community ID hash value. Default target field is `community_id`. |
`ignore_missing` | Optional | Specifies whether the processor should exit quietly if one of the required fields is missing. Default is `false`. |
`description` | Optional | A brief description of the processor. |
`if` | Optional | A condition for running the processor. |
`ignore_failure` | Optional | If set to `true`, then failures are ignored. Default is `false`. |
`on_failure` | Optional | A list of processors to run if the processor fails. |
`tag` | Optional | An identifier tag for the processor. Useful for debugging in order to distinguish between processors of the same type. |

## Using the processor

Follow these steps to use the processor in a pipeline.

**Step 1: Create a pipeline**

The following query creates a pipeline named `community_id_pipeline` that uses the `community_id` processor to generate a hash value for the network flow tuple:

```json
PUT /_ingest/pipeline/commnity_id_pipeline
{
"description": "generate hash value for the network flow tuple",
"processors": [
{
"community_id": {
"source_ip_field": "source_ip",
"source_port_field": "source_port",
"destination_ip_field": "destination_ip",
"destination_port_field": "destination_port",
"iana_protocol_number_field": "iana_protocol_number",
"target_field": "community_id"
}
}
]
}
```
{% include copy-curl.html %}

**Step 2 (Optional): Test the pipeline**

It is recommended that you test your pipeline before ingesting documents.
{: .tip}

To test the pipeline, run the following query:

```json
POST _ingest/pipeline/commnity_id_pipeline/_simulate
{
"docs": [
{
"_index": "testindex1",
"_id": "1",
"_source": {
"source_ip": "66.35.250.204",
"source_port": 80,
"destination_ip": "128.232.110.120",
"destination_port": 34855,
"iana_protocol_number": 6
}
}
]
}
```
{% include copy-curl.html %}

#### Response

The following example response confirms that the pipeline is working as expected:

```json
{
"docs": [
{
"doc": {
"_index": "testindex1",
"_id": "1",
"_source": {
"community_id": "1:LQU9qZlK+B5F3KDmev6m5PMibrg=",
"destination_ip": "128.232.110.120",
"destination_port": 34855,
"source_port": 80,
"iana_protocol_number": 6,
"source_ip": "66.35.250.204"
},
"_ingest": {
"timestamp": "2024-03-11T02:17:22.329823Z"
}
}
}
]
}
```

**Step 3: Ingest a document**

The following query ingests a document into an index named `testindex1`:

```json
PUT testindex1/_doc/1?pipeline=commnity_id_pipeline
{
"source_ip": "66.35.250.204",
"source_port": 80,
"destination_ip": "128.232.110.120",
"destination_port": 34855,
"iana_protocol_number": 6
}
```
{% include copy-curl.html %}

#### Response

The request indexes the document into the `testindex1` index:

```json
{
"_index": "testindex1",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
```

**Step 4 (Optional): Retrieve the document**

To retrieve the document, run the following query:

```json
GET testindex1/_doc/1
```
{% include copy-curl.html %}
2 changes: 1 addition & 1 deletion _ingest-pipelines/processors/convert.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The following table lists the required and optional parameters for the `convert`
Parameter | Required/Optional | Description |
|-----------|-----------|-----------|
`field` | Required | The name of the field containing the data to be converted. Supports [template snippets]({{site.url}}{{site.baseurl}}/ingest-pipelines/create-ingest/#template-snippets). |
`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, `ip`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case) and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. |
`type` | Required | The type to convert the field value to. The supported types are `integer`, `long`, `float`, `double`, `string`, `boolean`, and `auto`. If the `type` is `boolean`, the value is set to `true` if the field value is a string `true` (ignoring case) and to `false` if the field value is a string `false` (ignoring case). If the value is not one of the allowed values, an error will occur. |
`description` | Optional | A brief description of the processor. |
`if` | Optional | A condition for running the processor. |
`ignore_failure` | Optional | Specifies whether the processor continues execution even if it encounters errors. If set to `true`, failures are ignored. Default is `false`. |
Expand Down
2 changes: 2 additions & 0 deletions _ingest-pipelines/processors/index-processors.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Processor type | Description
:--- | :---
`append` | Adds one or more values to a field in a document.
`bytes` | Converts a human-readable byte value to its value in bytes.
`community_id` | Generates a community ID flow hash algorithm for the network flow tuples.
`convert` | Changes the data type of a field in a document.
`copy` | Copies an entire object in an existing field to another field.
`csv` | Extracts CSVs and stores them as individual fields in a document.
Expand All @@ -52,6 +53,7 @@ Processor type | Description
`lowercase` | Converts text in a specific field to lowercase letters.
`pipeline` | Runs an inner pipeline.
`remove` | Removes fields from a document.
`remove_by_pattern` | Removes fields from a document by field pattern.
`script` | Runs an inline or stored script on incoming documents.
`set` | Sets the value of a field to a specified value.
`sort` | Sorts the elements of an array in ascending or descending order.
Expand Down
2 changes: 1 addition & 1 deletion _install-and-configure/install-dashboards/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ redirect_from:

# Managing OpenSearch Dashboards plugins

OpenSearch Dashboards provides a command line tool called `opensearch-plugin` for managing plugins. This tool allows you to:
OpenSearch Dashboards provides a command line tool called `opensearch-dashboards-plugin` for managing plugins. This tool allows you to:

- List installed plugins.
- Install plugins.
Expand Down
2 changes: 2 additions & 0 deletions _install-and-configure/install-opensearch/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,3 +120,5 @@ Property | Description
`opensearch.xcontent.fast_double_writer=[true|false]` | By default, OpenSearch serializes floating-point numbers using the default implementation provided by the Java Runtime Environment. Set this value to `true` to use the Schubfach algorithm, which is faster but may lead to small differences in precision. Default is `false`. |
`opensearch.xcontent.name.length.max=<value>` | By default, OpenSearch does not impose any limits on the maximum length of the JSON/YAML/CBOR/Smile field names. To protect your cluster against potential DDoS or memory issues, you can set the `opensearch.xcontent.name.length.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.name.length.max=50000`. |
`opensearch.xcontent.depth.max=<value>` | By default, OpenSearch does not impose any limits on the maximum nesting depth for JSON/YAML/CBOR/Smile documents. To protect your cluster against potential DDoS or memory issues, you can set the `opensearch.xcontent.depth.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.name.length.max=1000`. |
`opensearch.xcontent.codepoint.max=<value>` | By default, OpenSearch imposes a limit of `52428800` on the maximum size of the YAML documents (in code points). To protect your cluster against potential DDoS or memory issues, you can change the `opensearch.xcontent.codepoint.max` system property to a reasonable limit (the maximum is 2,147,483,647). For example, `-Dopensearch.xcontent.codepoint.max=5000000`. |

Loading

0 comments on commit 88e2975

Please sign in to comment.