Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting started data prepper #1480

Closed
wants to merge 31 commits into from
Closed
Changes from 5 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
e896071
Added cluster permissions to cluster permissions list.
carolxob Oct 3, 2022
12edb4a
Revert "Added cluster permissions to cluster permissions list."
carolxob Oct 3, 2022
d118fd6
Merged data prepper git started content
carolxob Oct 7, 2022
0269b9e
Added review feedback.
carolxob Oct 10, 2022
1efda4e
Apply suggestions from code review
carolxob Oct 10, 2022
12ae356
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
b7cd816
Apply suggestions from code review
carolxob Nov 8, 2022
ed3d70c
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
73a3a7b
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
9706254
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
fbf9115
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
b116941
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
6d2d027
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
c49be0e
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
69bd81c
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
bb43259
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
25ecd54
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
8c39d43
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
1053971
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
4bab8e5
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
32b8951
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
5b8a566
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
b82397e
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
7839247
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
67115b5
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
597ed55
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
1039837
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
42b4bb2
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
60fa7e1
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
a72627a
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
3f8fd64
Update _clients/data-prepper/get-started.md
carolxob Nov 8, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 81 additions & 5 deletions _clients/data-prepper/get-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,42 @@ nav_order: 1

Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages.

## 1. Install Data Prepper
If you are migrating from Open Distro Data Prepper, visit the [Migrating from Open Distro]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/migrate-open-distro/) page.

To use the Docker image, pull it like any other image:
## 1. Installing Data Prepper

```bash
There are two ways to install Data Prepper for running:
carolxob marked this conversation as resolved.
Show resolved Hide resolved

1. Run the Docker image
carolxob marked this conversation as resolved.
Show resolved Hide resolved
2. Build from source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to install from tar.gz now. (Our repo is out-of-date on this).

I'm not sure we want to have "Build from source" in the opensearch.org documentation. It is mostly useful in certain situations (bleeding-edge, contributions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable Do you have a link to the tar.gz file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tar.gz changes with every build. I think the best solution is to point the customer to the download page: https://opensearch.org/downloads.html#data-prepper

That will have the latest version available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Naarcha-AWS FYI, there is some new information here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable Noted - I am going to make a separate issue to address the tar.gz installation instructions.

carolxob marked this conversation as resolved.
Show resolved Hide resolved

The easiest way to use Data Prepper is by running the Docker image. We suggest
you use this approach if you have [Docker](https://www.docker.com) available.

You can pull the Docker image:

```
docker pull opensearchproject/data-prepper:latest
```

## 2. Define a pipeline
If you have special requirements that require you build from source, or if you
carolxob marked this conversation as resolved.
Show resolved Hide resolved
want to contribute, please see the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md).
carolxob marked this conversation as resolved.
Show resolved Hide resolved

## 2. Configuring Data Prepper

You must configure Data Prepper with a pipeline before running it.

You will configure two files:

* `data-prepper-config.yaml`
* `pipelines.yaml`

Depending on what you want to do, we have a few different guides to configuring Data Prepper.
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved
* [Trace Analytics]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/trace-analytics/) - Learn how to setup Data Prepper for trace observability
carolxob marked this conversation as resolved.
Show resolved Hide resolved
* [Log Ingestion]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/log-analytics/) - Learn how to setup Data Prepper for log observability
carolxob marked this conversation as resolved.
Show resolved Hide resolved
* [Simple Pipeline]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/simple-pipelines) - Learn the basics of Data Prepper pipelines with some simple configurations.
carolxob marked this conversation as resolved.
Show resolved Hide resolved

## 3. Defining a Pipeline
carolxob marked this conversation as resolved.
Show resolved Hide resolved

Create a Data Prepper pipeline file, `pipelines.yaml`, with the following configuration:

Expand All @@ -31,7 +58,7 @@ simple-sample-pipeline:
- stdout:
```

## 3. Start Data Prepper
## 4. Running Data Prepper

Run the following command with your pipeline configuration YAML.

Expand Down Expand Up @@ -61,3 +88,52 @@ After starting Data Prepper, you should see log output and some UUIDs after a fe
e51e700e-5cab-4f6d-879a-1c3235a77d18
b4ed2d7e-cf9c-4e9d-967c-b18e8af35c90
```
The remainder of this page shows examples for running from the Docker image. If you
carolxob marked this conversation as resolved.
Show resolved Hide resolved
built from source, refer to [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) for more information.
carolxob marked this conversation as resolved.
Show resolved Hide resolved

However you configure your pipeline, you will run Data Prepper the same way. You run the Docker
image and supply both the `pipelines.yaml` and `data-prepper-config.yaml` files.

For Data Prepper 2.0 or above, use this command:
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved

```
docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest
```

For Data Prepper before version 2.0:
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved

```
docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x
```

Once Data Prepper is running, it will process data until it is shutdown. Once you are done, shut it down with
carolxob marked this conversation as resolved.
Show resolved Hide resolved

```
curl -X POST http://localhost:4900/shutdown
```
### Additional Configurations
carolxob marked this conversation as resolved.
Show resolved Hide resolved

For Data Prepper 2.0 or above, Log4j 2 configuration file is read from `config/log4j2.properties` in the application's home directory.
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved
By default, it's using `log4j2-rolling.properties` in the *shared-config* directory.
carolxob marked this conversation as resolved.
Show resolved Hide resolved

For Data Prepper before version 2.0, optionally add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command if you would
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved
like to pass a custom log4j2 properties file. If no properties file is provided, Data Prepper will default to the log4j2.properties file in the *shared-config* directory.

## Next Steps
carolxob marked this conversation as resolved.
Show resolved Hide resolved

All Data Prepper instances expose a few APIs. The [API documentation]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/api/) outlines these APIs and
how to configure the server.

Trace Analytics is an important Data Prepper use case. If you haven't yet configure it,
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Trace Analytics is an important Data Prepper use case. If you haven't yet configure it,
Trace analytics is an important Data Prepper use case. If you haven't yet configured it,

please visit the [Trace Analytics documentation]{{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/trace-analytics/).
carolxob marked this conversation as resolved.
Show resolved Hide resolved

Log Ingestion is also an important Data Prepper use case. To learn more, visit the [Log Ingestion Documentation]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/log-analytics/).
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Log Ingestion is also an important Data Prepper use case. To learn more, visit the [Log Ingestion Documentation]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/log-analytics/).
Log ingestion is also an important Data Prepper use case. To learn more, visit the [Log Ingestion Documentation]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/log-analytics/).


To run Data Prepper with a Logstash configuration, please visit the [Logstash Migration Guide]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/logstash-migration-guide/).
carolxob marked this conversation as resolved.
Show resolved Hide resolved

To monitor Data Prepper, please read the [Monitoring]({{site.url}}{{site.baseurl}}/opensearch/clients/data-prepper/monitoring/) page.
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved

## Other Examples
carolxob marked this conversation as resolved.
Show resolved Hide resolved

We have other several Docker [examples](https://github.com/opensearch-project/data-prepper/tree/main/examples/)
carolxob marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor Author

@carolxob carolxob Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We have other several Docker [examples](https://github.com/opensearch-project/data-prepper/tree/main/examples/)
For other Docker examples that allow you to run Data Prepper in different scenarios, see [examples] (https://github.com/opensearch-project/data-prepper/tree/main/examples/).

that allow you to run Data Prepper in different scenarios.
carolxob marked this conversation as resolved.
Show resolved Hide resolved
carolxob marked this conversation as resolved.
Show resolved Hide resolved