If you are migrating from Open Distro Data Prepper, visit the Migrating from Open Distro page. Otherwise, please continue.
There are two ways to install Data Prepper for running.
- Run the Docker image
- Build from source
The easiest way to use Data Prepper is by running the Docker image. We suggest you use this approach if you have Docker available.
You can pull the Docker image:
docker pull opensearchproject/data-prepper:latest
If you have special requirements that require you build from source, or if you want to contribute, please see the Developer Guide.
You must configure Data Prepper with a pipeline before running it.
You will configure two files:
data-prepper-config.yaml
pipelines.yaml
Depending on what you want to do, we have a few different guides to configuring Data Prepper.
- Trace Analytics - Learn how to setup Data Prepper for trace observability
- Log Ingestion - Learn how to setup Data Prepper for log observability
- Simple Pipeline - Learn the basics of Data Prepper pipelines with some simple configurations.
The remainder of this page shows examples for running from the Docker image. If you built from source, refer to Developer Guide for more information.
However you configure your pipeline, you will run Data Prepper the same way. You run the Docker
image and supply both the pipelines.yaml
and data-prepper-config.yaml
files.
For Data Prepper 2.0 or above, use this command:
docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest
For Data Prepper before version 2.0:
docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:latest
Once Data Prepper is running, it will process data until it is shutdown. Once you are done, shut it down with
curl -X POST http://localhost:4900/shutdown
For Data Prepper 2.0 or above, Log4j 2 configuration file is read from config/log4j2.properties
in the application's home directory.
By default, it's using log4j2-rolling.properties
in the shared-config directory.
For Data Prepper before version 2.0, optionally add "-Dlog4j.configurationFile=config/log4j2.properties"
to the command if you would
like to pass a custom log4j2 properties file. If no properties file is provided, Data Prepper will default to the log4j2.properties file in the shared-config directory.
All Data Prepper instances expose a few APIs. The API documentation outlines these APIs and how to configure the server.
Trace Analytics is an important Data Prepper use case. If you haven't yet configure it, please visit the Trace Analytics documentation.
Log Ingestion is also an important Data Prepper use case. To learn more, visit the Log Ingestion Documentation.
To run Data Prepper with a Logstash configuration, please visit the Logstash Migration Guide.
To monitor Data Prepper, please read the Monitoring page.
We have other several Docker examples that allow you to run Data Prepper in different scenarios.