Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log analytics getting started #573

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ You will configure two files:
Depending on what you want to do, we have a few different guides to configuring Data Prepper.

* [Trace Analytics](trace_analytics.md) - Learn how to setup Data Prepper for trace observability
* [Log Ingestion](log_analytics.md) - Learn how to setup Data Prepper for log observability
* [Simple Pipeline](simple_pipelines.md) - Learn the basics of Data Prepper pipelines with some simple configurations.

## Running
Expand Down Expand Up @@ -67,6 +68,8 @@ how to configure the server.
Trace Analytics is an important Data Prepper use case. If you haven't yet configure it,
please visit the [Trace Analytics documentation](trace_analytics.md).

Log Ingestion is also an important Data Prepper use case. To learn more, visit the [Log Ingestion Documentation](log_analytics.md).

To monitor Data Prepper, please read the [Monitoring](monitoring.md) page.

## Other Examples
Expand Down
Binary file removed docs/images/Components.jpg
Binary file not shown.
Binary file added docs/images/LogAnalyticsComponents.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/images/Log_Ingestion_FluentBit_DataPrepper_OpenSearch.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/TraceAnalyticsComponents.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
128 changes: 128 additions & 0 deletions docs/log_analytics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Log Analytics

## Introduction

Data Prepper is an extendable, configurable, and scalable solution for log ingestion into OpenSearch and Amazon OpenSearch Service.
Currently, Data Prepper is focused on receiving logs from [FluentBit](https://fluentbit.io/) via the
[Http Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md), and processing those logs with a [Grok Prepper](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-prepper/README.md) before ingesting them into OpenSearch through the [OpenSearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md).

Here is all of the components for log analytics with FluentBit, Data Prepper, and OpenSearch:
<br />
<br />
![Log Analytics Components](images/LogAnalyticsComponents.png)
<br />
<br />

In your application environment you will have to run FluentBit.
FluentBit can be containerized through Kubernetes, Docker, or Amazon ECS.
It can also be run as an agent on EC2.
You should configure the [FluentBit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to export log data to Data Prepper.
You will then have to deploy Data Prepper as an intermediate component and configure it to send
the enriched log data to your OpenSearch cluster or Amazon OpenSearch Service domain. From there, you can
use OpenSearch Dashboards to perform more intensive visualization and analysis.

## Log Analytics Pipeline

Log analytic pipelines in Data Prepper are extremely customizable. A simple pipeline is shown below.

![](images/Log_Ingestion_FluentBit_DataPrepper_OpenSearch.jpg)

## Http Source

The [Http Source](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/http-source/README.md) accepts log data from FluentBit.
More specifically, this source accepts log data in a JSON array format.
This source supports industry-standard encryption in the form of TLS/HTTPS and HTTP basic authentication.

## Preppers

The Data Prepper 1.2 release will come with a [Grok Prepper](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/grok-prepper/README.md).
The Grok Prepper can be an invaluable tool to structure and extract important fields from your logs in order to make them more queryable.

The Grok Prepper comes with a wide variety of [default patterns](https://github.com/thekrakken/java-grok/blob/master/src/main/resources/patterns/patterns) that match against common log formats like apcahe logs or syslogs,
but can easily accept any custom patterns that cater to your specific log format.

There are a lot of complex Grok features that will not be discussed here, so please read the documentation if you are interested.

## OpenSearch sink

We have a generic sink that writes the data to OpenSearch as the destination. The [opensearch sink](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/README.md) has configuration options related to an OpenSearch cluster like endpoint, SSL/Username, index name, index template, index state management, etc.

## Pipeline Configuration
sbayer55 marked this conversation as resolved.
Show resolved Hide resolved

Create a file, `pipeline.yaml`. Paste the following example pipeline in there.

```yaml
log-pipeline:
source:
http:
# This is a simple http source with SSL disabled.
# However, you can easily enable SSL and http basic authentication for this source
ssl: false
# The default port that will listen for incoming logs
port: 2021
prepper:
- grok:
match:
# This will match logs with a "log" key against the COMMONAPACHELOG pattern (ex: { "log": "actual apache log..." } )
# You should change this to match what your logs look like. See the grok documenation to get started.
log: [ "%{COMMONAPACHELOG}" ]
sink:
- opensearch:
hosts: [ "https://localhost:9200" ]
# Change to your credentials
username: "admin"
password: "admin"
# Add a certificate file if you are accessing an OpenSearch cluster with a self-signed certificate
#cert: /path/to/cert
# If you are connecting to an Amazon OpenSearch Service domain without
# Fine-Grained Access Control, enable these settings. Comment out the
# username and password above.
#aws_sigv4: true
#aws_region: us-east-1
# Since we are grok matching for apache logs, it makes sense to send them to an OpenSearch index named apache_logs.
# You should change this to correspond with how your OpenSearch indices are set up.
index: apache_logs
```

This pipeline configuration is an example of apache log ingestion. Don't forget that you can easily configure the Grok Prepper for your own custom logs.

You will need to modify the configuration above for your OpenSearch cluster.

The main changes you will need to make are:

* `hosts` - Set to your hosts
* `index` - Change this to the OpenSearch index you want to send logs to
* `username`- Provide the OpenSearch username
* `password` - Provide your OpenSearch password
* `aws_sigv4` - If you use Amazon OpenSearch Service with AWS signing, set this to true. It will sign requests with the default AWS credentials provider.
* `aws_region` - If you use Amazon OpenSearch Service with AWS signing, set this value to your region.
## FluentBit

You will have to run FluentBit in your service environment. You can find the installation guide of FluentBit [here](https://docs.fluentbit.io/manual/installation/getting-started-with-fluent-bit).
Please ensure that you can configure the [FluentBit http output plugin](https://docs.fluentbit.io/manual/pipeline/outputs/http) to your Data Prepper Http Source. Below is an example `fluent-bit.conf` that tails a log file named `test.log` and forwards it to a locally running Data Prepper's http source, which runs
by default on port 2021. Note that you should adjust the file `path`, output `Host` and `Port` according to how and where you have FluentBit and Data Prepper running.

```
[INPUT]
name tail
refresh_interval 5
path test.log
read_from_head true

[OUTPUT]
Name http
Match *
Host localhost
Port 2021
URI /log/ingest
Format json
```

## Next Steps

Follow the [Log Ingestion Demo Guide](../examples/log-ingestion/log_ingestion_demo_guide.md) to get a specific example of apache log ingestion from `FluentBit -> Data Prepper -> OpenSearch` running through Docker.

In the future, Data Prepper will contain additional sources and preppers which will make more complex log analytic pipelines available. Check out our [Roadmap](https://github.com/opensearch-project/data-prepper/projects/1) to see what is coming.

If there is a specifc source, prepper, or sink that you would like to include in your log analytic workflow, and it is not currently on the Roadmap, please bring it to our attention by making a Github issue. Additionally, if you
are interested in contributing, see our [Contribuing Guidelines](../CONTRIBUTING.md) as well as our [Developer Guide](developer_guide.md) and [Plugin Development Guide](plugin_development.md).
4 changes: 2 additions & 2 deletions docs/trace_analytics.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ The transformed trace data is the visualized using the
[Trace Analytics OpenSearch Dashboards plugin](https://opensearch.org/docs/monitoring-plugins/trace/ta-dashboards/),
which provides at-a-glance visibility into your application performance, along with the ability to drill down on individual traces.

Here is how all the components work in trace analytics,
Here is how all the components work in trace analytics:
<br />
<br />
![Trace Analytics Pipeline](images/Components.jpg)
![Trace Analytics Pipeline](images/TraceAnalyticsComponents.png)
<br />
<br />

Expand Down
73 changes: 73 additions & 0 deletions examples/log-ingestion/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
version: '3'
services:
fluent-bit:
container_name: fluent-bit
image: fluent/fluent-bit
volumes:
- ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
- ./test.log:/var/log/test.log
networks:
- opensearch-net
opensearch-node1:
image: opensearchproject/opensearch:latest
container_name: opensearch-node1
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
hard: 65536
volumes:
- opensearch-data1:/usr/share/opensearch/data
ports:
- 9200:9200
- 9600:9600 # required for Performance Analyzer
networks:
- opensearch-net
opensearch-node2:
image: opensearchproject/opensearch:latest
container_name: opensearch-node2
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_master_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- opensearch-data2:/usr/share/opensearch/data
networks:
- opensearch-net
opensearch-dashboards:
image: opensearchproject/opensearch-dashboards:latest
container_name: opensearch-dashboards
ports:
- 5601:5601
expose:
- "5601"
environment:
OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]'
networks:
- opensearch-net

volumes:
opensearch-data1:
opensearch-data2:

networks:
opensearch-net:
13 changes: 13 additions & 0 deletions examples/log-ingestion/fluent-bit.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[INPUT]
name tail
refresh_interval 5
path /var/log/test.log
read_from_head true

[OUTPUT]
Name http
Match *
Host data-prepper
Port 2021
URI /log/ingest
Format json
Loading