-
Notifications
You must be signed in to change notification settings - Fork 210
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Documentation of performance test 1.3
Signed-off-by: Taylor Gray <[email protected]>
- Loading branch information
1 parent
34dd94b
commit 05eb6ed
Showing
5 changed files
with
291 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,291 @@ | ||
# Performance Test Results | ||
|
||
Following the release of Data Prepper 1.3, performance tests were run for each of the new processors that were added, as well as the equivalent Logstash filters. In the performance test results discussed below, the test environments and configurations are identical, except where the same option is not available for all applications. | ||
|
||
## Environment Details | ||
|
||
### Data Prepper Environment | ||
|
||
![](../../docs/images/PerformanceTestEnvironment.png) | ||
|
||
### Logstash Environment | ||
|
||
![](../../docs/images/PerformanceTestEnvironmentLogstash.png) | ||
|
||
Comparing the performance of the latest release of Data Prepper 1.3 against Logstash 7.13.2, the test was configured to simulate 10 clients to send requests as frequently as possible. Each request was contained a batch of 200 logs. The test ran for 20 minutes, measuring the latency and throughput. | ||
|
||
### AWS Resource Details | ||
|
||
| Name | EC2 Instance Type | Instance Count | vCPU | Memory (GiB) | JVM Memory Limit (GiB) | | ||
|-----------------------------------| :---------------- | -------------: |-----:| -----------: |-----------------------:| | ||
| Data Prepper | m5.xlarge | 1 | 4 | 16 | 8 | | ||
| Data Prepper Prometheus + Grafana | m5.xlarge | 1 | 4 | 16 | | | ||
| Data Prepper OpenSearch Cluster | i3.xlarge | 3 | 4 | 30.5 | | | ||
| Logstash | m5.xlarge | 1 | 4 | 16 | 8 | | ||
| Logstash Prometheus + Grafana | m5.xlarge | 1 | 4 | 16 | | | ||
| Logstash OpenSearch Cluster | i3.xlarge | 3 | 4 | 30.5 | | | ||
| Gatling | m5.2xlarge | 1 | 8 | 32 | | | ||
|
||
## Configurations | ||
|
||
The following log was sent to the `http` source of Data Prepper and Logstash in batches of 200. | ||
|
||
```json | ||
{ | ||
"log": "127.0.0.1 - frank [11/Dec/2021:10:00:00 -0700] \"GET /apache_pb.gif?paramKey1=paramValue1¶mKey2=paramValue2¶mKey3=paramValue3 HTTP/1.0\" 200 2326" | ||
} | ||
``` | ||
|
||
Ingesting this log through Data Prepper results in the following document being sent to OpenSearch. | ||
|
||
```json | ||
{ | ||
"_index": "data_prepper", | ||
"_type": "_doc", | ||
"_id": "BEz2BYABkvwoydO0vqsG", | ||
"_version": 1, | ||
"_score": null, | ||
"_source": { | ||
"log": "127-0-0-1 - FRANK [11/DEC/2021:10:00:00 -0700] \"GET /APACHE_PB-GIF?PARAMKEY1=PARAMVALUE1&PARAMKEY2=PARAMVALUE2&PARAMKEY3=PARAMVALUE3 HTTP/1-0\" 200 2326", | ||
"request": [ | ||
"/apache_pb.gif", | ||
"paramKey1=paramValue1¶mKey2=paramValue2¶mKey3=paramValue3" | ||
], | ||
"auth": "frank", | ||
"ident": "-", | ||
"response": "200", | ||
"bytes": "2326", | ||
"clientip": "127.0.0.1", | ||
"verb": "get", | ||
"httpversion": "1.0", | ||
"timestamp": "11/Dec/2021:10:00:00 -0700", | ||
"@timestamp": "2022-04-07T21:36:44.710Z", | ||
"query_params": { | ||
"paramKey2": "paramValue2", | ||
"paramKey3": "paramValue3", | ||
"paramKey1": "paramValue1" | ||
}, | ||
"copy_key": "127-0-0-1 - FRANK [11/DEC/2021:10:00:00 -0700] \"GET /APACHE_PB-GIF?PARAMKEY1=PARAMVALUE1&PARAMKEY2=PARAMVALUE2&PARAMKEY3=PARAMVALUE3 HTTP/1-0\" 200 2326" | ||
} | ||
} | ||
``` | ||
|
||
The configuration files include the following sources, processors, and sinks. | ||
|
||
### Sources | ||
* http | ||
|
||
### Processors | ||
* grok | ||
* date | ||
* substitute_string | ||
* uppercase_string | ||
* lowercase_string | ||
* trim_string | ||
* split_string | ||
* key_value | ||
* add_entries | ||
* rename_keys | ||
* copy_values | ||
* delete_entries | ||
|
||
### Sinks | ||
* opensearch | ||
|
||
### | ||
|
||
### Data Prepper pipeline configuration | ||
|
||
```yaml | ||
log-pipeline: | ||
workers: 12 | ||
delay: 50 | ||
source: | ||
http: | ||
ssl: false | ||
port: 2021 | ||
thread_count: 4 | ||
max_connection_count: 2000 | ||
request_timeout: 10000 | ||
buffer: | ||
bounded_blocking: | ||
buffer_size: 2000000 | ||
batch_size: 5000 | ||
processor: | ||
- grok: | ||
match: | ||
log: [ "%{COMMONAPACHELOG}" ] | ||
- date: | ||
from_time_received: true | ||
destination: "@timestamp" | ||
- substitute_string: | ||
entries: | ||
- source: "log" | ||
from: '\.' | ||
to: "-" | ||
- uppercase_string: | ||
with_keys: | ||
- "log" | ||
- trim_string: | ||
with_keys: | ||
- "log" | ||
- split_string: | ||
entries: | ||
- source: "request" | ||
delimiter: "?" | ||
- key_value: | ||
source: "/request/1" | ||
field_split_characters: "&" | ||
value_split_characters: "=" | ||
destination: "query_params" | ||
- lowercase_string: | ||
with_keys: | ||
- "verb" | ||
- add_entries: | ||
entries: | ||
- key: "entry1" | ||
value: "entry1value" | ||
- key: "entry2" | ||
value: "entry2value" | ||
- key: "entry3" | ||
value: "entry3value" | ||
- rename_keys: | ||
entries: | ||
- from_key: "entry1" | ||
to_key: "renameEntry1" | ||
- from_key: "entry2" | ||
to_key: "renameEntry2" | ||
- from_key: "entry3" | ||
to_key: "renameEntry3" | ||
- copy_values: | ||
entries: | ||
- from_key: "log" | ||
to_key: "copy_key" | ||
- delete_entries: | ||
with_keys: ["renameEntry1", "renameEntry2", "renameEntry3"] | ||
|
||
sink: | ||
- opensearch: | ||
hosts: [ "myOpenSearchEndpoint" ] | ||
index: "data_prepper" | ||
username: "username" | ||
password: "password" | ||
``` | ||
## Logstash Configuration | ||
``` | ||
input { | ||
http { | ||
ssl => false | ||
port => 2021 | ||
codec => "json" | ||
threads => 4 | ||
} | ||
} | ||
|
||
filter { | ||
grok { | ||
match => {"log" => "%{COMMONAPACHELOG}"} | ||
} | ||
|
||
mutate { | ||
gsub => ["log", "\.", "-"] | ||
} | ||
|
||
mutate { | ||
uppercase => [ "log" ] | ||
} | ||
|
||
mutate { | ||
strip => [ "log" ] | ||
} | ||
|
||
mutate { | ||
split => { "request" => "?" } | ||
} | ||
|
||
kv { | ||
field_split => "&" | ||
value_split => "=" | ||
source => "[request][1]" | ||
target => "query_params" | ||
} | ||
|
||
mutate { | ||
lowercase => [ "verb" ] | ||
} | ||
|
||
mutate { | ||
add_field => { | ||
"entry1" => "entry1value" | ||
"entry2" => "entry2value" | ||
"entry3" => "entry3value" | ||
} | ||
} | ||
|
||
mutate { | ||
rename => { | ||
"entry1" => "renameEntry1" | ||
"entry2" => "renameEntry2" | ||
"entry3" => "renameEntry3" | ||
} | ||
} | ||
|
||
mutate { | ||
copy => { | ||
"log" => "copy_key" | ||
} | ||
} | ||
|
||
mutate { | ||
remove_field => ["renameEntry1", "renameEntry2", "renameEntry3"] | ||
} | ||
} | ||
|
||
output { | ||
opensearch { | ||
hosts => ["myOpenSearchEndpoint"] | ||
user => "user" | ||
password => "password" | ||
ssl => true | ||
ssl_certificate_verification => false | ||
index => "logstash" | ||
} | ||
} | ||
``` | ||
|
||
The following changes were made to the `logstash.yml` file in order to replicate the Data Prepper configuration as much as possible | ||
|
||
```yaml | ||
pipeline.workers: 12 | ||
pipeline.batch.size: 5000 | ||
``` | ||
## Results | ||
### Data Prepper | ||
Gatling reported the following results for the test that was run for Data Prepper. These results show that `118,104` requests were succesfully sent to Data Prepper. Throughput can be measured as | ||
`118,104 * 200 (batch size) logs = 23,620,800 logs / 1,200 s = 19,684 logs/s`. The Gatling report also shows a distribution of latency on the side of the `http` source, with an average latency of `101 ms` | ||
|
||
![](../../docs/images/DataPrepperGatling.png) | ||
|
||
|
||
### Logstash | ||
|
||
Gatling reported the following results for the test that was run for Data Prepper. These results show that `75,709` requests were succesfully sent to Data Prepper. Throughput can be measured as | ||
`75,709 * 200 (batch size) logs = 15,141,800 logs / 1,200 s = 12,618 logs/s`. The Gatling report also shows a distribution of latency on the side of the `http` source, with an average latency of `158 ms` | ||
|
||
![](../../docs/images/LogstashGatling.png) | ||
|
||
When comparing these measurements side by side, the test shows that with this configuration, Data Prepper has a `56%` higher throughput than Logstash, and an average latency that is `56%` lower than Logstash. | ||
|
||
## Running tests | ||
|
||
If you wish to run performance tests on your own Data Prepper configuration, you can replicate the environment used. You can utilize the Gatling performance test suite in this repo on another EC2, and point Gatling to your running Data Prepper instance with the following command: | ||
|
||
``` | ||
./gradlew --rerun-tasks :performance-test:gatlingRun-org.opensearch.dataprepper.test.performance.FixedClientSimulation -Dhost="<target_url>" -DlogFile="/path/to/logFile" | ||
``` | ||
After all tests have completed HTML reports will be created in `data-prepper/performance-test/build/reports/gatling/<simulation-name>-<unix-timestamp>/index.html`. Further instructions on running performance tests and Gatling are available in the repository readme. |