Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulnerability detection state index stress testing #31

Closed
5 tasks done
Tracked by #5
havidarou opened this issue Sep 21, 2023 · 7 comments
Closed
5 tasks done
Tracked by #5

Vulnerability detection state index stress testing #31

havidarou opened this issue Sep 21, 2023 · 7 comments
Assignees
Labels
level/task Task issue type/enhancement Enhancement issue

Comments

@havidarou
Copy link
Member

havidarou commented Sep 21, 2023

Description

We want to ensure our default indexer settings provide good performance for the new vulnerability index. We need to:

  • measure the scenarios with the min required hardware described in the documentation
  • measure the performance increase in the same scenarios when we increase the hardware capacity
  • measure the queries the vulnerabilities dashboard will execute against the index

Scenarios

The following scenarios are our estimation of the index size for each deployment. We do not need to deploy such scenarios, just test the index size for each one:

  • Scenario 1: A small deployment with:
    • 1 manager,
    • 2K agents,
    • 1k packages/agent
    • ~100 cve/agent
      will require an index with 200K indexed cves.
  • Scenario 2: a big deployment with:
    • 25 managers
    • 2K agents/node
    • 1k packages/agent
    • ~100 cve/agent
      will require an index with 5M indexed cves.

Tasks

  • Create a test dataset of 200k cves for the scenario 1
  • Create a test dataset of 5M cves for the scenario 2
  • On a single indexer with 8GB of RAM, measure each query from the vulnerability dashboard
  • On two indexer with 8GB of RAM each, measure each query from the vulnerability dashboard
  • Compare the results

During the measure process, we must take notes about what performance tuning we can make to increase the query speed.

@havidarou havidarou added level/task Task issue type/enhancement Enhancement issue labels Sep 21, 2023
@gdiazlo gdiazlo mentioned this issue Sep 21, 2023
5 tasks
@wazuhci wazuhci moved this to Triage in Release 4.8.0 Sep 21, 2023
@wazuhci wazuhci moved this from Triage to Backlog in Release 4.8.0 Sep 25, 2023
@wazuhci wazuhci moved this from Backlog to In progress in Release 4.8.0 Sep 26, 2023
@AlexRuiz7
Copy link
Member

AlexRuiz7 commented Sep 26, 2023

Both scenarios continue the work performed in #6 (index-template, wazuh-states-vulnerabilites index and sample events for testing).

Dataset generation

To generate the dataset, I re-used the sample events in #6 (comment) and created a script to load the required number of documents into the index. The sample events have been converted to bulk format for better performance during the indexing.

push_n_events.sh

#!/bin/bash

# Define the OpenSearch URL
OPENSEARCH_URL="https://indexer:9200"

# Define index name
# INDEX="wazuh-states-vulnerabilities" # Unused, already present in the sample files

# Define the OpenSearch credentials
ADMIN="admin"
PASSWORD="admin"

# Sample CVE files
SAMPLE_FILES=(
    'event_00.json'
    'event_01.json'
    'event_01.json'
)

BATCH_SIZE=1000
BULK_FILE="bulk.json"

# We need:
#   - Scenario A) 200K documents
#   - Scenario B) 5M documents

N=$1

if [ -z "$N" ]; then
    echo "Usage: $0 <number of documents>"
    exit 1
fi

# Create temp file
echo "" > $BULK_FILE

# Iterate N / BATCH_SIZE times
# On each iteration, create a bulk operation with BATCH_SIZE documents
# and send it to OpenSearch
echo "Inserting $N CVEs ..."
for i in $(seq 1 $BATCH_SIZE $N); do

    # Create a bulk operation
    for j in $(seq 1 $BATCH_SIZE); do
        # Pick a random CVE file
        FILE=${SAMPLE_FILES[RANDOM%${#SAMPLE_FILES[@]}]}
        cat $FILE >> bulk.json
    done

    # Insert the CVE
    curl -s -k -u $ADMIN:$PASSWORD -X POST "$OPENSEARCH_URL/_bulk" -H 'Content-Type: application/x-ndjson' --data-binary @$BULK_FILE > /dev/null

    # Clear the bulk file
    echo "" > $BULK_FILE

    progress=$((i*100/N))
    echo -ne "\rProgress : $progress%"
    echo -ne '\r'
done
echo -ne '\n'

event_00.json

{"index":{"_index":"wazuh-states-vulnerabilities"}}
{"@timestamp": "2023-09-26T15:51:23.715447Z", "agent": {"build": {"original": "build9249"}, "id": "agent38", "name": "Agent78", "version": "v3-stable", "ephemeral_id": "57206", "type": "filebeat"}, "ecs": {"version": "1.7.0"}, "event": {"action": "unarchive", "agent_id_status": "missing", "category": "storage", "code": "31424", "created": "2023-10-03T12:46:51.077715Z", "dataset": "file", "duration": 26341, "end": "2023-09-28T15:55:44.789603Z", "hash": "-1216889152242699264", "id": "80912", "ingested": "2023-10-03T05:22:02.899712Z", "kind": "event", "module": "authentication", "original": "original83884", "outcome": "success", "provider": "dns", "reason": "This event happened due to reason84127", "reference": "https://system.example.com/event/#45634", "risk_score": 5.9, "risk_score_norm": 6.1, "sequence": 0, "severity": 0, "start": "2023-09-27T04:44:10.913422Z", "timezone": "MST", "type": "denied", "url": "http://mysystem.example.com/alert/40926"}, "host": {"os": {"family": "RHEL", "full": "RHEL 98.4", "kernel": "98.4kernel92", "name": "RHEL 98.4", "platform": "RHEL", "type": "windows", "version": "98.4"}}, "labels": {"label1": "label74", "label2": "label14"}, "message": "message72386", "package": {"architecture": "arm64", "build_version": "build7849", "checksum": "checksum3829", "description": "description2321", "install_scope": "user", "installed": "2023-10-01T03:04:23.654146Z", "license": "license7", "name": "name4", "path": "/path/to/package34", "reference": "package-reference-66", "size": 38918, "type": "rar", "version": "v1-stable"}, "tags": ["tag86", "tag47", "tag86", "tag70", "tag12", "tag54", "tag22", "tag2"], "vulnerability": {"category": "custom", "classification": ["classification2132"], "description": "description6537", "enumeration": "CVE", "id": "CVE-2976", "reference": "https://mycve.test.org/cgi-bin/cvename.cgi?name=2976", "report_id": "report-4004", "scanner": {"vendor": "vendor-5"}, "score": {"base": 1.0, "environmental": 7.2, "temporal": 1.3, "version": 9.9}, "severity": "high"}}

event_01.json

{"create":{"_index":"wazuh-states-vulnerabilities"}}
{"@timestamp": "2023-09-29T17:31:57.118137Z", "agent": {"build": {"original": "build4573"}, "id": "agent28", "name": "Agent1", "version": "v2-stable", "ephemeral_id": "76694", "type": "linux"}, "ecs": {"version": "1.7.0"}, "event": {"action": "disconnect", "agent_id_status": "auth_metadata_missing", "category": "network", "code": "77507", "created": "2023-09-30T14:09:40.313282Z", "dataset": "tls", "duration": 47608, "end": "2023-09-27T13:21:01.232258Z", "hash": "2436137543235819361", "id": "86238", "ingested": "2023-09-28T00:29:37.484334Z", "kind": "enrichment", "module": "http", "original": "original92144", "outcome": "success", "provider": "socket", "reason": "This event happened due to reason32096", "reference": "https://system.example.com/event/#1881", "risk_score": 0.2, "risk_score_norm": 6.7, "sequence": 10, "severity": 4, "start": "2023-09-30T15:15:10.683293Z", "timezone": "MST", "type": "group", "url": "http://mysystem.example.com/alert/59430"}, "host": {"os": {"family": "macos", "full": "macos 18.96", "kernel": "18.96kernel39", "name": "macos 18.96", "platform": "macos", "type": "unix", "version": "18.96"}}, "labels": {"label1": "label96", "label2": "label87"}, "message": "message73369", "package": {"architecture": "x86", "build_version": "build5859", "checksum": "checksum5607", "description": "description4616", "install_scope": "user", "installed": "2023-10-01T21:49:58.408379Z", "license": "license7", "name": "name78", "path": "/path/to/package76", "reference": "package-reference-38", "size": 18793, "type": "tar.sz", "version": "v0-stable"}, "tags": ["tag19", "tag75"], "vulnerability": {"category": "package", "classification": ["classification5028"], "description": "description8755", "enumeration": "CVE", "id": "CVE-5076", "reference": "https://mycve.test.org/cgi-bin/cvename.cgi?name=5076", "report_id": "report-3561", "scanner": {"vendor": "vendor-6"}, "score": {"base": 7.1, "environmental": 6.5, "temporal": 1.6, "version": 9.1}, "severity": "low"}}

event_02.json

{"create":{"_index":"wazuh-states-vulnerabilities"}}
{"@timestamp": "2023-09-30T06:37:31.485114Z", "agent": {"build": {"original": "build4689"}, "id": "agent3", "name": "Agent36", "version": "v2-stable", "ephemeral_id": "50351", "type": "windows"}, "ecs": {"version": "1.7.0"}, "event": {"action": "upload", "agent_id_status": "mismatch", "category": "network", "code": "56834", "created": "2023-10-02T09:18:01.384042Z", "dataset": "process", "duration": 38111, "end": "2023-09-28T03:08:56.910567Z", "hash": "-4255256760789246518", "id": "72908", "ingested": "2023-09-29T00:21:33.818744Z", "kind": "alert", "module": "process", "original": "original40608", "outcome": "success", "provider": "file", "reason": "This event happened due to reason96369", "reference": "https://system.example.com/event/#66781", "risk_score": 8.7, "risk_score_norm": 7.6, "sequence": 3, "severity": 3, "start": "2023-09-27T00:41:32.617077Z", "timezone": "EDT", "type": "protocol", "url": "http://mysystem.example.com/alert/30043"}, "host": {"os": {"family": "macos", "full": "macos 37.31", "kernel": "37.31kernel65", "name": "macos 37.31", "platform": "macos", "type": "unix", "version": "37.31"}}, "labels": {"label1": "label16", "label2": "label19"}, "message": "message40015", "package": {"architecture": "x86", "build_version": "build2729", "checksum": "checksum1732", "description": "description7855", "install_scope": "system", "installed": "2023-09-29T22:45:43.453405Z", "license": "license7", "name": "name45", "path": "/path/to/package3", "reference": "package-reference-78", "size": 54269, "type": "deb", "version": "v5-stable"}, "tags": ["tag63", "tag50", "tag43", "tag93", "tag79", "tag35", "tag95"], "vulnerability": {"category": "os", "classification": ["classification9128"], "description": "description8521", "enumeration": "CVE", "id": "CVE-4916", "reference": "https://mycve.test.org/cgi-bin/cvename.cgi?name=4916", "report_id": "report-4781", "scanner": {"vendor": "vendor-0"}, "score": {"base": 3.6, "environmental": 5.8, "temporal": 6.5, "version": 4.9}, "severity": "critical"}}

Performance testing

The _search API not only returns the search results, but also the time that it took to perform the search.
The time is returned in the took field of the JSON response, which is measured in milliseconds.
We'll test the performance of the requests by executing them several times, watching the took field and calculating the average time.

For that, we'll use the followings scripts and files.

The json files are the queries to be executed, and to be used in the visualizations composing the new vulnerabilites dashboard. The run.sh script will execute all the visualizations' queries in parallel, by invoking the launcher.sh script that will execute a single query several times and calculate the average time.

There are 2 scenarios:

  • uncached data: when the data is accessed for the first time, the time is significantly higher.
  • cached data: next requests are way faster, so much that sometimes in takes less than a millisecond, retuning 0 ms. This leads to unprecise measurements.

The only solutions to avoid this problem are:

  • use the profiler: returns more information about the query, including the time spent in each phase of the query execution. These measures use nanoseconds, which will allow us to get more precise measurements, but the profiler significantly slows down the query execution.
  • clear the cache before each query execution. This is the solution I've chosen, as it focuses on the worst case scenario (uncached data).
run.sh

#!/bin/bash

script=./launcher.sh
requests=100
visualizations=(
    'critical.json'
    'high.json'
    'medium.json'
    'low.json'
    'inventory.json'
    'top-10-endpoints.json'
    'top-10-vulnerabilities.json'
)

# Run in parallel
for i in "${visualizations[@]}"
do
    $script $i $requests &
done

# Wait for all processes to finish
wait

launcher.sh

#!/bin/bash

# Constants
USERNAME="admin"
PASSWORD="admin"
INDEXER_URL="https://indexer:9200"
INDEX_NAME="wazuh-states-vulnerabilities"

# Function to display usage instructions
usage() {
    echo "Usage: $0 <query_filename> <number_of_times>"
    exit 1
}

# Check the number of arguments
if [ $# -ne 2 ]; then
    usage
fi

# Input arguments
QUERY_FILENAME="$1"
NUMBER_OF_TIMES="$2"

# Output file naming
output_file="./results/$(basename "$QUERY_FILENAME" .json)_${NUMBER_OF_TIMES}.csv"
mkdir -p ./results
echo "" > "$output_file"

# Function to measure query execution time
measure_time() {
    local acum=0
    for i in $(seq 1 "$NUMBER_OF_TIMES"); do
        took=$(curl -s -u "$USERNAME:$PASSWORD" -k -XGET "$INDEXER_URL/$INDEX_NAME/_search" -H "Content-Type: application/json" -d "@$QUERY_FILENAME" | jq -r ".took")
        acum=$((acum + took))
        sleep 1
        echo "Req $i, $took" >> "$output_file"
    done
    echo "Avg, $((acum / NUMBER_OF_TIMES))" >> "$output_file"
    echo "Average execution time: $((acum / NUMBER_OF_TIMES)) milliseconds"
}

# Main function
main() {
    measure_time
}

main

critical.json

{
  "aggs": {
    "2": {
      "filters": {
        "filters": {
          "Critical Severity Alerts": {
            "bool": {
              "must": [],
              "filter": [
                {
                  "bool": {
                    "should": [
                      {
                        "match_phrase": {
                          "vulnerability.severity": "critical"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                }
              ],
              "should": [],
              "must_not": []
            }
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "event.created",
      "format": "date_time"
    },
    {
      "field": "event.end",
      "format": "date_time"
    },
    {
      "field": "event.ingested",
      "format": "date_time"
    },
    {
      "field": "event.start",
      "format": "date_time"
    },
    {
      "field": "package.installed",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2023-07-07T15:31:39.927Z",
              "lte": "2023-10-05T15:31:39.927Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

high.json

{
  "aggs": {
    "2": {
      "filters": {
        "filters": {
          "High Severity Alerts": {
            "bool": {
              "must": [],
              "filter": [
                {
                  "bool": {
                    "should": [
                      {
                        "match_phrase": {
                          "vulnerability.severity": "high"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                }
              ],
              "should": [],
              "must_not": []
            }
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "event.created",
      "format": "date_time"
    },
    {
      "field": "event.end",
      "format": "date_time"
    },
    {
      "field": "event.ingested",
      "format": "date_time"
    },
    {
      "field": "event.start",
      "format": "date_time"
    },
    {
      "field": "package.installed",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2023-07-07T15:31:40.540Z",
              "lte": "2023-10-05T15:31:40.540Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

medium.json

{
  "aggs": {
    "2": {
      "filters": {
        "filters": {
          "Medium Severity Alerts": {
            "bool": {
              "must": [],
              "filter": [
                {
                  "bool": {
                    "should": [
                      {
                        "match_phrase": {
                          "vulnerability.severity": "medium"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                }
              ],
              "should": [],
              "must_not": []
            }
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "event.created",
      "format": "date_time"
    },
    {
      "field": "event.end",
      "format": "date_time"
    },
    {
      "field": "event.ingested",
      "format": "date_time"
    },
    {
      "field": "event.start",
      "format": "date_time"
    },
    {
      "field": "package.installed",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2023-07-07T15:43:56.977Z",
              "lte": "2023-10-05T15:43:56.977Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

low.json

{
  "aggs": {
    "2": {
      "filters": {
        "filters": {
          "Low Severity Alerts": {
            "bool": {
              "must": [],
              "filter": [
                {
                  "bool": {
                    "should": [
                      {
                        "match_phrase": {
                          "vulnerability.severity": "low"
                        }
                      }
                    ],
                    "minimum_should_match": 1
                  }
                }
              ],
              "should": [],
              "must_not": []
            }
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "event.created",
      "format": "date_time"
    },
    {
      "field": "event.end",
      "format": "date_time"
    },
    {
      "field": "event.ingested",
      "format": "date_time"
    },
    {
      "field": "event.start",
      "format": "date_time"
    },
    {
      "field": "package.installed",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2023-07-07T15:43:56.943Z",
              "lte": "2023-10-05T15:43:56.943Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

inventory.json

{
  "aggs": {
    "2": {
      "terms": {
        "field": "package.name",
        "order": {
          "_count": "desc"
        },
        "size": 5
      },
      "aggs": {
        "3": {
          "terms": {
            "field": "package.version",
            "order": {
              "_count": "desc"
            },
            "size": 5
          },
          "aggs": {
            "4": {
              "terms": {
                "field": "package.architecture",
                "order": {
                  "_count": "desc"
                },
                "size": 5
              },
              "aggs": {
                "5": {
                  "terms": {
                    "field": "vulnerability.severity",
                    "order": {
                      "_count": "desc"
                    },
                    "size": 5
                  },
                  "aggs": {
                    "6": {
                      "terms": {
                        "field": "vulnerability.id",
                        "order": {
                          "_count": "desc"
                        },
                        "size": 5
                      },
                      "aggs": {
                        "7": {
                          "terms": {
                            "field": "vulnerability.score.version",
                            "order": {
                              "_count": "desc"
                            },
                            "size": 5
                          },
                          "aggs": {
                            "8": {
                              "terms": {
                                "field": "vulnerability.score.base",
                                "order": {
                                  "_count": "desc"
                                },
                                "size": 5
                              }
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "event.created",
      "format": "date_time"
    },
    {
      "field": "event.end",
      "format": "date_time"
    },
    {
      "field": "event.ingested",
      "format": "date_time"
    },
    {
      "field": "event.start",
      "format": "date_time"
    },
    {
      "field": "package.installed",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2023-07-07T15:43:57.050Z",
              "lte": "2023-10-05T15:43:57.050Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

top-10-endpoints.json

{
  "aggs": {
    "2": {
      "terms": {
        "field": "agent.id",
        "order": {
          "_count": "desc"
        },
        "size": 10
      },
      "aggs": {
        "3": {
          "terms": {
            "field": "agent.id",
            "order": {
              "_count": "desc"
            },
            "size": 5
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "event.created",
      "format": "date_time"
    },
    {
      "field": "event.end",
      "format": "date_time"
    },
    {
      "field": "event.ingested",
      "format": "date_time"
    },
    {
      "field": "event.start",
      "format": "date_time"
    },
    {
      "field": "package.installed",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2023-07-07T15:43:56.961Z",
              "lte": "2023-10-05T15:43:56.961Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

top-10-vulnerabilities.json

{
  "aggs": {
    "2": {
      "terms": {
        "field": "vulnerability.id",
        "order": {
          "_count": "desc"
        },
        "size": 10
      },
      "aggs": {
        "3": {
          "terms": {
            "field": "vulnerability.id",
            "order": {
              "_count": "desc"
            },
            "size": 5
          }
        }
      }
    }
  },
  "size": 0,
  "stored_fields": [
    "*"
  ],
  "script_fields": {},
  "docvalue_fields": [
    {
      "field": "@timestamp",
      "format": "date_time"
    },
    {
      "field": "event.created",
      "format": "date_time"
    },
    {
      "field": "event.end",
      "format": "date_time"
    },
    {
      "field": "event.ingested",
      "format": "date_time"
    },
    {
      "field": "event.start",
      "format": "date_time"
    },
    {
      "field": "package.installed",
      "format": "date_time"
    }
  ],
  "_source": {
    "excludes": []
  },
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "match_all": {}
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2023-07-07T15:43:56.990Z",
              "lte": "2023-10-05T15:43:56.990Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Scenario 1

Environment setup

Installed OpenSearch 2.10.0 following the documentation. The machine uses RHEL7 OS and has 8GB of RAM. Modified /etc/opensearch/opensearch.yml and /etc/opensearch/jvm.options as stated and kept demo certificates, so I skipped these steps.

Vagrantfile

Vagrant.configure("2") do |config|

    config.vm.define "indexer" do |indexer|
        indexer.vm.box = "generic/rhel7"
        indexer.vm.synced_folder ".", "/vagrant"
        indexer.vm.network "private_network", ip: "192.168.56.10", name: "vboxnet0"
        indexer.vm.hostname = "indexer"

        indexer.vm.provider "virtualbox" do |vb|
            vb.memory = "8192"
            vb.cpus = "4"
        end

        indexer.vm.provision "shell", inline: <<-SHELL
            systemctl stop firewalld
            systemctl disable firewalld

            curl -SL https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/opensearch-2.x.repo -o /etc/yum.repos.d/opensearch-2.x.repo
            yum clean all
            yum install opensearch -y

            # Configure OpenSearch to listen on all network interfaces.
            sed -i 's/#network.host: 192.168.0.1/network.host: 0.0.0.0/g' /etc/opensearch/opensearch.yml
            echo "discovery.type: single-node" >> /etc/opensearch/opensearch.yml

            # Specify initial and maximum JVM heap sizes.
            sed -i 's/Xms1g/Xms4g/g' /etc/opensearch/jvm.options
            sed -i 's/Xmx1g/Xmx4g/g' /etc/opensearch/jvm.options

            # Define JDK path
            echo "OPENSEARCH_JAVA_HOME=/usr/share/opensearch/jdk" >> /etc/environment

            # Enable and start the service
            systemctl enable opensearch
            systemctl start opensearch
            curl -X GET https://localhost:9200 -u 'admin:admin' --insecure || exit 1

            # Configure performance analyzer plugin
            sed -i 's/#webservice-bind-host =/webservice-bind-host = 0.0.0.0/g' /etc/opensearch/opensearch-performance-analyzer/performance-analyzer.properties

            # Enable and start the service
            systemctl enable opensearch-performance-analyzer.service
            systemctl start opensearch-performance-analyzer.service

            # Start Performance Analyzer RCA
            curl -XPOST https://localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k
        SHELL
    end

    config.vm.define "dashboard" do |dashboard|
        dashboard.vm.box = "generic/rhel7"
        dashboard.vm.synced_folder ".", "/vagrant"
        dashboard.vm.network "private_network", ip: "192.168.56.11", name: "vboxnet0"
        dashboard.vm.hostname = "dashboard"

        dashboard.vm.provider "virtualbox" do |vb|
            vb.memory = "2048"
            vb.cpus = "2"
        end

        dashboard.vm.provision "shell", inline: <<-SHELL
            systemctl stop firewalld
            systemctl disable firewalld

            curl -SL https://artifacts.opensearch.org/releases/bundle/opensearch-dashboards/2.x/opensearch-dashboards-2.x.repo -o /etc/yum.repos.d/opensearch-dashboards-2.x.repo
            yum clean all
            yum install opensearch-dashboards -y

            # Add OpenSearch to hosts
            echo "192.168.56.10.0  indexer" >> /etc/hosts

            # Configure OpenSearch Dashboards to connect to OpenSearch
            sed -i 's/localhost:/indexer:/g' /etc/opensearch-dashboards/opensearch_dashboards.yml

            # Enable and start the service
            systemctl enable opensearch-dashboards
            systemctl start opensearch-dashboards
        SHELL
    end
end

Dataset

  1. Load the index-template
  2. Insert 200k CVEs into the indexer
./push_n_events.sh $((2*10**5))

Scenario 2

Environment setup

Same as in scenario 1, but duplicating the indexer node.

Vagrantfile

Vagrant.configure("2") do |config|

    config.vm.define "indexer_1" do |indexer_1|
        indexer_1.vm.box = "generic/rhel7"
        indexer_1.vm.synced_folder ".", "/vagrant"
        indexer_1.vm.network "private_network", ip: "192.168.56.9", name: "vboxnet0"
        indexer_1.vm.hostname = "indexer-1"

        indexer_1.vm.provider "virtualbox" do |vb|
            vb.memory = "8192"
            vb.cpus = "4"
        end

        indexer_1.vm.provision "shell", inline: <<-SHELL
            systemctl stop firewalld
            systemctl disable firewalld

            curl -SL https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/opensearch-2.x.repo -o /etc/yum.repos.d/opensearch-2.x.repo
            yum clean all
            yum install opensearch -y

            # Configure OpenSearch to listen on all network interfaces.
            sed -i 's/#network.host: 192.168.0.1/network.host: 0.0.0.0/g' /etc/opensearch/opensearch.yml
            sed -i 's/#cluster.name: my-application/cluster.name: opensearch-cluster/g' /etc/opensearch/opensearch.yml
            sed -i 's/#node.name: node-1/node.name: indexer-1/g' /etc/opensearch/opensearch.yml
            sed -i 's/#discovery.seed_hosts: ["host1", "host2"]/discovery.seed_hosts: ["indexer-1", "indexer-2"]/g' /etc/opensearch/opensearch.yml

            # Specify initial and maximum JVM heap sizes.
            sed -i 's/Xms1g/Xms4g/g' /etc/opensearch/jvm.options
            sed -i 's/Xmx1g/Xmx4g/g' /etc/opensearch/jvm.options

            # Define JDK path
            echo "OPENSEARCH_JAVA_HOME=/usr/share/opensearch/jdk" >> /etc/environment

            # Add indexer-2 to /etc/hosts
            echo "192.168.56.10 indexer-2" >> /etc/hosts

            # Enable and start the service
            systemctl enable opensearch
            systemctl start opensearch
            curl -X GET https://localhost:9200 -u 'admin:admin' --insecure || exit 1
        SHELL
    end

    config.vm.define "indexer_2" do |indexer_2|
        indexer_2.vm.box = "generic/rhel7"
        indexer_2.vm.synced_folder ".", "/vagrant"
        indexer_2.vm.network "private_network", ip: "192.168.56.10", name: "vboxnet0"
        indexer_2.vm.hostname = "indexer-2"

        indexer_2.vm.provider "virtualbox" do |vb|
            vb.memory = "8192"
            vb.cpus = "4"
        end

        indexer_2.vm.provision "shell", inline: <<-SHELL
            systemctl stop firewalld
            systemctl disable firewalld

            curl -SL https://artifacts.opensearch.org/releases/bundle/opensearch/2.x/opensearch-2.x.repo -o /etc/yum.repos.d/opensearch-2.x.repo
            yum clean all
            yum install opensearch -y

            # Configure OpenSearch to listen on all network interfaces.
            sed -i 's/#network.host: 192.168.0.1/network.host: 0.0.0.0/g' /etc/opensearch/opensearch.yml
            sed -i 's/#cluster.name: my-application/cluster.name: opensearch-cluster/g' /etc/opensearch/opensearch.yml
            sed -i 's/#node.name: node-1/node.name: indexer-1/g' /etc/opensearch/opensearch.yml
            sed -i 's/#discovery.seed_hosts: ["host1", "host2"]/discovery.seed_hosts: ["indexer-1", "indexer-2"]/g' /etc/opensearch/opensearch.yml

            # Specify initial and maximum JVM heap sizes.
            sed -i 's/Xms1g/Xms4g/g' /etc/opensearch/jvm.options
            sed -i 's/Xmx1g/Xmx4g/g' /etc/opensearch/jvm.options

            # Define JDK path
            echo "OPENSEARCH_JAVA_HOME=/usr/share/opensearch/jdk" >> /etc/environment

            # Add indexer-2 to /etc/hosts
            echo "192.168.56.9 indexer-1" >> /etc/hosts

            # Enable and start the service
            systemctl enable opensearch
            systemctl start opensearch
            curl -X GET https://localhost:9200 -u 'admin:admin' --insecure || exit 1
        SHELL
    end

    config.vm.define "dashboard" do |dashboard|
        dashboard.vm.box = "generic/rhel7"
        dashboard.vm.synced_folder ".", "/vagrant"
        dashboard.vm.network "private_network", ip: "192.168.56.11", name: "vboxnet0"
        dashboard.vm.hostname = "dashboard"

        dashboard.vm.provider "virtualbox" do |vb|
            vb.memory = "2048"
            vb.cpus = "2"
        end

        dashboard.vm.provision "shell", inline: <<-SHELL
            systemctl stop firewalld
            systemctl disable firewalld

            curl -SL https://artifacts.opensearch.org/releases/bundle/opensearch-dashboards/2.x/opensearch-dashboards-2.x.repo -o /etc/yum.repos.d/opensearch-dashboards-2.x.repo
            yum clean all
            yum install opensearch-dashboards -y

            # Add OpenSearch to hosts
            echo "192.168.56.9.0   indexer-1" >> /etc/hosts
            echo "192.168.56.10.0  indexer-2" >> /etc/hosts

            # Configure OpenSearch Dashboards to connect to OpenSearch
            echo "opensearch.hosts: [\"https://indexer-1:9200\", \"https://indexer-2:9200\"]" >> /etc/opensearch-dashboards/opensearch_dashboards.yml

            # Enable and start the service
            systemctl enable opensearch-dashboards
            systemctl start opensearch-dashboards
        SHELL
    end
end

Dataset

  1. Load the index-template
  2. Insert 5M CVEs into the indexer
    ./push_n_events.sh $((5*10**6))

@AlexRuiz7
Copy link
Member

@AlexRuiz7
Copy link
Member

Performance Analyzer

  • Included by default in OpenSearch and wazuh-indexer.
  • Configured as described in the documentation.
Details

curl -X GET http://indexer:9600/_plugins/_performanceanalyzer/metrics/units
{
    "Disk_Utilization": "%",
    "Cache_Request_Hit": "count",
    "ClusterManager_PendingQueueSize": "count",
    "Refresh_Time": "ms",
    "ThreadPool_QueueLatency": "count",
    "Merge_Time": "ms",
    "ClusterApplierService_Latency": "ms",
    "PublishClusterState_Latency": "ms",
    "Cache_Request_Size": "B",
    "LeaderCheck_Failure": "count",
    "ThreadPool_QueueSize": "count",
    "Sched_Runtime": "s/ctxswitch",
    "Disk_ServiceRate": "MB/s",
    "Heap_AllocRate": "B/s",
    "Indexing_Pressure_Current_Limits": "B",
    "Sched_Waittime": "s/ctxswitch",
    "ShardBulkDocs": "count",
    "Thread_Blocked_Time": "s/event",
    "VersionMap_Memory": "B",
    "IO_TotThroughput": "B/s",
    "Indexing_Pressure_Current_Bytes": "B",
    "Indexing_Pressure_Last_Successful_Timestamp": "ms",
    "Net_PacketRate6": "packets/s",
    "Cache_Query_Hit": "count",
    "IO_ReadSyscallRate": "count/s",
    "Net_PacketRate4": "packets/s",
    "Cache_Request_Miss": "count",
    "ThreadPool_RejectedReqs": "count",
    "Net_TCP_TxQ": "segments/flow",
    "IO_WriteSyscallRate": "count/s",
    "IO_WriteThroughput": "B/s",
    "Refresh_Event": "count",
    "Flush_Time": "ms",
    "Heap_Init": "B",
    "Indexing_Pressure_Rejection_Count": "count",
    "CPU_Utilization": "cores",
    "Cache_Query_Size": "B",
    "Merge_Event": "count",
    "ClusterManager_Task_Queue_Time": "ms",
    "Cache_FieldData_Eviction": "count",
    "IO_TotalSyscallRate": "count/s",
    "Net_Throughput": "B/s",
    "Paging_RSS": "pages",
    "AdmissionControl_ThresholdValue": "count",
    "Indexing_Pressure_Average_Window_Throughput": "count/s",
    "Cache_MaxSize": "B",
    "IndexWriter_Memory": "B",
    "Net_TCP_SSThresh": "B/flow",
    "IO_ReadThroughput": "B/s",
    "LeaderCheck_Latency": "ms",
    "FollowerCheck_Failure": "count",
    "HTTP_RequestDocs": "count",
    "Net_TCP_Lost": "segments/flow",
    "GC_Collection_Event": "count",
    "Sched_CtxRate": "count/s",
    "AdmissionControl_RejectionCount": "count",
    "Heap_Max": "B",
    "ClusterManager_ThrottledPendingTasksCount": "count",
    "ClusterApplierService_Failure": "count",
    "PublishClusterState_Failure": "count",
    "Merge_CurrentEvent": "count",
    "Indexing_Buffer": "B",
    "Bitset_Memory": "B",
    "Net_PacketDropRate4": "packets/s",
    "Heap_Committed": "B",
    "Net_PacketDropRate6": "packets/s",
    "Thread_Blocked_Event": "count",
    "GC_Collection_Time": "ms",
    "Cache_Query_Miss": "count",
    "Latency": "ms",
    "Shard_State": "count",
    "Thread_Waited_Event": "count",
    "CB_ConfiguredSize": "B",
    "ThreadPool_QueueCapacity": "count",
    "CB_TrippedEvents": "count",
    "Disk_WaitTime": "ms",
    "Data_RetryingPendingTasksCount": "count",
    "ClusterManager_Task_Run_Time": "ms",
    "AdmissionControl_CurrentValue": "count",
    "Flush_Event": "count",
    "Net_TCP_RxQ": "segments/flow",
    "Shard_Size_In_Bytes": "B",
    "Thread_Waited_Time": "s/event",
    "HTTP_TotalRequests": "count",
    "ThreadPool_ActiveThreads": "count",
    "Paging_MinfltRate": "count/s",
    "Net_TCP_SendCWND": "B/flow",
    "Cache_Request_Eviction": "count",
    "Segments_Total": "count",
    "FollowerCheck_Latency": "ms",
    "Heap_Used": "B",
    "CB_EstimatedSize": "B",
    "Indexing_ThrottleTime": "ms",
    "Cache_FieldData_Size": "B",
    "Paging_MajfltRate": "count/s",
    "ThreadPool_TotalThreads": "count",
    "ShardEvents": "count",
    "Net_TCP_NumFlows": "count",
    "Election_Term": "count"
}

@AlexRuiz7
Copy link
Member

AlexRuiz7 commented Oct 3, 2023

Root Cause Analysis (RCA)

  • Included by default in OpenSearch and wazuh-indexer.
  • Configured as described in the documentation.
Details

curl -X GET http://localhost:9600/_plugins/_performanceanalyzer/rca
{
    "ThreadMetricsRca": [
        {
            "rca_name": "ThreadMetricsRca",
            "timestamp": 1696330524683,
            "state": "healthy"
        }
    ],
    "QueueRejectionRca": [
        {
            "rca_name": "QueueRejectionRca",
            "timestamp": 1696330489607,
            "state": "healthy"
        }
    ],
    "ShardRequestCacheRca": [
        {
            "rca_name": "ShardRequestCacheRca",
            "timestamp": 1696330489585,
            "state": "healthy"
        }
    ],
    "HighCpuRca": [
        {
            "rca_name": "HighCpuRca",
            "timestamp": 1696330489603,
            "state": "healthy"
        }
    ],
    "HighHeapUsageOldGenRca": [
        {
            "rca_name": "HighHeapUsageOldGenRca",
            "timestamp": 1696330489606,
            "state": "healthy"
        }
    ],
    "FieldDataCacheRca": [
        {
            "rca_name": "FieldDataCacheRca",
            "timestamp": 1696330489609,
            "state": "healthy"
        }
    ],
    "HotShardRca": [
        {
            "rca_name": "HotShardRca",
            "timestamp": 1696330489612,
            "state": "healthy"
        }
    ],
    "AdmissionControlClusterRca": [
        {
            "rca_name": "AdmissionControlClusterRca",
            "timestamp": 1696330489627,
            "state": "healthy"
        }
    ],
    "QueueRejectionClusterRca": [
        {
            "rca_name": "QueueRejectionClusterRca",
            "timestamp": 1696330489641,
            "state": "healthy"
        }
    ],
    "ShardRequestCacheClusterRca": [
        {
            "rca_name": "ShardRequestCacheClusterRca",
            "timestamp": 1696330489641,
            "state": "healthy"
        }
    ],
    "HotNodeRca": [
        {
            "rca_name": "HotNodeRca",
            "timestamp": 1696330489642,
            "state": "healthy"
        }
    ],
    "FieldDataCacheClusterRca": [
        {
            "rca_name": "FieldDataCacheClusterRca",
            "timestamp": 1696330489643,
            "state": "healthy"
        }
    ],
    "HotShardClusterRca": [
        {
            "rca_name": "HotShardClusterRca",
            "timestamp": 1696330489645,
            "state": "healthy"
        }
    ],
    "HighHeapUsageClusterRca": [
        {
            "rca_name": "HighHeapUsageClusterRca",
            "timestamp": 1696330489650,
            "state": "healthy"
        }
    ],
    "CpuUtilDimensionTemperatureRca": [
        {
            "rca_name": "CpuUtilDimensionTemperatureRca",
            "timestamp": 1696330524700,
            "state": "unknown",
            "NodeLevelDimensionalSummary": [
                {
                    "dimension": "CPU_Utilization",
                    "mean": 0,
                    "total": 0.0107409692080986,
                    "numShards": 1,
                    "NodeLevelZoneSummary": [
                        {
                            "zone": "HOT",
                            "all_shards": []
                        },
                        {
                            "zone": "WARM",
                            "all_shards": []
                        },
                        {
                            "zone": "LUKE_WARM",
                            "all_shards": [
                                {
                                    "index_name": "security-auditlog-2023.10.03",
                                    "shard_id": 0,
                                    "temperature": [
                                        {
                                            "dimension": "CPU_Utilization",
                                            "value": "0"
                                        },
                                        {
                                            "dimension": "Heap_AllocRate",
                                            "value": "0"
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "zone": "COLD",
                            "all_shards": []
                        }
                    ]
                }
            ]
        }
    ],
    "HeapAllocRateTemperatureRca": [
        {
            "rca_name": "HeapAllocRateTemperatureRca",
            "timestamp": 1696330524700,
            "state": "unknown",
            "NodeLevelDimensionalSummary": [
                {
                    "dimension": "Heap_AllocRate",
                    "mean": 0,
                    "total": 2764064.42816729,
                    "numShards": 1,
                    "NodeLevelZoneSummary": [
                        {
                            "zone": "HOT",
                            "all_shards": []
                        },
                        {
                            "zone": "WARM",
                            "all_shards": []
                        },
                        {
                            "zone": "LUKE_WARM",
                            "all_shards": [
                                {
                                    "index_name": "security-auditlog-2023.10.03",
                                    "shard_id": 0,
                                    "temperature": [
                                        {
                                            "dimension": "CPU_Utilization",
                                            "value": "0"
                                        },
                                        {
                                            "dimension": "Heap_AllocRate",
                                            "value": "0"
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "zone": "COLD",
                            "all_shards": []
                        }
                    ]
                }
            ]
        }
    ]
}

Available RCAs are:

  • ThreadMetricsRca
  • QueueRejectionRca
  • ShardRequestCacheRca
  • HighCpuRca
  • HighHeapUsageOldGenRca
  • FieldDataCacheRca
  • HotShardRca
  • AdmissionControlClusterRca
  • QueueRejectionClusterRca
  • ShardRequestCacheClusterRca
  • HotNodeRca
  • FieldDataCacheClusterRca
  • HotShardClusterRca
  • HighHeapUsageClusterRca
  • CpuUtilDimensionTemperatureRca
  • HeapAllocRateTemperatureRca

Details about them can be checked here. Implementation details might be found inside the store folder.

Each of these RCAs contain a state, which can be healthy, unhealthy or unknown.
For those in unhealthy state, the RCA decider framework provides a set of actions towards a possible solution.

@AlexRuiz7
Copy link
Member

AlexRuiz7 commented Oct 3, 2023

PerfTop

  • Not included by default in OpenSearch and wazuh-indexer. Install instructions in the repo.
  • PerfTop includes 5 built-in dashboards. Creating custom dashboards is also possible.
Details

Usage

$ ./opensearch-perf-top-linux --help
usage: global.js [-h] --dashboard DASHBOARD [--endpoint ENDPOINT]
                 [--nodename NODENAME] [--logfile LOGFILE] [--mode MODE]
                 [--legacy LEGACY]
                 

For "Getting Started" guide and documentation, visit https://docs-beta.
opensearch.org/

Optional arguments:
  -h, --help            Show this help message and exit.
  --dashboard DASHBOARD
                        Relative path to the dashboard configuration JSON. To 
                        load preset dashboard, this may also be: (1) 
                        ClusterOverview, (2) ClusterNetworkMemoryAnalysis, 
                        (3) ClusterThreadAnalysis, (4) NodeAnalysis, or (5) 
                        TemperatureAnalysis (e.g. "--dashboard 
                        ClusterOverview")
  --endpoint ENDPOINT   Endpoint URL for the Performance Analyzer queries. 
                        This can also be defined in the JSON. Protocol is 
                        "http" by default, unless "https" specified in the 
                        URL.
  --nodename NODENAME   Value to replace "#nodeName" in the JSON.
  --logfile LOGFILE     File to redirect STDERR to. If undefined, redirect to 
                        "/dev/null".
  --mode MODE           The mode perftop is on. This can be: (1) 
                        metrics(default), (2) rca
  --legacy LEGACY       Set legacy flag as true to run perfTop in the legacy 
                        mode.

image
Screenshot 1 .- ClusterOverview dashboard

image
Screenshot 2 .- ClusterThreadAnalysis dashboard

@AlexRuiz7
Copy link
Member

After doing some tests, none of these tools are useful for this use case, as they analyze the performance of the cluster and not the queries. However, these can be used as complementary tools for performance testing, in order to pin down the cause of slow queries.

The performance analysis of the queries will be done by measuring their execution times and calculating the average time.

@AlexRuiz7
Copy link
Member

AlexRuiz7 commented Oct 6, 2023

Query performance measurements

All the results have been imported into the attached spreadsheet.

The results have been summarized in charts. There are several clear conclusions from these results:

  • The inventory query takes way more time to fetch the data than any other query.
  • The cluster mode reduces the fetch time for large datasets, but raises it in lower datasets, probably due to overhead time related to the distributed environment.
  • Same conclusion with the number of shards. 2 or more shards reduce fetch time for large datasets, while 1 shard is faster for small datasets.

vuln_detector_queries_cluster_single_nocache_comparison
vuln_detector_queries_comparison

vulnerability-index-query-performance.ods

@wazuhci wazuhci moved this from In progress to Done in Release 4.8.0 Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task Task issue type/enhancement Enhancement issue
Projects
No open projects
Archived in project
Development

No branches or pull requests

2 participants