-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Segment Replication + Remote Store] GA test planning #8109
Comments
Added an issue to opensearch-benchmark for adding replication lag metric - opensearch-project/opensearch-benchmark#339. This may require moving the lag metric to node stats similar to other captured metrics. |
These metrics can be published to a different cluster by OSB buy i don't think these are part of the test result. |
Splitting this out into individual runs: Here's what I'm thinking in terms of set up -
|
For small cluster configuration, let's test with shards in multiple of 3 (currently testing with 3, 6, 12, 24). Similar configuration is being used for remote store perf testing as well |
Where can we track that testing? I expect there is overlap here given node-node replication is not used with remote store? |
About the performance testing, I created an dedicated issue #8874 to track. |
This issue contains a running list of testing required In preparation for the release of using remote storage with segment replication.
Objective
General Hypothesis
Performance Benchmark Plan #
Each use case listed below should also be performed with docrep & node-node replication and where possible added to our nightly runs on https://opensearch.org/benchmarks for easy ongoing comparison.
Metrics that should be captured in addition what OSB reports:
All clusters should have 3 dedicated cluster manager nodes.
Small cluster = ~3 nodes
Large cluster = ~40 nodes
Use m5.xlarge node type for consistency.
Additional tests (nice to have):
Longevity testing - Longer running test with large cluster.
Architecture comparison (x86 v arm)
Feature Stability and Testing #
This is a list of specific test cases that should be performed either through existing/new ITs or manual.
We are working on enabling S+R across our entire server suite as part of this issue, but are hitting challenges. So in the meantime creating a running list of the most critical.
Existing test packages to run with SR enabled in :server.
cluster
gateway
index
indexing
indices
ingest
recovery
remotestore
update
The text was updated successfully, but these errors were encountered: