Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Scale Testing and Understanding Distributed Workload Generation #416

Closed
IanHoang opened this issue Nov 15, 2023 · 2 comments
Closed
Labels
RFC Request for comment on major changes

Comments

@IanHoang
Copy link
Collaborator

IanHoang commented Nov 15, 2023

Overview:

This RFC is a proposal to study OpenSearch Benchmark’s performance under different conditions and identify its strengths and limitations. Understanding performance of a distributed system like OpenSearch is a complex task and requires a reliable macro benchmarking tool. A study done to validate OpenSearch Benchmark’s performance will add value to the community as it will identify any gaps in the tool.

Motivation:

For the past few months, community members and maintainers have been cleaning up the OpenSearch Benchmark (OSB) repository and the OpenSearch Benchmark Workloads repository. During this time, many foundation-based issues have been resolved. OSB 1.0.0 marks the first version of OSB after all high-priority issues and several medium-priority issues have been resolved. Since then, OSB 1.1.0 has been released. Before more features are added to OSB, a public study to quality check OpenSearch Benchmark will greatly benefit the community and help identify any pain points and areas of interest that should be added to the roadmap.

This RFC proposes a study plan is comprised of areas of interest. The study plan examines OSB’s mechanics and how it performs under the hood as it runs tests on clusters of varying sizes. It also aims to explore distributed workload generation, an OSB feature that is capable of allows users to generate load from different hosts onto the target cluster, that is not well understood. These studies serve as a long term investment for the OpenSearch community, allowing us to understand this system’s strengths and weaknesses and lead to the development of better tooling. Findings from both studies will be published on the documentation website so that users can easily reproduce them.

Areas of Interest / Questions to Answer:

OSB has been used to benchmark and stress test clusters to determine the performance of OpenSearch clusters. The purpose of this study is to do the opposite — stress test OSB, observe its mechanics, and determine how it performs under the hood. The study will focus on clarifying the following points:

  • What are the Limitations of Single Load Generation Host and Observing OSB Performance as Cluster Scales Up?
    • Most users use a single load generation host, a machine with OSB installed and running, to target load to their OpenSearch clusters. There’s speculation that a single load generation host is not enough when targeting load to a large multi-node OpenSearch cluster. When this happens remains ambiguous to users and community members. Tests should be run to determine when this happens. Although it depends on several variables — such as how large the target cluster is, the load size, the desired rate of ingestion and querying, the host type — a series of well-documented tests that focus on stress testing a single load generation host would be useful for the community.
    • The values can be ingested into a datastore and create visualizations from them. These values will come handy when validating Distributed Workload Generation (DWG).
  • What Parameters are Useful to Tweak to Meet Business Requirements?
    • Users are interested in using OSB to see if their cluster can meet business requirements. For example, some customers want their ingestion rate to meet a specific Throughput Per Second (TPS) or query rate to meet a specific Queries Per Second (QPS). All of OpenSearch Benchmark’s official workloads come with a set of workload parameters — such as bulk_indexing_clients and search_clients — that allow users to tweak characteristics of the benchmark process. We should first validate that OSB’s parameters can be used to achieve a specific TPS and QPS. Next, we should find and document other parameters that can be tweaked to meet these requirements and identify any limitations.
    • It’s also important to find how to know if the bottle neck is on OSB or cluster side. If on OSB side, how can we mitigate it (such as by adding clients or extra threads).
    • Parameters of interest:
      • bulk_size
      • bulk_indexing_clients
      • search_clients
      • target_throughput
  • Does Distributed Workload Generation (DWG) allow more than 3 LG Hosts?
    • Users have mostly used 3 LG Hosts when using DWG but can they use more?
  • Does Distributed Workload Generation (DWG) really help? If so, by how much?
    • After getting data points on how OSB performs across different cluster setups, it’d be beneficial to run the same tests on OSB with extra load generation hosts and compare the results. With these runs, we’ll be able to determine if DWG is beneficial and at what point do we begin to see it being more useful than a single load generation host. It will also give us the chance to understand the mechanisms for DWG.
  • How can we improve setting up Distributed Workload Generation (DWG)?
    • By this point, we will be familiar with DWG feature in OSB by the time the testing finishes. We should look into ways can we improve the overall process of setting up or using DWG.

By now, some tests will help us identify areas of weaknesses and limitations for OSB. This phase will help us chart a path forward on how we can further improve them.

Requirements for Tests:

  • Load size: load size needs to be sufficient enough to stress test OpenSearch Benchmark.
  • Testing done for both TimeSeries and Search data
  • Testing should be done with bulk ingestion and continuous ingestion
  • LG Host Core Sizes: average seems to be 2-4 Cores. c5.xlarge?
  • Load Generation Resources to measure: CPU Utilization, JVM MP, RAM, Disk Utilization
  • Tests should be run with node-stats

Testing Setup

  • 4 CPU Cores (vCPUs) for Load Generation Hosts (for single load generation host)
  • x86 (start with this and then try ARM later)
  • c5.xlarge, c6g.xlarge
  • Workloads to use: NYC Taxis and Http Logs
@IanHoang IanHoang added the RFC Request for comment on major changes label Nov 15, 2023
@IanHoang IanHoang changed the title Scale Testing and Understanding Distributed Workload Generation [RFC] Scale Testing and Understanding Distributed Workload Generation Nov 20, 2023
@IanHoang
Copy link
Collaborator Author

Related META issue #505

@IanHoang
Copy link
Collaborator Author

Closing this RFC as will upload an updated one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comment on major changes
Projects
Archived in project
Development

No branches or pull requests

1 participant