[RFC] Control plane for submitting benchmark runs. #4231

rishabh6788 · 2023-11-21T22:19:37Z

Self-serviceable Performance Benchmark Platform

Purpose

The purpose of this issue is to brainstorm over different approaches to come up with a self-serviceable platform for running ad-hoc benchmark tests. We will go over the current state of affairs and then propose different alternatives to achieve the goal of a self-serviceable platform. This will enable developers to move away from manually setting up infra to run benchmark tests against their local changes and make the whole experience of running benchmarks seamless and less cumbersome.

Tenets

Open-source: The self-serviceable platform should be publicly accessible for opensearch-project community members to submit benchmark runs.
Usability: The user interface should be intuitive and user-friendly to ensure a positive user experience.
Secure: The proposed solution must have appropriate authentication and authorization mechanism to control who can and cannot submit ad-hoc benchmark runs.
Scalability: The system should be able to handle an increasing load by efficiently scaling resources.
Modularity: The design should promote modular components for easier maintenance and future enhancements.

Background

In the rapidly changing analytics marketplace where different providers aim to provide almost similar ingestion and search solutions what sets them apart is how well they perform against each other and also across the release-cycle of that product. While performance has always been at the core of OpenSearch development cycle it was never centralized and there was no unified platform to track OpenSearch performance over its release cycle.

At the starting of this year we undertook the objective of streamlining performance benchmarking process and creating a centralized portal to view and analyze performance metrics across various versions of OpenSearch. This solved the long awaited problem of consistently running benchmark on daily basis across releases and in-development versions of OpenSearch and publishing metrics publicly for anyone to view those metrics. We added various enhacements and features in the development lifecycle to get consistent results and reduce variance. We can now run indexing-specific benchmark to track any regressions/improvements in the indexing path and same can be achieved for search roadmap where we use data snapshots to purely test search metrics without having any variance coming from index writers during benchmark run.

Opportunity

While we have made tremendous progress in setting up the nightly benchmark runs it is still not straight-forward for developers to do performance runs on ad-hoc basis. The developers still have to manage their own infra to set up OpenSearch cluster and then run benchmark using opensearch-benchmark to run benchmark against their local changes. Getting used to running opensearch-benchmark tool efficiently itself is a learning curve in itself.

Even though the nightly benchmark platform supports submitting ad-hoc performance runs against locally developed OpenSearch artifacts (x64 tarball) it is still not possible to open it up for devs to start using, mainly due to:

Even though the current benchmark-test job is hosted in public jenkins it is not possible for external maintainers and collaborators to submit runs as the access is restricted through amazon internal midway platform. It will not be fair to open-source community to open it just for amazon internal developers.
The public jenkins hosts many other workflows related to build-release-test of OpenSearch and Dashboards. Opening it for ad-hoc runs would add more pressure to the infra and simply scaling it might not help as we may hit scaling bottleneck in future due to new workflows getting added frequently.

Recently we have been getting a lot of feedback to provide a platform for developers to submit ad-hoc benchmark runs and not to be dependent on engineering-effectiveness team to submit on their behalf.

The nightly benchmark platform has given us the opportunity to build upon the existing work that has been done and come up with a self-serviceable platform that developers can use to run ad-hoc benchmark runs without worrying about setting up infra and learning how opensearch-benchmark tool works. This way they will be able to leverage all the work we have done in achieving maximum efficiency to test specific code paths, i.e. indexing path or search path (using data restored from snapshots).

Proposed Solutions

Separate Jenkins Instance as core execution engine (Recommended)

The idea behind this proposal is that we set up a dedicated jenkins instance to orchestrate and execute performance runs. Below are the reasons why:

Easy to set-up as we already have the necessary code to set up new jenkins infra from scratch.
Jenkins UI is good enough to get the ground running and users can start using it to submit ad-hoc benchmark jobs and see the results.
Easy integration with github oauth provider, just install https://plugins.jenkins.io/github-oauth/ . Internally we can use jenkins role-based startegy to allow who is authorized to submit a run even though the user successfully authenticates. Keeps away the bad actors from abusing the system.
We can basically copy the existing implementation which is responsible for executing the benchmark runs and publishing results, no new code or code changes required to get started from Day-1.

Next steps after Jenkins has been setup and starts running benchmark runs:

Use API Gateway+Lambda along with Jenkins API library to build automation to submit benchmark runs from anywhere, e.g. from command line.
This will help build automation for running micro-bechmarks , basically running benchmark by merely adding a label or comment on a PR and getting the results published on the same PR.
Come up with UI to basically search previous run results and compare it against other runs.

Pros:

Easy to setup as most of the work has already been done with respect to creating an Orchestrator.
Out of the box Jenkins UI to submit jobs and see results.
Out of the box support for github oauth for authentication and authorization.
Alerting and Monitoring for Jenkins infra (orchestration layer) is already available its present implementation.
Easy to add new features, like a new UI to view and compare results that can be built upon without any rework required on the orchestrator layer.
Jenkins API support to build automation.

Cons:

The jenkins infra code will require a slight modification to support Github Oauth and limiting it to just use benchmark agents.

Create Benchmark Orchestrator from Scratch

In this approach we build a new benchmark Orchestrator service from scratch. Below are the high-level components that will be required:

A brand new UI to submit job and see results. Can be integrated with API Gateway+Lambda to start the execution.
Integrate github authentication and authorization with the new UI.
Since cluster setup and load generation happens in the same workflow we can use Step-Functions with AWS Batch (since benchmark can run for hours and lambda has 15-mins timeout) to execute the workflow.
To be able to handle concurrent submissions we will have to utilize SQS to submit the jobs in an orderly manner and then have a poller to process each request.
Also for each run a custom test-execution-id needs to be generated, can be API Gateway request-id, and need to be mapped against OSB test-execution-id for referencing the results later on. More can be discussed during implementation phase.

Pros:

Well established support from infrastructure side of things since we will be using AWS.

Cons:

Almost all the components logic would have to be implemented from scratch.
Operational overhead of monitoring all the infrastructure components involved.
Authentication and Authorization components would have to be written and implemented from scratch.
In order to only allow authorized user, role based access would have to be implemented from scratch.
Will require advanced UI/UX skills.

Both the solution proposed will pave way for micro-benchmarks, running benchmarks against PR and updating results merely by adding a comment or label to the PR.
We need to choose the solution with the least operational overhead.

The text was updated successfully, but these errors were encountered:

zelinh · 2023-11-21T23:16:36Z

[Triage] @rishabh6788 Will you be working on this issue?

rishabh6788 · 2023-11-29T23:36:22Z

@dblock @reta Please review the proposal and provide your comments.
To get a background of how nightly benchmarks are run please refer opensearch-project/opensearch-benchmark#102 (comment).
The idea of the self-serviceable platform is to provide a simple interface to the user where the user can provide what cluster configuration they would like along with enabling any features and then what sort of benchmark they would like to run. The workflow will handle cluster formation, benchmark run and publishing results.

dblock · 2023-11-30T17:21:38Z

This is a well thought through proposal that basically boils down to whether we want to reuse Jenkins or build something new. The cons of building something new easily outweighs any of the advantages, so I agree with the recommendation.

A question about development of features in private. Can I easily reproduce this infrastructure for private setups? For example, if I use open source OpenSearch with an additional private plugin X, how can I reuse this setup in this workflow for testing changes in my version of the product that includes plugin X?

Unrelated to the question above, my typical performance improvement cycle that I see is something like this. Help me understand how I'll interact with what you're proposing?

Baseline performance.
Code change on my branch, including new performance tests.
Ad hoc performance test and comparison with baseline.
Committed code change to main.
Committed new performance tests.
Ensuring the performance improvement is visible in a build with all the tests.
Continuous runs of the tests to avoid regressions.

rishabh6788 · 2023-12-01T22:42:11Z

Thank you for the review and feedback @dblock.

Regarding Can I easily reproduce this infrastructure for private setups?

All the components used in the recommended approach, Jenkins (https://github.com/opensearch-project/opensearch-ci) and benchmark workflow are completely opensource and can be replicated after slight modifications to the environment and secret variables used.

Regarding testing your local development changes with our platform all you need to provide is a publicly available url from where your artifact (x64 tarball for now) can be downloaded. At present many developers are uploading their local tarball to github by creating a dummy release in their fork, uploading tarball as attachment (5gb limit) and then using that link with opensearch-cluster-cdk to set up a cluster with their changes. Hope this answer's your question.

This will be the starting point for us and we can come up with a proper automation where a developer just uploads the artifact to a custom UI along with other required parameters and is able to run the benchmark.

Here's the response to other points you mentioned:
1 & 3, The baseline will come from the nightly benchmarks that are running across released and in-development versions of OpenSearch, see https://opensearch.org/benchmarks. The idea is to use the same datastore that we are using for nightly benchmark runs so that it becomes easy to generate comparison reports with baseline data.

2 & 5, We are using https://github.com/opensearch-project/opensearch-benchmark-workloads that provides all the benchmark test specifications and data, so in case any new performance tests are getting added this repository they will be automatically picked up by the adhoc or nightly benchmark runs. With respect to changes to your branch, you just need to provide the artifact using a public url.

4, 6 & 7, As mentioned we already have nightly benchmarks running using the same setup for past 8 months across released version of OS, 2.x and main branch, so any new code that is being committed to mainline or 2.x is getting picked up by our nightly benchmark runs and reflected in the dashboards immediately. We use the same dashboard to create alerts and notifications for catching any regression or improvement.

For e.g, the recent PR opensearch-project/OpenSearch#11390 to main line and 2.x showed significant improvements to aggregate queries and was picked up by our public dashboards. We can definitely improve upon our notification delivery to broadcast such events.

See https://tinyurl.com/ukavvvhh for 2.x branch improvement and https://tinyurl.com/2wxw2t89 for mainline.

Hope I was able to answer your queries.

rishabh6788 · 2023-12-01T22:51:23Z

Tagging @msfroh @peternied @jainankitk @rishabhmaurya to get more feedback.

reta · 2023-12-04T22:22:51Z

Thanks for the proposal @rishabh6788

+1 in favour of recommended approach

Regarding testing your local development changes with our platform all you need to provide is a publicly available url from where your artifact (x64 tarball for now) can be downloaded. At present many developers are uploading their local tarball to github by creating a dummy release in their fork, uploading tarball as attachment (5gb limit) and then using that link with opensearch-cluster-cdk to set up a cluster with their changes. Hope this answer's your question.

This is somewhat surprising procedure taking into account that those changes should be coming from:

pull requests (preferably)
feature branches (as an alternative)

I would have imagined that self service platform would not be asking for arbitrary bits but only trusted sources of changes like the above (it also could be automated with opensearch-bot as a future enhancement).

rishabh6788 · 2023-12-04T22:52:45Z

Thank you for the feedback @reta.
The self-serviceable benchmark solution is being built to provide below solutions, ordered by priority (high to low):

A universal platform for all the developers to be able to submit benchmark runs. Will be really useful for users that do not have necessary infrastructure to spin up production grade multi-node clusters and benchmark their changes.
A platform to be able to compare the performance of their changes with baseline.
The same platform will be extended in future to enable micro-benchmarks, submitting benchmark runs on PRs and updating the results on the same PR.

With respect to your concern regarding trusted sources, not everyone will be able to submit benchmark runs. Even though a user is able to log in into our platform using github OIDC, they will not be able to submit a run until the owners of the platform add them to a particular execution role. I believe this will help us in keeping away the bad actors who may try to abuse the system. Only valid members of the opensearch-project team will be eligible to be added to execution role, on request basis.

Hope this answers your query.

reta · 2023-12-05T20:34:53Z

The self-serviceable benchmark solution is being built to provide below solutions, ordered by priority (high to low):

Thanks @rishabh6788 , I understand the solution but not the reasoning that leads to it. The trusted sources is not only where the change comes from, but what the change actually is (== pull request or feature branch): beside just benchmarks, there is an assurance that all other checks are passed as well.

Only valid members of the opensearch-project team will be eligible to be added to execution role, on request basis.

This basically means, as it stands as of today, only AWS employees (if I am not mistaken how Github teams work, the team members must be part of the organization)

rishabh6788 · 2023-12-05T21:27:51Z

I think mentioning opensearch-project is not correct, what I wanted to convey was any valid contributor to OpenSearch repo, like yourself, will be able to use the self-serviceable platform to submit benchmark runs.

For, e.g., If you are working on a PR and want to run a benchmark against it and compare how it is doing against baseline, you should be able to do it using this platform since you are a validated contributor to the repo and we should be add you to the execution role. Same goes with any valid contributor of OpenSearch repository. @reta

In future, you should be just be able to initiate a run by adding a label or comment on your PR.

I think the goal of this project is to make performance benchmarking essential to most of the work that is happening on OpenSearch repo and at the same time make it easier and more accessible for all the community members who contribute to OpenSearch repo. Hope this helps.

reta · 2023-12-05T21:42:56Z

For, e.g., If you are working on a PR and want to run a benchmark against it and compare how it is doing against baseline, you should be able to do it using this platform since you are a validated contributor to the repo and we should be add you to the execution role.

This is exactly what I refer to here . It is super straightforward to build the distribution out of pull request using tooling (this is Gradle task), and I think this would simplify the onboarding time even more.

rishabh6788 · 2024-05-24T17:22:45Z

This is currently waiting on Jenkins version upgrade and splitting of single jenkins infra into dedicated jenkins for gradle-checks, build/test/release and benchmark use cases.
opensearch-project/opensearch-ci#389
opensearch-project/opensearch-ci#382

Working with @Divyaasm to help complete the above mentioned pre-requisites.

rishabh6788 · 2024-06-11T20:59:29Z

Given we are awaiting on Jenkins to be split into separate use-cases and security review for integrating github oidc with jenkins we are proposing a slight change in the authentication and authorization mechanism to keep things simple yet secure and move faster.
The only change from the design proposed above is that instead of using github oidc for authentication and authorization we will be using IAM Auth which is supported out of the box by API Gateway. Another change is that user doesn’t have to be logged in and mapped to a role inside jenkins to be able to submit the job, we shall be using Generic Webhook Trigger to trigger the performance job.

The only requirement is that the user should have a valid aws account and an IAM user/role. There will be no cost incurred for using IAM service as it is free, see https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html.

Below is the proposed change.

@dblock @reta
At the later part of the project we can switch back to using github oidc for authentication and authorization if required.

rishabh6788 · 2024-06-11T22:38:19Z

Another option on the table is use lambda authorizer with api gateway and use github oauth to authenticate the user, but I do not want any github user to be able to submit the job. So I have an option to either maintain an internal database or file which I will check before authorizing the user or create a team in our github project and check if the authenticated user is part of that team.

Let my muck around a bit and get back.

reta · 2024-06-12T13:10:18Z

Thanks @rishabh6788

The only requirement is that the user should have a valid aws account and an IAM user/role.

So at the moment, none of the external contributors have AWS account(s) to access build infra (Jenkins, etc). I suspect with the suggested approach it stays the same, so it is not clear to me how those contributors would benefit from self-serviceable platform?

peternied · 2024-06-13T12:46:00Z

So I have an option to either maintain an internal database or file which I will check before authorizing the user or create a team in our github project and check if the authenticated user is part of that team.

@rishabh6788 In an ideal configuration, who would have access to submit these runs?

If that group is maintainers, we've got ways to extract those lists that we consider authoritative via an API check that could be done as part of the authorization step, this would allow anyone that is a maintainer to have access.

            const maintainersResponse = await github.request('GET /repos/{owner}/{repo}/collaborators', {
                owner: context.repo.owner,
                repo: context.repo.repo,
                permission: 'maintain',
                affiliation: 'all',
                per_page: 100
            });

From maintainer-approval.yml

If that group is broader, I think that GitHub's organization is the best aligned mechanism, with a team/special permission that was moderated for this kind of access. Decoupling the OpenSearch-Project org from the AWS seems like a prerequisite - @getsaurabh02 do you know if we have any updates, it seems we are hitting another access management related issue.

rishabh6788 · 2024-06-13T18:15:53Z

Thank you @reta and @peternied for your feedback.
I worked on a solution where I use a lambda authorizer which validates the user provided github token against github auth and allow them to submit benchmark runs.
But we don't want to allow anyone with a github account to be able to submit runs and only legitimate contributors of OpenSearch to be able to submit runs.
I thought of using teams to have final authorization on who can submit but external maintainers and contributors cannot be part of teams.

I am deliberating using contributors list of OpenSearch repo and use that for Authz.
Appreciate any feedback to help provide clarity and move forward.

Why user contributors instead of maintainer because:

it provides access to members who actively work on OpenSearch but are not part of maintainers and
We have a plan to use opensearch-ci-bot to be able to submit runs from PR itself using github actions, of course, it will have a through design and security review before anything is finalized.

rishabh6788 · 2024-06-13T18:39:03Z

Okay, the list_contribtors api fetches users that were part of OpenSearch before it was forked from elastic, so cannot be trusted to be the authoritative list for granting authorization.
Maintainers list is too small, do you think having a plain old simple list in a file is a good idea, anyone needing access can just raise a PR to get their login added and we can just approve or reject it.
Open to other suggestions as well.
@dblock

dblock · 2024-06-13T18:57:27Z

How do you feel about a flow similar to releases?

Anon submits a PR with the changes and a config that they want to run benchmarks on.
Bot comments, labels, etc.
For any new contributor, someone from CODEOWNERS must approve by responding "approved". For any contributor who ran a benchmark in the past, the job is submitted automatically after some bake time.
Bot kicks off job, comments on PR.
Original author, or anyone from CODEOWNERS can cancel job by clicking/commenting "cancel".
Results of the job posted to the PR.
Bot checks back for updates, repeat (3).

rishabh6788 · 2024-06-13T20:19:52Z

How do you feel about a flow similar to releases?

Anon submits a PR with the changes and a config that they want to run benchmarks on.

Bot comments, labels, etc.

For any new contributor, someone from CODEOWNERS must approve by responding "approved". For any contributor who ran a benchmark in the past, the job is submitted automatically after some bake time.

Bot kicks off job, comments on PR.

Original author, or anyone from CODEOWNERS can cancel job by clicking/commenting "cancel".

Results of the job posted to the PR.

Bot checks back for updates, repeat (3).

Thanks for the feedback Db, yes this the expected flow once we have a rest api ready for submitting benchmark runs.
To first get the REST API working I will go ahead and use github authentication + a file with list of authorized users for them to be able to start using our benchmark platform for submitting runs.

Once that is out and working expected, I will start the proposal on how to submit benchmark runs just by commenting or adding label to PRs.

rishabh6788 · 2024-06-17T21:04:02Z

Here is the updated request flow for proposed changes. For now we will rely on a file that will contain the github user-id of authorized personnel after github user tokens have been validated. This authorization is expected to get replaced by github teams once we have a final decision on when external maintainers can be added to github teams.

rishabh6788 added enhancement New Enhancement untriaged Issues that have not yet been triaged labels Nov 21, 2023

zelinh removed the untriaged Issues that have not yet been triaged label Nov 21, 2023

rishabh6788 changed the title ~~[META] Control plane for submitting benchmark runs.~~ [RFC] Control plane for submitting benchmark runs. Nov 29, 2023

rishabh6788 added proposal Proposal and RFC to the community performance labels Nov 29, 2023

rishabh6788 self-assigned this Nov 30, 2023

rishabh6788 added this to OpenSearch Engineering Effectiveness Dec 4, 2023

github-project-automation bot moved this to Backlog in OpenSearch Engineering Effectiveness Dec 4, 2023

nibix mentioned this issue Dec 29, 2023

[RFC] Security Performance Test Suite opensearch-project/security#3903

Open

bbarani moved this from Backlog to In Progress in OpenSearch Engineering Effectiveness Feb 12, 2024

rishabh6788 added this to Performance Roadmap Feb 12, 2024

rishabh6788 moved this to untriaged in Performance Roadmap Feb 12, 2024

rishabh6788 moved this from untriaged to Todo in Performance Roadmap Feb 12, 2024

getsaurabh02 moved this from Todo to Now (This Quarter) in Performance Roadmap Feb 19, 2024

getsaurabh02 moved this from Now (This Quarter) to In Progress in Performance Roadmap May 13, 2024

rishabh6788 mentioned this issue Jun 18, 2024

Create public rest api for users to be able to submit benchmark jobs #4788

Open

peterzhuamazon added this to Engineering Effectiveness Board Jul 1, 2024

github-project-automation bot moved this to 🆕 New in Engineering Effectiveness Board Jul 1, 2024

peterzhuamazon moved this from 🆕 New to 🏗 In progress in Engineering Effectiveness Board Jul 1, 2024

rishabh6788 mentioned this issue Aug 2, 2024

[META] Automate performance benchmark for OpenSearch and components #4906

Open

6 tasks

rishabh6788 mentioned this issue Sep 5, 2024

Performance Benchmark Rest Apis opensearch-project/opensearch-devops#202

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Control plane for submitting benchmark runs. #4231

[RFC] Control plane for submitting benchmark runs. #4231

rishabh6788 commented Nov 21, 2023 •

edited

Loading

zelinh commented Nov 21, 2023

rishabh6788 commented Nov 29, 2023 •

edited

Loading

dblock commented Nov 30, 2023

rishabh6788 commented Dec 1, 2023 •

edited

Loading

rishabh6788 commented Dec 1, 2023

reta commented Dec 4, 2023

rishabh6788 commented Dec 4, 2023

reta commented Dec 5, 2023

rishabh6788 commented Dec 5, 2023 •

edited

Loading

reta commented Dec 5, 2023

rishabh6788 commented May 24, 2024

rishabh6788 commented Jun 11, 2024

rishabh6788 commented Jun 11, 2024

reta commented Jun 12, 2024

peternied commented Jun 13, 2024

rishabh6788 commented Jun 13, 2024 •

edited

Loading

rishabh6788 commented Jun 13, 2024

dblock commented Jun 13, 2024 •

edited

Loading

rishabh6788 commented Jun 13, 2024 •

edited

Loading

rishabh6788 commented Jun 17, 2024

[RFC] Control plane for submitting benchmark runs. #4231

[RFC] Control plane for submitting benchmark runs. #4231

Comments

rishabh6788 commented Nov 21, 2023 • edited Loading

Self-serviceable Performance Benchmark Platform

Purpose

Tenets

Background

Opportunity

Proposed Solutions

Separate Jenkins Instance as core execution engine (Recommended)

Create Benchmark Orchestrator from Scratch

zelinh commented Nov 21, 2023

rishabh6788 commented Nov 29, 2023 • edited Loading

dblock commented Nov 30, 2023

rishabh6788 commented Dec 1, 2023 • edited Loading

rishabh6788 commented Dec 1, 2023

reta commented Dec 4, 2023

rishabh6788 commented Dec 4, 2023

reta commented Dec 5, 2023

rishabh6788 commented Dec 5, 2023 • edited Loading

reta commented Dec 5, 2023

rishabh6788 commented May 24, 2024

rishabh6788 commented Jun 11, 2024

rishabh6788 commented Jun 11, 2024

reta commented Jun 12, 2024

peternied commented Jun 13, 2024

rishabh6788 commented Jun 13, 2024 • edited Loading

rishabh6788 commented Jun 13, 2024

dblock commented Jun 13, 2024 • edited Loading

rishabh6788 commented Jun 13, 2024 • edited Loading

rishabh6788 commented Jun 17, 2024

rishabh6788 commented Nov 21, 2023 •

edited

Loading

rishabh6788 commented Nov 29, 2023 •

edited

Loading

rishabh6788 commented Dec 1, 2023 •

edited

Loading

rishabh6788 commented Dec 5, 2023 •

edited

Loading

rishabh6788 commented Jun 13, 2024 •

edited

Loading

dblock commented Jun 13, 2024 •

edited

Loading

rishabh6788 commented Jun 13, 2024 •

edited

Loading