Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRIVERS-2862: Benchmark Collection and Client BulkWrite #1733

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adelinowona
Copy link
Contributor

@adelinowona adelinowona commented Nov 19, 2024

The PR adds benchmarks for Collection::BulkWrite and Client::BulkWrite with insert-only operations and mixed operations. We already have a Small doc Bulk Insert and Large doc Bulk Insert benchmark in the benchmarking spec which may be sufficient for benchmarking Collection::BulkWrite with insert-only operations. However, the Small doc Bulk Insert and Large doc Bulk Insert benchmarks are implemented using insertMany and not all the drivers use Collection::BulkWrite in their implementation of insertMany. With that in mind, I have added explicit Collection::BulkWrite benchmarks to be implemented by all drivers who implement Collection::BulkWrite. This ensures we still maintain comprehensive performance testing for our batch-write performance.

Here are results of the new benchmarks as implemented on the C# driver:

Name MB/sec
SmallDocClientBulkWriteMixedOps 1.8424719941
SmallDocCollectionBulkWriteMixedOps 0.7477088677
LargeDocClientBulkWriteInsert 94.242845107
LargeDocCollectionBulkWriteInsert 96.493395532
LargeDocBulkInsert 96.5527236825
SmallDocClientBulkWriteInsert 36.6208512962
SmallDocCollectionBulkWriteInsert 41.0499232896
SmallDocBulkInsert 42.1881191706


Please complete the following before merging:

  • Update changelog.
  • Test changes in at least one language driver.
  • Test these changes against all server versions and topologies (including standalone, replica set, sharded
    clusters, and serverless).

@adelinowona adelinowona requested a review from a team as a code owner November 19, 2024 16:49
@adelinowona adelinowona requested review from JamesKovacs, BorisDog, dariakp and ShaneHarvey and removed request for a team November 19, 2024 16:49
Copy link
Contributor

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review


| Phase | Description |
| ----------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Setup | Construct a MongoClient object. Drop the `perftest` database. Load the SMALL_DOC dataset into memory as a language-appropriate document type (or JSON string for C). Make 10,000 copies of the document. DO NOT manually add an `_id` field; leave it to the driver or database. Construct a list of write models with insert, replace and delete operations for each copy of the document. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some inconsistency in specifying the following action "Make X copies, DO NOT add _id", it's part of Setup step for bulk insert and part of Do task in other places. Should this be part of same step everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thanks! I will make this explicit in the Setup step


| Phase | Description |
| ----------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Setup | Construct a MongoClient object. Drop the `perftest` database. Load the SMALL_DOC dataset into memory as a language-appropriate document type (or JSON string for C). Make 10,000 copies of the document. DO NOT manually add an `_id` field; leave it to the driver or database. Construct a list of write models with insert, replace and delete operations for each copy of the document. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need more details here: ordering and explicit count for each operation kind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep will do! In the C# benchmarks, my proposed sequence involves performing an insert operation, followed by a replace, and then a delete operation for each document copy. Both the replace and delete operations use empty filters as well.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions:

  1. What is the benefit of adding the mixed op benchmarks? I don't see much value in adding them.
  2. Do we need to add the "collection bulkWrite insert" benchmarks? It seems like they little value since we already have the existing insertMany ones.

I'm wary of introducing extra benchmarks because each once has a cost.

@adelinowona
Copy link
Contributor Author

@ShaneHarvey, we're adding these benchmarks as part of the Multi-Doc suite to evaluate batch-write efficiency. The Multi-Doc benchmarks should reflect real-world use cases, including mixed operation benchmarks. Without these, we risk leaving a gap in our performance analysis. Including them now gives us a baseline to track future improvements or regressions in the implementations that support mixed operations in bulkWrite.

Regarding the 'collection bulkWrite insert' benchmarks, I understand that drivers using Collection::BulkWrite for insertMany, or sharing code between these implementations, might question the need for these benchmarks. However, for thorough performance coverage, it's important to benchmark both APIs for batch inserts. Even if their implementations are similar now, they could diverge with future driver updates. Having these benchmarks allows us to identify any current performance differences and provides a baseline for comparison. Additionally, they offer a direct contrast to the 'client bulkWrite insert' benchmarks, facilitating easier performance comparisons between these APIs.

@ShaneHarvey
Copy link
Member

Fair points. I'm still skeptical on the mixed benchmarks. To me it seems like all we're measuring is the cost of extra roundtrips to the server but I suppose it could find a regression at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants