DRIVERS-2862: Benchmark Collection and Client BulkWrite #1733

adelinowona · 2024-11-19T16:49:55Z

The PR adds benchmarks for Collection::BulkWrite and Client::BulkWrite with insert-only operations and mixed operations. We already have a Small doc Bulk Insert and Large doc Bulk Insert benchmark in the benchmarking spec which may be sufficient for benchmarking Collection::BulkWrite with insert-only operations. However, the Small doc Bulk Insert and Large doc Bulk Insert benchmarks are implemented using insertMany and not all the drivers use Collection::BulkWrite in their implementation of insertMany. With that in mind, I have added explicit Collection::BulkWrite benchmarks to be implemented by all drivers who implement Collection::BulkWrite. This ensures we still maintain comprehensive performance testing for our batch-write performance.

Here are results of the new benchmarks as implemented on the C# driver:

Name	MB/sec
SmallDocClientBulkWriteMixedOps	1.8424719941
SmallDocCollectionBulkWriteMixedOps	0.7477088677
LargeDocClientBulkWriteInsert	94.242845107
LargeDocCollectionBulkWriteInsert	96.493395532
LargeDocBulkInsert	96.5527236825
SmallDocClientBulkWriteInsert	36.6208512962
SmallDocCollectionBulkWriteInsert	41.0499232896
SmallDocBulkInsert	42.1881191706

Please complete the following before merging:

Update changelog.
Test changes in at least one language driver.
Test these changes against all server versions and topologies (including standalone, replica set, sharded
clusters, and serverless).

BorisDog

partial review

BorisDog · 2024-11-22T20:49:42Z

source/benchmarking/benchmarking.md

+
+| Phase       | Description                                                                                                                                                                                                                                                                                                        |
+| ----------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Setup       | Construct a MongoClient object. Drop the `perftest` database. Load the SMALL_DOC dataset into memory as a language-appropriate document type (or JSON string for C). Make 10,000 copies of the document. DO NOT manually add an `_id` field; leave it to the driver or database. Construct a list of write models with insert, replace and delete operations for each copy of the document. |


There is some inconsistency in specifying the following action "Make X copies, DO NOT add _id", it's part of Setup step for bulk insert and part of Do task in other places. Should this be part of same step everywhere?

Yes thanks! I will make this explicit in the Setup step

BorisDog · 2024-11-22T21:10:16Z

source/benchmarking/benchmarking.md

+
+| Phase       | Description                                                                                                                                                                                                                                                                                                       |
+| ----------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Setup       | Construct a MongoClient object. Drop the `perftest` database. Load the SMALL_DOC dataset into memory as a language-appropriate document type (or JSON string for C). Make 10,000 copies of the document. DO NOT manually add an `_id` field; leave it to the driver or database. Construct a list of write models with insert, replace and delete operations for each copy of the document. |


I think we need more details here: ordering and explicit count for each operation kind.

Yep will do! In the C# benchmarks, my proposed sequence involves performing an insert operation, followed by a replace, and then a delete operation for each document copy. Both the replace and delete operations use empty filters as well.

ShaneHarvey

I have two questions:

What is the benefit of adding the mixed op benchmarks? I don't see much value in adding them.
Do we need to add the "collection bulkWrite insert" benchmarks? It seems like they little value since we already have the existing insertMany ones.

I'm wary of introducing extra benchmarks because each once has a cost.

adelinowona · 2024-11-26T16:41:46Z

@ShaneHarvey, we're adding these benchmarks as part of the Multi-Doc suite to evaluate batch-write efficiency. The Multi-Doc benchmarks should reflect real-world use cases, including mixed operation benchmarks. Without these, we risk leaving a gap in our performance analysis. Including them now gives us a baseline to track future improvements or regressions in the implementations that support mixed operations in bulkWrite.

Regarding the 'collection bulkWrite insert' benchmarks, I understand that drivers using Collection::BulkWrite for insertMany, or sharing code between these implementations, might question the need for these benchmarks. However, for thorough performance coverage, it's important to benchmark both APIs for batch inserts. Even if their implementations are similar now, they could diverge with future driver updates. Having these benchmarks allows us to identify any current performance differences and provides a baseline for comparison. Additionally, they offer a direct contrast to the 'client bulkWrite insert' benchmarks, facilitating easier performance comparisons between these APIs.

ShaneHarvey · 2024-11-26T20:52:27Z

Fair points. I'm still skeptical on the mixed benchmarks. To me it seems like all we're measuring is the cost of extra roundtrips to the server but I suppose it could find a regression at some point.

adelinowona requested a review from a team as a code owner November 19, 2024 16:49

adelinowona requested review from JamesKovacs, BorisDog, dariakp and ShaneHarvey and removed request for a team November 19, 2024 16:49

DRIVERS-2862: Benchmark Collection and Client BulkWrite

6213f9d

adelinowona force-pushed the drivers2862 branch from 9a52786 to 6213f9d Compare November 20, 2024 18:58

BorisDog requested changes Nov 22, 2024

View reviewed changes

ShaneHarvey reviewed Nov 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRIVERS-2862: Benchmark Collection and Client BulkWrite #1733

DRIVERS-2862: Benchmark Collection and Client BulkWrite #1733

adelinowona commented Nov 19, 2024 •

edited

Loading

BorisDog left a comment

BorisDog Nov 22, 2024

adelinowona Nov 26, 2024

BorisDog Nov 22, 2024

adelinowona Nov 26, 2024

ShaneHarvey left a comment

adelinowona commented Nov 26, 2024

ShaneHarvey commented Nov 26, 2024

DRIVERS-2862: Benchmark Collection and Client BulkWrite #1733

Are you sure you want to change the base?

DRIVERS-2862: Benchmark Collection and Client BulkWrite #1733

Conversation

adelinowona commented Nov 19, 2024 • edited Loading

BorisDog left a comment

Choose a reason for hiding this comment

BorisDog Nov 22, 2024

Choose a reason for hiding this comment

adelinowona Nov 26, 2024

Choose a reason for hiding this comment

BorisDog Nov 22, 2024

Choose a reason for hiding this comment

adelinowona Nov 26, 2024

Choose a reason for hiding this comment

ShaneHarvey left a comment

Choose a reason for hiding this comment

adelinowona commented Nov 26, 2024

ShaneHarvey commented Nov 26, 2024

adelinowona commented Nov 19, 2024 •

edited

Loading