-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRIVERS-2862: Benchmark Collection and Client BulkWrite #1733
base: master
Are you sure you want to change the base?
Conversation
9a52786
to
6213f9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review
|
||
| Phase | Description | | ||
| ----------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| Setup | Construct a MongoClient object. Drop the `perftest` database. Load the SMALL_DOC dataset into memory as a language-appropriate document type (or JSON string for C). Make 10,000 copies of the document. DO NOT manually add an `_id` field; leave it to the driver or database. Construct a list of write models with insert, replace and delete operations for each copy of the document. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some inconsistency in specifying the following action "Make X copies, DO NOT add _id", it's part of Setup
step for bulk insert and part of Do task
in other places. Should this be part of same step everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thanks! I will make this explicit in the Setup
step
|
||
| Phase | Description | | ||
| ----------- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| Setup | Construct a MongoClient object. Drop the `perftest` database. Load the SMALL_DOC dataset into memory as a language-appropriate document type (or JSON string for C). Make 10,000 copies of the document. DO NOT manually add an `_id` field; leave it to the driver or database. Construct a list of write models with insert, replace and delete operations for each copy of the document. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need more details here: ordering and explicit count for each operation kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep will do! In the C# benchmarks, my proposed sequence involves performing an insert operation, followed by a replace, and then a delete operation for each document copy. Both the replace and delete operations use empty filters as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two questions:
- What is the benefit of adding the mixed op benchmarks? I don't see much value in adding them.
- Do we need to add the "collection bulkWrite insert" benchmarks? It seems like they little value since we already have the existing insertMany ones.
I'm wary of introducing extra benchmarks because each once has a cost.
@ShaneHarvey, we're adding these benchmarks as part of the Multi-Doc suite to evaluate batch-write efficiency. The Multi-Doc benchmarks should reflect real-world use cases, including mixed operation benchmarks. Without these, we risk leaving a gap in our performance analysis. Including them now gives us a baseline to track future improvements or regressions in the implementations that support mixed operations in Regarding the 'collection bulkWrite insert' benchmarks, I understand that drivers using |
Fair points. I'm still skeptical on the mixed benchmarks. To me it seems like all we're measuring is the cost of extra roundtrips to the server but I suppose it could find a regression at some point. |
The PR adds benchmarks for
Collection::BulkWrite
andClient::BulkWrite
with insert-only operations and mixed operations. We already have aSmall doc Bulk Insert
andLarge doc Bulk Insert
benchmark in the benchmarking spec which may be sufficient for benchmarkingCollection::BulkWrite
with insert-only operations. However, theSmall doc Bulk Insert
andLarge doc Bulk Insert
benchmarks are implemented usinginsertMany
and not all the drivers useCollection::BulkWrite
in their implementation ofinsertMany
. With that in mind, I have added explicitCollection::BulkWrite
benchmarks to be implemented by all drivers who implementCollection::BulkWrite
. This ensures we still maintain comprehensive performance testing for our batch-write performance.Here are results of the new benchmarks as implemented on the C# driver:
Please complete the following before merging:
clusters, and serverless).