Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bulkcopy metadata query to make use of cached data if available #2231

Merged
merged 2 commits into from
Oct 11, 2023

Conversation

tkyc
Copy link
Member

@tkyc tkyc commented Oct 6, 2023

Prior to this PR, the column metadata query call was made every single time despite data being cached. We now only make the call if no data is cached.

Fixes #2224.

@tkyc tkyc added this to the 12.5.0 milestone Oct 6, 2023
David-Engel
David-Engel previously approved these changes Oct 9, 2023
@lilgreenbird lilgreenbird changed the title Fixed repeated bulkcopy metadata query Fixed bulkcopy metadata query to make use of cached data if available Oct 10, 2023
lilgreenbird
lilgreenbird previously approved these changes Oct 11, 2023
@tkyc tkyc dismissed stale reviews from lilgreenbird and David-Engel via 2084e80 October 11, 2023 00:06
@tkyc tkyc merged commit 5ac4e61 into main Oct 11, 2023
17 checks passed
@tkyc tkyc deleted the bulkcopy-metadata-perf-fix branch October 11, 2023 20:55
@mmimica
Copy link
Contributor

mmimica commented Oct 18, 2023

When is metadata actually being reused?

From what I see, SQLServerBulkCopy#getDestinationMetadata is only ever called once from SQLServerPreparedStatement#execute[Large]Batch, where a new SQLServerBulkCopy object is created for each call.

I think the point of SQLServerBulkCopy#destColumnMetadata is not re-use, but merely having the state around to avoid passing it around as a method argument.

@tkyc
Copy link
Member Author

tkyc commented Oct 18, 2023

When is metadata actually being reused?

The wording on the issue can probably be corrected/improved on my end. Originally, the problem scenario occurs when a user repeatedly calls SQLServerBulkCopy#writeToServer(ISQLServerBulkData sourceData) (for the same SQLServerBulkCopy object). Doing so, the metadata query call would be repeated as well and the destColumnMetadata would be repopulated each time which impacted performance. After this PR fix, the subsequent bulk inserts wouldn't make the metadata query call and would "reuse" the existing destColumnMetadata that was set from a prior writeToServer call. Feel free to Let me know of better phrasing or wording to convey the problem/fix.

From what I see, SQLServerBulkCopy#getDestinationMetadata is only ever called once from SQLServerPreparedStatement#execute[Large]Batch, where a new SQLServerBulkCopy object is created for each call.

Is this on the main branch of the driver project? On my end I see, SQLServerBulkCopy#getDestinationMetadata being called from SQLServerBulkCopy#writeToServer(), which is then called by SQLServerBulkCopy#writeToServer(ISQLServerBulkData sourceData). Let me know if I'm incorrect or missing any details here. Thanks.

@mmimica
Copy link
Contributor

mmimica commented Oct 19, 2023

Are you saying users use SQLServerBulkCopy class directly? In that case it makes sense, yes.

@tkyc
Copy link
Member Author

tkyc commented Oct 19, 2023

Are you saying users use SQLServerBulkCopy class directly?

That's right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants