-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Bulk Copy API for batch insert operation #686
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #686 +/- ##
============================================
+ Coverage 48.03% 48.39% +0.35%
- Complexity 2631 2741 +110
============================================
Files 118 120 +2
Lines 26753 27210 +457
Branches 4493 4589 +96
============================================
+ Hits 12852 13167 +315
- Misses 11773 11873 +100
- Partials 2128 2170 +42
Continue to review full report at Codecov.
|
case java.sql.Types.TIMESTAMP: | ||
case microsoft.sql.Types.DATETIMEOFFSET: | ||
// The precision is just a number long enough to hold all types of temporal data, doesn't need to be exact precision. | ||
columnMetadata.put(positionInTable, new ColumnMetadata(colName, jdbcType, 50, scale, dateTimeFormatter)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the "50", which I'm assuming is the precision, a const so changing it is easier when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 217 explains why this is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporal data being < 50 precision might not be true forever. If it's an arbitrarily assigned number than it can be just as arbitrarily changed. And also because it's arbitrary, we might not know what number to look for if we do have a need to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precision here should be the one passed by user. I'm assuming 50 is used as precision based on the BulkCopy CSV implementation, it makes sense in case of CSV, because temporal types can be stored in CSV using any of the supported string literal format (https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-2017#supported-string-literal-formats-for-datetime) and we pass it as varchar/nvarchar to Server.
|
||
boolean isAzureDW() throws SQLServerException, SQLException { | ||
if (null == isAzureDW) { | ||
try (Statement stmt = this.createStatement(); ResultSet rs = stmt.executeQuery("SELECT CAST(SERVERPROPERTY('EngineEdition') as INT)");) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to catch/handle any exceptions from this try statement, is there a reason why we're using it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, when we create a new Statement or ResultSet object, it's good practice to close it after we're done. However, the try-with-resources syntax (which i've used here) allows those objects to get closed automatically. You can read more on this page: https://blogs.oracle.com/weblogicserver/using-try-with-resources-with-jdbc-objects
// Base data type: int | ||
final int ENGINE_EDITION_FOR_SQL_AZURE_DW = 6; | ||
rs.next(); | ||
int engineEdition = rs.getInt(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change block to
isAzureDW = rs.getInt(1) == ENGINE_EDITION_FOR_SQL_AZURE_DW
or even more concise but possibly a little confusing
return (rs.getInt(1) == ENGINE_EDITION_FOR_SQL_AZURE_DW)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think right now is fine, i merely adopted the same part of code from our fx framework anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should at least remove
if (boolean condition) {
bool = true
} else {
bool = false
}
|
||
String destinationTableName = tableName; | ||
// Get destination metadata | ||
try (SQLServerResultSet rs = ((SQLServerStatement) connection.createStatement()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try with resources, but we don't handle any exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment above
|
||
// ignore all comments | ||
if (localUserSQL.substring(0, 2).equalsIgnoreCase("/*")) { | ||
int temp = localUserSQL.indexOf("*/") + 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is done quite a lot. Can we maybe make a private function that takes in a string and returns what we need. So the final result for all these if blocks would be something like
if(localUserSQL.substring(0, 2).equalsIgnoreCase("/*")) { return newPrivateFunction("*/"); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
return isInsert(temp.substring(index)); | ||
} | ||
char c = temp.charAt(0); | ||
if (c != 'i' && c != 'I') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were burned before using this kind of logic in an attempt to parse SQL (in pstmt metadata caching), is this not breakable? What if I put a use statement before my insert, will it not flag my SQL as an insert statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user decides to put a use statement, it won't be flagged as an insert statement, and the old implementation for handling batch statements will get executed instead. This isInsert statement is currently only being used for queries that come through as an execute batch statement - we don't expect the user to run thousands of the same batch query with a use statement in front of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the following lines here to eliminate non-insert queries faster? Ideally, the SQL passed to this method is always an insert, so these lines can be removed?
char c = temp.charAt(0);
if (c != 'i' && c != 'I')
return false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insert isn't the only way to use batch - there's many ways we would be calling isInsert on a non-insert query. I don't really think this fast elimination is worth the space, so i'm going to remove it anyways though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only gave this a quick skim through and have a couple of extra thoughts:
- have we tested the performance impact on the normal execution path?
- is this scenario also applicable to SQL Server Parallel Data Warehouse (PDW)?
|
||
boolean isAzureDW() throws SQLServerException, SQLException { | ||
if (null == isAzureDW) { | ||
try (Statement stmt = this.createStatement(); ResultSet rs = stmt.executeQuery("SELECT CAST(SERVERPROPERTY('EngineEdition') as INT)");) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to avoid an extra round trip to the server to get this info? For example, can we get any useful info from the TDS pre-login or login packets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can avoid making a call to get the table metadata, but as it turns out, we were making an exact same FMTONLY call in SQLServerBulkCopy::getDestinationMetadata method (which is called as part of the bulk copy process shortly down the line). So instead of making the FMTONLY call to the database twice in the same fashion, the driver will now store the resultset from the first FMTONLY so it can be re-used in getDestinationMetadata. This means we won't be making any additional trips to the database compared to the way it was working before.
} | ||
|
||
// It shouldn't come here. If we did, something is wrong. | ||
throw new IllegalArgumentException("localUserSQL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you pick throwing an IllegalArgumentException (for this & the other parse routines)? It's probably better to have a descriptive error message instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to throw an IllegalArgumentException because I thought it best described the symptom. It would be easy to throw an exception detailing at which index the parsing failed if the parsing actually failed at some point, but for these instances where the parsing fails due to getting to the end of the parsing method (which it shouldn't - this means the user's SQL query was incorrect somehow), all we know is that the user's argument was incorrect. I do agree that I could give a better description, though - I'll add that in.
@@ -167,6 +230,61 @@ public void testExecuteBatch1() { | |||
fail(TestResource.getResource("R_executeBatchFailed") + ": " + e.getMessage()); | |||
} | |||
} | |||
|
|||
public void testExecuteBatch1UseBulkCopyAPI() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to all the tests, don't you need to set useBulkCopyForBatchInsert
property to true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
*/ | ||
@Test | ||
@Tag("slow") | ||
public void testStatementPoolingUseBulkCopyAPI() throws SQLException, NoSuchFieldException, SecurityException, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the new tests are just duplicates of the existing tests with minor changes. Can't we refactor this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -167,6 +230,61 @@ public void testExecuteBatch1() { | |||
fail(TestResource.getResource("R_executeBatchFailed") + ": " + e.getMessage()); | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is almost the exact copy of testExecuteBatch1()
. Please refactor.
int retValue[] = {0, 0, 0}; | ||
int updCountLength = 0; | ||
try { | ||
String sPrepStmt = "update ctstable2 set PRICE=PRICE*20 where TYPE_ID=?"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sPrepStmt
is never closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's closed.
retValue[i++] = rs.getInt(1); | ||
} | ||
|
||
pstmt1.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better move this into a try-with-resources block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's handled at the end of the test anyhoo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection itself is available in the class, they're all static objects (don't know why), but why create a new one anywhere else in the class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the modifyConnectionForBulkCopyAPI method modifies the connection object. If we don't create a new connection object (with the try block), the tests that come after this one will be affected by the change I made to the connection (since the only other connection object available is static). Therefore, to prevent that happening, I need to make a new connection object just to test my PR.
Also, they connection object is static because originally the test could just re-use the same connection object that doesn't need to change, but now we need to change our connection object accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's handled at the end of the test anyhoo.
- Not if
executeQuery
fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it'll be closed in the terminateVariation() (this is what I meant by end of the test)
} | ||
} | ||
catch (BatchUpdateException b) { | ||
fail("BatchUpdateException : Call to executeBatch is Failed!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call to executeBatch Failed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same below
fail("BatchUpdateException : Call to executeBatch is Failed!"); | ||
} | ||
catch (SQLException sqle) { | ||
fail("Call to executeBatch is Failed!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These catch blocks make no sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that they could handle exception better here, but this isn't my code. I'm just gonna fix it anyways though.
@@ -111,6 +115,65 @@ public void testAddBatch1() { | |||
fail(TestResource.getResource("R_addBatchFailed") + ": " + e.getMessage()); | |||
} | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments for testExecuteBatch1UseBulkCopyAPI()
apply to this test too.
* | ||
* @throws Exception | ||
*/ | ||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of Repro47239large
. Please refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@DisplayName("Regression test for using 'large' methods") | ||
public void Repro47239largeUseBulkCopyAPI() throws Exception { | ||
|
||
assumeTrue("JDBC42".equals(Utils.getConfiguredProperty("JDBC_Version")), TestResource.getResource("R_incompatJDBC")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed. assumeTrue("JDBC42".equals(Utils.getConfiguredProperty("JDBC_Version")), TestResource.getResource("R_incompatJDBC"));
error = "RAISERROR ('raiserror level 11',11,1) WITH LOG"; | ||
severe = "RAISERROR ('raiserror level 20',20,1) WITH LOG"; | ||
} | ||
con.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use try-with-resources
} | ||
catch (Exception ignored) { | ||
} | ||
stmt.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try-with-resources.
} | ||
|
||
try { | ||
stmt.executeLargeUpdate("drop table " + tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Utils.dropTableIfExists()
@@ -267,7 +269,227 @@ public void Repro47239() throws SQLException { | |||
stmt.close(); | |||
conn.close(); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very similar to Repro47239largeUseBulkCopyAPI()
. Please apply the comments to this test too.
@@ -313,7 +317,116 @@ public void batchWithLargeStringTest() throws SQLException { | |||
} | |||
|
|||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate of batchWithLargeStringTest()
. Please refactor.
|
||
@Test | ||
public void batchWithLargeStringTestUseBulkCopyAPI() throws SQLException { | ||
Connection con = DriverManager.getConnection(connectionString + ";useBulkCopyForBatchInsert=true;"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection is never closed.
// create a table with two columns | ||
boolean createPrimaryKey = false; | ||
try { | ||
stmt.execute("if object_id('" + testTable + "', 'U') is not null\ndrop table " + testTable + ";"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Utils.dropTableIfExists()
} | ||
|
||
try { | ||
stmt.executeUpdate("drop table " + tableName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the method from Utils class.
@BeforeEach | ||
public void testSetup() throws TestAbortedException, Exception { | ||
connection = DriverManager.getConnection(connectionString + ";useBulkCopyForBatchInsert=true;"); | ||
stmt = (SQLServerStatement) connection.createStatement(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better move this into a try-with-resources block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the stmt gets cleaned up during the cleanup stage - i think that's okay?
// SERVERPROPERTY('EngineEdition') can be used to determine whether the db server is SQL Azure. | ||
// It should return 6 for SQL Azure DW. This is more reliable than @@version or serverproperty('edition'). | ||
// Reference: http://msdn.microsoft.com/en-us/library/ee336261.aspx | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This topic is no longer available
We're sorry—the topic that you requested is no longer available. Use the search box to find related information.
Improves the batch insert operation performace against Azure DW.
Fixes issue #331.