Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Performance drop SQL Server compatibility #2471

Closed
marianadsalgueiro opened this issue Jul 5, 2024 · 7 comments
Closed

[QUESTION] Performance drop SQL Server compatibility #2471

marianadsalgueiro opened this issue Jul 5, 2024 · 7 comments
Assignees
Labels
Question Used when a question is asked, as opposed to an issue being raised

Comments

@marianadsalgueiro
Copy link

Question

I have this java program that reads a CSV file and updates a table (performing deletes and inserts) in batches using java.sql.PreparedStatement.addBatch and java.sql.PreparedStatement.executeBatch. I noticed a significant performance drop when migrating from SQL Server 2012 to SQL Server 2019 and changing the database compatibility level from 110 to 150.

CSV File Size Execution Time SQL Server Version
166MB 10min15s 2012
166MB 1h24min40s 2019

It appears that while using the database in compatibility level 110, batches are trully being generated, but in compatibility level 150, deletes/inserts are being made line by line? Does that make sense?

Since this java program is a little bit old, we were still using sqljdbc4 4.2. We tried migrating to mssql-jdbc 12.6.2.jre8, but the problem still persists.

The connection string is:
jdbc:sqlserver://localhost:1433;databaseName=XXX;selectMethod=cursor;encrypt=true;trustServerCertificate=true;

Is there anything missing on the connection string? Is this a known problem?

@marianadsalgueiro marianadsalgueiro changed the title [QUESTION] [QUESTION] Performance drop SQL Server compatibility Jul 5, 2024
@typhoon2k
Copy link

Please try to add useBulkCopyForBatchInsert=true in connection string.

@marianadsalgueiro
Copy link
Author

marianadsalgueiro commented Jul 8, 2024

Thanks for the update!

The execution time has gotten a bit better but is still slower compared to compatibiliy level 110.

CSV File Size Execution Time SQL Server Version
166MB 10min15s 2012
166MB 1h24min40s 2019
166MB 41min40s 2019 (new connection string)

New connection string: jdbc:sqlserver://localhost:1433;databaseName=XXX;selectMethod=cursor;encrypt=true;trustServerCertificate=true;useBulkCopyForBatchInsert=true;

Do you have any documentation on why useBulkCopyForBatchInsert=true helps (at least a little)?
Is there anything else you recommend?

CSV files are generated for this program every day with different sizes, and I'm concerned that larger files will result in significantly longer execution times.

Thanks again.

@typhoon2k
Copy link

You can read about this property here. It enables usage of bulk copy API, when prepared statement is used. If you have that possibility, you can try to play with bulk copy API directly and check if batch size property in SQLServerBulkCopyOptions will allow you to achieve better performance.

@Jeffery-Wasty
Copy link
Contributor

Jeffery-Wasty commented Jul 8, 2024

I can't explain the performance impact between the SQL versions, have you checked with the SQL Server team?

Can you give us the results when comparing SQL Server 2012 to 2019, when using the same driver version and same compatibility level?

Additionally, can you supply us with the execution plan between the two runs, we can see if there are any major differences that would explain the performance impact.

We could also take a look at the driver logs (though the above execution plan and test results are more helpful).

For improving performance in general, here are our tips:

  • Since you are using prepared statements, you can try setting prepareMethod=prepare

image

  • The performance impact, while using bulk copy for batch insert, could also be because of metadata not being cached correctly (see Connection level bulk copy metadata caching #2464). You can confirm this for yourself by trying the 12.7.1 preview (which has the fix for proper caching). The next stable release to include this fix will be our 12.8.0 GA scheduled for July 31.

  • If you are using batch inserts without bulkcopy, look into Prepared Statement Metadata caching.

@Jeffery-Wasty Jeffery-Wasty self-assigned this Jul 8, 2024
@Jeffery-Wasty Jeffery-Wasty added the Question Used when a question is asked, as opposed to an issue being raised label Jul 8, 2024
@marianadsalgueiro
Copy link
Author

Thanks for the feedback!

I'm currently working on some tests based on your suggestions and will get back to you as soon as possible.

@Jeffery-Wasty Jeffery-Wasty added Performance-related The desired fix involves increasing the performance of a process. and removed Performance-related The desired fix involves increasing the performance of a process. labels Jul 10, 2024
@marianadsalgueiro
Copy link
Author

marianadsalgueiro commented Jul 15, 2024

Hi!
After a while (and lots of testing), I've finally found the problem.

Here are the results on different driver versions and compatibility levels:

CSV File Size Compatibility Level Driver Execution Time
166MB 110 sqljdbc4 4.2 10min15s
166MB 150 sqljdbc4 4.2 1h29min49s
166MB 110 mssql-jdbc 12.6.2.jre8 10min
166MB 150 mssql-jdbc 12.6.2.jre8 1h24min40s
166MB 110 mssql-jdbc 12.7.1.jre8-preview 11min25s
166MB 150 mssql-jdbc 12.7.1.jre8-preview 1h5min41s

As we can see, all driver versions on compatibily level 150 are affected.

While running SQL Server Profiler on both compatibility levels, I observed that for a specific code and date ('7985', '20160701'), the duration of the DELETES got significantly worse.

image

As Jeffery requested the execution plan between the two runs, I remembered that I had once enabled Query Store on SQL Server 2019 (at compatibility level 150) while the program was running and noticed that the plan was changing mid-run.

image

At compatibility level 110, the plan did not change.

image

While researching, I found that starting from SQL Server 2016, the Database Engine changed, and Trace Flag 2371 is enabled by default. This changes the threshold used by the auto-update statistics process to a dynamic threshold. So I decided to check if statistics were changing mid-run, explaining why DELETEs duration got worse and why the plan was changing.

image

And voilá.
So basically, updated statistics are generating new plans, and those plans are worse.
I just don't understand why they're worse... but ok.

So, it has nothing to do with the driver but with the database engine's changes.

Anyway, thank you!

@Jeffery-Wasty
Copy link
Contributor

Very interesting, thank you for looking into this. We'll go ahead and close this issue, but I'm hoping in anyone else in the future has this same issue they come across this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Question Used when a question is asked, as opposed to an issue being raised
Projects
Status: Closed Issues
Development

No branches or pull requests

3 participants