Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL performance and connectivity improvements #444

Closed
wants to merge 1 commit into from

Conversation

noslowerdna
Copy link

We have identified several improvements for Zipkin deployments that using a MySQL database for storage,

  • Automatic reconnect for dropped connections so that the collector and query services do not need to be restarted
  • Automatic retry for transient/recoverable exceptions so that web app users and clients sending traces to the collector service are not affected if seamless recovery is possible
  • Proper back-pressure for maintaining stability under heavy load conditions instead of unbounded resource (Java heap, native threads) growth
  • Database connection pooling for significantly improved collector service performance under heavy load conditions
  • Prevent constraint violation errors if duplicate traces are sent
  • Document recommended indexes for optimal performance for MySQL deployments
  • Query service performance improved significantly for getServiceNames, getSpanNames, getTraceIdsByName, and getTraceIdsByAnnotation - includes both updated SQL and table schema additions
  • Add timers for all SQL executions that can be viewed using the admin API

@codefromthecrypt
Copy link
Member

@henrikno @IceCreamYou @Trundle @eirslett cursory glance looks ok (although in general, this stuff should have tests) If I don't hear back otherwise, I'll merge by Friday

@codefromthecrypt
Copy link
Member

@postwait you might also be interested in this.

@eirslett
Copy link
Contributor

eirslett commented Jul 1, 2015

How about using HikariCP?

@mosesn
Copy link
Contributor

mosesn commented Jul 2, 2015

If you really want it to scream on mysql, I'd suggest checking out finagle-mysql. It's pretty basic, but it's implemented from the ground-up to be asynchronous, so it plays more nicely with the existing zipkin abstractions.

@noslowerdna
Copy link
Author

How about using HikariCP?

Yes, HikariCP does look good. I'll try to run some comparison tests with it this afternoon.

I'd suggest checking out finagle-mysql

Thanks for the feedback. Doesn't look quite as easy as HikariCP to switch over to, but I'll check it out if I have time.

@mosesn
Copy link
Contributor

mosesn commented Jul 2, 2015

Yeah, we wouldn't be able to just switch, either–finagle-mysql implements the mysql protocol, so it only supports mysql, unlike anorm, which also supports things like sqlite, so it would have to be a totally separate mysql-only implementation.

@noslowerdna
Copy link
Author

HikariCP was trivial to integrate [1], but its performance was essentially identical to the original selection of Apache Commons DBCP. Performance of DBCP was actually slightly better in our benchmark testing although that's probably just natural variance. I think it's a safe conclusion that the pool overhead is negligible, dwarfed by the I/O of the database transaction.

Commons DBCP

collector.pushback: 367
jvm_thread_peak_count: 62
storeSpan_msec: (average=639, count=5879, maximum=2586, minimum=21, p50=349, p90=1568, p95=1733, p99=1915, p999=2339, p9999=2586, sum=3759050)

HikariCP

collector.pushback: 381
jvm_thread_peak_count: 64
storeSpan_msec: (average=661, count=5861, maximum=3158, minimum=26, p50=349, p90=1733, p95=1915, p99=2116, p999=2586, p9999=3158, sum=3875527)

[1] https://gist.github.com/noslowerdna/9b975aed5c502a10efad

@eirslett
Copy link
Contributor

eirslett commented Jul 2, 2015

Interesting... So they're performing approximately the same? ( @brettwooldridge is this a normal case for HikariCP? I haven't used it myself but I've heard its performance is a quite dramatic improvement over other connection pools.)

@codefromthecrypt
Copy link
Member

fyi we've two options on this:

  • merge with DBCP and convert later if a compelling case exists for HikariCP - this allows the bulk of the change, which so far seems uncontested, to merge.
  • wait until others get on board with DBCP vs HikariCP - might result in higher performance, but delays the change

Historically we've struggled with delayed change, it sometimes forces people into forks, or delays their move out of forks. Anecdotally, delay seems at odds with engagement, which is something we are trying to turn around. For this reason alone, I suggest we merge.

@eirslett wdyt?

@brettwooldridge
Copy link

@eirslett @noslowerdna @adriancole Thanks for the Cc.

7pm here in Tokyo. I'm on the way home, and formulating a more detailed comment. I'll update in about 4 hours (after sending daughter to dreamland).


Thanks for taking a look at HikariCP. While HikariCP is probably best known for being fast, we have actually spent more effort on achieving that speed within the constraints of providing the highest reliability possible. We are confident that HikariCP is not slightly more reliable, but substantially more reliable than currently available pools.

Many (most?) available pools (including DBCP) default to a mode of operation where performance is prioritized over reliability. In contrast, HikariCP has no "unsafe" operational modes -- no way to disable "correct" behavior.

The benchmark cited on our page is actually extremely generous to other pools. They are run against a JDBC stub-driver in which every operation is an empty method. When a real driver is put into the loop instead, the difference in results begs believability -- but believe them. We should probably beat this drum more loudly, but...

So, what is unfair about the typical comparison? I'm going to talk about HikariCP, Apache DBCP2, and Tomcat DBCP here; talking some about speed, and then bringing in the reliability pieces. Running HikariCP-benchmark against the three pools against a real database (MySQL) instead of a stub.

First, all three pools in default configuration (+ autocommit=false):

Benchmark                                             (jdbcUrl)  (pool)   Mode   Score
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  hikari  thrpt   45289.116  ops/ms
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  tomcat  thrpt    2329.692  ops/ms
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  dbcp2   thrpt      21.750  ops/ms

DBCP2

What is not visible here is that DBCP2 is generating ~3MB/sec of traffic to the DB, because rollbackOnReturn defaults to true and it is unconditionally rolling back. It gets bonus points for defaulting on the side of safety. Unfortunately, it is not validating connections on borrow.

HikariCP

HikariCP is generating zero traffic to the DB. HikariCP also defaults to "rollback on return" (it can't be turned off because that is the correct behavior for a pool), but it additionally tracks transaction state and does not rollback if the SQL has already been committed (or no SQL was run). HikariCP also defaults to "test on borrow" (it can't be turned off...), but employs an optimization that says, "If a connection had activity within the past 1000ms, bypass connection validation."

Tomcat

Tomcat DBCP is also generating zero traffic to the DB, but for a different reason. It simply is not validating connections at all, nor is it rolling back on return.


Now, let's try to level the playing field as a little. For Tomcat and DBCP, we need to enable connection validation.

Benchmark                                             (jdbcUrl)  (pool)   Mode   Score
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  hikari  thrpt   45289.116  ops/ms
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  tomcat  thrpt    2133.992  ops/ms
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  dbcp2   thrpt       5.296  ops/ms

DBCP2

DBCP2 took a hit here, because it does not have a validation optization like HikariCP. It is still generating ~3MB/sec of traffic to the DB.

Tomcat

Tomcat DBCP does support a similar optimzation to HikariCP, the config goes something like this:

setTestOnBorrow(true)
setValidationInterval(1000)
setValidator( validator )

But we forgot "rollback on return" for Tomcat:

ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  tomcat  thrpt      20.706  ops/ms

And there goes the performance. Tomcat is now generating ~3MB/sec of traffic to the DB. It does not track transaction state and therefore must unconditionally rollback. I thought maybe enabling "ConnectionState tracking" might help, but it does not.

There is a lot more that HikariCP is doing, not covered here ... guarding against network partitions, checking SQLExceptions for vendor disconnect codes, resetting auto-commit, transaction isolation, catalog, network timeout, tracking open Statements (and closing them), etc. All while keeping the performance levels you see above.

@codefromthecrypt
Copy link
Member

@brettwooldridge awesome. good luck with dadops!

@codefromthecrypt
Copy link
Member

fyi I can't merge this for technical reasons right now. Hopefully, they'll resolve by Monday.

@codefromthecrypt
Copy link
Member

technical issues resolved.. merging

*
*/

import org.specs._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency not configured.. fixing before merge

@codefromthecrypt
Copy link
Member

closed via 2f5f5f5

@codefromthecrypt
Copy link
Member

Thanks for all the help, folks. Particularly those interested in performance should scroll up to read the update from @brettwooldridge on db pools (github doesn't notify on edit). This topic might be best pulled into a separate issue/pull request, which we could quickly address before releasing 1.2.

@codefromthecrypt
Copy link
Member

fwiw I think @brettwooldridge makes a strong case, but up to y'all to decide to raise a PR for switcheroo or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants