MySQL performance and connectivity improvements #444

noslowerdna · 2015-07-01T16:37:02Z

We have identified several improvements for Zipkin deployments that using a MySQL database for storage,

Automatic reconnect for dropped connections so that the collector and query services do not need to be restarted
Automatic retry for transient/recoverable exceptions so that web app users and clients sending traces to the collector service are not affected if seamless recovery is possible
Proper back-pressure for maintaining stability under heavy load conditions instead of unbounded resource (Java heap, native threads) growth
Database connection pooling for significantly improved collector service performance under heavy load conditions
Prevent constraint violation errors if duplicate traces are sent
Document recommended indexes for optimal performance for MySQL deployments
Query service performance improved significantly for getServiceNames, getSpanNames, getTraceIdsByName, and getTraceIdsByAnnotation - includes both updated SQL and table schema additions
Add timers for all SQL executions that can be viewed using the admin API

codefromthecrypt · 2015-07-01T19:05:41Z

@henrikno @IceCreamYou @Trundle @eirslett cursory glance looks ok (although in general, this stuff should have tests) If I don't hear back otherwise, I'll merge by Friday

codefromthecrypt · 2015-07-01T19:17:30Z

@postwait you might also be interested in this.

eirslett · 2015-07-01T21:32:08Z

How about using HikariCP?

mosesn · 2015-07-02T01:15:39Z

If you really want it to scream on mysql, I'd suggest checking out finagle-mysql. It's pretty basic, but it's implemented from the ground-up to be asynchronous, so it plays more nicely with the existing zipkin abstractions.

noslowerdna · 2015-07-02T18:05:26Z

How about using HikariCP?

Yes, HikariCP does look good. I'll try to run some comparison tests with it this afternoon.

I'd suggest checking out finagle-mysql

Thanks for the feedback. Doesn't look quite as easy as HikariCP to switch over to, but I'll check it out if I have time.

mosesn · 2015-07-02T18:45:20Z

Yeah, we wouldn't be able to just switch, either–finagle-mysql implements the mysql protocol, so it only supports mysql, unlike anorm, which also supports things like sqlite, so it would have to be a totally separate mysql-only implementation.

noslowerdna · 2015-07-02T20:45:39Z

HikariCP was trivial to integrate [1], but its performance was essentially identical to the original selection of Apache Commons DBCP. Performance of DBCP was actually slightly better in our benchmark testing although that's probably just natural variance. I think it's a safe conclusion that the pool overhead is negligible, dwarfed by the I/O of the database transaction.

Commons DBCP

collector.pushback: 367
jvm_thread_peak_count: 62
storeSpan_msec: (average=639, count=5879, maximum=2586, minimum=21, p50=349, p90=1568, p95=1733, p99=1915, p999=2339, p9999=2586, sum=3759050)

HikariCP

collector.pushback: 381
jvm_thread_peak_count: 64
storeSpan_msec: (average=661, count=5861, maximum=3158, minimum=26, p50=349, p90=1733, p95=1915, p99=2116, p999=2586, p9999=3158, sum=3875527)

[1] https://gist.github.com/noslowerdna/9b975aed5c502a10efad

eirslett · 2015-07-02T21:02:05Z

Interesting... So they're performing approximately the same? ( @brettwooldridge is this a normal case for HikariCP? I haven't used it myself but I've heard its performance is a quite dramatic improvement over other connection pools.)

codefromthecrypt · 2015-07-03T09:22:41Z

fyi we've two options on this:

merge with DBCP and convert later if a compelling case exists for HikariCP - this allows the bulk of the change, which so far seems uncontested, to merge.
wait until others get on board with DBCP vs HikariCP - might result in higher performance, but delays the change

Historically we've struggled with delayed change, it sometimes forces people into forks, or delays their move out of forks. Anecdotally, delay seems at odds with engagement, which is something we are trying to turn around. For this reason alone, I suggest we merge.

@eirslett wdyt?

brettwooldridge · 2015-07-03T10:01:33Z

@eirslett @noslowerdna @adriancole Thanks for the Cc.

7pm here in Tokyo. I'm on the way home, and formulating a more detailed comment. I'll update in about 4 hours (after sending daughter to dreamland).

Thanks for taking a look at HikariCP. While HikariCP is probably best known for being fast, we have actually spent more effort on achieving that speed within the constraints of providing the highest reliability possible. We are confident that HikariCP is not slightly more reliable, but substantially more reliable than currently available pools.

Many (most?) available pools (including DBCP) default to a mode of operation where performance is prioritized over reliability. In contrast, HikariCP has no "unsafe" operational modes -- no way to disable "correct" behavior.

The benchmark cited on our page is actually extremely generous to other pools. They are run against a JDBC stub-driver in which every operation is an empty method. When a real driver is put into the loop instead, the difference in results begs believability -- but believe them. We should probably beat this drum more loudly, but...

So, what is unfair about the typical comparison? I'm going to talk about HikariCP, Apache DBCP2, and Tomcat DBCP here; talking some about speed, and then bringing in the reliability pieces. Running HikariCP-benchmark against the three pools against a real database (MySQL) instead of a stub.

First, all three pools in default configuration (+ autocommit=false):

Benchmark                                             (jdbcUrl)  (pool)   Mode   Score
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  hikari  thrpt   45289.116  ops/ms
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  tomcat  thrpt    2329.692  ops/ms
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  dbcp2   thrpt      21.750  ops/ms

DBCP2

What is not visible here is that DBCP2 is generating ~3MB/sec of traffic to the DB, because rollbackOnReturn defaults to true and it is unconditionally rolling back. It gets bonus points for defaulting on the side of safety. Unfortunately, it is not validating connections on borrow.

HikariCP

HikariCP is generating zero traffic to the DB. HikariCP also defaults to "rollback on return" (it can't be turned off because that is the correct behavior for a pool), but it additionally tracks transaction state and does not rollback if the SQL has already been committed (or no SQL was run). HikariCP also defaults to "test on borrow" (it can't be turned off...), but employs an optimization that says, "If a connection had activity within the past 1000ms, bypass connection validation."

Tomcat

Tomcat DBCP is also generating zero traffic to the DB, but for a different reason. It simply is not validating connections at all, nor is it rolling back on return.

Now, let's try to level the playing field as a little. For Tomcat and DBCP, we need to enable connection validation.

Benchmark                                             (jdbcUrl)  (pool)   Mode   Score
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  hikari  thrpt   45289.116  ops/ms
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  tomcat  thrpt    2133.992  ops/ms
ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  dbcp2   thrpt       5.296  ops/ms

DBCP2

DBCP2 took a hit here, because it does not have a validation optization like HikariCP. It is still generating ~3MB/sec of traffic to the DB.

Tomcat

Tomcat DBCP does support a similar optimzation to HikariCP, the config goes something like this:

setTestOnBorrow(true)
setValidationInterval(1000)
setValidator( validator )

But we forgot "rollback on return" for Tomcat:

ConnectionBench.cycleCnnection  jdbc:mysql://192.168.20.15/test  tomcat  thrpt      20.706  ops/ms

And there goes the performance. Tomcat is now generating ~3MB/sec of traffic to the DB. It does not track transaction state and therefore must unconditionally rollback. I thought maybe enabling "ConnectionState tracking" might help, but it does not.

There is a lot more that HikariCP is doing, not covered here ... guarding against network partitions, checking SQLExceptions for vendor disconnect codes, resetting auto-commit, transaction isolation, catalog, network timeout, tracking open Statements (and closing them), etc. All while keeping the performance levels you see above.

codefromthecrypt · 2015-07-03T10:08:33Z

@brettwooldridge awesome. good luck with dadops!

codefromthecrypt · 2015-07-03T15:06:01Z

fyi I can't merge this for technical reasons right now. Hopefully, they'll resolve by Monday.

codefromthecrypt · 2015-07-04T06:58:00Z

technical issues resolved.. merging

codefromthecrypt · 2015-07-04T07:17:37Z

zipkin-anormdb/src/test/scala/com/twitter/zipkin/storage/anormdb/DBSpec.scala

+ *
+ */
+
+import org.specs._


dependency not configured.. fixing before merge

codefromthecrypt · 2015-07-04T12:01:56Z

closed via 2f5f5f5

codefromthecrypt · 2015-07-04T12:04:14Z

Thanks for all the help, folks. Particularly those interested in performance should scroll up to read the update from @brettwooldridge on db pools (github doesn't notify on edit). This topic might be best pulled into a separate issue/pull request, which we could quickly address before releasing 1.2.

codefromthecrypt · 2015-07-04T12:06:33Z

fwiw I think @brettwooldridge makes a strong case, but up to y'all to decide to raise a PR for switcheroo or not.

MySQL performance and connectivity improvements

f0f178a

noslowerdna mentioned this pull request Jul 1, 2015

Reduce default query limit from 100 to 10 #443

Merged

codefromthecrypt reviewed Jul 4, 2015
View reviewed changes

codefromthecrypt closed this Jul 4, 2015

noslowerdna mentioned this pull request Jul 6, 2015

Switch database connection pool from DBCP2 to HikariCP #449

Closed

codefromthecrypt mentioned this pull request Jun 27, 2016

Performance test for integrated span collection #1148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MySQL performance and connectivity improvements #444

MySQL performance and connectivity improvements #444

noslowerdna commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

eirslett commented Jul 1, 2015

mosesn commented Jul 2, 2015

noslowerdna commented Jul 2, 2015

mosesn commented Jul 2, 2015

noslowerdna commented Jul 2, 2015

eirslett commented Jul 2, 2015

codefromthecrypt commented Jul 3, 2015

brettwooldridge commented Jul 3, 2015

codefromthecrypt commented Jul 3, 2015

codefromthecrypt commented Jul 3, 2015

codefromthecrypt commented Jul 4, 2015

codefromthecrypt Jul 4, 2015

codefromthecrypt commented Jul 4, 2015

codefromthecrypt commented Jul 4, 2015

codefromthecrypt commented Jul 4, 2015

MySQL performance and connectivity improvements #444

MySQL performance and connectivity improvements #444

Conversation

noslowerdna commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

codefromthecrypt commented Jul 1, 2015

eirslett commented Jul 1, 2015

mosesn commented Jul 2, 2015

noslowerdna commented Jul 2, 2015

mosesn commented Jul 2, 2015

noslowerdna commented Jul 2, 2015

eirslett commented Jul 2, 2015

codefromthecrypt commented Jul 3, 2015

brettwooldridge commented Jul 3, 2015

DBCP2

HikariCP

Tomcat

DBCP2

Tomcat

codefromthecrypt commented Jul 3, 2015

codefromthecrypt commented Jul 3, 2015

codefromthecrypt commented Jul 4, 2015

codefromthecrypt Jul 4, 2015

Choose a reason for hiding this comment

codefromthecrypt commented Jul 4, 2015

codefromthecrypt commented Jul 4, 2015

codefromthecrypt commented Jul 4, 2015