Cassandra driver's connections per host option #956

drax68 · 2016-02-10T17:43:44Z

Please add configurable option to manage maxConnectionPerHost cassandra driver parameter.
Defaults is quite low (2 for HostDistance.REMOTE) and causes multiple errors in collector logs.

codefromthecrypt · 2016-02-11T02:15:53Z

@michaelsembwever @yurishkuro @prat0318 @danchia any reason we shouldn't expose CASSANDRA_MAX_CONNECTIONS?

danchia · 2016-02-11T02:19:32Z

I can't think of any reason why you wouldn't want to, it's quite a reasonable tuning option.

Speaking of which, one improvement I had planned to test was batching writes to Cassandra into small, unlogged batches, which I think would improvement performance.

prat0318 · 2016-02-11T02:42:37Z

++ would be nice

yurishkuro · 2016-02-11T02:55:51Z

batching writes to Cassandra into small, unlogged batches, which I think would improvement performance.

I've been told quite the opposite by the team with more experience with Cassandra. A batch can span multiple nodes, so batching puts more load on the coordinating node, rather than that work to be done by a client with token-aware connection.

So you may want to profile it first, before changing the storage implementation to support batches.

codefromthecrypt · 2016-02-11T03:12:29Z

Here's an impl #958

danchia · 2016-02-11T03:22:00Z

I was basing it off: https://dzone.com/articles/efficient-cassandra-write, which could mean there is some benefit to be had if we have locality.

I think the other thing that was motivating me to batch was that AFAIK we wait for the each trace to be stored before processing the next one, at least for Kafka. This greatly reduces potential throughput than say, if we allowed for up to X number of store operations in flight in the collector.

yurishkuro · 2016-02-11T04:35:51Z

Thanks for the link, Daniel. It seems the articles agrees with what I said, that fan-out from the client is more efficient then batching and sending to a coordination node. But if collector reads KAFKA and writes to C one span at a time, then yes it would be bad and batching might help. It's just you'd be optimizing the wrong end. We don't use collectors over KAFKA yet (actually working on it for another use case), but our collectors (in Go) have an internal queue which is feeding N go-routines, over a token-aware connection. The same thing can be easily done in Scala/Java collectors.

Also worth noting that Cassandra connections are multiplexing, I think up to 128 or 256 streams, but we haven't gone that high in the # of threads.

codefromthecrypt · 2016-02-11T06:16:20Z

@danchia @yurishkuro booting the batch discussion to #961 where it is easier for others to identify

codefromthecrypt · 2016-02-11T06:59:15Z

fix pending on whether we want two properties (for local and remote), as this issue was brought up about remote.

codefromthecrypt · 2016-02-12T02:39:24Z

I wonder if we addressed this, if we'd end up with only needing to tune the local policy? #967

codefromthecrypt · 2016-02-15T07:41:04Z

here's a PR which allows non-local hosts to be skipped #980

codefromthecrypt mentioned this issue Feb 11, 2016

Adds CASSANDRA_MAX_CONNECTIONS to control pooled connections per host #958

Merged

codefromthecrypt mentioned this issue Feb 11, 2016

Should we batch writes to Cassandra into small, unlogged batches? #961

Closed

codefromthecrypt closed this as completed in #958 Feb 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cassandra driver's connections per host option #956

Cassandra driver's connections per host option #956

drax68 commented Feb 10, 2016

codefromthecrypt commented Feb 11, 2016

danchia commented Feb 11, 2016

prat0318 commented Feb 11, 2016

yurishkuro commented Feb 11, 2016

codefromthecrypt commented Feb 11, 2016 via email

danchia commented Feb 11, 2016

yurishkuro commented Feb 11, 2016

codefromthecrypt commented Feb 11, 2016

codefromthecrypt commented Feb 11, 2016

codefromthecrypt commented Feb 12, 2016

codefromthecrypt commented Feb 15, 2016

Cassandra driver's connections per host option #956

Cassandra driver's connections per host option #956

Comments

drax68 commented Feb 10, 2016

codefromthecrypt commented Feb 11, 2016

danchia commented Feb 11, 2016

prat0318 commented Feb 11, 2016

yurishkuro commented Feb 11, 2016

codefromthecrypt commented Feb 11, 2016 via email

danchia commented Feb 11, 2016

yurishkuro commented Feb 11, 2016

codefromthecrypt commented Feb 11, 2016

codefromthecrypt commented Feb 11, 2016

codefromthecrypt commented Feb 12, 2016

codefromthecrypt commented Feb 15, 2016