Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cassandra driver's connections per host option #956

Closed
drax68 opened this issue Feb 10, 2016 · 11 comments
Closed

Cassandra driver's connections per host option #956

drax68 opened this issue Feb 10, 2016 · 11 comments

Comments

@drax68
Copy link
Contributor

drax68 commented Feb 10, 2016

Please add configurable option to manage maxConnectionPerHost cassandra driver parameter.
Defaults is quite low (2 for HostDistance.REMOTE) and causes multiple errors in collector logs.

@codefromthecrypt
Copy link
Member

@michaelsembwever @yurishkuro @prat0318 @danchia any reason we shouldn't expose CASSANDRA_MAX_CONNECTIONS?

@danchia
Copy link

danchia commented Feb 11, 2016

I can't think of any reason why you wouldn't want to, it's quite a reasonable tuning option.

Speaking of which, one improvement I had planned to test was batching writes to Cassandra into small, unlogged batches, which I think would improvement performance.

@prat0318
Copy link

++ would be nice

@yurishkuro
Copy link
Contributor

batching writes to Cassandra into small, unlogged batches, which I think would improvement performance.

I've been told quite the opposite by the team with more experience with Cassandra. A batch can span multiple nodes, so batching puts more load on the coordinating node, rather than that work to be done by a client with token-aware connection.

So you may want to profile it first, before changing the storage implementation to support batches.

@codefromthecrypt
Copy link
Member

codefromthecrypt commented Feb 11, 2016 via email

@danchia
Copy link

danchia commented Feb 11, 2016

I was basing it off: https://dzone.com/articles/efficient-cassandra-write, which could mean there is some benefit to be had if we have locality.

I think the other thing that was motivating me to batch was that AFAIK we wait for the each trace to be stored before processing the next one, at least for Kafka. This greatly reduces potential throughput than say, if we allowed for up to X number of store operations in flight in the collector.

@yurishkuro
Copy link
Contributor

Thanks for the link, Daniel. It seems the articles agrees with what I said, that fan-out from the client is more efficient then batching and sending to a coordination node. But if collector reads KAFKA and writes to C one span at a time, then yes it would be bad and batching might help. It's just you'd be optimizing the wrong end. We don't use collectors over KAFKA yet (actually working on it for another use case), but our collectors (in Go) have an internal queue which is feeding N go-routines, over a token-aware connection. The same thing can be easily done in Scala/Java collectors.

Also worth noting that Cassandra connections are multiplexing, I think up to 128 or 256 streams, but we haven't gone that high in the # of threads.

@codefromthecrypt
Copy link
Member

@danchia @yurishkuro booting the batch discussion to #961 where it is easier for others to identify

@codefromthecrypt
Copy link
Member

fix pending on whether we want two properties (for local and remote), as this issue was brought up about remote.

@codefromthecrypt
Copy link
Member

I wonder if we addressed this, if we'd end up with only needing to tune the local policy? #967

@codefromthecrypt
Copy link
Member

here's a PR which allows non-local hosts to be skipped #980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants