-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cassandra driver's connections per host option #956
Comments
@michaelsembwever @yurishkuro @prat0318 @danchia any reason we shouldn't expose |
I can't think of any reason why you wouldn't want to, it's quite a reasonable tuning option. Speaking of which, one improvement I had planned to test was batching writes to Cassandra into small, unlogged batches, which I think would improvement performance. |
++ would be nice |
I've been told quite the opposite by the team with more experience with Cassandra. A batch can span multiple nodes, so batching puts more load on the coordinating node, rather than that work to be done by a client with token-aware connection. So you may want to profile it first, before changing the storage implementation to support batches. |
Here's an impl #958
|
I was basing it off: https://dzone.com/articles/efficient-cassandra-write, which could mean there is some benefit to be had if we have locality. I think the other thing that was motivating me to batch was that AFAIK we wait for the each trace to be stored before processing the next one, at least for Kafka. This greatly reduces potential throughput than say, if we allowed for up to X number of store operations in flight in the collector. |
Thanks for the link, Daniel. It seems the articles agrees with what I said, that fan-out from the client is more efficient then batching and sending to a coordination node. But if collector reads KAFKA and writes to C one span at a time, then yes it would be bad and batching might help. It's just you'd be optimizing the wrong end. We don't use collectors over KAFKA yet (actually working on it for another use case), but our collectors (in Go) have an internal queue which is feeding N go-routines, over a token-aware connection. The same thing can be easily done in Scala/Java collectors. Also worth noting that Cassandra connections are multiplexing, I think up to 128 or 256 streams, but we haven't gone that high in the # of threads. |
@danchia @yurishkuro booting the batch discussion to #961 where it is easier for others to identify |
fix pending on whether we want two properties (for local and remote), as this issue was brought up about remote. |
I wonder if we addressed this, if we'd end up with only needing to tune the local policy? #967 |
here's a PR which allows non-local hosts to be skipped #980 |
Please add configurable option to manage maxConnectionPerHost cassandra driver parameter.
Defaults is quite low (2 for HostDistance.REMOTE) and causes multiple errors in collector logs.
The text was updated successfully, but these errors were encountered: