-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jaeger fragile in the face of cassandra failures #767
Comments
That sounds like a bug in the gocql though, not Jaeger directly. |
we have a fixit week coming up, & want to try upgrading the driver |
@burmanm whether it is a bug in gocql or not doesn't really matter:
Either way, a bug in this repo is relevant while the defect is present :). |
@rbtcollins are you still experiencing this? The driver has been updated as part of #829. Edit: of course you are still experiencing this... we haven't released a version with the fix yet. Would you be able to run from master? |
I'll see about that, got a few things up in the air just now. Anything stopping doing a release? |
This was released as part of 1.5.0, let us know if it works |
I'm closing this one, but feel free to reopen if you are still experiencing this after 1.5.0 |
I'd like to re-open this as I'm seeing the same problem after 1.5.0. Related gocql issue: apache/cassandra-gocql-driver#915 I see there is a setting in gocql
In our use case, it's definitely been triggered while resizing the cluster. Assuming that setting works as advertised... I believe the suggestion would be to add a flag to jaeger for cassandra reconnect interval, and set the default to some reasonable value. |
Would you like to contribute a patch? |
* Fix jaegertracing#767 by enabling gocql setting `ReconnectInterval` to reconnect to down Cassandra hosts at a regular interval.
* Fix jaegertracing#767 by enabling gocql setting `ReconnectInterval` to reconnect to down Cassandra hosts at a regular interval. Signed-off-by: Brendan Shaklovitz <[email protected]>
* Fix jaegertracing#767 by enabling gocql setting `ReconnectInterval` to reconnect to down Cassandra hosts at a regular interval. Signed-off-by: Brendan Shaklovitz <[email protected]>
@jpkrohling Sure 💯 Opened #934 |
* Make cassandra reconnect down hosts. * Fix #767 by enabling gocql setting `ReconnectInterval` to reconnect to down Cassandra hosts at a regular interval. Signed-off-by: Brendan Shaklovitz <[email protected]> * Add cassandra `ReconnectInterval` test. Signed-off-by: Brendan Shaklovitz <[email protected]>
We've been observing Jaeger components - specifically the collectors and query - fail to recover after Cassandra has any sort of outage. While Cassandra reliability is clearly not a topic for here ;), Jaeger's resilience to issues is.
What we see happen is this a stuck process which is only logging one thing even hours after the issue is fixed...
gocql: no hosts available in the pool
What we'd like to see happen is a recovery after a few minutes without manual intervention. This is related to, but distinct from, #562
The text was updated successfully, but these errors were encountered: