-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry policy and fix documentation for Cassandra storage backend #10467
Add retry policy and fix documentation for Cassandra storage backend #10467
Conversation
fix docs for connection_timeout
b34bcef
to
fdcab88
Compare
@ncabatoff, Hi! Could you take a look, please? |
Enhancement request along with accompanying documentation improvements do seem relevant. Can this be reviewed for release in the near future? |
Likely related to #15899 - and on merger user on that issue should be advised to retest before closing that issue too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
The changelog check is ok -- I triggered the CI run off another branch and PR, so the check is looking for the wrong changelog entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We gave it a test run and it all seems good! 👍
Shortly, this PR fixes two problems:
More details below...
Current documentation says, that default value for
connection_timeout
is0
, you might think there is no timeout by default, but in fact, if we don't set these options timeout will be 600ms.see:
vault/physical/cassandra/cassandra.go
Lines 127 to 133 in 0e8c6c2
cluster.Timeout
is not changing somewhere else and get default value:vault/vendor/github.com/gocql/gocql/cluster.go
Line 49 in 665d668
If we have a Cassandra cluster with several nodes, we don't want to get an error if one of nodes has gone. For supporting this behavior with gocql(which used in Cassandra backend) we must set one of RetryPolicy for the cluster: https://github.com/gocql/gocql/blob/5913df4d474e0b2492a129d17bbb3c04537a15cd/policies.go#L158
By default(current behavior) cluster use "retry on same connection" policy. So, if current active node is down client will get an error:
https://github.com/gocql/gocql/blob/5913df4d474e0b2492a129d17bbb3c04537a15cd/policies.go#L141
We can easily fix this by using SimpleRetryPolicy (retry for another connection) when creating the cluster.
Also, one new option added to set timeout for the initial connection.
I tried to make changes as little as possible.
This PR doesn't change the behavior of current installations.