Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems electing leader in HA mode #1908

Closed
elephantfries opened this issue Sep 21, 2016 · 1 comment · Fixed by #1909
Closed

Problems electing leader in HA mode #1908

elephantfries opened this issue Sep 21, 2016 · 1 comment · Fixed by #1909

Comments

@elephantfries
Copy link

Hi,

I am having serious problems getting vault to properly elect the leader in HA mode. It is inconsistent. At times, it elects one leader but other nodes don't know who that is even if they stay in standby mode (so somehow they know there is a leader somewhere).
At other times, and this is the case I see right now, two nodes elect themselves a leader while the third one remains in standby again not knowing who the leader is.

Configuration:
3x vault 0.6.1 with etcd backend each with the following config where redirect_addr
points to respective instances vault1, vault2, vault3

backend "etcd" {
    address = "http://10.0.31.175:2379,http://10.0.28.147:2379,http://10.0.28.148:2379"
    redirect_addr = "https://vault1.xxxxxx.org:8200"
    path = "/vault"
}

Node 1 log:

Sep 21 07:36:41 atomic1.localdomain docker[21884]: ==> Vault server configuration:
Sep 21 07:36:41 atomic1.localdomain docker[21884]: Backend: etcd (HA available)
Sep 21 07:36:41 atomic1.localdomain docker[21884]: Listener 1: tcp (addr: "0.0.0.0:8200", tls: "enabled")
Sep 21 07:36:41 atomic1.localdomain docker[21884]: Log Level: info
Sep 21 07:36:41 atomic1.localdomain docker[21884]: Mlock: supported: true, enabled: true
Sep 21 07:36:41 atomic1.localdomain docker[21884]: Redirect Address: https://vault1.xxxxxx.org:8200
Sep 21 07:36:41 atomic1.localdomain docker[21884]: Version: Vault v0.6.1
Sep 21 07:36:41 atomic1.localdomain docker[21884]: ==> Vault server started! Log data will stream in below:
Sep 21 07:36:42 atomic1.localdomain docker[21884]: 2016/09/21 14:36:42.796947 [INF] core: vault is unsealed
Sep 21 07:36:42 atomic1.localdomain docker[21884]: 2016/09/21 14:36:42.797117 [INF] core: entering standby mode
Sep 21 07:36:42 atomic1.localdomain bash[21885]: {"sealed":false,"t":1,"n":1,"progress":0,"version":"Vault v0.6.1","cluster_name":"vault-cluster-2e2bf603","cluster_id":"98224978-76
Sep 21 07:36:42 atomic1.localdomain systemd[1]: Started vault service.

curl -sL https://vault1.xxxxxx.org:8200/v1/sys/leader | jq .
{
"ha_enabled": true,
"is_self": false,
"leader_address": ""
}

Node 2 log:

Sep 21 07:26:36 atomic2.localdomain docker[6329]: ==> Vault server configuration:
Sep 21 07:26:36 atomic2.localdomain docker[6329]: Backend: etcd (HA available)
Sep 21 07:26:36 atomic2.localdomain docker[6329]: Listener 1: tcp (addr: "0.0.0.0:8200", tls: "enabled")
Sep 21 07:26:36 atomic2.localdomain docker[6329]: Log Level: info
Sep 21 07:26:36 atomic2.localdomain docker[6329]: Mlock: supported: true, enabled: true
Sep 21 07:26:36 atomic2.localdomain docker[6329]: Redirect Address: https://vault2.xxxxxx.org:8200
Sep 21 07:26:36 atomic2.localdomain docker[6329]: Version: Vault v0.6.1
Sep 21 07:26:36 atomic2.localdomain docker[6329]: ==> Vault server started! Log data will stream in below:
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.283070 [INF] core: vault is unsealed
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.283248 [INF] core: entering standby mode
Sep 21 07:26:37 atomic2.localdomain bash[6330]: {"sealed":false,"t":1,"n":1,"progress":0,"version":"Vault v0.6.1","cluster_name":"vault-cluster-2e2bf603","cluster_id":"98224978-76c
Sep 21 07:26:37 atomic2.localdomain systemd[1]: Started vault service.
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.373143 [INF] core: acquired lock, enabling active operation
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.464273 [INF] core: post-unseal setup starting
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.467159 [INF] core: successfully mounted backend type=generic path=secret/
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.467198 [INF] core: successfully mounted backend type=cubbyhole path=cubbyhole/
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.467305 [INF] core: successfully mounted backend type=system path=sys/
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.467386 [INF] rollback: starting rollback manager
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.474233 [INF] core/startClusterListener: clustering disabled, not starting listeners
Sep 21 07:26:37 atomic2.localdomain docker[6329]: 2016/09/21 14:26:37.474259 [INF] core: post-unseal setup complete

curl -sL https://vault2.xxxxxx.org:8200/v1/sys/leader | jq .
{
"ha_enabled": true,
"is_self": true,
"leader_address": "https://vault2.xxxxxx.org:8200"
}

Node 3:

Sep 21 07:30:09 atomic3.localdomain docker[15139]: ==> Vault server configuration:
Sep 21 07:30:09 atomic3.localdomain docker[15139]: Backend: etcd (HA available)
Sep 21 07:30:09 atomic3.localdomain docker[15139]: Listener 1: tcp (addr: "0.0.0.0:8200", tls: "enabled")
Sep 21 07:30:09 atomic3.localdomain docker[15139]: Log Level: info
Sep 21 07:30:09 atomic3.localdomain docker[15139]: Mlock: supported: true, enabled: true
Sep 21 07:30:09 atomic3.localdomain docker[15139]: Redirect Address: https://vault3.xxxxxx.org:8200
Sep 21 07:30:09 atomic3.localdomain docker[15139]: Version: Vault v0.6.1
Sep 21 07:30:09 atomic3.localdomain docker[15139]: ==> Vault server started! Log data will stream in below:
Sep 21 07:30:10 atomic3.localdomain docker[15139]: 2016/09/21 14:30:10.138376 [INF] core: vault is unsealed
Sep 21 07:30:10 atomic3.localdomain docker[15139]: 2016/09/21 14:30:10.138573 [INF] core: entering standby mode
Sep 21 07:30:10 atomic3.localdomain bash[15140]: {"sealed":false,"t":1,"n":1,"progress":0,"version":"Vault v0.6.1","cluster_name":"vault-cluster-2e2bf603","cluster_id":"98224978-76
Sep 21 07:30:10 atomic3.localdomain systemd[1]: Started vault service.
Sep 21 07:36:48 atomic3.localdomain docker[15139]: 2016/09/21 14:36:48.426826 [INF] core: acquired lock, enabling active operation
Sep 21 07:36:48 atomic3.localdomain docker[15139]: 2016/09/21 14:36:48.526453 [INF] core: post-unseal setup starting
Sep 21 07:36:48 atomic3.localdomain docker[15139]: 2016/09/21 14:36:48.529206 [INF] core: successfully mounted backend type=generic path=secret/
Sep 21 07:36:48 atomic3.localdomain docker[15139]: 2016/09/21 14:36:48.529244 [INF] core: successfully mounted backend type=cubbyhole path=cubbyhole/
Sep 21 07:36:48 atomic3.localdomain docker[15139]: 2016/09/21 14:36:48.529353 [INF] core: successfully mounted backend type=system path=sys/
Sep 21 07:36:48 atomic3.localdomain docker[15139]: 2016/09/21 14:36:48.529475 [INF] rollback: starting rollback manager
Sep 21 07:36:48 atomic3.localdomain docker[15139]: 2016/09/21 14:36:48.536051 [INF] core/startClusterListener: clustering disabled, not starting listeners
Sep 21 07:36:48 atomic3.localdomain docker[15139]: 2016/09/21 14:36:48.536090 [INF] core: post-unseal setup complete

curl -sL https://vault3.xxxxxx.org:8200/v1/sys/leader | jq .
{
"ha_enabled": true,
"is_self": true,
"leader_address": "https://vault3.xxxxxx.org:8200"
}

@jefferai
Copy link
Member

etcd HA workings are known broken (#1184). We're considering taking a similar tactic that we took with DynamoDB and requiring users to explicitly enable it.

jefferai added a commit that referenced this issue Sep 21, 2016
Fixes #1908

(Doesn't really "fix" it but someone from the community needs to step up
if they want to see this fixed.)
jefferai added a commit that referenced this issue Sep 21, 2016
Fixes #1908

(Doesn't really "fix" it but someone from the community needs to step up
if they want to see this fixed.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants