Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8S Cluster Deployment Failed with "Flaky agency communication to http+ssl://cluster-agent-1vmeax62.cluster-int.default.svc:8529. Unsuccessful" #539

Closed
IsurangaPerera opened this issue Mar 26, 2020 · 2 comments
Assignees

Comments

@IsurangaPerera
Copy link

IsurangaPerera commented Mar 26, 2020

I tried to deploy kube-arangodb as a cluster on K8S. All agent and primary pods were deployed successfully. However, I cannot use the cluster as crdn pods are not being initialled properly.

Please find the logs observed in crdn pods below.

2020-03-26T10:31:09Z [1] INFO [e52b0] ArangoDB 3.6.0 [linux] 64bit, using jemalloc, build tags/v3.6.0-0-g08785b946a, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d  10 Sep 2019
2020-03-26T10:31:09Z [1] INFO [75ddc] detected operating system: Linux version 4.4.0-173-generic (buildd@lgw01-amd64-037) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) ) #203-Ubuntu SMP Wed Jan 15 02:55:01 UTC 2020
2020-03-26T10:31:09Z [1] WARNING [118b0] {memory} maximum number of memory mappings per process is 65530, which seems too low. it is recommended to set it to at least 512000
2020-03-26T10:31:09Z [1] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=512000"'
2020-03-26T10:31:09Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2020-03-26T10:31:09Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"'
2020-03-26T10:31:09Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"'
2020-03-26T10:31:09Z [1] INFO [4a3fc] using storage engine rocksdb
2020-03-26T10:31:09Z [1] INFO [3bb7d] {cluster} Starting up with role COORDINATOR
2020-03-26T10:32:59Z [1] INFO [2f181] {agencycomm} Flaky agency communication to http+ssl://cluster-agent-sjafrn4j.cluster-int.default.svc:8529. Unsuccessful consecutive tries: 21 (109.95s). Network checks advised. Server in prepare.
2020-03-26T10:33:05Z [1] INFO [2f181] {agencycomm} Flaky agency communication to http+ssl://cluster-agent-1vmeax62.cluster-int.default.svc:8529. Unsuccessful consecutive tries: 22 (115.94s). Network checks advised. Server in prepare.
2020-03-26T10:35:01Z [1] INFO [2f181] {agencycomm} Flaky agency communication to http+ssl://cluster-agent-1vmeax62.cluster-int.default.svc:8529. Unsuccessful consecutive tries: 21 (109.95s). Network checks advised. Server in prepare.
2020-03-26T10:35:07Z [1] INFO [2f181] {agencycomm} Flaky agency communication to http+ssl://cluster-agent-7gkuvt73.cluster-int.default.svc:8529. Unsuccessful consecutive tries: 22 (115.95s). Network checks advised. Server in prepare.

Please find logs of each agent pod below:

Agent Pod 01

2020-03-26T09:59:57Z [1] INFO [e52b0] ArangoDB 3.6.0 [linux] 64bit, using jemalloc, build tags/v3.6.0-0-g08785b946a, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d  10 Sep 2019
2020-03-26T09:59:57Z [1] INFO [75ddc] detected operating system: Linux version 4.4.0-173-generic (buildd@lgw01-amd64-037) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) ) #203-Ubuntu SMP Wed Jan 15 02:55:01 UTC 2020
2020-03-26T09:59:57Z [1] WARNING [118b0] {memory} maximum number of memory mappings per process is 65530, which seems too low. it is recommended to set it to at least 512000
2020-03-26T09:59:57Z [1] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=512000"'
2020-03-26T09:59:57Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2020-03-26T09:59:57Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"'
2020-03-26T09:59:57Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"'
2020-03-26T09:59:57Z [1] INFO [144fe] using storage engine 'rocksdb'
2020-03-26T09:59:57Z [1] INFO [3bb7d] {cluster} Starting up with role AGENT
2020-03-26T09:59:57Z [1] INFO [6ea38] using endpoint 'http+ssl://[::]:8529' for ssl-encrypted requests
2020-03-26T09:59:57Z [1] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576
2020-03-26T09:59:57Z [1] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2020-03-26T09:59:57Z [1] WARNING [ad4b2] found existing lockfile '/data/LOCK' of previous process with pid 1, and that process seems to be still running
2020-03-26T09:59:57Z [1] INFO [e6460] created base application directory '/var/lib/arangodb3-apps/_db'
2020-03-26T09:59:57Z [1] INFO [cf3f4] ArangoDB (version 3.6.0 [linux]) is ready for business. Have fun!
2020-03-26T09:59:57Z [1] INFO [d7476] {agency} Restarting agent from persistence ...
2020-03-26T09:59:57Z [1] INFO [9530f] {agency} Found majority of agents in agreement over active pool. Finishing startup sequence.
2020-03-26T09:59:57Z [1] INFO [79fd7] {agency} Activating agent.

Agent Pod 02

2020-03-26T10:00:50Z [1] INFO [e52b0] ArangoDB 3.6.0 [linux] 64bit, using jemalloc, build tags/v3.6.0-0-g08785b946a, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d  10 Sep 2019
2020-03-26T10:00:50Z [1] INFO [75ddc] detected operating system: Linux version 4.4.0-139-generic (buildd@lcy01-amd64-006) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10) ) #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018
2020-03-26T10:00:50Z [1] WARNING [118b0] {memory} maximum number of memory mappings per process is 262144, which seems too low. it is recommended to set it to at least 512000
2020-03-26T10:00:50Z [1] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=512000"'
2020-03-26T10:00:50Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2020-03-26T10:00:50Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"'
2020-03-26T10:00:50Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"'
2020-03-26T10:00:50Z [1] INFO [144fe] using storage engine 'rocksdb'
2020-03-26T10:00:50Z [1] INFO [3bb7d] {cluster} Starting up with role AGENT
2020-03-26T10:00:50Z [1] INFO [6ea38] using endpoint 'http+ssl://[::]:8529' for ssl-encrypted requests
2020-03-26T10:00:50Z [1] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576
2020-03-26T10:00:50Z [1] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2020-03-26T10:00:50Z [1] INFO [e6460] created base application directory '/var/lib/arangodb3-apps/_db'
2020-03-26T10:00:50Z [1] INFO [cf3f4] ArangoDB (version 3.6.0 [linux]) is ready for business. Have fun!
2020-03-26T10:00:50Z [1] INFO [7b6f3] {agency} Entering gossip phase ...
 2020-03-26T10:00:50Z [1] INFO [95b8d] {agency} Adding PRMR-prm3mcef(ssl://cluster-agent-sjafrn4j.cluster-int.default.svc:8529) to agent pool

Agent Pod 03

2020-03-26T10:00:40Z [1] INFO [e52b0] ArangoDB 3.6.0 [linux] 64bit, using jemalloc, build tags/v3.6.0-0-g08785b946a, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.1d  10 Sep 2019
2020-03-26T10:00:40Z [1] INFO [75ddc] detected operating system: Linux version 4.4.0-171-generic (buildd@lcy01-amd64-018) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) ) #200-Ubuntu SMP Tue Dec 3 11:04:55 UTC 2019
2020-03-26T10:00:40Z [1] WARNING [118b0] {memory} maximum number of memory mappings per process is 262144, which seems too low. it is recommended to set it to at least 512000
2020-03-26T10:00:40Z [1] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=512000"'
2020-03-26T10:00:40Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2020-03-26T10:00:40Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"'
2020-03-26T10:00:40Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"'
2020-03-26T10:00:40Z [1] INFO [144fe] using storage engine 'rocksdb'
2020-03-26T10:00:40Z [1] INFO [3bb7d] {cluster} Starting up with role AGENT
2020-03-26T10:00:40Z [1] INFO [6ea38] using endpoint 'http+ssl://[::]:8529' for ssl-encrypted requests
2020-03-26T10:00:40Z [1] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576
2020-03-26T10:00:40Z [1] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2020-03-26T10:00:40Z [1] INFO [e6460] created base application directory '/var/lib/arangodb3-apps/_db'
2020-03-26T10:00:40Z [1] INFO [cf3f4] ArangoDB (version 3.6.0 [linux]) is ready for business. Have fun!
2020-03-26T10:00:40Z [1] INFO [7b6f3] {agency} Entering gossip phase ...
2020-03-26T10:00:40Z [1] INFO [95b8d] {agency} Adding AGNT-7gkuvt73(ssl://cluster-agent-7gkuvt73.cluster-int.default.svc:8529) to agent pool

@ajanikow @informalict @maierlars Appreciate your help on the issue

@IsurangaPerera IsurangaPerera changed the title Flaky agency communication to http+ssl://cluster-agent-1vmeax62.cluster-int.default.svc:8529. Unsuccessful consecutive tries: 22 (115.94s). Network checks advised. Server in prepare K8S Cluster Deployment Failed with "Flaky agency communication to http+ssl://cluster-agent-1vmeax62.cluster-int.default.svc:8529. Unsuccessful" Mar 26, 2020
@ajanikow ajanikow self-assigned this Mar 26, 2020
@ajanikow
Copy link
Collaborator

Hello!

It looks like you have split in your network. Can you exec to crdn pod and try curl any agency pods?

Also share with us what network settings are you using for pods.

Best Regards,
Adam.

@IsurangaPerera
Copy link
Author

Thanks, @ajanikow.
Sorry this was an issue from my side. Hence closing the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants