Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agents Stuck in loop on Docker For Windows #366

Open
AllySagaWest opened this issue Mar 30, 2019 · 3 comments
Open

Agents Stuck in loop on Docker For Windows #366

AllySagaWest opened this issue Mar 30, 2019 · 3 comments

Comments

@AllySagaWest
Copy link

Im attempting to run a linux container in Docker For Windows with a kubernetes arango cluster. It starts up but gets stuck in an endless loop of the agents starting in error,terminating,then initializing. I also noticed that the load balancer I have is set to local host instead of the IP to hit the pod from an external source. Im not quite sure what im doing wrong. Running the same commands on a linux machine works fine. Any help would be appreciated. Let me know if you need more info. Logs & screen shots below.

Here is the Yaml to deploy the cluster:
clusterYaml

Here is a Screen shot of the termination:
AgentTerminating

Here is them re-initializing:
AgentInit

The load balancer service:
LoadBalancer

and here is the describe command on the deployment:
Name: arango-cluster
Namespace: default
Labels:
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"database.arangodb.com/v1alpha","kind":"ArangoDeployment","metadata":{"annotations":{},"name":"arango-cluster","namespace":"...
API Version: database.arangodb.com/v1alpha
Kind: ArangoDeployment
Metadata:
Cluster Name:
Creation Timestamp: 2019-03-29T02:06:59Z
Finalizers:
database.arangodb.com/remove-child-finalizers
Generation: 1
Resource Version: 71178
Self Link: /apis/database.arangodb.com/v1alpha/namespaces/default/arangodeployments/arango-cluster
UID: 560ccde7-51c7-11e9-b4f7-00155d001410
Spec:
Agents:
Count: 3
Resources:
Requests:
Storage: 8Gi
Auth:
Jwt Secret Name: arango-cluster-jwt
Chaos:
Interval: 60000000000
Kill - Pod - Probability: 50
Coordinators:
Count: 3
Resources:
Dbservers:
Count: 3
Resources:
Requests:
Storage: 8Gi
Environment: Development
External Access:
Type: LoadBalancer
Image: arangodb/arangodb:3.4.4
Image Pull Policy: IfNotPresent
License:
Mode: Cluster
Rocksdb:
Encryption:
Single:
Resources:
Requests:
Storage: 8Gi
Storage Engine: RocksDB
Sync:
Auth:
Client CA Secret Name: arango-cluster-sync-client-auth-ca
Jwt Secret Name: arango-cluster-sync-jwt
External Access:
Monitoring:
Token Secret Name: arango-cluster-sync-mt
Tls:
Ca Secret Name: arango-cluster-sync-ca
Ttl: 2610h
Syncmasters:
Resources:
Syncworkers:
Resources:
Tls:
Ca Secret Name: arango-cluster-ca
Ttl: 2610h
Status:
Accepted - Spec:
Agents:
Count: 3
Resources:
Requests:
Storage: 8Gi
Auth:
Jwt Secret Name: arango-cluster-jwt
Chaos:
Interval: 60000000000
Kill - Pod - Probability: 50
Coordinators:
Count: 3
Resources:
Dbservers:
Count: 3
Resources:
Requests:
Storage: 8Gi
Environment: Development
External Access:
Type: LoadBalancer
Image: arangodb/arangodb:3.4.4
Image Pull Policy: IfNotPresent
License:
Mode: Cluster
Rocksdb:
Encryption:
Single:
Resources:
Requests:
Storage: 8Gi
Storage Engine: RocksDB
Sync:
Auth:
Client CA Secret Name: arango-cluster-sync-client-auth-ca
Jwt Secret Name: arango-cluster-sync-jwt
External Access:
Monitoring:
Token Secret Name: arango-cluster-sync-mt
Tls:
Ca Secret Name: arango-cluster-sync-ca
Ttl: 2610h
Syncmasters:
Resources:
Syncworkers:
Resources:
Tls:
Ca Secret Name: arango-cluster-ca
Ttl: 2610h
Arangodb - Images:
Arangodb - Version: 3.4.4
Image: arangodb/arangodb:3.4.4
Image - Id: arangodb/arangodb@sha256:56abd87cc340a29f9d60e61a8941afb962b18a22c072fe6b77397fa24d531c04
Conditions:
Last Transition Time: 2019-03-29T02:07:07Z
Last Update Time: 2019-03-29T02:07:07Z
Status: False
Type: Ready
Current - Image:
Arangodb - Version: 3.4.4
Image: arangodb/arangodb:3.4.4
Image - Id: arangodb/arangodb@sha256:56abd87cc340a29f9d60e61a8941afb962b18a22c072fe6b77397fa24d531c04
Members:
Agents:
Conditions:
Last Transition Time: 2019-03-29T02:14:38Z
Last Update Time: 2019-03-29T02:14:38Z
Reason: Pod Not Ready
Status: False
Type: Ready
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Failed
Status: True
Type: Terminated
Last Transition Time: 2019-03-29T02:15:39Z
Last Update Time: 2019-03-29T02:15:39Z
Reason: Pod marked for deletion
Status: True
Type: Terminating
Created - At: 2019-03-29T02:07:01Z
Id: AGNT-bnq8tiee
Initialized: false
Persistent Volume Claim Name: arango-cluster-agent-bnq8tiee
Phase: Created
Pod Name: arango-cluster-agnt-bnq8tiee-128dfb
Recent - Terminations:
2019-03-29T02:08:04Z
2019-03-29T02:08:42Z
2019-03-29T02:09:20Z
2019-03-29T02:09:54Z
2019-03-29T02:10:29Z
2019-03-29T02:11:55Z
2019-03-29T02:13:07Z
2019-03-29T02:15:05Z
Conditions:
Last Transition Time: 2019-03-29T02:14:38Z
Last Update Time: 2019-03-29T02:14:38Z
Reason: Pod Not Ready
Status: False
Type: Ready
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Failed
Status: True
Type: Terminated
Last Transition Time: 2019-03-29T02:15:39Z
Last Update Time: 2019-03-29T02:15:39Z
Reason: Pod marked for deletion
Status: True
Type: Terminating
Created - At: 2019-03-29T02:07:01Z
Id: AGNT-co8m3gqw
Initialized: true
Persistent Volume Claim Name: arango-cluster-agent-co8m3gqw
Phase: Created
Pod Name: arango-cluster-agnt-co8m3gqw-128dfb
Recent - Terminations:
2019-03-29T02:07:59Z
2019-03-29T02:08:38Z
2019-03-29T02:09:16Z
2019-03-29T02:09:49Z
2019-03-29T02:10:29Z
2019-03-29T02:11:55Z
2019-03-29T02:13:07Z
2019-03-29T02:15:05Z
Conditions:
Last Transition Time: 2019-03-29T02:14:38Z
Last Update Time: 2019-03-29T02:14:38Z
Reason: Pod Not Ready
Status: False
Type: Ready
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Failed
Status: True
Type: Terminated
Last Transition Time: 2019-03-29T02:15:39Z
Last Update Time: 2019-03-29T02:15:39Z
Reason: Pod marked for deletion
Status: True
Type: Terminating
Created - At: 2019-03-29T02:07:01Z
Id: AGNT-z8htk4xo
Initialized: false
Persistent Volume Claim Name: arango-cluster-agent-z8htk4xo
Phase: Created
Pod Name: arango-cluster-agnt-z8htk4xo-128dfb
Recent - Terminations:
2019-03-29T02:08:05Z
2019-03-29T02:08:43Z
2019-03-29T02:09:21Z
2019-03-29T02:09:55Z
2019-03-29T02:10:29Z
2019-03-29T02:11:55Z
2019-03-29T02:13:07Z
2019-03-29T02:15:05Z
Coordinators:
Conditions:
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Not Ready
Status: False
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: CRDN-fgfmyrni
Initialized: false
Phase: Created
Pod Name: arango-cluster-crdn-fgfmyrni-128dfb
Recent - Terminations:
2019-03-29T02:14:38Z
Conditions:
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Not Ready
Status: False
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: CRDN-sxgd2w6y
Initialized: false
Phase: Created
Pod Name: arango-cluster-crdn-sxgd2w6y-128dfb
Recent - Terminations:
2019-03-29T02:14:38Z
Conditions:
Last Transition Time: 2019-03-29T02:15:05Z
Last Update Time: 2019-03-29T02:15:05Z
Reason: Pod Not Ready
Status: False
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: CRDN-to1bm0c3
Initialized: false
Phase: Created
Pod Name: arango-cluster-crdn-to1bm0c3-128dfb
Recent - Terminations:
2019-03-29T02:14:38Z
Dbservers:
Conditions:
Last Transition Time: 2019-03-29T02:07:59Z
Last Update Time: 2019-03-29T02:07:59Z
Reason: Pod Ready
Status: True
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: PRMR-32xzf09n
Initialized: true
Persistent Volume Claim Name: arango-cluster-dbserver-32xzf09n
Phase: Created
Pod Name: arango-cluster-prmr-32xzf09n-128dfb
Recent - Terminations:
Conditions:
Last Transition Time: 2019-03-29T02:07:50Z
Last Update Time: 2019-03-29T02:07:50Z
Reason: Pod Ready
Status: True
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: PRMR-cdvjutsj
Initialized: true
Persistent Volume Claim Name: arango-cluster-dbserver-cdvjutsj
Phase: Created
Pod Name: arango-cluster-prmr-cdvjutsj-128dfb
Recent - Terminations:
Conditions:
Last Transition Time: 2019-03-29T02:08:11Z
Last Update Time: 2019-03-29T02:08:11Z
Reason: Pod Ready
Status: True
Type: Ready
Created - At: 2019-03-29T02:07:01Z
Id: PRMR-gdxuutwg
Initialized: true
Persistent Volume Claim Name: arango-cluster-dbserver-gdxuutwg
Phase: Created
Pod Name: arango-cluster-prmr-gdxuutwg-128dfb
Recent - Terminations:
Phase: Running
Plan:
Creation Time: 2019-03-29T02:12:45Z
Group: 4
Id: cJIhbLxsuV4pyc5n
Member ID: CRDN-fgfmyrni
Type: RemoveMember
Creation Time: 2019-03-29T02:12:45Z
Group: 4
Id: U1PGroR95ybUN1Km
Type: AddMember
Secret - Hashes:
Auth - Jwt: eff9b00914cc1a4511de887d056a5c4ef1b324f6b746456ca2443db98bdaeea1
Tls - Ca: d69ded4284c5f832388b3633c3db1453d91514300a07313ddf91f32b31653b45
Service Name: arango-cluster
Events:
Type Reason Age From Message


Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-fgfmyrni added to deployment
Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-co8m3gqw added to deployment
Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-z8htk4xo added to deployment
Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-32xzf09n added to deployment
Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-cdvjutsj added to deployment
Normal New Dbserver Added 36h arango-deployment-operator-8579f476cc-fspdz New dbserver PRMR-gdxuutwg added to deployment
Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-sxgd2w6y added to deployment
Normal New Coordinator Added 36h arango-deployment-operator-8579f476cc-fspdz New coordinator CRDN-to1bm0c3 added to deployment
Normal New Agent Added 36h arango-deployment-operator-8579f476cc-fspdz New agent AGNT-bnq8tiee added to deployment
Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-32xzf09n-128dfb of member dbserver is created
Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-cdvjutsj-128dfb of member dbserver is created
Normal Pod Of Dbserver Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-prmr-gdxuutwg-128dfb of member dbserver is created
Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-fgfmyrni-128dfb of member coordinator is created
Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-sxgd2w6y-128dfb of member coordinator is created
Normal Pod Of Coordinator Created 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-crdn-to1bm0c3-128dfb of member coordinator is created
Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-co8m3gqw-128dfb of member agent is created
Normal Pod Of Agent Gone 36h arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-z8htk4xo-128dfb of member agent is gone
Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-bnq8tiee-128dfb of member agent is created
Normal Pod Of Agent Created 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-z8htk4xo-128dfb of member agent is created
Normal Pod Of Agent Gone 36h (x2 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-co8m3gqw-128dfb of member agent is gone
Normal Pod Of Agent Gone 36h (x6 over 36h) arango-deployment-operator-8579f476cc-fspdz Pod arango-cluster-agnt-bnq8tiee-128dfb of member agent is gone

@maierlars
Copy link
Collaborator

If possible please provide the logs of an terminating agent.

@xelik
Copy link

xelik commented Nov 8, 2019

I have similar problem. I create cluster on k8s (3 worker nodes cluster) and one Agent is in restart loop.

Screenshot_20191108_160504

Operator logs :

2019-11-08T15:26:48Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm
2019-11-08T15:26:49Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm
2019-11-08T15:26:50Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1
2019-11-08T15:26:50Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false
2019-11-08T15:26:51Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm
2019-11-08T15:26:52Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm
2019-11-08T15:26:52Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1
2019-11-08T15:26:52Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false
2019-11-08T15:26:53Z DBG ...inspected deployment component=deployment deployment=arangodb-cluster interval=1s operator-id=jddlm
2019-11-08T15:26:54Z DBG Inspect deployment... component=deployment deployment=arangodb-cluster operator-id=jddlm
2019-11-08T15:26:55Z DBG Not all agents are ready error="Agent http://arangodb-cluster-agent-abtwb2ag.arangodb-cluster-int.test.svc:8529 is not responding" action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1
2019-11-08T15:26:55Z DBG Action CheckProgress completed abort=false action-id=hOXp6q2P2aNuhvh0 action-type=WaitForMemberUp component=deployment deployment=arangodb-cluster group=agent member-id=AGNT-abtwb2ag operator-id=jddlm plan-len=1 ready=false

logs from restarting Pod:

2019-11-08T16:02:17Z [1] INFO [e52b0] ArangoDB 3.5.0 [linux] 64bit, using jemalloc, build tags/v3.5.0-0-gc42dbe8547, VPack 0.1.33, RocksDB 6.2.0, ICU 58.1, V8 7.1.302.28, OpenSSL 1.1.0k 28 May 2019
2019-11-08T16:02:17Z [1] INFO [75ddc] detected operating system: Linux version 3.10.0-1062.4.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Oct 18 17:15:30 UTC 2019
2019-11-08T16:02:17Z [1] WARNING [118b0] {memory} maximum number of memory mappings per process is 262144, which seems too low. it is recommended to set it to at least 512000
2019-11-08T16:02:17Z [1] WARNING [49528] {memory} execute 'sudo sysctl -w "vm.max_map_count=512000"'
2019-11-08T16:02:17Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/enabled is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2019-11-08T16:02:17Z [1] WARNING [e8b68] {memory} /sys/kernel/mm/transparent_hugepage/defrag is set to 'always'. It is recommended to set it to a value of 'never' or 'madvise'
2019-11-08T16:02:17Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/enabled"'
2019-11-08T16:02:17Z [1] WARNING [f3108] {memory} execute 'sudo bash -c "echo madvise > /sys/kernel/mm/transparent_hugepage/defrag"'
2019-11-08T16:02:17Z [1] DEBUG [63a7a] host ASLR is in use for shared libraries, stack, mmap, VDSO, heap and memory managed through brk()
2019-11-08T16:02:17Z [1] DEBUG [713c0] {authentication} Not creating user manager
2019-11-08T16:02:17Z [1] DEBUG [71a76] {authentication} Setting jwt secret of size 64
2019-11-08T16:02:17Z [1] INFO [144fe] using storage engine rocksdb
2019-11-08T16:02:17Z [1] INFO [3bb7d] {cluster} Starting up with role AGENT
2019-11-08T16:02:17Z [1] INFO [a1c60] {syscall} file-descriptors (nofiles) hard limit is 1048576, soft limit is 1048576
2019-11-08T16:02:17Z [1] DEBUG [f6e04] {config} using default language 'en_US'
2019-11-08T16:02:17Z [1] INFO [3844e] {authentication} Authentication is turned on (system only), authentication for unix sockets is turned on
2019-11-08T16:02:17Z [1] DEBUG [f6e04] {config} using default language 'en_US'
2019-11-08T16:02:23Z [1] INFO [e6460] created base application directory '/var/lib/arangodb3-apps/_db'
2019-11-08T16:02:23Z [1] INFO [6ea38] using endpoint 'http+tcp://[::]:8529' for non-encrypted requests
2019-11-08T16:02:23Z [1] DEBUG [dc45a] bound to endpoint 'http+tcp://[::]:8529'
2019-11-08T16:02:23Z [1] INFO [cf3f4] ArangoDB (version 3.5.0 [linux]) is ready for business. Have fun!
2019-11-08T16:02:23Z [1] INFO [d7476] {agency} Restarting agent from persistence ...
2019-11-08T16:02:23Z [1] INFO [d96f6] {agency} Found active RAFTing agency lead by AGNT-axody2ec. Finishing startup sequence.
2019-11-08T16:02:23Z [1] INFO [fe299] {agency} Constituent::update: setting _leaderID to 'AGNT-axody2ec' in term 9
2019-11-08T16:02:23Z [1] INFO [79fd7] {agency} Activating agent.
2019-11-08T16:02:23Z [1] INFO [29175] {agency} Setting role to follower in term 9
2019-11-08T16:02:29Z [1] INFO [aefab] {agency} AGNT-abtwb2ag: candidating in term 9
2019-11-08T16:02:29Z [1] DEBUG [74339] accept failed: Operation canceled
2019-11-08T16:02:30Z [1] INFO [4bcb9] ArangoDB has been shut down

Limit memory for Agent pod is set to 2Gi

@iMoses
Copy link

iMoses commented Jun 10, 2020

same issue.. anyone found a solution for it?
it started when i tried to set the dbservers.count to 0 and now its stuck in a restart loop and no matter what i do i can't get it to stop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants