Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime error: index out of range #1237

Closed
cgouteix opened this issue Feb 16, 2018 · 8 comments
Closed

runtime error: index out of range #1237

cgouteix opened this issue Feb 16, 2018 · 8 comments
Assignees
Milestone

Comments

@cgouteix
Copy link

Summary

After upgrading to 1.17.0, the agent container is enable to start on some randomly instances.

Description

I'm using ECS and an EC2 Autoscaling Group. All my containers have docker's healthcheck on.

I'm using ami-63cbae0c (amzn-ami-2017.09.h-amazon-ecs-optimized) in eu-central-1. I was on ami-eacf5d85 (amzn-ami-2017.09.f-amazon-ecs-optimized) with no such trouble.

After upgrading to 1.17.0 I get this behavior randomly (some EC2 instances and all):

  • Tasks could not be started with this event message : service xxx was unable to place a task because no container instance met all of its requirements. The closest matching container-instance xxx encountered error "AGENT". For more information, see the Troubleshooting section.
  • In the ECS Instances tab, the "Agent Connected" column says false for the previously mentionned instance
  • After login in the instance, I can see that the ecs-agent container is perpetually restarting. Its last log message is panic: runtime error: index out of range (see the trace bellow)

Environment Details

$ docker info
Containers: 35
 Running: 29
 Paused: 0
 Stopped: 6
Images: 6
Server Version: 17.09.1-ce
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3kB
 Base Device Size: 10.74GB
 Backing Filesystem: ext4
 Data file: 
 Metadata file: 
 Data Space Used: 2.904GB
 Data Space Total: 23.33GB
 Data Space Available: 20.42GB
 Metadata Space Used: 2.707MB
 Metadata Space Total: 25.17MB
 Metadata Space Available: 22.46MB
 Thin Pool Minimum Free Space: 2.333GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.77-31.58.amzn1.x86_64
Operating System: Amazon Linux AMI 2017.09
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: ip-10-93-4-161
ID: YFQZ:3NTN:N2RV:LFFW:UTLS:SWUY:5UES:324C:ZB5I:67LA:TI76:NMAA
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
$ curl http://localhost:51678/v1/metadata
{"Cluster":"xxx","ContainerInstanceArn":"arn:aws:ecs:eu-central-1:xxx","Version":"Amazon ECS Agent - v1.17.0 (761937f7)"}
$ df -h
Sys. de fichiers Taille Utilisé Dispo Uti% Monté sur
/dev/xvda1         7,8G    769M  7,0G  10% /
devtmpfs           7,9G    196K  7,9G   1% /dev
tmpfs              7,9G       0  7,9G   0% /dev/shm

Supporting Log Snippets

Very hard to catch, because the agent container is recreated every few seconds:

$ docker logs -f ecs-agent
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/aws/amazon-ecs-agent/agent/engine.metadataFromContainer(0xc4204f2000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_client.go:792 +0x7b2
github.com/aws/amazon-ecs-agent/agent/engine.(*dockerGoClient).DescribeContainer(0xc420281b80, 0xc4201cb8c0, 0x40, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_client.go:590 +0x101
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).synchronizeContainerStatus(0xc4202a8000, 0xc42029bb90, 0xc4202a8140)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:349 +0x5c7
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).synchronizeState(0xc4202a8000)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:276 +0x295
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).Init(0xc4202a8000, 0x1002040, 0xc42001a028, 0x2, 0x410bf7)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:199 +0xf6
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).MustInit.func1(0x0, 0x0)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:227 +0x56
github.com/aws/amazon-ecs-agent/agent/utils.RetryWithBackoffCtx(0x1002040, 0xc42001a028, 0xffda00, 0xc420313500, 0xc4200ebc68, 0xc4200ebc01, 0xc420313500)
	/go/src/github.com/aws/amazon-ecs-agent/agent/utils/utils.go:130 +0x87
github.com/aws/amazon-ecs-agent/agent/utils.RetryWithBackoff(0xffda00, 0xc420313500, 0xc4200ebc68, 0xf, 0xc4203a6200)
	/go/src/github.com/aws/amazon-ecs-agent/agent/utils/utils.go:113 +0x57
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).MustInit(0xc4202a8000, 0x1002040, 0xc42001a028)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:223 +0x151
github.com/aws/amazon-ecs-agent/agent/app.(*ecsAgent).doStart(0xc420203e00, 0xc4201fc770, 0x1001180, 0xc420292980, 0x1006780, 0xc4201da0f0, 0x1003c40, 0xc4201f2fc0, 0x1003280, 0xc4201da1e0, ...)
	/go/src/github.com/aws/amazon-ecs-agent/agent/app/agent.go:285 +0x42c
github.com/aws/amazon-ecs-agent/agent/app.(*ecsAgent).start(0xc420203e00, 0xc42001a028)
	/go/src/github.com/aws/amazon-ecs-agent/agent/app/agent.go:199 +0x339
github.com/aws/amazon-ecs-agent/agent/app.Run(0xc42000e180, 0x0, 0x0, 0x0)
	/go/src/github.com/aws/amazon-ecs-agent/agent/app/run.go:64 +0x15b
main.main()
	/go/src/github.com/aws/amazon-ecs-agent/agent/agent.go:29 +0x63
@cgouteix
Copy link
Author

it is happening again
in my pre-production environment

$ docker logs -f ecs-agent
2018-02-16T17:05:16Z [INFO] Loading configuration
2018-02-16T17:05:16Z [INFO] Loading state! module="statemanager"
2018-02-16T17:05:16Z [INFO] Event stream ContainerChange start listening...
2018-02-16T17:05:16Z [INFO] Restored cluster 'cmdpre'
2018-02-16T17:05:16Z [INFO] Restored from checkpoint file. I am running as 'arn:aws:ecs:eu-central-1:559122992834:container-instance/73a4c4aa-a1de-4030-b709-6e2020fc137f' in cluster 'cmdpre'
2018-02-16T17:05:16Z [INFO] Registered container instance with cluster!
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/aws/amazon-ecs-agent/agent/engine.metadataFromContainer(0xc4205de600, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_client.go:792 +0x7b2
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).synchronizeContainerStatus(0xc4202b2000, 0xc4202856e0, 0xc4202b2640)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:337 +0x336
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).synchronizeState(0xc4202b2000)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:276 +0x295
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).Init(0xc4202b2000, 0x1002040, 0xc42001a028, 0x2, 0x410bf7)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:199 +0xf6
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).MustInit.func1(0x0, 0x0)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:227 +0x56
github.com/aws/amazon-ecs-agent/agent/utils.RetryWithBackoffCtx(0x1002040, 0xc42001a028, 0xffda00, 0xc4204ab410, 0xc420347c68, 0xc420347c01, 0xc4204ab410)
	/go/src/github.com/aws/amazon-ecs-agent/agent/utils/utils.go:130 +0x87
github.com/aws/amazon-ecs-agent/agent/utils.RetryWithBackoff(0xffda00, 0xc4204ab410, 0xc420347c68, 0xf, 0xc42027dc40)
	/go/src/github.com/aws/amazon-ecs-agent/agent/utils/utils.go:113 +0x57
github.com/aws/amazon-ecs-agent/agent/engine.(*DockerTaskEngine).MustInit(0xc4202b2000, 0x1002040, 0xc42001a028)
	/go/src/github.com/aws/amazon-ecs-agent/agent/engine/docker_task_engine.go:223 +0x151
github.com/aws/amazon-ecs-agent/agent/app.(*ecsAgent).doStart(0xc420276e00, 0xc42019a930, 0x1001180, 0xc42025f720, 0x1006780, 0xc420066c80, 0x1003c40, 0xc4202a81b0, 0x1003280, 0xc420066d70, ...)
	/go/src/github.com/aws/amazon-ecs-agent/agent/app/agent.go:285 +0x42c
github.com/aws/amazon-ecs-agent/agent/app.(*ecsAgent).start(0xc420276e00, 0xc42001a028)
	/go/src/github.com/aws/amazon-ecs-agent/agent/app/agent.go:199 +0x339
github.com/aws/amazon-ecs-agent/agent/app.Run(0xc42000e180, 0x0, 0x0, 0x0)
	/go/src/github.com/aws/amazon-ecs-agent/agent/app/run.go:64 +0x15b
main.main()
	/go/src/github.com/aws/amazon-ecs-agent/agent/agent.go:29 +0x63
$ docker info
Containers: 31
 Running: 20
 Paused: 0
 Stopped: 11
Images: 5
Server Version: 17.09.1-ce
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3kB
 Base Device Size: 10.74GB
 Backing Filesystem: ext4
 Data file: 
 Metadata file: 
 Data Space Used: 2.4GB
 Data Space Total: 23.33GB
 Data Space Available: 20.93GB
 Metadata Space Used: 2.392MB
 Metadata Space Total: 25.17MB
 Metadata Space Available: 22.77MB
 Thin Pool Minimum Free Space: 2.333GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.77-31.58.amzn1.x86_64
Operating System: Amazon Linux AMI 2017.09
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.798GiB
Name: ip-10-93-5-190
ID: YX3L:UVEW:F2GM:PW73:ZPH7:C4BC:FB3U:3P22:CUUM:UJU2:YPM2:VXTX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false
$ curl http://localhost:51678/v1/metadata
curl: (7) Failed to connect to localhost port 51678: Connexion refusée

@adnxn adnxn added the kind/bug label Feb 16, 2018
@adnxn
Copy link
Contributor

adnxn commented Feb 16, 2018

@cgouteix good catch, thanks for bringing this to our attention.

@adnxn
Copy link
Contributor

adnxn commented Feb 16, 2018

also -

I'm not able to reproduce this error on my end and some more logging data would be very helpful for debugging this issue.

Do you mind sharing your container image metadata as well? Specifically I'm curious about the healthcheck command and how the output from that is interacting with the agent. Would you be able to send me logs from the problematic instance? If this is easily reproducible for you, I'd also suggest running the agent with ECS_LOGLEVEL=debug.

You can capture logs using our log collection tool and email them to me directly at adnkha at amazon dot com. If you end up sending the logs to my email, please update this issue so I don't miss them.

Thanks 😄

@adnxn adnxn self-assigned this Feb 16, 2018
@adnxn adnxn added this to the 1.17.1 milestone Feb 16, 2018
@cgouteix
Copy link
Author

I'll try to reproduced it and give you more info. But for now I had to roll back to the previous ami.

I think maybe it has something to do with Scheduled Tasks. I use them a lot.

@cgouteix
Copy link
Author

So it seems that I was right.
It happens with scheduled tasks.
I can't say if ecs-agent crashes after the task started or after it ended, but it did crash.
And I was able to let the ecs-agent restart after deleting the stopped container that was created for the scheduled task.
I'll send you my ecs logs by email, but I don't think it can help much.

@dynaum
Copy link

dynaum commented Feb 19, 2018

We are facing the same problem here, but this error occurs when agent try to stop a task with two containers.

@garrickstapleszed
Copy link

+1

I'm seeing a lot of these after upgrading to 1.17.0. I'm anxiously awaiting 1.17.1.

@adnxn
Copy link
Contributor

adnxn commented Feb 23, 2018

The latest release 1.17.1 includes a fix for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants