Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.3] Docker / Queue process unexpectedly 0 exits with no errors logged #17183

Closed
pvormittag opened this issue Jan 6, 2017 · 6 comments
Closed

Comments

@pvormittag
Copy link

  • Laravel Version: 5.3.23 (Lumen 5.3)
  • PHP Version: 7.0.13
  • Database Driver & Version: MySql 5.7.11
  • OS: Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-53-generic x86_64)
  • Docker Version: 1.12.4, build 1564f02

Description:

I have Laravel queue worker processes running inside a Docker container on AWS (EC2 Ubuntu 16.4 instance). The containers will run for 4 to 5 days, then unexpectedly exit with a 0 code. Viewing the logs shows no Exceptions thrown from the app. Because the container is run with a restart policy of on-failure and the process is exiting with 0, Docker will not restart the container.

There are no resource constraints currently on the containers and the EC2 instance is not reporting any high resource utilization.

Interestingly, I have a similar stack as this one but is running Laravel 5.2 and does not suffer from this issue. My assumption is this may be due to the changes made to the Queuing logic in 5.3.

I've removed or replaced any sensitive information with ******.

The docker-entrypoint.sh file, which Docker executes as ENTRYPOINT is:

: "${APP_ENV:=local}"
: "${QUEUE_TRIES:=0}"
: "${QUEUE_SLEEP:=3}"
: "${QUEUE_DELAY:=0}"
: "${MEMORY_LIMIT:=256}"
: "${QUEUE_TIMEOUT:=60}"

if [ -z "$QUEUE" ]; then
  echo >&2 'error: A queue to listen on is not specified '
  echo >&2 'You need to specify QUEUE'
  exit 1
fi

exec php /var/www/html/artisan queue:work \
  --queue=$QUEUE \
  --env=$APP_ENV \
  --tries=$QUEUE_TRIES \
  --delay=$QUEUE_DELAY \
  --sleep=$QUEUE_SLEEP \
  --memory=$MEMORY_LIMIT \
  --timeout=$QUEUE_TIMEOUT \
  --no-interaction

Notable Environment settings.

                "APP_ENV=production",
                "QUEUE=*****",
                "QUEUE_TRIES=1",
                "MEMORY_LIMIT=256",
                "QUEUE_DRIVER=sqs",
                "QUEUE_TIMEOUT=600",
                "APP_DEBUG=false",
                "APP_LOG=errorlog"

Logs from the container and information about the status of the container.

$ docker logs ubuntu_pricing_transactions_worker_5
[2016-12-16 18:15:57] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 18:15:57] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 18:15:58] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 18:15:58] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 18:19:43] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 18:19:43] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 18:19:47] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 20:33:00] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 20:40:24] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-16 21:15:20] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-20 14:00:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-20 14:00:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-20 14:00:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-20 20:00:18] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-21 18:36:32] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 14:00:15] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 14:00:15] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 14:00:15] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 14:00:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 20:00:14] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 20:00:15] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 20:00:15] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 20:00:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 20:00:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-23 20:00:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:15] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:15] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:16] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:17] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:17] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:18] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:18] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:19] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:19] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:21] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:21] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:21] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:22] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-27 17:42:23] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-28 14:00:14] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-28 20:00:13] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-28 20:00:14] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-29 14:00:13] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-31 20:00:14] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2016-12-31 20:00:18] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-02 18:44:24] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-02 18:44:24] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-02 18:44:25] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:46] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:46] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:47] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:47] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:47] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:47] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:48] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:48] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-04 21:35:49] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
[2017-01-05 14:00:19] Processed: App\Jobs\Multifamily\Pricing\RunTransaction
$ docker inspect ubuntu_pricing_transactions_worker_5
[
    {
        "Id": "be19bcd78db06b66ffa6aa1032e35662b0d9fbf082854866b0c66c0acdda9d2c",
        "Created": "2016-12-16T17:50:26.84270574Z",
        "Path": "docker-entrypoint.sh",
        "Args": [],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2017-01-02T18:44:23.519963601Z",
            "FinishedAt": "2017-01-06T15:03:00.663556969Z"
        },
        "Image": "sha256:d30dfbff04bc9d16e91efb6adb2fc1aaa461b07449690efd22c3c3c742e2c203",
        "ResolvConfPath": "/var/lib/docker/containers/be19bcd78db06b66ffa6aa1032e35662b0d9fbf082854866b0c66c0acdda9d2c/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/be19bcd78db06b66ffa6aa1032e35662b0d9fbf082854866b0c66c0acdda9d2c/hostname",
        "HostsPath": "/var/lib/docker/containers/be19bcd78db06b66ffa6aa1032e35662b0d9fbf082854866b0c66c0acdda9d2c/hosts",
        "LogPath": "/var/lib/docker/containers/be19bcd78db06b66ffa6aa1032e35662b0d9fbf082854866b0c66c0acdda9d2c/be19bcd78db06b66ffa6aa1032e35662b0d9fbf082854866b0c66c0acdda9d2c-json.log",
        "Name": "/ubuntu_pricing_transactions_worker_5",
        "RestartCount": 0,
        "Driver": "aufs",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "ubuntu_default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "on-failure",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": [],
            "CapAdd": null,
            "CapDrop": null,
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Name": "aufs",
            "Data": null
        },
        "Mounts": [],
        "Config": {
            "Hostname": "be19bcd78db0",
            "Domainname": "*****",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "9000/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [],
            "Cmd": null,
            "Image": "*****",
            "Volumes": null,
            "WorkingDir": "/var/www/html",
            "Entrypoint": [
                "docker-entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "com.docker.compose.config-hash": "210a06591101da05fb1455db832f30b4ce3d114049352ffb701693becdf320d0",
                "com.docker.compose.container-number": "5",
                "com.docker.compose.oneoff": "False",
                "com.docker.compose.project": "ubuntu",
                "com.docker.compose.service": "pricing_transactions_worker",
                "com.docker.compose.version": "1.9.0"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "a25749da93ae10d83677d9b7c82fcca1d337121d51eb3c312e9c69cf0b58fd0e",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": null,
            "SandboxKey": "/var/run/docker/netns/a25749da93ae",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "ubuntu_default": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "be19bcd78db0",
                        "pricing_transactions_worker"
                    ],
                    "NetworkID": "c8a70836ff00d8317da76e241219e0e5281ff66f488fc73438fca73188e8eae9",
                    "EndpointID": "",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": ""
                }
            }
        }
    }
]

Steps To Reproduce:

N/A

@GrahamCampbell
Copy link
Member

This is the expected behaviour. They "restart" when they've used enough memory that they need a reboot. By "restart", I mean that just quit, and they expect that someone else will restart them.

It's normal to use something like supervisor to manage that.

@pvormittag
Copy link
Author

Thanks for the explanation @GrahamCampbell.

They "restart" when they've used enough memory that they need a reboot.

Is there any signaling from the process prior to it exiting that it needs to restart? I would argue a process exhausting its memory allocation – either a hard or soft memory limit – should not exit with a 0 status. The 0 exit is also hard to get around in the case of container-based deployments, as Docker relies on the exit code for its restart policies.

If I use an always restart policy with Docker, I can ensure the queue workers are always running, same as what Supervisor achieves (e.g. Process Management). However, then I do not have a means of "draining" the workers in order to deploy updated container instances without forcefully stoping them, which is not a solution in my case.

Currently in both my 5.3 and 5.2 environments, an automated deployment script does the following:

  1. docker pull the container image – tagged as passing by the CI system.
  2. Call docker exec -ti previous-container-name php artisan queue:restart.
  3. Poll docker until the container reports it has "exited" (docker inspect). Forcefully stop the container if a timeout period is reached.
  4. docker rm container-name to remove the previous container instance.
  5. docker run -d ... container-name to start a new container instance from the new image.

This procedure effectively "drains" any queue worker before updating and restarting.

As it stands I'm not sure of a solution to this problem given the current behavior. Can we reopen this thread for further discussion? Again, I'd argue exiting with 0 is inappropriate.

Any ideas from the community would be appreciated.

@pvormittag
Copy link
Author

After reviewing the source, I do believe this should be addressed. Please reopen the issue.

The daemon() method of the Illuminate\Queue\Worker class creates the daemon loop and if either $this->memoryExceeded($options->memory) or $this->queueShouldRestart($lastRestart)) returns true, the stop() method is called.

stop() is where the issue lies.

stop() simply triggers an event then calls die. As noted by the PHP documentation, die is functionally equivalent to exit(), which explicitly is exit(0).

There is currently no way of changing this behavior and there should be for reasons I mentioned in my previous comment. A few possible solutions are:

  1. Change die to exit(1). This is the simple fix, however changes the current behavior which may cause unexpected results and is not backward compatible.

  2. Add an --exit-code option to the queue:work command which allows setting the exit code. This allows configuration by users which require a different exit code – in most cases being 1. The default can be 0 to maintain current behavior (backward compatible). This requires a change to the stop() method signature in order to pass the exit code value. This method is public, however when searching through files in the src/Illuminate directory for stop(), it doesn't appear to be called from outside the class?

  3. Add a --fail-on-exit flag to the queue:work command which sets the exit code to 1, otherwise the exit code is 0. This is similar to option 2, but assumes the most likely use case and removes decision for the user about what the exit code should be, since its typical convention for any other value than 0 to be considered an error in POSIX-compatible systems.

https://en.wikipedia.org/wiki/Exit_status

I favor option 2 as it as it allows for configuration of the exit code and is backward compatible. Feedback is welcome and I'm happy to provide a PR if we agree on a resolution.

@pvormittag
Copy link
Author

I've realized in 5.2 this is the same behavior – exiting with 0.

Can we reopen this thread and further the discussion @GrahamCampbell @taylorotwell?

Again, I'm happy to contribute a PR if we decide on how to resolve this.

@marcvdm
Copy link
Contributor

marcvdm commented Feb 8, 2017

Just for future reference #17302 fixed this.

@pvormittag
Copy link
Author

Great, thanks @marcvdm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants