Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance of PHP(+xdebug) after sleep and waking up because of clock changing from TSC to HPET #3455

Closed
2 tasks done
anton-siardziuk opened this issue Jan 12, 2019 · 73 comments

Comments

@anton-siardziuk
Copy link

anton-siardziuk commented Jan 12, 2019

  • I have tried with the latest version of my channel (Stable or Edge)
  • I have uploaded Diagnostics
  • Diagnostics ID: 327AD075-DF8F-491F-AF4D-200FBC3FB830/20190112205307

Expected behavior

Test scenario should always take 30s to complete

Actual behavior

Test scenario takes 30s to complete after Docker for Mac restart, and takes 1m30s-2m to complete after sleep and wakeup.

Information

  • macOS Version: 10.14.1 (18B75)

I have some PHP test scenario for my application I can run to check how good or bad performance is. I also have a linux laptop to compare with. This test in docker on linux completes in 30s. When I restart Docker for Mac, it runs 30s on Docker for Mac as well, but after I put MacBook to sleep and wake it up, the very same test takes 1m30s-2m. And it would take this long until I restart Docker for Mac. After restart, this test takes 30s again. I can reproduce it very reliably every time I try. The very same issue was reported a year ago here: EugenMayer/docker-sync#524

Diagnostic logs

Docker for Mac: Version 2.0.0.0-mac81 (29211)

Steps to reproduce the behavior

  1. Run something with fixed run time, see time X
  2. Put Mac to sleep, wake it up
  3. Run it again, see time 3X
  4. Restart Docker for Mac
  5. Run it again, see time X
@anton-siardziuk
Copy link
Author

I started working on a way to easily reproduce this issue outside of my project. My first call was to try to run some C compilation so I created docker-compose.yml file like this:

version: '3.3'
services:
  benchmark:
    image: php:7.2.14-cli-stretch
    command: docker-php-ext-install json pdo pdo_mysql

And run it like that

time docker-compose up

Where docker-php-ext-install basically compiles those extensions.
With no luck: compile time is the same before and after sleep (~25s).
So it is not CPU who works slower. Will try to investigate further. My next idea is to run some strace and compare results from fast and slow tests, but it might take some time to dig through these results. Any ideas how to pin it down are welcome.

@anton-siardziuk
Copy link
Author

anton-siardziuk commented Jan 13, 2019

More info on this problem. Ran a test php script through strace and the difference between good and bad performance straces is millions (literally) calls of gettimeofday when performance is bad. I also found this clock related logs:

2019-01-13 13:54:39.119934+0300  localhost com.docker.driver.amd64-linux[35411]: proxy << POST /v1.39/exec/a177f7c2bbbd2fc5816387e002da09680c89d4b828e59dc7627513a9e4e4b737/resize?
h=41&w=179 (181.771536ms)
2019-01-13 13:56:40.584949+0300  localhost com.docker.vpnkit[35410]: TCP 192.168.65.3:53910 > 212.211.132.250:80 closing flow due to idle port disconnection
2019-01-13 13:56:40.585113+0300  localhost com.docker.vpnkit[35410]: TCP 192.168.65.3:33558 > 212.211.132.250:80 closing flow due to idle port disconnection
2019-01-13 14:01:47.719443+0300  localhost kernel[0]: PM notification timeout (pid 35411, com.docker.drive)
2019-01-13 14:01:47.719449+0300  localhost kernel[0]: PM notification timeout (pid 35411, com.docker.drive)
2019-01-13 14:06:09.957178+0300  localhost com.docker.hyperkit[35416]: [ 1616.560466] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
2019-01-13 14:06:09.964637+0300  localhost com.docker.hyperkit[35416]: [ 1616.579018] clocksource:                       'hpet' wd_now: 3bd42798 wd_last: 3ad23599 mask: ffffffff
2019-01-13 14:06:09.967614+0300  localhost com.docker.hyperkit[35416]: [ 1616.581360] clocksource:                       'tsc' cs_now: bf689879560 cs_last: b761a0f445c mask: ffffffffffffffff
2019-01-13 14:06:09.972656+0300  localhost com.docker.hyperkit[35416]: [ 1616.587694] Hangcheck: hangcheck value past margin!
2019-01-13 14:06:09.990049+0300  localhost com.docker.hyperkit[35416]: [ 1616.602045] clocksource: Switched to clocksource hpet
2019-01-13 14:06:09.997985+0300  localhost com.docker.driver.amd64-linux[35411]: Requesting time sync
2019-01-13 14:06:10.420056+0300  localhost com.docker.driver.amd64-linux[35411]: cannot read sysevent: EOF

@anton-siardziuk
Copy link
Author

So after a lot of gdb and strace and google, I think I understand what is going on. Linux on wakeup thinks that super fast but not that reliable TSC clock no longer up to date and switches to super reliable but not that fast HPET clock. First one works without any syscalls - in userspace, but HPET works though syscall, hence millions of gettimeofday sys calls when performance is bad. And PHP applications in development mode generate millions calls of current time (because of how debugger in PHP works). That's why this clock switch is very noticeable - everything is 3 times slower after wakeup.

@anton-siardziuk anton-siardziuk changed the title Bad performance after waking up Bad performance of PHP(+xdebug) after sleep and waking up because of clock changing from TSC to HPET Jan 14, 2019
@barat
Copy link

barat commented Jan 23, 2019

Nice catch - I have similar issues with big Symfony (+ES) project. Until sleep - everything seems to be OK, after wakeup - 10x degraded performance. I Use docker-sync to eliminate bind mount issues (thinking about mutagen tho), but this thing will happen even without any mounts. Hope, that someone from dev team will take a closer look at it. Now, Docker Desktop is really problematic software to work it - requires workarounds, constant restarts, has timedrifts ...

@ThHareau
Copy link

I'm having this issue also, on fedora since a while actually (almost 6 months).

@remcoverton
Copy link

remcoverton commented Feb 19, 2019

We are having exactly the same issue and it's blocking us from migrating local development to a docker setup. We are on the latest stable version: 2.0.0.3 (31259), Mac 10.14.3.

For testing purposes: I have created a small container that runs the following https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/ example c script:

#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>

int
main(int argc, char *argv[])
{
        struct timeval tv;
        int i = 0;
        for (; i<100; i++) {
                gettimeofday(&tv,NULL);
        }

        return 0;
}
#!/bin/bash

while true; do
    strace -ce gettimeofday /gettimeofdaytest/test
    sleep 10
done

After closing the Mac and waiting a minute or 2 the script starts outputting trace information indicating the calls are now system calls. The source of the issue seems to also be that unstable TSC Clocksource.

linuxkit-025000000001:~# echo 'tsc' > /sys/devices/system/clocksource/clocksourc e0/current_clocksource [ 4483.670063] clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode

When I restart docker desktop the problem is gone again.

@tonivdv
Copy link

tonivdv commented Feb 27, 2019

Was going to create a new support ticket until I saw this one. I exactly have the same issue, and also with a php project with a lot of mounted files. Only a docker for mac restart fixes the issue! Knowing that most mac user put their mac in sleep mode this will be very annoying.

Hope you can find a solution to this.

@traed
Copy link

traed commented Mar 11, 2019

Same here. Hoping for a quick fix!

@dchassin
Copy link

dchassin commented Mar 20, 2019

@m00t: kudos to you. We are experiencing the same problem with gridlabd. A complete validation run normally takes about 400 seconds. After sleep it typically takes around 2000 seconds. Your conclusion is entirely consistent with our findings. After a lot of playing around with linux builds on different VM platforms we decided it had to be in the docker system itself. For example, after the same sleep interval (overnight) centos:7 docker is slow, but a VMware Centos 7 is ok.

The only solution we have now is to restart the docker daemon.

@moacode
Copy link

moacode commented Apr 27, 2019

I've noticed this exact behavior, although I've only just had time to investigate. Typically we run a stack of PHP, MySQL and Apache proxied through Nginx and in my test (which is based off a profiler read-out), I'm using CraftCMS with PHP 7.2.

This is a screenshot of the profiler with Docker daemon restarted AND the container stack freshly recreated:

Screen Shot 2019-04-27 at 11 31 45 PM

~1.5s to load the page (I am using Mutagen for keeping files in sync).

After putting my computer to sleep and waking it up again, the same page reload yielded the following results:

Screen Shot 2019-04-27 at 10 12 40 PM

~6 seconds!?!

Finally, after restarting Docker I was getting the following output:

Screen Shot 2019-04-27 at 10 16 05 PM

~3 seconds... Still not at the original 1.5 seconds.

It wasn't until I restarted the docker daemon and re-created my containers that the page load time came back down to 1.5 seconds (Mutagen was left running the entire time).

@remcoverton
Copy link

I have updated to the latest docker ( edge ) version and the problem still persists in exactly the same manner as described above.

2.0.4.0

  • Engine: 19.03.0-beta3
  • Compose: 1.24.0
  • Kubernetes: v1.14.1
  • Notary: 0.6.1
  • Credential Helper: 0.6.0
  • Machine: 0.16.1

@remcoverton
Copy link

Tested on the latest docker Edge version, it still falls back to hpet.

2.0.5.0

docker-desktop:# [ 450.623931] clocksource: timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 450.627869] clocksource: 'hpet' wd_now: cf3bfb7 wd_last: c111b9b mask: ffffffff
[ 450.630547] clocksource: 'tsc' cs_now: 21d88179225 cs_last: 21c9504986d mask: ffffffffffffffff
[ 450.634050] clocksource: Switched to clocksource hpet
docker-desktop:
# cat /sys/devices/system/clocksource/clocksource0/current_clock
source
hpet

@tomasinchoo
Copy link

Until this issue is resolved in stable version, it is impossible to use docker on Mac.

@dchassin
Copy link

We’ve been able to use docker on macs as long as we don’t let the Mac sleep. If it sleeps we have to restart docker.

@tomasinchoo
Copy link

We’ve been able to use docker on macs as long as we don’t let the Mac sleep. If it sleeps we have to restart docker.

Yea, I know, but that is not much of a solution, especially on Mac which sleeps a lot and has limited charges. I don't even use Mac, but we have a team that does and I really can't even understand why. It is so limited and weak.
Unfortunately I'm the one handling docker images for them. :(

@dchassin
Copy link

Indeed. It’s been a real source of frustration for most of us because we use docker alot on laptops. Sleep is very common even when running long jobs. It has to be fixed soon.

@smaudet
Copy link

smaudet commented Jul 11, 2019

Slightly different problem but I think it might be a problem with OSX itself...

Running VBox for a linux VM (so similar situation to Docker) - on OSX sleep and then resume, the whole computer is unusable, requiring a restart...

This is some shoddy OSX resume/restart logic I think. Anyway to downgrade OSX to a version which worked?

@tonivdv
Copy link

tonivdv commented Jul 12, 2019

@smaudet Never had that issue with vbox. I used for a very long time docker toolbox with nfs enabled and I never had the issue described here. So I doubt it's related.

@dchassin
Copy link

Slightly different problem but I think it might be a problem with OSX itself...

If I had to guess, it's an interaction between the three systems, and not something that is exclusively the fault of any one system. It works fine on VMware Linux on OSX, but it doesn't work on Docker Linux on OSX, where Linux and OSX are the same versions for both cases. Perhaps when OSX wakes from a sleep, Linux isn't being told what happened and it doesn't reset the clock stability check properly on Docker, whereas on VMware it is. Since VMware can do it but Docker can't, that suggests OSX isn't to blame alone for this problem.

@smaudet
Copy link

smaudet commented Jul 12, 2019

@dchassin the curious thing is there isn't an issue on the windows or linux host side of things... perhaps it is partly related to the software? I.e. VMWare catches the sleep event while Docker and VBox don't...

Could be hardware too, I've not ruled that out yet (would try to BootCamp windows or linux host and see if the problem persists).

@carlos-granados
Copy link

I have created an issue in the Xdebug repo asking for a way to disable this call to getTimeOfDay as this is just used to timestamp stack entries and I think that 99% of the applications that use Xdebug don't need this info

@joniklas
Copy link

joniklas commented Sep 16, 2019

Thanks @carlos-granados for your initiative. This could decrease the symptoms of the slow clocksource problem.

Since getTimeOfDay is used in many other cases than debugging: It would be great, if this issue on Macs could be solved at the core - so inside docker. (Also there are other low level calls wich are slow in debugging. E.g. getPid.)

@derickr
Copy link

derickr commented Sep 16, 2019

Is there a different clock source that can be used that does not have this problem?

@dochne
Copy link

dochne commented Sep 16, 2019

Is there a different clock source that can be used that does not have this problem?

There are three clock sources available most of the time. tsc, hpet and acpi_pm. If TSC desyncs (causing this issue) then we're left with only hpet and acpi_pm.

I'll try to get around to running some benchmarks with acpi instead, but from my reading about different clock sources I suspect anything non-TSC will end up with the same issue.

As a side note, here's a couple of helpful bash scripts that we've been using internally lately to at least increase visibility of the issue, which may or may not be useful to some others experiencing this issue.

Verifying if this is currently effecting your containers:

YELLOW="\033[0;33m"
RED="\033[31m"
RESET="\033[0m"
DOCKER_CLOCK_SOURCE=$(docker exec -ti {my_docker_container} /bin/bash -c 'cat /sys/devices/system/clocksource/clocksource0/current_clocksource | tr -d "[:space:]"')
if [[ ${DOCKER_CLOCK_SOURCE} != "tsc" ]]; then
    echo -e "${RED}WARNING!${RESET} - ${YELLOW}Clock drift detected. Please restart Docker Desktop to avoid severe performance issues${RESET}"
fi

Automatically killing and restarting Docker for Desktop via CLI:

echo "Killing Docker"
osascript -e 'quit app "Docker"';
echo "Restarting Docker"
open -a Docker
echo "Waiting for Docker to start"
while ! docker system info > /dev/null 2>&1; do printf "."; sleep 1; done; printf "\n";

@dsazup
Copy link

dsazup commented Oct 16, 2019

I'm currently running into the same issue. Test suite which should run for 2 mins max, runs 11 minutes after sleep. After restarting, it's back to 2 minutes.. 🤔

@dchassin
Copy link

Sometime during the last week Centos:7 slowed down significantly when running gridlabd validation tests, which uses time calls a lot. I wonder if it has anything to do with the change to the clock calls.

@patricknelson
Copy link

patricknelson commented Aug 30, 2021

The issue has been open for a few years, but I just wanted to pile on a few more stats since I can tell this will still be a significant source of latency even without Xdebug installed at all. Anecdotally, we noticed a performance hit of anywhere from at least 34% to 61% slower on initial load before OPcache takes over in our tests (see table below). This is still pretty significant, albeit not as grotesque as how badly it affects Xdebug.


More details: We're running a complex PHP application with a large number of files. We started investigating this when our Mac developers were reporting significant slowdowns after their MacBooks went to sleep. We found that even after removing the Xdebug extension from PHP, the developers still had a significant decrease in performance after sleeping (about 34% in latency on initial load, suggesting OPcache optimization is helping on subsequent loads). This tells me that it's not just an issue with Xdebug and therefore shouldn't be considered addressed once people upgrade Xdebug.

Stats for nerds: Where cat /sys/devices/system/clocksource/clocksource0/current_clock returns tsc before sleep and hpet after sleep. Note that the first cluster of tests are with Xdebug installed but not actively debugging.

First load % Slower Second Load % Slower
Xdebug installed (test 1): Before sleep (tsc) 3.3s 1.9s
Xdebug installed (test 1): After sleep (hpet) 13.16s 299% slower 3.8s 100% slower
Xdebug installed (test 2): Before sleep (tsc) 2.34s 488ms
Xdebug installed (test 2): After sleep (hpet) 11.85s 406% slower 2.45s 402% slower
No Xdebug (test 1): Before sleep (tsc) 2.9s 1.7s
No Xdebug (test 1): After sleep (hpet) 3.9s 34% slower 1.9s 12% slower
No Xdebug (test 1): Before sleep (tsc) 1.45s 390ms
No Xdebug (test 1): After sleep (hpet) 2.33s 61% slower 432ms 11% slower
  • Docker 20.10.8, build 3967b7d
  • Docker Desktop for Mac 3.6.0
  • PHP 5.6 w/ Xdebug 2.5.5 (I know we should upgrade, we're working on it...)

@dochne
Copy link

dochne commented Sep 1, 2021

As nobody has mentioned it yet, this doesn't seem to be an issue on M1, presumably because tsc is an x86/x64 clocksource and there's a different one for ARM.

@jfirebaugh
Copy link

Is there some way to force tsc to be used as the clocksource? I don't care about clock drift; there's an NTP daemon running in the VM anyway.

It looks like the kernel accepts clocksource=tsc command line argument but I can't find any docs about how to set that for Docker.

@jfirebaugh
Copy link

jfirebaugh commented Mar 10, 2022

I think I figured out a solution!

Add tsc=reliable to /Applications/Docker.app/Contents/Resources/linuxkit/cmdline. Restart Docker Desktop.

Now my containers continue to use the tsc clocksource even after waking from sleep.

@patricknelson
Copy link

@jfirebaugh I take it you're on MacOS running an older x86 CPU, right?

Unfortunately I'm having a hard time reproducing this bug on my current corp-issued machine since it doesn't sleep anymore (only hibernates).

@kerryj89
Copy link

kerryj89 commented Mar 11, 2022

I think I figured out a solution!

Add tsc=reliable to /Applications/Docker.app/Contents/Resources/linuxkit/cmdline. Restart Docker Desktop.

Now my containers continue to use the tsc clocksource even after waking from sleep.

Wow - that actually worked for me. I put my Intel MBP to sleep overnight and woke up to my app working just as speedy as I left it whereas it's usually incredibly slow until I restart Docker. Thank you so much for sharing!!

@Housik
Copy link

Housik commented Mar 11, 2022

I think I figured out a solution!

Add tsc=reliable to /Applications/Docker.app/Contents/Resources/linuxkit/cmdline. Restart Docker Desktop.

Now my containers continue to use the tsc clocksource even after waking from sleep.

@stephen-turner Should by nice to have it as an option in Docker GUI for Mac :-)

@dchassin
Copy link

Confirmed this morning after an overnight sleep that this fix absolutely remedies the problem. Even git clone is noticeably faster. Thanks to all who contributed to finding the root cause and remedying the problem. This makes a huge difference in our work.

Thank you, thank you, thank you!

@uphlewis
Copy link

uphlewis commented Mar 14, 2022

Add tsc=reliable to /Applications/Docker.app/Contents/Resources/linuxkit/cmdline. Restart Docker Desktop.

This completely solves the issue on my 2019 Intel i7 MBP. Can't believe the fix was just a few keystrokes away all these years 🥲. THANK YOU @jfirebaugh! ❤️

@ddanielou
Copy link

Adding to the choir of flabbergasted xdebug users. This is awesome. Anyone at docker reading this? Some way of setting that option as a checkbox in the GUI (or even something more obscure, an experimental flag, anything) would be more than welcome.

@giraz82
Copy link

giraz82 commented Mar 14, 2022

Thank you @jfirebaugh, this is a life-saving fix!
It works like a charm!

@derickr
Copy link

derickr commented Mar 14, 2022 via email

@patricknelson
Copy link

@derickr credit @jfirebaugh for his discovery 😄

@djs55
Copy link
Contributor

djs55 commented Mar 15, 2022

@jfirebaugh and everyone -- thanks for all the investigation work! I think setting tsc=reliable when using hyperkit makes a lot of sense. I've made an experimental build with this change, if you'd like to try it: https://desktop-stage.docker.com/mac/main/amd64/75926/Docker.dmg

@ziodave
Copy link

ziodave commented Apr 3, 2022

@kerryj89
Copy link

kerryj89 commented Apr 3, 2022

Did anyone try this? https://www.docker.com/blog/speed-boost-achievement-unlocked-on-docker-desktop-4-6-for-mac/

Yes, it works good, but doesn't solve the issue in this topic. You still have to manually add tsc=reliable into Docker's cmdline file. Hopefully that can either be built in or present itself as an option soon.

@pgsimon
Copy link

pgsimon commented Apr 4, 2022

I think I figured out a solution!

Add tsc=reliable to /Applications/Docker.app/Contents/Resources/linuxkit/cmdline. Restart Docker Desktop.

Now my containers continue to use the tsc clocksource even after waking from sleep.

@jfirebaugh Big thanks! Your solution saved a lot of time for our team!

The main question whether when will Docker fix this 3 years old issue?
These kind of issues should be make a lot of sleepless hours for unexperienced developers!

@uphlewis
Copy link

uphlewis commented Apr 8, 2022

BTW - i noticed a significant performance degradation after upgrading docker-for-mac to 4.6.1.

Turns out that it overwrote the change to /Applications/Docker.app/Contents/Resources/linuxkit/cmdline. So, be mindful of that.

Clearly we need an official setting for this.

@stephen-turner
Copy link
Contributor

Good news, this is fixed in Docker Desktop 4.7 (released yesterday). Many thanks to @jfirebaugh for figuring out the solution.

@thijskaspers
Copy link

Is it really fixed?

I used to have a lot of Mac OS crashes when putting my macbook into sleep while Docker for Mac was still running. It would always end up in me opening up my laptop and seeing it had rebooted with the alert "Your computer was restarted because of a problem".

Not sure if it is related to this issue, but I do know that once I added tsc=reliable to the cmdline file it never happened again. Until... I updated to Docker for Mac 4.7 with virtuofs (without the tsc=reliable) in the cmdline file, and this morning the Macbook woke up again rebooted with the alert that a problem had occurred.

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators May 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests