Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Docker client returning timeout on EXEC #3999

Closed
agjohnson opened this issue Apr 23, 2018 · 17 comments · Fixed by #5654
Closed

New Docker client returning timeout on EXEC #3999

agjohnson opened this issue Apr 23, 2018 · 17 comments · Fixed by #5654
Labels
Accepted Accepted issue on our roadmap Bug A bug

Comments

@agjohnson
Copy link
Contributor

https://sentry.io/read-the-docs/readthedocs-org/issues/533022676/

timeout: timed out
(10 additional frame(s) were not displayed)
...
  File "readthedocs/doc_builder/environments.py", line 472, in run
    return super(BuildEnvironment, self).run(*cmd, **kwargs)
  File "readthedocs/doc_builder/environments.py", line 307, in run
    return self.run_command_class(cls=self.command_class, cmd=cmd, **kwargs)
  File "readthedocs/doc_builder/environments.py", line 478, in run_command_class
    return super(BuildEnvironment, self).run_command_class(*cmd, **kwargs)
  File "readthedocs/doc_builder/environments.py", line 346, in run_command_class
    build_cmd.run()
  File "readthedocs/doc_builder/environments.py", line 234, in run
    output = client.exec_start(exec_id=exec_cmd['Id'], stream=False)

(Build) [astropy:latest] timed out
@SylvainCorlay
Copy link

Oops yeah, that's a new problem with docker from a deploy last week. I'll close the issue here as we're addressed the docker change in #3999

Note: I see that you closed the other issues, but the builds are still failing.

@stsewd
Copy link
Member

stsewd commented Apr 23, 2018

@SylvainCorlay yes, they are, the problem isn't solved yet, the team is working on it :)

@SylvainCorlay
Copy link

gotcha, thanks!

@humitos
Copy link
Member

humitos commented Apr 23, 2018

I'm not sure why this is happening now.

We did a deploy with a newer version of docker python package (3.2.1) but there is nothing in the changelog that talks about timeouts: http://docker-py.readthedocs.io/en/stable/change-log.html and https://github.com/docker/docker-py/milestone/50?closed=1

I'm buiding astropy in my local instance for more than 10 minutes and it continues building (it's still creating the conda env). No timeout reached.

There is nothing new related to timeout and the only thing that I've found is the timeout for the API calls in the constructor for the APIClient (http://docker-py.readthedocs.io/en/stable/api.html?highlight=APIClient#docker.api.client.APIClient).

Although, we are not setting it and the default is 60 seconds. So, if it's considered to the exec_start, any build that takes more that 1 minute should fail.

I'm still a little confused. Will keep researching. Also, I was able to run an astropy build for 1031 seconds (it finally failed because the latest branch has a problem --it seems).

Also, the timeout is from socket.recv: https://github.com/docker/docker-py/blob/master/docker/utils/socket.py#L30

@humitos
Copy link
Member

humitos commented Apr 23, 2018

We noticed that this problem is not present on docker==3.1.3, so we are going to downgrade this package probably. Also, this only happened on the servers --I wasn't able to reproduce this locally even building big projects.

Besides, I noticed that most/all of the errors reported in Sentry are only for projects that uses conda and in the conda env create step.

@humitos
Copy link
Member

humitos commented Apr 23, 2018

There is another Sentry logs with project that fails at pip install: https://sentry.io/read-the-docs/readthedocs-org/issues/533186433/events/latest/

@boegel
Copy link

boegel commented Apr 25, 2018

Problem seems to be fixed now for me, thanks!

see https://readthedocs.org/projects/easybuild/builds/7094389/

@SylvainCorlay
Copy link

Problem seems to be fixed now for me, thanks!

cc @gouarin

@gouarin
Copy link

gouarin commented Apr 25, 2018

It also works for me now.

Thanks !

@humitos humitos removed the Priority: high High priority label Apr 25, 2018
@humitos
Copy link
Member

humitos commented Apr 25, 2018

Thanks for your feedback.

I downgraded docker python package to 3.1.3 as a current solution. Although, that's not the final solution since at the moment we don't know why this happened originally with 3.2.1 and I wasn't able reproduce this locally either.

So, at the moment, we are going to be blocked on 3.1.3 while we can research what's going on with docker :(

@humitos
Copy link
Member

humitos commented Apr 26, 2018

A new docker version was released today: https://github.com/docker/docker-py/releases/tag/3.3.0

It says it fixes an issue with the timeout for stop and restart. It may be related with our case...

@stsewd
Copy link
Member

stsewd commented Jun 14, 2018

@humitos the docker client was updated in #4124, we don't have this problem anymore, right?

@humitos
Copy link
Member

humitos commented Jun 14, 2018

@stsewd we don't know yet. That PR wasn't deployed yet.

@humitos humitos added the Accepted Accepted issue on our roadmap label Jun 14, 2018
@humitos
Copy link
Member

humitos commented Jun 14, 2018

Just deployed and the issue is still present in 3.3.0 :(

I downgraded it to 3.1.3 again.

@humitos
Copy link
Member

humitos commented Oct 1, 2018

Not too much we can do here for now. Unassigning this. We will need to try in production with a newer version in the future :/

@humitos humitos removed their assignment Oct 1, 2018
openstack-gerrit pushed a commit to openstack-archive/stx-config that referenced this issue Nov 23, 2018
The Docker SDK for Python package has recently been up-versioned from
2.4.2 to 3.3.0 to take advantage of the new API with more reliable
exit_code. However, the new Docker client can return a random timeout
on exec which appears to be a known issue:
readthedocs/readthedocs.org#3999

This change entails specifying a timeout value when obtaining a
Docker client.

Tests conducted:
  - verify successful images download
  - verify successful application install

Change-Id: I1676ee835303ab507af187bcce1e1c9be483900f
Story: 2003908
Task: 28013
Signed-off-by: Tee Ngo <[email protected]>
@humitos
Copy link
Member

humitos commented Feb 12, 2019

docker 3.7.0 is released, we could upgrade our version and test the new one manually on one of the builder first before merging and deploying.

@humitos
Copy link
Member

humitos commented May 2, 2019

I just manually upgraded docker in our build03 to version 3.7.2 and triggered a couple of builds: they passed. Also, Sentry does not report any problem on build03 at the moment.

I think we can test this for some days more and then upgrade our requirements file to make this change in all of our builders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Bug A bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants