Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dial-stdio: handle connections which lack CloseRead method. #1718

Merged
merged 3 commits into from
Mar 13, 2019

Conversation

ijc
Copy link
Contributor

@ijc ijc commented Mar 6, 2019

This happens on Windows when dialing a named pipe (a path which is used by CLI
plugins), in that case some debugging shows:

DEBU[0000] conn is a *winio.win32MessageBytePipe
DEBU[0000] conn is a halfReadCloser: false
DEBU[0000] conn is a halfWriteCloser: true
the raw stream connection does not implement halfCloser                                                                                                                                                                         

In such cases we can simply wrap with a nop function since closing for read
isn't too critical.

Signed-off-by: Ian Campbell [email protected]

My testing on Windows s is limited to

>docker-windows-amd64.exe -D system dial-stdio                                                                                                                                                   
                                                                                                                                                                                                                                
HTTP/1.1 400 Bad Request                                                                                                                                                                                                        
Content-Type: text/plain; charset=utf-8                                                                                                                                                                                         
Connection: close                                                                                                                                                                                                               
                                                                                                                                                                                                                                
400 Bad Request                                                             

Which failed previously.

This is a replacement/alternative for #1710.

@codecov-io
Copy link

codecov-io commented Mar 6, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@bf4a96e). Click here to learn what that means.
The diff coverage is 0%.

@@            Coverage Diff            @@
##             master    #1718   +/-   ##
=========================================
  Coverage          ?   56.13%           
=========================================
  Files             ?      306           
  Lines             ?    21035           
  Branches          ?        0           
=========================================
  Hits              ?    11807           
  Misses            ?     8373           
  Partials          ?      855

@thaJeztah
Copy link
Member

ping @AkihiroSuda @tonistiigi PTAL

@AkihiroSuda
Copy link
Collaborator

looks good

chris-crone pushed a commit to chris-crone/app that referenced this pull request Mar 8, 2019
- Update docker/cli is now pointing to chris-crone/cli.
The change needs the merge of docker/cli#1718
and docker/cli#1690
- Fix issues relative paths in Jenkinsfile and Jenkinsfile.baguette
- Avoid using '--config' in favor of env variable 'DOCKER_CONFIG'

Signed-off-by: Ulysses Souza <[email protected]>
chris-crone pushed a commit to chris-crone/app that referenced this pull request Mar 8, 2019
- Update docker/cli is now pointing to chris-crone/cli.
The change needs the merge of docker/cli#1718
and docker/cli#1690
- Fix issues relative paths in Jenkinsfile and Jenkinsfile.baguette
- Avoid using '--config' in favor of env variable 'DOCKER_CONFIG'

Signed-off-by: Ulysses Souza <[email protected]>
@ijc
Copy link
Contributor Author

ijc commented Mar 8, 2019

Is this (or #1710) ready to be merged? I don't think the change is unit testable hence the coverage result.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll copy @AkihiroSuda's LGTM 👍

ping @tonistiigi ptal

@cpuguy83
Copy link
Collaborator

cpuguy83 commented Mar 8, 2019

Why not add a CloseRead to the pipe implementation?

@cpuguy83
Copy link
Collaborator

cpuguy83 commented Mar 8, 2019

Doing nothing on CloseRead doesn't seem right.

@andrey-ko
Copy link

andrey-ko commented Mar 8, 2019

Not really directly related to theses changes, but it looks like that defer cancel() doesn't always close conn. So we can return from runDialStdio function without waiting or canceling background goroutines, for example if we read non-nil error from stdin2conn channel or anything from conn2stdout. Is it right behaviour? In case if we should cancel all background goroutines and close connection on exit, then nop-closeReader actually is ok, as it will always be followed by connection close call, but all this logic is quite obfuscated though...

Copy link
Contributor

@silvin-lubecki silvin-lubecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@ijc
Copy link
Contributor Author

ijc commented Mar 11, 2019

Why not add a CloseRead to the pipe implementation?

I lack the necessary Windows background to make that happen. /cc @jstarks who added CloseWrite to these things in case he can help.

Doing nothing on CloseRead doesn't seem right.

This was @AkihiroSuda's suggestion in #1710 (comment), it seemed reasonable to me for objects which didn't support CloseRead.

#1710 has now been merged so my immediate problem is solved, I'm happy to bump this and address review around this basic premise etc but I'm unlikely to implement CloseRead on named pipes any time soon.

@ijc ijc force-pushed the dial-stdio-npipe-on-windows branch from 3feab88 to ecd8d86 Compare March 11, 2019 15:46
@ijc
Copy link
Contributor Author

ijc commented Mar 11, 2019

I've rebased (to pickup #1728 e2e fix) so this now includes the revert of #1710 too.

@ijc
Copy link
Contributor Author

ijc commented Mar 12, 2019

It turns out that system dial-stdio is also broken with tcp:// and --tls even on Linux:

$ ./build/docker-linux-amd64 -D -H tcp://18.197.174.153:443 --tls --tlscacert /tmp/tmp.TDp4Lrmnc1/ca.pem system dial-stdio 
DEBU[0000] conn is a *tls.Conn                          
DEBU[0000] conn is a halfReadCloser: false              
DEBU[0000] conn is a halfWriteCloser: true              
the raw stream connection does not implement halfCloser

This PR helps with that too.

@AkihiroSuda
Copy link
Collaborator

Perhaps Close() should be called when both CloseRead() and CloseWrite() are called

@cpuguy83
Copy link
Collaborator

cpuguy83 commented Mar 12, 2019

Go 1.11 should have CloseRead now for tls, as I recall.

@ijc
Copy link
Contributor Author

ijc commented Mar 12, 2019

Go 1.11 should have CloseRead now, as I recall.

Not according to https://golang.org/pkg/crypto/tls/, nor the test shown above which was with g1.11.

Perhaps Close() should be called when both CloseRead() and CloseWrite() are called

Not sure about that, since I don't have enough insight into what all the actual underlying objects might actually be.

OTOH -- shouldn't there be a conn.Close somewhere (perhaps deferred) in this function somewhere? Otherwise what is closing the conn in the end? Are we leaking an fd?

@@ -34,10 +34,17 @@ func runDialStdio(dockerCli command.Cli) error {
if err != nil {
return errors.Wrap(err, "failed to open the raw stream connection")
}
connHalfCloser, ok := conn.(halfCloser)
if !ok {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer conn.Close here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems plausible and WFM in my simple cases, so I've pushed. I think this was just a preexisting leak, but it also help allay concerns about the nopCloseReader here too.

Would be good to get this tested with your ssh:// use case though.

Ian Campbell added 3 commits March 12, 2019 14:52
This happens on Windows when dialing a named pipe (a path which is used by CLI
plugins), in that case some debugging shows:

    DEBU[0000] conn is a *winio.win32MessageBytePipe
    DEBU[0000] conn is a halfReadCloser: false
    DEBU[0000] conn is a halfWriteCloser: true
    the raw stream connection does not implement halfCloser
In such cases we can simply wrap with a nop function since closing for read
isn't too critical.

Signed-off-by: Ian Campbell <[email protected]>
This was leaking the fd.

Signed-off-by: Ian Campbell <[email protected]>
This reverts commit c41c238.

This case is now handled due to the previous commit.

Signed-off-by: Ian Campbell <[email protected]>
@ijc ijc force-pushed the dial-stdio-npipe-on-windows branch from ecd8d86 to 0449ad8 Compare March 12, 2019 14:58
@cpuguy83
Copy link
Collaborator

cpuguy83 commented Mar 12, 2019 via email

Copy link
Collaborator

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alterntive here would be to always wrap and have an implementation that does

func(c *connWrapper) CloseRead() error {
    if rc, ok := conn.(halfReadCloser) {
      rc.CloseRead()
    }
    return nil
}

I feel like the attach code does something like this.

But LGTM.

@thaJeztah
Copy link
Member

Let's merged 👍

@thaJeztah thaJeztah merged commit d6a2306 into docker:master Mar 13, 2019
@GordonTheTurtle GordonTheTurtle added this to the 19.03.0 milestone Mar 13, 2019
@ijc ijc deleted the dial-stdio-npipe-on-windows branch March 13, 2019 10:51
ijc pushed a commit to ijc/docker-cli that referenced this pull request Mar 18, 2019
So far we've had problems with it not working on Windows (npipe lacked the
`ReadClose` method) and problems with using tcp with tls (the tls connection
also lacks `ReadClose`). Both of these were workedaround in docker#1718 which added a
nop `ReadClose` method.

However I am now seeing hangs (on Windows) where the `system dial-stdio`
subprocess is not exiting (I'm unsure why so far).

I think the 3rd problem found with this is an indication that `dial-stdio` is
not ready for primetime outside of its initial usecase (support for `ssh://`
URLs to connect to remote daemons).

This change simply disables the `dial-stdio` path for all plugins. However I've
left in a trap door environment variable so that those who want to experiment
with this mode have an easier path do doing so.

Signed-off-by: Ian Campbell <[email protected]>
ijc pushed a commit to ijc/docker-cli that referenced this pull request Mar 18, 2019
Since docker#1654 so far we've had problems with it not working on Windows (npipe
lacked the `ReadClose` method) and problems with using tcp with tls (the tls
connection also lacks `ReadClose`). Both of these were workedaround in docker#1718
which added a nop `ReadClose` method.

However I am now seeing hangs (on Windows) where the `system dial-stdio`
subprocess is not exiting (I'm unsure why so far).

I think the 3rd problem found with this is an indication that `dial-stdio` is
not quite ready for wider use outside of its initial usecase (support for
`ssh://` URLs to connect to remote daemons).

This change simply disables the `dial-stdio` path for all plugins. However
rather than completely reverting 891b3d9 ("cli-plugins: use `docker system
dial-stdio` to call the daemon") I've just disabled the functionality at the
point of use and left in a trap door environment variable so that those who
want to experiment with this mode (and perhaps fully debug it) have an easier
path do doing so.

Signed-off-by: Ian Campbell <[email protected]>
ijc pushed a commit to ijc/docker-cli that referenced this pull request Mar 18, 2019
Since docker#1654 so far we've had problems with it not working on Windows (npipe
lacked the `CloseRead` method) and problems with using tcp with tls (the tls
connection also lacks `CloseRead`). Both of these were workedaround in docker#1718
which added a nop `CloseRead` method.

However I am now seeing hangs (on Windows) where the `system dial-stdio`
subprocess is not exiting (I'm unsure why so far).

I think the 3rd problem found with this is an indication that `dial-stdio` is
not quite ready for wider use outside of its initial usecase (support for
`ssh://` URLs to connect to remote daemons).

This change simply disables the `dial-stdio` path for all plugins. However
rather than completely reverting 891b3d9 ("cli-plugins: use `docker system
dial-stdio` to call the daemon") I've just disabled the functionality at the
point of use and left in a trap door environment variable so that those who
want to experiment with this mode (and perhaps fully debug it) have an easier
path do doing so.

Signed-off-by: Ian Campbell <[email protected]>
ijc pushed a commit to ijc/docker-cli that referenced this pull request Mar 18, 2019
Since docker#1654 so far we've had problems with it not working on Windows (npipe
lacked the `CloseRead` method) and problems with using tcp with tls (the tls
connection also lacks `CloseRead`). Both of these were workedaround in docker#1718
which added a nop `CloseRead` method.

However I am now seeing hangs (on Windows) where the `system dial-stdio`
subprocess is not exiting (I'm unsure why so far).

I think the 3rd problem found with this is an indication that `dial-stdio` is
not quite ready for wider use outside of its initial usecase (support for
`ssh://` URLs to connect to remote daemons).

This change simply disables the `dial-stdio` path for all plugins. However
rather than completely reverting 891b3d9 ("cli-plugins: use `docker system
dial-stdio` to call the daemon") I've just disabled the functionality at the
point of use and left in a trap door environment variable so that those who
want to experiment with this mode (and perhaps fully debug it) have an easier
path do doing so.

The e2e test for this case is disabled unless the trap door envvar is set. I
also renamed the test to clarify that it is about cli plugins.

Signed-off-by: Ian Campbell <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants