Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addprocs(["user@host"]) can not execute command on remote Windows #29702

Closed
yangli-ai opened this issue Oct 18, 2018 · 12 comments
Closed

addprocs(["user@host"]) can not execute command on remote Windows #29702

yangli-ai opened this issue Oct 18, 2018 · 12 comments

Comments

@yangli-ai
Copy link

yangli-ai commented Oct 18, 2018

Hi, I am trying to use addprocs to connect to remote workers on Windows. SSH connection is successful, but Julia cannot excecute commands.

I think it is not the problem of Julia on remote machines, because I got the same error even remote Julia is uninstalled. SSH is good, because it can work on cmd.exe.

Here are the errors:

[email protected]'s password: [Here I input password]
        From failed worker startup:     Unable to execute command or shell on remote system: Failed to Execute process.
ERROR: Unable to read host:port string from worker. Launch command exited with error?
error(::String) at .\error.jl:33
read_worker_host_port(::Pipe) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\cluster.jl:273
connect(::Distributed.SSHManager, ::Int64, ::WorkerConfig) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\managers.jl:397
create_worker(::Distributed.SSHManager, ::WorkerConfig) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\cluster.jl:505
setup_launched_worker(::Distributed.SSHManager, ::WorkerConfig, ::Array{Int64,1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\cluster.jl:451
(::getfield(Distributed, Symbol("##47#50")){Distributed.SSHManager,WorkerConfig})() at .\task.jl:259
Stacktrace:
 [1] sync_end(::Array{Any,1}) at .\task.jl:226
 [2] #addprocs_locked#44(::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:tunnel, :sshflags, :max_parallel),Tuple{Bool,Cmd,Int64}}}, ::Function, ::Distributed.SSHManager) at .\task.jl:266
 [3] #addprocs_locked at .\none:0 [inlined]
 [4] #addprocs#43(::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:tunnel, :sshflags, :max_parallel),Tuple{Bool,Cmd,Int64}}}, ::Function, ::Distributed.SSHManager) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\cluster.jl:369
 [5] #addprocs at .\none:0 [inlined]
 [6] #addprocs#251(::Bool, ::Cmd, ::Int64, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Array{String,1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\managers.jl:118
 [7] addprocs(::Array{String,1}) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.0\Distributed\src\managers.jl:117
 [8] top-level scope at none:0
@StefanKarpinski
Copy link
Member

  1. Are you able to ssh to the machine and get a shell interactively?
  2. Is the machine configured to only allow running certain commands?

@yangli-ai
Copy link
Author

yangli-ai commented Oct 18, 2018

  1. I use cmd.exe on local machine, and can ssh to remote windows which returns an interactive cmd.exe shell. Just typing "julia" in the shell can start Julia.exe.
  2. Remote machine SSH allow to run commands, such as cmd, dir, ls, sh. sh is from cygwin.

Remote machine use freesshd as ssh services

I just tried:
addprocs(["[email protected]"],exename="/cygdrive/c/Julia/bin/julia",dir="/cygdrive/c/Desktop".

Remote machine really starts a Julia process, which can be seen from Windows Task Manager. But Julia local machine is halted, and shows the following:

[email protected]'s password: debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug2: channel 0: rcvd close
debug2: channel 0: output open -> drain
debug2: channel 0: close_read
debug2: channel 0: input open -> closed
debug2: channel 0: obuf empty
debug2: channel 0: close_write
debug2: channel 0: output drain -> closed
debug2: channel 0: almost dead
debug2: channel 0: gc: notify user
debug2: channel 0: gc: user detached
debug2: channel 0: send close
debug2: channel 0: is dead
debug2: channel 0: garbage collecting
debug1: channel 0: free: client-session, nchannels 1
Transferred: sent 2056, received 1352 bytes, in 188.9 seconds
Bytes per second: sent 10.9, received 7.2
debug1: Exit status 1

debug2: we sent a password packet, wait for reply
debug1: Authentication succeeded (password).
Authenticated to 192.168.1.114 ([192.168.1.114]:22).
debug2: fd 4 setting O_NONBLOCK
debug2: fd 5 setting O_NONBLOCK
debug1: channel 0: new [client-session]
debug2: channel 0: send open
debug1: Entering interactive session.
debug1: pledge: network
debug2: callback start
debug2: fd 3 setting TCP_NODELAY
debug2: client_session2_setup: id 0
debug1: Sending command: sh -l -c 'cd -- /cygdrive/c/Users/seanm/Desktop
export JULIA_WORKER_TIMEOUT=60
/cygdrive/e/Programming/Julia/bin/julia --worker'
debug2: channel 0: request exec confirm 1
debug2: callback done
debug2: channel 0: open confirm rwindow 131072 rmax 98304
debug2: channel_input_status_confirm: type 99 id 0
debug2: exec request accepted on channel 0
debug2: channel 0: rcvd adjust 0

@StefanKarpinski
Copy link
Member

Does running these commands on the remote machine work:

cd -- /cygdrive/c/Users/seanm/Desktop
export JULIA_WORKER_TIMEOUT=60
/cygdrive/e/Programming/Julia/bin/julia --worker

@yangli-ai
Copy link
Author

Yes, it is running, and just stop there without doing anything.

Windows Task Manager also shows that a Julia process is running.

@yangli-ai
Copy link
Author

yangli-ai commented Oct 18, 2018

If I use Ctrl+C to force it stop. A message is returned:

Internal error: encountered unexpected error in runtime:
InterruptException()

By the way, I use Julia 1.0

@yangli-ai
Copy link
Author

Sorry. If I press Enter more times, it returns: julia_worker:9724#192.168.1.114

@StefanKarpinski
Copy link
Member

Can you confirm that this was on the remote machine, not the local machine?

@yangli-ai
Copy link
Author

Yes, I confirm

@yangli-ai
Copy link
Author

yangli-ai commented Oct 18, 2018

On local machine, I try:
addprocs(["[email protected]:9724"],exename="/cygdrive/e/Programming/Julia/bin/julia",dir="/cygdrive/c/Users/seanm/Desktop")

Remote Julia returns an error:
ErrorException("Process(1) - Invalid connection credentials sent by remote.")CapturedException(ErrorException("Process(1) - Invalid connection credentials sent by remote."), Any[(error(::String) at error.jl:33, 1), (process_hdr(::TCPSocket, ::Bool) at process_messages.jl:248, 1), (message_handler_loop(::TCPSocket, ::TCPSocket, ::Bool) at process_messages.jl:142, 1), (process_tcp_streams(::TCPSocket, ::TCPSocket, ::Bool) at process_messages.jl:117, 1), ((::##105#106{TCPSocket,TCPSocket,Bool})() at task.jl:259, 1)])
Process(1) - Unknown remote, closing connection.

@vchuravy
Copy link
Member

Duplicate of #29243,

function launch_on_machine(manager::SSHManager, machine, cnt, params, launched, launch_ntfy::Condition)
assumes that it will run on a posix system.

@StefanKarpinski
Copy link
Member

Right, Windows is non-POSIX, unfortunately and therefore not supported.

@mgkuhn
Copy link
Contributor

mgkuhn commented Jul 10, 2021

For the record: a shell=:wincmd option of Distributed.addprocs() was added in Julia 1.6 for Windows workers where you ssh into cmd.exe. The documentation now also makes clear that the default is for POSIX shells.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants