-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run/repro: Execute a command on another machine #1490
Comments
Can this also be related to data processing on a remote Spark cluster? We're using EMR clusters to run our Spark jobs. One way it is done is using Let me try to simplify. Say that I have a local script
The problem is that |
@drorata a workaround can be to actively pull the status in the |
The idea suggested by @shcheklein is indeed a workaround --- first issue which comes to my mind is that the local machine (where dvc is running) would have to stay awake during the whole processing and this can be a very lengthy process. I don't have any clear picture in mind how an asynchronous flow should look like, but it is probably something worthy discussing. After all, dvc is designed around handling huge data sets, but this renders useless if there's no way of designing stages which process the data in a distributed manner on a Spark or dask clusters for example. I use a notification mechanism in one of my projects where I trigger a process on EMR and this process emits a message once completed. There's a counterpart waiting for that message and once received the 2-phase kicks in. |
Closing as this will be handled by |
I'd vote to leave this one open since there's no solution today, or mark it as a duplicate if there's another issue to cover it. |
@dberenbaum It has been silent for 2 years now, so we could call it stale too. Exp will handle it in iterative/enhancement-proposals#3 , so this is an outdated duplicate. Please feel free to reopen if you think this is still useful. |
Sometimes, you have a dedicated machine with the proper runtime environment to execute your exepriments. It would be great to have an option to send input and run a command on that machine and being able to retrieve the output files.
Things to look up:
-d
ando
to keep track of which files need to pushed to and retrieved from the remote hostMaybe, this could work alongside setting up SSHFS or NFS.
We can introduce an option
--sshlogin
to receive the URI of the node where the computation is needed to be run.The text was updated successfully, but these errors were encountered: