Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run/repro: Execute a command on another machine #1490

Closed
ghost opened this issue Jan 11, 2019 · 6 comments
Closed

run/repro: Execute a command on another machine #1490

ghost opened this issue Jan 11, 2019 · 6 comments
Labels
enhancement Enhances DVC feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint

Comments

@ghost
Copy link

ghost commented Jan 11, 2019

Sometimes, you have a dedicated machine with the proper runtime environment to execute your exepriments. It would be great to have an option to send input and run a command on that machine and being able to retrieve the output files.

Things to look up:

  • Cancelled or failed runs could leave garbage on the remote hosts that need to be collected
  • Large Input / Output files with inefficient transfers
  • Use -d and o to keep track of which files need to pushed to and retrieved from the remote host

Maybe, this could work alongside setting up SSHFS or NFS.

We can introduce an option --sshlogin to receive the URI of the node where the computation is needed to be run.

@ghost ghost added enhancement Enhances DVC feature request Requesting a new feature labels Feb 4, 2019
@drorata
Copy link

drorata commented Aug 22, 2019

Can this also be related to data processing on a remote Spark cluster?

We're using EMR clusters to run our Spark jobs. One way it is done is using aws emr add-steps. In this case, we the job's code is placed on S3 and by providing a cluster ID we can execute the work.

Let me try to simplify. Say that I have a local script execute_process_on_emr_cluster.sh. The result of running this script is an EMR step running on a predefined EMR cluster. Furthermore, the result of the computation will be persisted to s3://mybucket/great_result.parquet. I would like to be able to do something like:

dvc run -d execute_process_on_emr_cluster.sh [-d some other dependencies maybe] -o s3://mybucket/great_result.parquet execute_process_on_emr_cluster.sh

The problem is that execute_process_on_emr_cluster.sh merely returns the ID of the step submitted to EMR, so dvc will complain that no expected output was found. I guess that some "asynchronous" approach is needed here.

@shcheklein
Copy link
Member

@drorata a workaround can be to actively pull the status in the execute_process_on_emr_cluster.sh and exit when it's done. I'm not sure I understand how can asynchronous mode look like. Have you used some notification mechanisms for this before?

@drorata
Copy link

drorata commented Aug 23, 2019

The idea suggested by @shcheklein is indeed a workaround --- first issue which comes to my mind is that the local machine (where dvc is running) would have to stay awake during the whole processing and this can be a very lengthy process.

I don't have any clear picture in mind how an asynchronous flow should look like, but it is probably something worthy discussing. After all, dvc is designed around handling huge data sets, but this renders useless if there's no way of designing stages which process the data in a distributed manner on a Spark or dask clusters for example.

I use a notification mechanism in one of my projects where I trigger a process on EMR and this process emits a message once completed. There's a counterpart waiting for that message and once received the 2-phase kicks in.

@efiop efiop added p3-nice-to-have It should be done this or next sprint and removed p4 labels Sep 30, 2019
@efiop
Copy link
Contributor

efiop commented May 3, 2021

Closing as this will be handled by dvc exp executors in the future.

@efiop efiop closed this as completed May 3, 2021
@dberenbaum
Copy link
Collaborator

I'd vote to leave this one open since there's no solution today, or mark it as a duplicate if there's another issue to cover it.

@efiop
Copy link
Contributor

efiop commented May 3, 2021

@dberenbaum It has been silent for 2 years now, so we could call it stale too. Exp will handle it in iterative/enhancement-proposals#3 , so this is an outdated duplicate. Please feel free to reopen if you think this is still useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

No branches or pull requests

4 participants