Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get, get-url: add --jobs option #2122

Merged
merged 6 commits into from
Feb 5, 2021
Merged

get, get-url: add --jobs option #2122

merged 6 commits into from
Feb 5, 2021

Conversation

isidentical
Copy link
Contributor

You may disregard these recommendations if you used the Edit on GitHub button from dvc.org to improve a doc in place.

❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.

🐛 Please make sure to mention Fix #issue (if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.

Please choose to allow us to edit your branch when creating the PR.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

iterative/dvc#5362

@shcheklein shcheklein temporarily deployed to dvc-landing-get-jobs-jz9mt16th January 29, 2021 08:19 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-get-jobs-jz9mt16th February 2, 2021 15:47 Inactive
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-get-jobs-jz9mt16th February 2, 2021 15:57 Inactive
Comment on lines 71 to 72
default is `4`. Note that the default value can be set using the `jobs` config
option with `dvc remote modify`. Using more jobs may speed up the operation.
Copy link
Contributor

@jorgeorpinel jorgeorpinel Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the default value can be set using the jobs config option with dvc remote modify.

Does this apply to get (and import)? I.e. does it respect the .dvc/config settings of the source repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. does it respect the .dvc/config settings of the source repo?

I believe it should, though might be mistaken on this cc: @efiop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the source - yes, in the host - no. Meaning that if you run dvc get inside of some repo, its configs won't be used. This is consistent with all of the related commands (list/import/get). There are a few separate tasks for that, we are almost ready to get back to them, as most of the necessary pre-requisites have already been merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop what will be the behavior after those tasks are addressed? I'm trying to understand if we need some additional clarification here (what config is being used).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop ping ^^ I'm not sure I understood the comment about There are a few separate tasks for that, we are almost ready to get back to them, as most of the necessary pre-requisites have already been merged., could you clarify it a bit please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will be the behavior after those tasks are addressed?

Don't know right now. Maybe we'll try to use host configs too or maybe we'll require explicit --config flag or config flags.

I think the confusion here is that dvc remote modify mentioned is talking about source repo, not the host repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's confusing. Maybe:

Suggested change
default is `4`. Note that the default value can be set using the `jobs` config
option with `dvc remote modify`. Using more jobs may speed up the operation.
default is `4`. Using more jobs may speed up the operation. Note that the
default value can be set in the source repo using the `jobs` config option of
`dvc remote modify`.

Or even make that last sentence an independent block quote (at the bullet indentation level).

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 1 Q ☝️

@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-get-jobs-jz9mt16th February 2, 2021 16:40 Inactive
@jorgeorpinel

This comment has been minimized.

@@ -132,6 +132,10 @@ $ dvc run -n download_data \
already exist locally and you want to "DVCfy" this state of the project (see
also `dvc commit`).

- `-j <number>`, `--jobs <number>` - parallelism level for DVC to download data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isidentical just to make sure - it's not affected by any configs at all, right? (I understand that this is not about remotes in it's default mode, but we also have core.jobs or something if I remember it right)

also, what happens if we do:

dvc import-url remote://name syntax and remote specifies a jobs in its config - is it a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood #2122 (comment) correctly, the configuration of the source repo is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For import-url, it goes with --jobs => remote.jobs => core.jobs => BaseTree.JOBS

This comment was marked as resolved.

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that not BaseTree.JOBS but more like Tree.JOBS, as defaults are different for different clouds. For ssh it is 4 for the rest it is 4 * NCPU.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to mention this here as we do in the other parts of this PR?

Copy link
Contributor

@efiop efiop Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein It is already mentioned here. Unless I'm misunderstanding your question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I don't see it. (I don't mean defaults, but rather that it's affected or can be affected by config. We have something like this in the other places)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Sure, could mention here.

@jorgeorpinel jorgeorpinel self-assigned this Feb 3, 2021
jorgeorpinel added a commit that referenced this pull request Feb 3, 2021
@jorgeorpinel jorgeorpinel temporarily deployed to dvc-landing-get-jobs-jz9mt16th February 3, 2021 19:11 Inactive
shcheklein pushed a commit that referenced this pull request Feb 4, 2021
@shcheklein shcheklein temporarily deployed to dvc-landing-get-jobs-jz9mt16th February 5, 2021 10:48 Inactive
@shcheklein shcheklein merged commit a1b50de into master Feb 5, 2021
Comment on lines +77 to +78
default is `4`. Note that the default value can be set using the `jobs` config
option with `dvc remote modify`. Using more jobs may speed up the operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default value can be set using the jobs config option with dvc remote modify.

This doesn't apply to get-url right? 🙂

Comment on lines +136 to +137
default is `4`. Note that the default value can be set using the `jobs` config
option with `dvc remote modify`. Using more jobs may speed up the operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (there are no remotes involved in the source for import-url, except with --to-remote)

@jorgeorpinel
Copy link
Contributor

I addressed my last commends in #2158

@jorgeorpinel jorgeorpinel deleted the get-jobs branch February 7, 2021 03:05
shcheklein pushed a commit that referenced this pull request Feb 7, 2021
* cmd: --jobs option desc in sync refs
per #2122 (review)

* cmd: update get/import-url --jobs desc.
per #2122 (review)
and #2122 (review)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants