-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get, get-url: add --jobs option #2122
Conversation
076a079
to
20ab141
Compare
default is `4`. Note that the default value can be set using the `jobs` config | ||
option with `dvc remote modify`. Using more jobs may speed up the operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the default value can be set using the
jobs
config option withdvc remote modify
.
Does this apply to get
(and import
)? I.e. does it respect the .dvc/config settings of the source repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. does it respect the .dvc/config settings of the source repo?
I believe it should, though might be mistaken on this cc: @efiop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the source - yes, in the host - no. Meaning that if you run dvc get
inside of some repo, its configs won't be used. This is consistent with all of the related commands (list/import/get). There are a few separate tasks for that, we are almost ready to get back to them, as most of the necessary pre-requisites have already been merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop what will be the behavior after those tasks are addressed? I'm trying to understand if we need some additional clarification here (what config is being used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop ping ^^ I'm not sure I understood the comment about There are a few separate tasks for that, we are almost ready to get back to them, as most of the necessary pre-requisites have already been merged.
, could you clarify it a bit please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will be the behavior after those tasks are addressed?
Don't know right now. Maybe we'll try to use host configs too or maybe we'll require explicit --config flag or config flags.
I think the confusion here is that dvc remote modify
mentioned is talking about source repo, not the host repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's confusing. Maybe:
default is `4`. Note that the default value can be set using the `jobs` config | |
option with `dvc remote modify`. Using more jobs may speed up the operation. | |
default is `4`. Using more jobs may speed up the operation. Note that the | |
default value can be set in the source repo using the `jobs` config option of | |
`dvc remote modify`. |
Or even make that last sentence an independent block quote (at the bullet indentation level).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 1 Q ☝️
This comment has been minimized.
This comment has been minimized.
@@ -132,6 +132,10 @@ $ dvc run -n download_data \ | |||
already exist locally and you want to "DVCfy" this state of the project (see | |||
also `dvc commit`). | |||
|
|||
- `-j <number>`, `--jobs <number>` - parallelism level for DVC to download data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isidentical just to make sure - it's not affected by any configs at all, right? (I understand that this is not about remotes in it's default mode, but we also have core.jobs or something if I remember it right)
also, what happens if we do:
dvc import-url remote://name
syntax and remote specifies a jobs
in its config - is it a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood #2122 (comment) correctly, the configuration of the source repo is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For import-url
, it goes with --jobs => remote.jobs => core.jobs => BaseTree.JOBS
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that not BaseTree.JOBS
but more like Tree.JOBS
, as defaults are different for different clouds. For ssh it is 4 for the rest it is 4 * NCPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense to mention this here as we do in the other parts of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein It is already mentioned here. Unless I'm misunderstanding your question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't see it. (I don't mean defaults, but rather that it's affected or can be affected by config. We have something like this in the other places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Sure, could mention here.
default is `4`. Note that the default value can be set using the `jobs` config | ||
option with `dvc remote modify`. Using more jobs may speed up the operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default value can be set using the
jobs
config option withdvc remote modify
.
This doesn't apply to get-url right? 🙂
default is `4`. Note that the default value can be set using the `jobs` config | ||
option with `dvc remote modify`. Using more jobs may speed up the operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here (there are no remotes involved in the source for import-url, except with --to-remote)
I addressed my last commends in #2158 |
* cmd: --jobs option desc in sync refs per #2122 (review) * cmd: update get/import-url --jobs desc. per #2122 (review) and #2122 (review)
❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.
🐛 Please make sure to mention
Fix #issue
(if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.Please choose to allow us to edit your branch when creating the PR.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
iterative/dvc#5362