-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API to add artifacts #9345
Add API to add artifacts #9345
Conversation
@aguschin Converted to draft since it's WIP. Hope that's okay. |
Co-authored-by: Oded Messer <[email protected]>
@aguschin Is this still in draft mode? Is it ready to be reviewed and merged? |
@aguschin Let's not include in this PR, but what about |
How would DVCFileSystem implement this? On other APIs, it feels too early. Also, I am not a fan of overloading target here, I'd have preferred it to be unambiguous with To be honest, I am not seeing much value with this, that's not already possible. |
I don't think we should implement it there for the first release - don't see clearly the scenario when it's really needed when there is get/import. If users will need that, we can add it later.
Thank you for sharing your thoughts @skshetry! I understand you here. My thoughts: we use the same to reference stages in dvc.yaml, so I think it's OK to reuse same approach, since we already took it.
Rn, names must be unique within a selected dvc.yaml, but I assume you meant they should be unique over all dvc.yaml files in DVC repo? This is quite hard to maintain I think, so don't really see this as an option.
Sorry, didn't get you here. Did you meant this PR, or |
@aguschin How will users get the revisions they want? For example, I may always want the latest prod version. I think this is more important than supporting the artifact name. |
It's okay to make an assumption that they are going to be unique on a repo-level without enforcing it, as long as they are documented. There's an alternative way, from path to download artifacts. |
I meant I don't see much difference between either of the following options. Most of the time, the artifact will refer to the same path, they will rarely change. $ dvc get https://github.com/iterative/example-get-started dvc.yaml:artifact1
$ dvc get https://github.com/iterative/example-get-started path/model I do see in #9100 that you proposed: $ dvc get $REPO mymodel@latest Which I do see value in. |
@skshetry Agreed that this is looks more valuable, but I think it makes sense to merge support for artifact names and follow up later with gto revision support. @aguschin Can we have some kind of revision support in |
Still think of either using a flag or at least getting rid of dvc.yaml path. Requiring too many information is counter-productive especially when the difference is already very minimal between path and name. It’s a remote resource that we are talking about, figuring out where the artifact is defined is even harder. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #9345 +/- ##
==========================================
- Coverage 91.60% 91.35% -0.25%
==========================================
Files 488 488
Lines 38101 38148 +47
Branches 5464 5469 +5
==========================================
- Hits 34901 34852 -49
- Misses 2637 2712 +75
- Partials 563 584 +21
☔ View full report in Codecov by Sentry. |
@@ -28,9 +28,27 @@ def to_dict(self) -> Dict[str, str]: | |||
|
|||
|
|||
@dataclass | |||
class Artifact(Annotation): | |||
class Artifact: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the problem inheriting from Annotation
class here (and just adding path: str
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path
is required, thus it should come before all other arguments. Thus simply inheriting doesn't work - it throws errors at running. I didn't find a better solution than simply copy-pasting this code unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there's a nicer solution here.
@skshetry - maybe you have an idea?
Co-authored-by: Oded Messer <[email protected]>
Dvc get artifacts upd
|
||
def merge(self, ancestor, other, allowed=None): | ||
raise NotImplementedError | ||
def check_for_nested_dvc_repo(dvcfile: Path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the harm in allowing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's confusing, since if you add it like this, you later won't find it (it's out of scope for this DVC repo). Since Studio is the only user for this API, for now if the user adds the model like this (if we disable check), he won't find it in Studio MR. Makes sense to make this check on DVC side to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i have said before, dvc does not support nested subrepos, so it does not make sense to add an error message about this.
WIP
implementdvc get
artifactsassert thatget
is implemented rightassert it's alright to simply split by:
to getdvc.yaml
location and artifact nameimplement same logic forimport