Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to add artifacts #9345

Merged
merged 26 commits into from
May 15, 2023
Merged

Add API to add artifacts #9345

merged 26 commits into from
May 15, 2023

Conversation

aguschin
Copy link
Contributor

@aguschin aguschin commented Apr 19, 2023

WIP

  • implement dvc get artifacts
  • add writing new artifacts (used by Studio, DVCLive uses its own way due to simplicity)
  • Remove some name limitations: "Do you know if using a number as the first character is problematic? If not, could we drop the restriction for The first character must be a letter.? The rest of the logic seems pretty straightforward."
  • assert that get is implemented right
    • assert it's alright to simply split by : to get dvc.yaml location and artifact name
  • implement same logic for import
  • add tests

@aguschin aguschin self-assigned this Apr 19, 2023
@aguschin aguschin added the A: artifacts Related to `artifacts` section in `dvc.yaml` label Apr 19, 2023
@dberenbaum dberenbaum marked this pull request as draft April 19, 2023 17:36
@dberenbaum
Copy link
Collaborator

@aguschin Converted to draft since it's WIP. Hope that's okay.

dvc/dvcfile.py Outdated Show resolved Hide resolved
dvc/repo/get.py Outdated Show resolved Hide resolved
@aguschin aguschin requested a review from skshetry April 25, 2023 10:01
dvc/commands/get.py Outdated Show resolved Hide resolved
dvc/dvcfile.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Show resolved Hide resolved
dvc/repo/get.py Outdated Show resolved Hide resolved
dvc/repo/get.py Outdated Show resolved Hide resolved
dvc/repo/get.py Outdated Show resolved Hide resolved
dvc/repo/get.py Outdated Show resolved Hide resolved
dvc/commands/get.py Outdated Show resolved Hide resolved
@dberenbaum
Copy link
Collaborator

@aguschin Is this still in draft mode? Is it ready to be reviewed and merged?

@dberenbaum
Copy link
Collaborator

@aguschin Let's not include in this PR, but what about dvc get-url and API methods (dvc.api.open/read and DVCFileSystem)?

@skshetry
Copy link
Member

skshetry commented May 2, 2023

@aguschin Let's not include in this PR, but what about dvc get-url and API methods (dvc.api.open/read and DVCFileSystem)?

How would DVCFileSystem implement this? On other APIs, it feels too early.

Also, I am not a fan of overloading target here, I'd have preferred it to be unambiguous with --model or similar flags. Or, alternatively get rid of dvc.yaml path and assume model names are going to be unique, as there's a way to download using the path unambiguously.

To be honest, I am not seeing much value with this, that's not already possible.

@aguschin
Copy link
Contributor Author

aguschin commented May 2, 2023

@aguschin Let's not include in this PR, but what about dvc get-url and API methods (dvc.api.open/read and DVCFileSystem)?

I don't think we should implement it there for the first release - don't see clearly the scenario when it's really needed when there is get/import. If users will need that, we can add it later.

Also, I am not a fan of overloading target here, I'd have preferred it to be unambiguous with --model or similar flags

Thank you for sharing your thoughts @skshetry! I understand you here. My thoughts: we use the same to reference stages in dvc.yaml, so I think it's OK to reuse same approach, since we already took it.

Or, alternatively get rid of dvc.yaml path and assume model names are going to be unique, as there's a way to download using the path unambiguously.

Rn, names must be unique within a selected dvc.yaml, but I assume you meant they should be unique over all dvc.yaml files in DVC repo? This is quite hard to maintain I think, so don't really see this as an option.

To be honest, I am not seeing much value with this, that's not already possible.

Sorry, didn't get you here. Did you meant this PR, or dvc get-url and API methods (dvc.api.open/read and DVCFileSystem)?

@dberenbaum
Copy link
Collaborator

@aguschin How will users get the revisions they want? For example, I may always want the latest prod version. I think this is more important than supporting the artifact name.

@skshetry
Copy link
Member

skshetry commented May 3, 2023

Rn, names must be unique within a selected dvc.yaml, but I assume you meant they should be unique over all dvc.yaml files in DVC repo? This is quite hard to maintain I think, so don't really see this as an option.

It's okay to make an assumption that they are going to be unique on a repo-level without enforcing it, as long as they are documented. There's an alternative way, from path to download artifacts.

@skshetry
Copy link
Member

skshetry commented May 3, 2023

Sorry, didn't get you here. Did you meant this PR, or dvc get-url and API methods (dvc.api.open/read and DVCFileSystem)?

I meant dvc get. Users can already download using path. We are only making it more confusing by overloading it.

I don't see much difference between either of the following options. Most of the time, the artifact will refer to the same path, they will rarely change.

$ dvc get https://github.com/iterative/example-get-started dvc.yaml:artifact1
$ dvc get https://github.com/iterative/example-get-started path/model

I do see in #9100 that you proposed:

$ dvc get $REPO mymodel@latest

Which I do see value in.

@dberenbaum
Copy link
Collaborator

$ dvc get $REPO mymodel@latest

@skshetry Agreed that this is looks more valuable, but I think it makes sense to merge support for artifact names and follow up later with gto revision support.

@aguschin Can we have some kind of revision support in dvc get for release? I think it's important since otherwise we are still stuck asking users to install and learn gto to deploy.

@skshetry
Copy link
Member

skshetry commented May 3, 2023

Still think of either using a flag or at least getting rid of dvc.yaml path. Requiring too many information is counter-productive especially when the difference is already very minimal between path and name. It’s a remote resource that we are talking about, figuring out where the artifact is defined is even harder.

@codecov
Copy link

codecov bot commented May 7, 2023

Codecov Report

Patch coverage: 93.24% and project coverage change: -0.25 ⚠️

Comparison is base (915dd50) 91.60% compared to head (11f51a6) 91.35%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9345      +/-   ##
==========================================
- Coverage   91.60%   91.35%   -0.25%     
==========================================
  Files         488      488              
  Lines       38101    38148      +47     
  Branches     5464     5469       +5     
==========================================
- Hits        34901    34852      -49     
- Misses       2637     2712      +75     
- Partials      563      584      +21     
Impacted Files Coverage Δ
dvc/schema.py 100.00% <ø> (ø)
dvc/repo/artifacts.py 89.09% <85.71%> (+0.85%) ⬆️
dvc/annotations.py 100.00% <100.00%> (ø)
tests/func/artifacts/test_artifacts.py 100.00% <100.00%> (ø)

... and 23 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@omesser omesser self-requested a review May 7, 2023 12:59
@@ -28,9 +28,27 @@ def to_dict(self) -> Dict[str, str]:


@dataclass
class Artifact(Annotation):
class Artifact:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the problem inheriting from Annotation class here (and just adding path: str)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path is required, thus it should come before all other arguments. Thus simply inheriting doesn't work - it throws errors at running. I didn't find a better solution than simply copy-pasting this code unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there's a nicer solution here.
@skshetry - maybe you have an idea?

dvc/repo/artifacts.py Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
tests/func/artifacts/test_artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
@aguschin aguschin changed the title dvc get artifacts Add API to add artifacts May 8, 2023
dvc/annotations.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved
@aguschin
Copy link
Contributor Author

Updated to your feedback @omesser @skshetry. Looking forward to get another round here and get this merged! 🎉

dvc/repo/artifacts.py Outdated Show resolved Hide resolved
dvc/repo/artifacts.py Outdated Show resolved Hide resolved

def merge(self, ancestor, other, allowed=None):
raise NotImplementedError
def check_for_nested_dvc_repo(dvcfile: Path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the harm in allowing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing, since if you add it like this, you later won't find it (it's out of scope for this DVC repo). Since Studio is the only user for this API, for now if the user adds the model like this (if we disable check), he won't find it in Studio MR. Makes sense to make this check on DVC side to me.

Copy link
Member

@skshetry skshetry May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As i have said before, dvc does not support nested subrepos, so it does not make sense to add an error message about this.

dvc/repo/artifacts.py Outdated Show resolved Hide resolved
@aguschin aguschin enabled auto-merge (squash) May 15, 2023 07:30
@aguschin aguschin merged commit 4518150 into iterative:main May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: artifacts Related to `artifacts` section in `dvc.yaml`
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants