Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exp run: add experiment name check. #6848

Merged
merged 8 commits into from
Oct 25, 2021

Conversation

karajan1001
Copy link
Contributor

@karajan1001 karajan1001 commented Oct 21, 2021

fix #6674 and fix #6652

  1. Add experiment name check (https://git-scm.com/docs/git-check-ref-format)
  2. Add duplicate exp name check.
  3. Add some unit test for it.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@pmrowla , in #6819, exp name with '/' inside are request. And these types of exp names are stored in a subdirectory. For example in #6843, it is stored in .git/refs/heads/dependabot/pip/pylint-2.11.1. The subdirectory here is used for grouping branches. Currently, our dvc exp did not support this. It involves not only name checking here, maybe we should reopen #6819 and create a new PR for it?

@karajan1001 karajan1001 requested a review from a team as a code owner October 21, 2021 09:37
@karajan1001 karajan1001 requested a review from pmrowla October 21, 2021 09:37
@pmrowla
Copy link
Contributor

pmrowla commented Oct 21, 2021

@pmrowla , in #6819, exp name with '/' inside are request. And these types of exp names are stored in a subdirectory. For example in #6843, it is stored in .git/refs/heads/dependabot/pip/pylint-2.11.1. The subdirectory here is used for grouping branches. Currently, our dvc exp did not support this. It involves not only name checking here, maybe we should reopen #6819 and create a new PR for it?

We should not support slashes in experiment names. Yes, the full ref can contain slashes, and we already use slashes in exp refs to organize them, but the user-readable (short) name should not contain slashes - the same as git branch or tag (short) names.

@karajan1001
Copy link
Contributor Author

We should not support slashes in experiment names. Yes, the full ref can contain slashes, and we already use slashes in exp refs to organize them, but the user-readable (short) name should not contain slashes - the same as git branch or tag (short) names.

Git branches can have slashes inside them, in the case the slashes will be regarded as directory name splits. And the branches will be well organized in groups.
asciicast

@pmrowla
Copy link
Contributor

pmrowla commented Oct 21, 2021

Ah ok, that's my misunderstanding then, I didn't think cgit allowed you to nest anything in refs/heads. I guess that could be re-opened as a feature request, but I'm still not sure we should support it, unless it becomes something that is more frequently requested from users. I'd prefer to just keep everything in refs/exps/... namespaces completely reserved for DVC use for now.

@karajan1001
Copy link
Contributor Author

Ah ok, that's my misunderstanding then, I didn't think cgit allowed you to nest anything in refs/heads. I guess that could be re-opened as a feature request, but I'm still not sure we should support it, unless it becomes something that is more frequently requested from users. I'd prefer to just keep everything in refs/exps/... namespaces completely reserved for DVC use for now.

I will ban it in this PR, and then clarify in doc.

@pmrowla pmrowla added A: experiments Related to dvc exp ui user interface / interaction labels Oct 22, 2021
dvc/command/experiments.py Outdated Show resolved Hide resolved
dvc/command/experiments.py Outdated Show resolved Hide resolved
@dberenbaum
Copy link
Collaborator

I'd prefer to just keep everything in refs/exps/... namespaces completely reserved for DVC use for now.

Is it harder to handle nested refs in refs/exps? Or is there some other reason not to allow slashes? @karajan1001 makes a good point that this could be a helpful way to organize, plus it would solve #6819.

@karajan1001
Copy link
Contributor Author

I'd prefer to just keep everything in refs/exps/... namespaces completely reserved for DVC use for now.

Is it harder to handle nested refs in refs/exps? Or is there some other reason not to allow slashes? @karajan1001 makes a good point that this could be a helpful way to organize, plus it would solve #6819.

The dulwich backend support / in exp name, and actually our full reference is inside the subdirectory named by the HEAD revision. But still some problems on the DVC side, for example currently we use / to separate refname and the last part as the human-readable one, this should be changed. I don't think it is very difficult to support / in ref name, but still need some effort.

@karajan1001 karajan1001 changed the title Add experiment name check. exp run: add experiment name check. Oct 25, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Oct 25, 2021

I'd prefer to just keep everything in refs/exps/... namespaces completely reserved for DVC use for now.

Is it harder to handle nested refs in refs/exps? Or is there some other reason not to allow slashes? @karajan1001 makes a good point that this could be a helpful way to organize, plus it would solve #6819.

There's nothing preventing us from allowing slashes in the exp refs, but once we start allowing them it means we cannot extend the current naming convention of refs/exps/ab/cd1234/.../<exp_name> to include any new information in the ... portion of the ref. My main concern is whether or not we think the current state of exp/refs is stable enough to make this change. I'd prefer not to make the change right now, unless using slashes in names really is a hard requirement.

Regarding user naming conventions, dots . and dashes - are allowed in exp names, and can also be used to generate hierarchies which could be filtered/sorted by users in the same way as using slash.

dvc/repo/experiments/__init__.py Outdated Show resolved Hide resolved
dvc/repo/experiments/__init__.py Outdated Show resolved Hide resolved
tests/unit/repo/experiments/test_utils.py Outdated Show resolved Hide resolved
karajan1001 and others added 2 commits October 25, 2021 11:38
Co-authored-by: Peter Rowlands (변기호) <[email protected]>
Co-authored-by: Peter Rowlands (변기호) <[email protected]>
Comment on lines 503 to 506
name = kwargs.pop("name", None)
self._validate_ref_name(name, **kwargs)

return self._stash_exp(*args, name=name, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit strange to pop a kwarg and then re-use it again in the following call (to _stash_exp

Suggested change
name = kwargs.pop("name", None)
self._validate_ref_name(name, **kwargs)
return self._stash_exp(*args, name=name, **kwargs)
self._validate_ref_name(kwargs.get("name"), **kwargs)
return self._stash_exp(*args, **kwargs)

Alternatively, you could also just make name default to None in _validate_ref_name and call it with **kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that kwargs still contains name argument and this would cause a duplication. If we want to use this method, we need to get name, force, reset and pass them separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the simplest fix is to just do

def _validate_ref_name(self, name: Optional[str] = None, **kwargs):

and call it from here as

self._validate_ref_name(**kwargs)

Copy link
Contributor

@pmrowla pmrowla Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also change this function to be

def new(self, *args, checkpoint_resume: Optional[str] = None, name: Optional[str] = None, **kwargs):

and then keep the call you have as

self._validate_ref_name(name, **kwargs)
return self._stash_exp(*args, name=name, **kwargs)

The point is just that doing

value = kwargs.pop(key)
func(key=value, **kwargs)

isn't particularly clean code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the simplest fix is to just do

def _validate_ref_name(self, name: Optional[str] = None, **kwargs):
and call it from here as

self._validate_ref_name(**kwargs)

Yes, I also considered it, but I think it is more clear to call it explicitly.

@@ -473,6 +473,17 @@ def _log_reproduced(self, revs: Iterable[str], tmp_dir: bool = False):
"\tdvc exp branch <exp> <branch>\n"
)

def _validate_new_ref(self, exp_ref: ExpRefInfo):
Copy link
Contributor Author

@karajan1001 karajan1001 Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is used for validating a new reference, not only the name. Because we include duplication detect in it. And also the duplication detection is not to the name but to the whole reference.

@pmrowla pmrowla merged commit c075921 into iterative:master Oct 25, 2021
@karajan1001 karajan1001 deleted the exp_name_check branch October 25, 2021 12:01
@dberenbaum
Copy link
Collaborator

I'd prefer not to make the change right now, unless using slashes in names really is a hard requirement.

Sounds good. I don't think there's a hard requirement for now at least, but helpful to have more context about the rationale.

@dberenbaum
Copy link
Collaborator

Looks good! One small request: can we remove "Reproduced" from

"Reproduced experiment conflicts with existing experiment "
?

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Dec 2, 2021

From what I understand this fixes bugs so no docs changes are needed (I checked the box).

@karajan1001
Copy link
Contributor Author

From what I understand this fixes bugs so no docs changes are needed (I checked the box).

This PR forbids the duplicate name and name with some illegal character (space, asterisk, etc), in my opinion, better to make an error msg if someone wants to use an illegal name.

@jorgeorpinel
Copy link
Contributor

OK I created iterative/dvc.org#3065 then.

shcheklein pushed a commit to iterative/dvc.org that referenced this pull request Dec 7, 2021
* ref: state exp names shoold be unique

rel. iterative/dvc#6848

* Restyled by prettier (#3066)

Co-authored-by: Restyled.io <[email protected]>

Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
iesahin pushed a commit to iterative/dvc.org that referenced this pull request Apr 11, 2022
* ref: state exp names shoold be unique

rel. iterative/dvc#6848

* Restyled by prettier (#3066)

Co-authored-by: Restyled.io <[email protected]>

Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: experiments Related to dvc exp ui user interface / interaction
Projects
None yet
4 participants