Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unifying training argument type annotations #17934

Merged
merged 4 commits into from
Jun 30, 2022
Merged

Unifying training argument type annotations #17934

merged 4 commits into from
Jun 30, 2022

Conversation

jannisborn
Copy link
Contributor

What does this PR do?

This PR fixes the incorrect type annotations in the TrainingArguments class. Some arguments can handle complex types like evaluation_strategy: IntervalStrategy. However, when calling from CLI, they can be initialized using a str as well which is not reflected in the type annotations.

Solution: Fix the type annotations to Union[ComplexType, str].

Note that this PR simply ensures consistency between the docstrings and the annotated types. E.g., the docstring for evaluation_strategy is already:

        evaluation_strategy (`str` or [`~trainer_utils.IntervalStrategy`], *optional*, defaults to `"no"`):
            The evaluation strategy...

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Please have a look @sgugger!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 29, 2022

The documentation is not available anymore as the PR was closed or merged.

@jannisborn
Copy link
Contributor Author

I understand that the tests are failing because of:

ValueError: Only `Union[X, NoneType]` (i.e., `Optional[X]`) is allowed for `Union` because
the argument parser only supports one type per argument. Problem encountered in field 'evaluation_strategy'.

Could this issue of not supporting multiple types be fixed in HF itself? I guess no since it's introduced upstream by arparse.add_argument not allowing multiple types? @sgugger

Ultimately, this is inconsistency between the type annotations and the actually possible types is caused by HF. I think it's quite problematic because it's a systematic inconsistency that makes things appear more complex as they are. If the annotation has to be a single type, then should it not be the simplest type that can actually be used, in this case str?
In that way, HF would be minimally invasive wrt downstream packages that indeed have stricter type annotation requirements.

@sgugger
Copy link
Collaborator

sgugger commented Jun 29, 2022

The type is not perfectly exact, but note that:

  1. the argument will be converted to that type in the post-init, so while the init of the dataclass accepts both str and IntervalStrategy (or other enum types), the attribute will always be of the enum type.
  2. having the enum has main type allows us to properly fill the choices part of the parser for CLI help.

So to be able to accept the change in type, we would need some custom code in HfArgumentParser to not only stop erroring on those types, but also properly fill the choices part. If you're interested in exploring this further, those are the missing steps we would need in order to merge this PR.

@BramVanroy
Copy link
Collaborator

Looking at this from the surface, it seems that this PR is (partially?) covered by the PR that was merged earlier today? #17933

There:

  • "Complex" enumtypes like IntervalStrategy (all that subclass ExplicitEnum) now also subclass str. As a consequence, the argparse equivalents also accept any string value
  • HfArgumentParser._parse_dataclass_field (that gives you the Union error) has been updated to allow Union's that include a str, because the str type is never an issue for argparse (as its the default)

@sgugger
Copy link
Collaborator

sgugger commented Jun 29, 2022

We don't have the error anymore, but we are still losing the autofill of "choices" and all the custom logic we had for enums here.

@jannisborn
Copy link
Contributor Author

Thanks a lot @BramVanroy, that's a nice coincidence!!

@sgugger: Could we move up that logic about the autofill to an elif starting at L94?

@sgugger
Copy link
Collaborator

sgugger commented Jun 29, 2022

I think there should be an if at line 94 that replaces the field.dtype by the field.type.__args__ which is not str (like we replace the field.dtype that is not None below line 95 for Optional), then line 105 and the test for enums will be triggered properly.

Basically something like:

if type(None) not in field.type.__args__:
    # filter `str` in Union
    field.type = field.type.__args__[0] if field.type.__args__[1] == str else field.type.__args__[1]
    origin_type = getattr(field.type, "__origin__", field.type)
elif bool not in field.type.__args__:

before and replacing the line

if bool not in field.type.__args__:

@jannisborn
Copy link
Contributor Author

Just did that!

@sgugger
Copy link
Collaborator

sgugger commented Jun 29, 2022

Thanks! Will play a bit with it tomorrow morning to triple-check nothing breaks then it should be good to merge!

@sgugger
Copy link
Collaborator

sgugger commented Jun 30, 2022

All good in my tests, thanks again for your work on this!

@sgugger sgugger merged commit 4f8361a into huggingface:main Jun 30, 2022
viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022
* doc: Unify training arg type annotations

* wip: extracting enum type from Union

* blackening
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants