Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested attrs.define dataclasses not serialized correctly when a subfield is also an attrs.define dataclass marked with attrs.field #643

Merged
merged 8 commits into from
Dec 18, 2024

Conversation

cody-mar10
Copy link
Contributor

@cody-mar10 cody-mar10 commented Dec 12, 2024

What does this PR do?

I am trying to see if I can convert a bunch of internal dataclasses.dataclass config objects to be defined with attrs.define instead for field validation and conversion.

I've noticed that jsonargparse handles simple cases with attrs-style dataclasses, but it breaks down with nested attrs-style dataclasses ONLY when the subfield is marked with an attrs.field descriptor, such as when using a default factory.

Here is a simple example:

from attrs import define, field
from jsonargparse import ArgumentParser

@define
class SubField:
    x: int = 0
    y: int = 1

@define
class Args:
    a: int
    subfield: SubField = field(factory=SubField)

parser = ArgumentParser()
parser.add_argument("--args", type=Args)

which leads to this call stack:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/ccmartin6/miniconda3/envs/pst/lib/python3.10/site-packages/jsonargparse/_core.py", line 127, in add_argument
    self.add_dataclass_arguments(kwargs.pop("type"), nested_key, **kwargs)
  File "/scratch/ccmartin6/miniconda3/envs/pst/lib/python3.10/site-packages/jsonargparse/_signatures.py", line 476, in add_dataclass_arguments
    self._add_signature_parameter(
  File "/scratch/ccmartin6/miniconda3/envs/pst/lib/python3.10/site-packages/jsonargparse/_signatures.py", line 416, in _add_signature_parameter
    action = container.add_argument(*args, **kwargs)
  File "/scratch/ccmartin6/miniconda3/envs/pst/lib/python3.10/site-packages/jsonargparse/_core.py", line 127, in add_argument
    self.add_dataclass_arguments(kwargs.pop("type"), nested_key, **kwargs)
  File "/scratch/ccmartin6/miniconda3/envs/pst/lib/python3.10/site-packages/jsonargparse/_signatures.py", line 471, in add_dataclass_arguments
    defaults = dataclass_to_dict(default)
  File "/scratch/ccmartin6/miniconda3/envs/pst/lib/python3.10/site-packages/jsonargparse/_signatures.py", line 623, in dataclass_to_dict
    return dataclasses.asdict(value)
  File "/scratch/ccmartin6/miniconda3/envs/pst/lib/python3.10/dataclasses.py", line 1237, in asdict
    raise TypeError("asdict() should be called on dataclass instances")
TypeError: asdict() should be called on dataclass instances

Thus, it would appear that the only change is needed in jsonargparse._signatures.dataclass_to_dict. More specifically, this pull request does the following in terms of code changes:

  1. After checking if a dataclass-like object is a pydantic dataclass, it will check if the attrs library is available using the flag already computed in jsonargparse._optional
  2. If attrs is available, then it will check if the input dataclass-like object is an attrs.define-style dataclass.
  3. If yes, then use attrs.asdict to serialize the model instead of dataclasses.asdict

Alternatives

There are several alternatives to implementing this pull request that all fall short in some way.

1. jsonargparse.lazy_instance

This makes the attrs.define-style dataclass work with jsonargparse and is possibly the simplest alternative, but it does NOT allow creating an instance with no init args:

from attrs import define
from jsonargparse import ArgumentParser, lazy_instance

@define
class SubField:
    x: int = 0
    y: int = 1

@define
class Args:
    a: int
    subfield: SubField = lazy_instance(SubField)

>>> parser = ArgumentParser()
>>> parser.add_argument("--args", type=Args)
>>> parser.parse_args(["--args.a", "2"])
Namespace(args=Namespace(a=2, subfield=Namespace(x=0, y=1)))

>>> Args(a=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<attrs generated repr __main__.Args>", line 13, in __repr__
  File "<attrs generated repr __main__.SubField>", line 13, in __repr__
AttributeError: 'LazyInstance_SubField' object has no attribute 'x'

which is not desired when these dataclasses are not just intermediates between command-line inputs and the rest of the code base.

2. Don't make the subfields have default values

This is similar to the above in that jsonargparse can handle this case and will correctly see that SubField has default values. However, this has the same problem that you cannot create a default Args object without init values. There is a further complication with field order when introducing required values.

from attrs import define
from jsonargparse import ArgumentParser, lazy_instance

@define
class SubField:
    x: int = 0
    y: int = 1

@define
class Args:
    a: int
    subfield: SubField 

>>> parser = ArgumentParser()
>>> parser.add_argument("--args", type=Args)
>>> parser.parse_args(["-h"])
usage: [-h] [--args CONFIG] --args.a A [--args.subfield CONFIG] [--args.subfield.x X] [--args.subfield.y Y]

options:
  -h, --help            Show this help message and exit.

Method generated by attrs for class Args:
  --args CONFIG         Path to a configuration file.
  --args.a A            (required, type: int)

Method generated by attrs for class SubField:
  --args.subfield CONFIG
                        Path to a configuration file.
  --args.subfield.x X   (type: int, default: 0)
  --args.subfield.y Y   (type: int, default: 1)

>>> Args(a=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Args.__init__() missing 1 required positional argument: 'subfield'

3. Make use of attrs.field converters

This technically works on the code end for constructing default values and when using jsonargparse.CLI, but the fields on the SubField are not shown in the CLI help page, which is not ideal.

from typing import Optional, TypeVar
from functools import partial
from jsonargparse import ArgumentParser, CLI
from attrs import define, field

T = TypeVar("T")

@define
class SubField:
    a: int = 0
    b: str = "1"

def convert_default_sub_dataclasses(value: Optional[T], cls: type[T]) -> T:
    if value is None:
        return cls()
    return value

@define
class Args:
    c: float
    d: Optional[SubField] = field(default=None, converter=partial(convert_default_sub_dataclasses, cls=SubField))

def main(args: Args):
    print(args)
    return args

>>> code_default = Args(c=1.2)
>>> code_default
Args(c=1.2, d=SubField(a=0, b='1'))

>>> cli_default = CLI(main, args=["--args.c=1.2"])
Args(c=1.2, d=SubField(a=0, b='1'))

>>> code_default == cli_default
True

>>> parser = ArgumentParser()
>>> parser.add_argument("--args", type=Args)
>>> parser.parse_args(["-h"])
usage: [-h] [--args CONFIG] --args.c C [--args.d D]

options:
  -h, --help     Show this help message and exit.

Method generated by attrs for class Args:
  --args CONFIG  Path to a configuration file.
  --args.c C     (required, type: float)
  --args.d D     (type: Optional[SubField], default: null)

4. Don't nest attrs.define-style dataclasses

This would obviously solve the problems without code changes, but considering that nested dataclasses.dataclass objects are handled AND this lib should not impose that kind of design decision on users, this is not really that great of a choice. Further, the required changes are extremely minimal.

Before submitting

  • Did you read the contributing guideline?
  • [n/a] Did you update the documentation? (readme and public docstrings)
  • Did you write unit tests such that there is 100% coverage on related code? (required for bug fixes and new features)
  • Did you verify that new and existing tests pass locally?
  • Did you make sure that all changes preserve backward compatibility?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)
    • I added a section, but I was unsure how this might fit into the indicated changes for v4.35.0 since I only tested with the latest available release v4.34.1

cody-mar10 and others added 5 commits December 11, 2024 19:10
- an edge case that is not handled is when nesting `attrs.define` dataclasses where a subfield is marked with `attrs.field`
  - this would typically happen when using a default factory for this field or with custom validation
- the code fix checks if `attrs.asdict` is both available and appropriate to use for serializing a dataclass-like object
Copy link
Member

@mauvilsa mauvilsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing! Nested attrs defines certainly makes sense to be supported.

I added a few comments. Also, do you know why your pull request did not run the github workflows? I can't really merge if I am unable to see everything run.

CHANGELOG.rst Outdated Show resolved Hide resolved
jsonargparse_tests/test_dataclass_like.py Outdated Show resolved Hide resolved
jsonargparse_tests/test_dataclass_like.py Outdated Show resolved Hide resolved
jsonargparse_tests/test_dataclass_like.py Outdated Show resolved Hide resolved
jsonargparse_tests/test_dataclass_like.py Outdated Show resolved Hide resolved
jsonargparse_tests/test_dataclass_like.py Outdated Show resolved Hide resolved
@mauvilsa
Copy link
Member

Figured out why tests were not being run. Github has a new "merge experience" which I had active. Apparently that doesn't have button to approve runs for new contributors.

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (8f22f69) to head (960daf2).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #643   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           22        22           
  Lines         6499      6504    +5     
=========================================
+ Hits          6499      6504    +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

additionally fixed `test_nested_without_default` to use the correct `attrs` model in the test
@cody-mar10
Copy link
Contributor Author

Hi, thanks for responding and taking a look at this. I believe that I have made the changes you requested

@cody-mar10 cody-mar10 requested a review from mauvilsa December 17, 2024 20:27
@cody-mar10
Copy link
Contributor Author

Also, I am not sure why 2 of the tests are failing, since the error messages aren't that helpful

@mauvilsa
Copy link
Member

Also, I am not sure why 2 of the tests are failing, since the error messages aren't that helpful

Don't worry about that. It is not related to your changes.

Copy link
Member

@mauvilsa mauvilsa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for contributing!

def test_nested_without_default(self, parser):
parser.add_argument("--data", type=AttrsWithNestedDataclassNoDefault)
cfg = parser.parse_args(["--data.p1=1.23"])
assert cfg.data == Namespace(p1=1.23, subfield=Namespace(p1="-", p2=0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks somewhat unexpected. I would have guessed that the parsed value would be Namespace(p1=1.23, subfield=None). But anyway, this is not related to this pull request so I would say to keep it like this for now. Probably this behavior will need to change for #287.

@mauvilsa mauvilsa merged commit 56ab4d6 into omni-us:main Dec 18, 2024
27 of 29 checks passed
@mauvilsa mauvilsa added the bug Something isn't working label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants