Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-121141: add support for copy.replace to AST nodes #121162

Merged
merged 25 commits into from
Jul 4, 2024

Conversation

picnixz
Copy link
Contributor

@picnixz picnixz commented Jun 29, 2024

cc @JelleZijlstra @serhiy-storchaka

I eventually had enough time for this one, but I had a question on the implementation and efficiency. Assume that I have the following:

node = ast.Name(id='a')
node.foo = 1
node.bar = 2

I cannot do ast.Name(id='a', foo=...) in 3.15 (it's deprecated) and this forbids me to call the constructor with the keyword arguments passed to copy.replace. So, copy.replace(node, id='b', foo=0) with kwargs = {'id': '5', 'foo': 0} is implemented as follows:

  1. Create node2 = ast.Name(id='b', ctx=node.ctx) (this pops 'id' from kwargs, or more generally, all fields known to the AST class, but not those extra fields).
  2. Do node2.__dict__ |= kwargs (i.e., you make the changes on the extra fields).
  3. Now node2.id == 'b' and node2.foo = 0 but node2 has no attribute bar yet because the constructor did not carry this information.
  4. So, I need again to do something like PyDict_Merge(node2.__dict__, node.__dict__, 0) so that all remaining values from node.__dict__ that were neither obtained in 1 or 2 are not set.

Ideally, I'd like to create a function that can add me whatever attributes I want without warning me but I'm not sure of the refactoring. So for now, I left that logic separate. There is a separate routine that checks whether you can replace a field or not (e.g., copy.replace(node, unknown=1) would fail because the node has no attributes named unknown for now).


I also fixed ast.compare because I may lack attributes or fields, and comparing two objects that lack attributes or fields would still be correct IMO. If you want me to make another PR for that, I can cherry pick the commit (it's the first one).


EDIT: simple wording question but should it be "add support for [...] to AST nodes" or "for AST nodes"? (or whatever else?)


📚 Documentation preview 📚: https://cpython-previews--121162.org.readthedocs.build/

@serhiy-storchaka
Copy link
Member

I think that it should only accept fields known to the AST class.

@picnixz
Copy link
Contributor Author

picnixz commented Jun 29, 2024

I think that it should only accept fields known to the AST class.

I can live with that assumption but I need to update the docs then. However, what about constructions that add parent's back references. I would expect such nodes to be able to copy themselves but only change one single attribute (like a sibling) so that's why I only allowed the fields in __dict__ to be mutated.

By the way, is there something I should have taken care of because of singletons such as ast.Load? Now that I see my tests, I'm not sure if a special treatement is needed.

@serhiy-storchaka
Copy link
Member

AFAIK, all other __replace__ implementations only allow to change the fixed set of fields, not arbitrary instance attributes.

@picnixz
Copy link
Contributor Author

picnixz commented Jun 29, 2024

Actually, I assumed that AST nodes were somewhat similar to simple namespaces, which allows for arbitrary addition I think:

if (kwargs) {
if (PyDict_Update(((_PyNamespaceObject*)result)->ns_dict, kwargs) < 0) {
Py_DECREF(result);
return NULL;
}
}

But if you think we should restrict to fields known to the class, I can easily remove the code.

@JelleZijlstra
Copy link
Member

I feel AST nodes are more like dataclasses. For dataclasses, unrecognized arguments are rejected by copy.replace and non-field attributes on the original object are lost.

>>> from dataclasses import dataclass
>>> 
>>> @dataclass
... class A:
...     a: int
...     
>>> from copy import replace
>>> a = A(1)
>>> a.b = 42
>>> a2 = replace(a, a=2)
>>> a2
A(a=2)
>>> a2.b
Traceback (most recent call last):
  File "<python-input-8>", line 1, in <module>
    a2.b
AttributeError: 'A' object has no attribute 'b'
>>> a.b
42
>>> replace(a, b=5)
Traceback (most recent call last):
  File "<python-input-10>", line 1, in <module>
    replace(a, b=5)
    ~~~~~~~^^^^^^^^
  File "/Users/jelle/py/cpython/Lib/copy.py", line 294, in replace
    return func(obj, **changes)
  File "/Users/jelle/py/cpython/Lib/dataclasses.py", line 1630, in _replace
    return self.__class__(**changes)
           ~~~~~~~~~~~~~~^^^^^^^^^^^
TypeError: A.__init__() got an unexpected keyword argument 'b'

@picnixz
Copy link
Contributor Author

picnixz commented Jun 29, 2024

Just after I sent my comment, I actually thought that maybe I should reject them since we anyway reject arbitrary keyword arguments in the constructor, maybe it's indeed closer to dataclasses.

I convinced myself as well and since we reached a consensus, I'll remove this. I'll extend the validate function (good thing I did this).

@serhiy-storchaka
Copy link
Member

SimpleNamespace constructor accepts arbitrary keyword arguments, so __replace__() does the same. All names are valid for SimpleNamespace, but other objects are more picky.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the implementation can be simpler.

  • Get fields (PyObject_GetOptionalAttr((PyObject*)Py_TYPE(self), state->_fields, &fields)).
  • Iterate them, and get corresponding values (PyDict_GetItemRef(dict, name, &value)).
  • Add corresponding name-value pair to the new dictionary.
  • Update that dictionary with the keyword arguments dictionary.
  • Create the object of the same type by passing a new dictionary as keyword arguments.

Validation is performed in the constructor.

Equivalent Python code:

def __replace__(self, /, **kwargs):
    newkwargs = {}
    for name in self._fields:
        value = getattr(self, name)
        newkwargs[name] = value
    newkwargs.update(kwargs)
    return type(self)(**kwargs)

@picnixz
Copy link
Contributor Author

picnixz commented Jun 30, 2024

I think that the implementation can be simpler.

Yes, that's what I decided after our discussion from yesterday. Currently the construction only warns about extra keyword arguments. For now, I'll leave that warning but this also means that the fields are replaced, even though the behaviour would disappear. In particular, in 3.15 and later, the tests should be changed, so I added some comments but is there a more rigourous way?

Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Lib/ast.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

You even made the changes I thought about but didn't write so as not to be too picky. We think alike.

@picnixz
Copy link
Contributor Author

picnixz commented Jul 2, 2024

I added two commits in which the optional attributes are checked. I have other questions as well and I'm not sure if we should first patch them before continuing this.

  • The allowed attributes are hard-coded and thus even if I specify them in sub-classes, I get a warning. Is it fine?

    >>> import ast
    >>> class A(ast.AST):
    ...     _attributes = ('a',)
    ...     a = None
    ...
    >>> A(a=1)
    <python-input-6>:1: DeprecationWarning: A.__init__ got an unexpected keyword argument 'a'. Support for arbitrary keyword arguments is deprecated and will be removed in Python 3.15.
      A(a=1)
    <__main__.A object at 0x7f021ddb08d0>
  • Because of this behaviour, I still have a warning being emitted. However, copy.replace correctly raises exceptions whenever a field that is not declared explicitly on the class is missing or not recognized.

  • Because attributes are sometimes optional and sometimes not, I do not have a way to check whether they should be supplied or not easily. I have a little trick where I try to find the expected attributes on the current instance, and if they exist, I assume that they do not need to be supplied again. In addition, it appears that "required attributes" cannot be deleted:

    >>> ast.Expr(1).end_lineno is None
    True
    >>> del ast.Expr(1).end_lineno
    Traceback (most recent call last):
    File "<python-input-11>", line 1, in <module>
      del ast.Expr(1).end_lineno
          ^^^^^^^^^^^^^^^^^^^^^^
    AttributeError: 'Expr' object has no attribute 'end_lineno'
  • Finally, is there a way to simplify my code? I don't know all the functions that exist, nor do I know all the _Py* functions so I'd be happy if you could tell me if there's a shortcut function for ', '.join(map(repr, sorted(x))) ....

Copy link
Contributor Author

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with this implementation we raise exceptions whenever it is needed (except at one point where a warning is emitted, but I think it's a separate issue; see my previous comment).

Anyway I hope that I have made the requested changes; please review again.

Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Show resolved Hide resolved
@picnixz
Copy link
Contributor Author

picnixz commented Jul 3, 2024

Ah the bot did not catch the fact that I said that I was done with the changes.

I have made the requested changes; please review again.

@bedevere-app
Copy link

bedevere-app bot commented Jul 3, 2024

Thanks for making the requested changes!

@serhiy-storchaka, @JelleZijlstra: please review the changes made to this pull request.

Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
Parser/asdl_c.py Show resolved Hide resolved
Parser/asdl_c.py Outdated Show resolved Hide resolved
@JelleZijlstra JelleZijlstra added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Jul 3, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @JelleZijlstra for commit bddcb97 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Jul 3, 2024
@picnixz
Copy link
Contributor Author

picnixz commented Jul 3, 2024

(I hope I didn't leak things. The previous implementation didn't but maybe this one does). I'll be leaving for today, my defense is tomorrow so I'll patch if afterwards if needed.

@JelleZijlstra
Copy link
Member

Just running the refleak bots because it's good practice to do that when there's a big pile of new C code.

Good luck with your defense!

@JelleZijlstra JelleZijlstra merged commit 9728ead into python:main Jul 4, 2024
52 of 54 checks passed
@picnixz picnixz deleted the ast-copy-replace branch July 4, 2024 07:00
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants