Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace setattr and getattr calls with __dict__ manipulations #905

Merged
merged 8 commits into from
Nov 15, 2023

Conversation

Numeri
Copy link
Contributor

@Numeri Numeri commented Nov 8, 2023

Describe the changes

Change the setattr and getattr calls in pyfakefs/mox3_stubout.py to directly access the objects __dict__ attribute. I did this because I was having trouble using pyfakefs in a large codebase that also used PyTorch – which I tracked down to custom __setattr__ methods found here (among other places).

I saw the comment in pyfakefs/mox3_stubout.py mentioning that a switch to manipulating the __dict__ attribute directly had been considered, and that it would even resolve the special case with static methods losing their staticmethod wrapper when accessing them with getattr, so I went ahead and made that change throughout the file.

I'm not sure if this will break pyfakefs for other use cases, but it fixed it for me, so I figured I'd make a pull request :)

Tasks

  • Unit tests added that reproduce the issue or prove feature is working
  • Fix or feature added
  • Entry to release notes added
  • Pre-commit CI shows no errors
  • Unit tests passing
  • For documentation changes: The Read the Docs preview builds and looks as expected

@mrbean-bremen
Copy link
Member

Sorry - looks like I accidentally removed myself from the watchers for this repo (or have been removed somehow), so I missed this.

I'm not quite sure about this - this part has been taken from the mox project...
getattr() finds attributes that are not found by __dict__ (like attributes from __slots__, or attributes defines by the descriptor protocol), but I'm unsure if we need to patch these. Probably not, at least so far I haven't seen a usage for that.

So I'm inclined to merge this after the failing test is fixed. Also please add an entry to the release notes.

@Numeri
Copy link
Contributor Author

Numeri commented Nov 13, 2023

@mrbean-bremen I could always define separate get_attribute_safely and set_attribute_safely methods, that look something like:

class AccessMethod(Enum):
    GETATTR = auto()
    DICT = auto()
    SLOT = auto()


def get_attribute_safely(obj, attr_name):
    try:
        attr_value = getattr(obj, attr_name)
        access_method = AccessMethod.GETATTR
    except AttributeError as e:
        if hasattr(obj, '__dict__') and attr_name in obj.__dict__:
            attr_value = obj.__dict__[attr_name]
            access_method = AccessMethod.DICT
        elif hasattr(obj, '__slot__') and attr_name in obj.__slot__:
            attr_value = obj.__slot__[attr_name]
            access_method = AccessMethod.SLOT
        else:
            raise e

    return attr_value, access_method

allowing us to store the access_method alongside the old value in self.stubs, so that it can be restored using the same method, using a corresponding set_attribute_safely(obj, attr_name, attr_value, access_method).

I think that would cover all possible edge cases and be quite safe, unless I'm missing something. On the other hand, if you think that the current approach is safe enough, that works, too.

@mrbean-bremen
Copy link
Member

I could always define separate get_attribute_safely and set_attribute_safely

I like this. It may be a bit overkill, but it would not change the current behavior and performance in almost all cases, and as you wrote, we'd be on the secure side.

@mrbean-bremen
Copy link
Member

mrbean-bremen commented Nov 14, 2023

Ok, I thought a bit more about this, and I think that your original approach might be sufficient, given that we are looking for specific file system functions that are certainly not slots. It may also have a performance benefit (though I still have to measure that).

So, please ignore my last comment - if you add an entry to the release notes mentioning the change and what it fixes, I will merge this as is, thanks!

@Numeri
Copy link
Contributor Author

Numeri commented Nov 14, 2023

Woops, missed your comment there. Everything should be reverted now. This is probably for the best anyway – I was working on writing unit tests for all the __slots__ stuff, and it was becoming clear that there were a lot more corner cases to think of than I was originally aware of.

Copy link
Member

@mrbean-bremen mrbean-bremen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good (apart from that typo that pre-commit found)!

@Numeri
Copy link
Contributor Author

Numeri commented Nov 15, 2023

Ahh, "compatibility" always gets me. Fixed!

@mrbean-bremen mrbean-bremen merged commit 1d52e3f into pytest-dev:main Nov 15, 2023
63 checks passed
@davidlbaird
Copy link
Collaborator

Sorry I didn't reply earlier. Why is setting a default to getattr not working instead of examining the object internals?
https://docs.python.org/3/library/functions.html#getattr

@davidlbaird
Copy link
Collaborator

davidlbaird commented Nov 15, 2023

Or, calling hasattr seems like a better approach, rather than looking at the internals of __dict__.
https://docs.python.org/3/library/functions.html#hasattr

@mrbean-bremen
Copy link
Member

Why is setting a default to getattr not working instead of examining the object internals?

It is usually working, but there are cases where it is overwritten and may cause problems/unintended side effects (in this case, with pytorch), thus this PR. My first thought was about using it as the default as before (see my first response), but I changed my mind after thinking about it a bit more. We actually don't want the extra functionality that getattr/setattr have compared with a simple __dict__ lookup, as we neither want to look at slots, or at class attributes, or at custom functionality, and we don't want to trigger any additional behavior while setting the property back. Also while __dict__ may be an internal, it is well documented and relied upon, so there is no way it will change in the foreseable future. Obviously the creators of the original mox had similar thoughts, thus the (now removed) todo comment about using __dict__ access.

One unintended consequence is also a noticable performance improvement, at least in Python versions before 3.12 (in 3.12 the performance of getattr / setattr seem to have improved a lot, so the change is not noticable there).

As always, I may have overlooked something, of course - crossing fingers that no regression bugs are incoming after the release...

@davidlbaird
Copy link
Collaborator

davidlbaird commented Nov 17, 2023

mox itself advises against using it, and mox3 support ends at Python 3.8. Once that version of Python is deprecated, I hope this mox3_stubout.py module can also be removed. Is pytorch compatible with mock?

@mrbean-bremen
Copy link
Member

mrbean-bremen commented Nov 17, 2023

mox itself advises against using it

Ok, didn'y know that.

mox3 support ends at Python 3.8

True, though I'm not sure if mock will now work. My last attempt to replace mox3 by mock (before we integrated the relevant part of mox3) failed, and I haven't looked at it since then. So no, I don't know if the problem is since gone.
Something to investigate, I guess...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants