Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traps in extension default behaviour #1587

Closed
wejradford opened this issue Nov 16, 2017 · 5 comments
Closed

Traps in extension default behaviour #1587

wejradford opened this issue Nov 16, 2017 · 5 comments
Labels
docs Documentation and website enhancement Feature requests and improvements

Comments

@wejradford
Copy link

I've found a few traps in how extension default values behave.

Unable to set a default of None

This would be useful to indicate unintialised values (see below), but currently raises an anonymous-looking AssertionError.

import spacy
from spacy.tokens import Doc
nlp = spacy.load('en')
Doc.set_extension('_data', default=None)

Mutable default values trap

The default argument behaves like a standard python default function parameter. One side-effect is that Docs will appear to share extension data if you've inadvertently used something like a dict as a default value.

import spacy
from spacy.tokens import Doc
nlp = spacy.load('en')

# Immutable defaults are fine.
Doc.set_extension('_data', default='')

doc_a = nlp('hello world')
doc_a._._data = 'foo'
assert doc_a._._data == 'foo'

doc_b = nlp('hello kitty')
assert doc_b._._data == ''

# Mutable defaults are a trap.
Doc.set_extension('_data', default={})

doc_a = nlp('hello world')
doc_a._._data['foo'] = 'bar'
assert doc_a._._data == {'foo': 'bar'}

doc_b = nlp('hello kitty')
assert doc_b._._data == {}  # This is actually {'foo': 'bar'}

It's not a massive deal, but could be a bit dangerous in this context.

To fix, it might be worth:

  • Drawing attention to this risk in the already-very-good-docs.
  • Allowing you to pass a factory function that is called to create the default value (like the default_factory parameter in defaultdict?)
Doc.set_extension('_data', default=dict)
# Or 
Doc.set_extension('_data', default=lambda: {'a': '1'})

Your Environment

  • spaCy version: 2.0.0
  • Platform: Darwin-17.2.0-x86_64-i386-64bit
  • Python version: 3.6.3
  • Models: en
@honnibal honnibal added docs Documentation and website enhancement Feature requests and improvements labels Nov 16, 2017
@ines
Copy link
Member

ines commented Nov 16, 2017

Thanks for the detailed report!

While I generally like the factory solution, I do think it'd kinda ruin the simplicity of setting the default keyword argument. So maybe adding a note to the docs would be the best solution, at least for now. I think a good policy would be to only recommend the default argument for immutable defaults, and a getter and setter for dictionaries, lists etc. (It's should probably also be considered an antipattern to have an attribute take different types – e.g. a string and a list.)

If a user does want to do it differently, they obviously still can – they just have to be careful. It's a pretty common Python gotcha after all.

@wejradford
Copy link
Author

No worries @ines. I agree simplicity is probably best in this case and the docs will help clarify it.

@ines ines closed this as completed in c3051e9 Nov 17, 2017
@wejradford
Copy link
Author

Awesome, thanks @ines.

@tokestermw
Copy link
Contributor

Doc.set_extension('_data', default=None)

Looks like the assertion error happens when we only have one non-None value.

assert nr_defined == 1

So the example in the docs will not work (though Spans don't have the same restriction). https://spacy.io/usage/processing-pipelines#custom-components-attributes

Doc.set_extension('hello', getter=get_hello_value, setter=set_hello_value)

I have a use case where I'd like to keep a list and a default=None would work best.

def __init__(self):
    Doc.set_extension('data', default=None)

...

def __call__(self, doc):
    doc._._data = []

...

doc._._data.append(5)

Would this be possible with getters and setters?

@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Documentation and website enhancement Feature requests and improvements
Projects
None yet
Development

No branches or pull requests

4 participants