-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-134 Cannot encode pandas NA objects #118
Conversation
bindings/python/pymongoarrow/api.py
Outdated
"""A custom type codec for Pandas NA objects.""" | ||
|
||
python_type = NA.__class__ # type:ignore[assignment] | ||
bson_type = None # type:ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, why the type ignores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My editor had flagged them because the base class uses read-only properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see mypy does not raise this issue. This does look like a bug in pyright though:
/Users/shane/git/mongo-python-driver/ignore.py:15:19 - error: Expression of type "python_type" cannot be assigned to declared type "property"
"Type[type]" is incompatible with "Type[property]" (reportGeneralTypeIssues)
Looks like microsoft/pyright#2601 which they closed as won't fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that pyright issue has @classmethod
whereas we just use @property
so I think it would be worth opening a new pyright issue for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codec_options = collection.codec_options | ||
if DataFrame is not None: | ||
type_registry = TypeRegistry([_PandasNACodec()]) | ||
codec_options = codec_options.with_options(type_registry=type_registry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's ideal to completely replace the collection's type_registry. What if the app already configured a type_registry for other types they want to encode?
Would it be possible to keep the existing registry but add the _PandasNACodec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but we'd have to use private APIs from TypeRegistry
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my mistake for assuming TypeRegistry would have the ability to add/edit a type. You know, like a registry. Could you open an ARROW ticket for this and backlog it? I'd say it's low priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codec_options = collection.codec_options | ||
if DataFrame is not None: | ||
type_registry = TypeRegistry([_PandasNACodec()]) | ||
codec_options = codec_options.with_options(type_registry=type_registry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my mistake for assuming TypeRegistry would have the ability to add/edit a type. You know, like a registry. Could you open an ARROW ticket for this and backlog it? I'd say it's low priority.
bindings/python/pymongoarrow/api.py
Outdated
"""A custom type codec for Pandas NA objects.""" | ||
|
||
python_type = NA.__class__ # type:ignore[assignment] | ||
bson_type = None # type:ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see mypy does not raise this issue. This does look like a bug in pyright though:
/Users/shane/git/mongo-python-driver/ignore.py:15:19 - error: Expression of type "python_type" cannot be assigned to declared type "property"
"Type[type]" is incompatible with "Type[property]" (reportGeneralTypeIssues)
Looks like microsoft/pyright#2601 which they closed as won't fix.
bindings/python/pymongoarrow/api.py
Outdated
"""A custom type codec for Pandas NA objects.""" | ||
|
||
python_type = NA.__class__ # type:ignore[assignment] | ||
bson_type = None # type:ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that pyright issue has @classmethod
whereas we just use @property
so I think it would be worth opening a new pyright issue for it.
bindings/python/test/test_pandas.py
Outdated
val = outgoing[name] | ||
if str(val.dtype) in ["object", "float64"]: | ||
val = val.astype(col.dtype) | ||
pd.testing.assert_series_equal(col, val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using "val" as the name of a column idomatic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
bindings/python/test/test_pandas.py
Outdated
for name in incoming.columns: | ||
col = incoming[name] | ||
val = outgoing[name] | ||
if str(val.dtype) in ["object", "float64"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here or a method docstring would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return datetime.datetime(1970, 1, 1) | ||
return k.replace(tzinfo=None) | ||
|
||
data["datetime"] = data.apply(lambda row: new_replace(row["datetime"]), axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That pandas ticket seems to be about .time() not working on NA types but we don't use .time() anywhere. Can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, meant to link to pandas-dev/pandas#16248. Also added context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused since this code is changing the data before passing it to any pymongoarrow methods. What happens if the user actually calls write() with a NaT datetime? And why are we clamping the datetime to datetime.datetime(1970, 1, 1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They would see the same error as https://jira.mongodb.org/browse/FREE-165786, for which this is a workaround. I just chose a random valid date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, 2 follow up questions.
- Can we change this to something like?:
if isinstance(k, Nat):
return None
return k
- this looks like something we should ideally be working around in write() itself. Can you open a ticket for making write() encode NaT datetimes as BSON null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return None
we get the original error, so I had to return the workaround datetime object. I filed https://jira.mongodb.org/browse/ARROW-136.
No description provided.