-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrame.groupby doesn't preserve _metadata #29442
Comments
FYI, this is likely fixed as part of #28394, where I'm calling |
Ok. I'll check it out. Are you saying that
SubclassedDataFrame.groupby.sum would now return a SubclassedDataFrame
instead of a regular DataFrame? That's the only way I could the metadata
being preserved.
ᐧ
…On Wed, Nov 6, 2019 at 4:09 PM Tom Augspurger ***@***.***> wrote:
FYI, this is likely fixed as part of #28394
<#28394>, where I'm calling
__finalize__ in more places.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#29442?email_source=notifications&email_token=AEHNEILZT66OEBW7HZR5UJLQSMXB5A5CNFSM4JJ4VGDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDH7V6Q#issuecomment-550501114>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEHNEIIS3RXSKZQ3KDRSP3LQSMXB5ANCNFSM4JJ4VGDA>
.
|
Fyi, this is still an issue on pandas 1.0.3. @TomAugspurger I think there is a relatively quick fix that I'm using in a personal project... in DataFrameGroupBy, you have access to self.obj. So you can write a class decorator to implement the following logic at the end of every method in DataFrameGroupBy: if isinstance(method_result, DataFrame) and issubclass(self.obj.__class__, DataFrame):
return self.obj.__class__(result)
elif isinstance(method_result, Series) and issubclass(self.obj.__class__, Series):
return self.obj.__class__(result) etc. |
The PR I linked to earlier has stalled a bit. It'd be best to address
calling groupby in a dedicated pull request if you're interested.
…On Thu, Apr 16, 2020 at 5:04 AM pandichef ***@***.***> wrote:
Fyi, this is still an issue on pandas 1.0.3. @TomAugspurger
<https://github.com/TomAugspurger> I think there is a relatively quick
fix that I'm using in a personal project... in DataFrameGroupBy, you have
access to self.obj. So you can write a class decorator to implement the
following logic at the end of every method in DataFrameGroupBy:
if isinstance(method_result, DataFrame) and issubclass(self.obj.*class*,
DataFrame):
return self.obj.*class*(result) #
elif isinstance(method_result, Series) and issubclass(self.obj.*class*,
Series):
return self.obj.*class*(result)
etc.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29442 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOITJ362ISOPMVLP3SO3RM3JZPANCNFSM4JJ4VGDA>
.
|
Sure. I’d be happy to. But I’d like to make sure that my solution is
acceptable at a high level before I start coding. Some might consider the
decorator approach to be somewhat of a hack. Alternatively, I could
manually change every “return DataFrame” into “return self.obj.__class__”.
Please let me know what you think.
On Thu, Apr 16, 2020 at 8:37 AM Tom Augspurger <[email protected]>
wrote:
… The PR I linked to earlier has stalled a bit. It'd be best to address
calling groupby in a dedicated pull request if you're interested.
On Thu, Apr 16, 2020 at 5:04 AM pandichef ***@***.***>
wrote:
> Fyi, this is still an issue on pandas 1.0.3. @TomAugspurger
> <https://github.com/TomAugspurger> I think there is a relatively quick
> fix that I'm using in a personal project... in DataFrameGroupBy, you have
> access to self.obj. So you can write a class decorator to implement the
> following logic at the end of every method in DataFrameGroupBy:
>
> if isinstance(method_result, DataFrame) and issubclass(self.obj.*class*,
> DataFrame):
> return self.obj.*class*(result) #
> elif isinstance(method_result, Series) and issubclass(self.obj.*class*,
> Series):
> return self.obj.*class*(result)
> etc.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#29442 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAKAOITJ362ISOPMVLP3SO3RM3JZPANCNFSM4JJ4VGDA
>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#29442 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEHNEINEHFPNKEZZX25Z2GTRM33Z3ANCNFSM4JJ4VGDA>
.
|
I'm not sure offhand.
…On Thu, Apr 16, 2020 at 11:04 AM pandichef ***@***.***> wrote:
Sure. I’d be happy to. But I’d like to make sure that my solution is
acceptable at a high level before I start coding. Some might consider the
decorator approach to be somewhat of a hack. Alternatively, I could
manually change every “return DataFrame” into “return self.obj.__class__”.
Please let me know what you think.
On Thu, Apr 16, 2020 at 8:37 AM Tom Augspurger ***@***.***>
wrote:
> The PR I linked to earlier has stalled a bit. It'd be best to address
> calling groupby in a dedicated pull request if you're interested.
>
> On Thu, Apr 16, 2020 at 5:04 AM pandichef ***@***.***>
> wrote:
>
> > Fyi, this is still an issue on pandas 1.0.3. @TomAugspurger
> > <https://github.com/TomAugspurger> I think there is a relatively quick
> > fix that I'm using in a personal project... in DataFrameGroupBy, you
have
> > access to self.obj. So you can write a class decorator to implement the
> > following logic at the end of every method in DataFrameGroupBy:
> >
> > if isinstance(method_result, DataFrame) and
issubclass(self.obj.*class*,
> > DataFrame):
> > return self.obj.*class*(result) #
> > elif isinstance(method_result, Series) and issubclass(self.obj.*class*,
> > Series):
> > return self.obj.*class*(result)
> > etc.
> >
> > —
> > You are receiving this because you were mentioned.
> > Reply to this email directly, view it on GitHub
> > <
> #29442 (comment)
>,
> > or unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/AAKAOITJ362ISOPMVLP3SO3RM3JZPANCNFSM4JJ4VGDA
> >
> > .
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <
#29442 (comment)>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AEHNEINEHFPNKEZZX25Z2GTRM33Z3ANCNFSM4JJ4VGDA
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#29442 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIVZGUMDXCLZTVLPJCLRM4UB3ANCNFSM4JJ4VGDA>
.
|
This bug is a regression in v1.1.0 and was introduced by the fix for pandas-devGH-34214 in commit [6f065b]. Underlying cause is that the `*Splitter` classes do not use the `._constructor` property and do not call `__finalize__`. Please note that the method name used for `__finalize__` calls was my best guess since documentation for the value has been hard to find. [6f065b]: pandas-dev@6f065b6
heads up: moved off 1.1.1 milestone |
Added testcase `test_groupby_sum_with_custom_metadata` for functionality exercised in the #pandas-devGH-29442. Testcase fails on current code.
In order to propagate metadata fields, the `__finalize__` method must be called for the resulting DataFrame with a reference to input. By implementing this in `_GroupBy._agg_general`, this is performed as late as possible for the `.sum()` (and similar) code-paths. Fixes #pandas-devGH-29442
moved off 1.1.2 milestone (imminent release) as associated PR #35688 not yet ready. |
…DataFrame.groupby doesn't preserve _metadata
…esn't preserve _metadata (#37122) Co-authored-by: Janus <[email protected]>
I'm following the 0.25.3 documentation on using _metadata.
The above output produces the result:
AttributeError: 'DataFrame' object has no attribute 'added_property'
added_property is not being "passed to manipulation results" as described in the documentation.
The text was updated successfully, but these errors were encountered: