-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BREAKING] Multicolumn transformations for GoupedDataFrame #2481
[BREAKING] Multicolumn transformations for GoupedDataFrame #2481
Conversation
OK - this PR is not cleaned, tested, nor documented yet, but the promised functionality seems (remember - not tested 😄)) to work, so please feel free to experiment with it and comment if you catch something surprising. Thank you! |
@pdeffebach - with this PR we ensure the following invariant (which I think is relevant for DataFramesMeta.jl design). The result of:
is always the same as the result of
(also up to errors - if one errors the other also errors) if |
Wait, why would we want that? What if I want to perform the operation |
Right - I forgotten to add that |
Oh good. You scared me! Yes having the same with |
I still have a lot of tests to write (but now the PR should pass the tests) and documentation to update. (all others: feel free to have a look at docs/src/man/split_apply_combine.md if you are interested as it specifies how new rules work) |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
@nalimilan - documentation updates part 2 (and hopefully final) is pushed here. I made adjustments to the manual per the comments given + I have refactored the docstrings. Now they are consistent with the manual and reuse the same template. This has three benefits I believe:
Another review of documentation would be appreciated (after it is done I will review the tests of the functionality to make sure we cover everything and the PR then will be good for a final review). |
I have added additional tests of correctness of the functionality we expose. This should be good for a whole code review. |
This PR is holding implementation of EDIT: I managed to get it without needing this, see #2496 |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
@nalimilan I have incorportated all the recommended changes apart from syncing manual and docs - I will do it when manual is finalized. It should be ready for another round of reviews. Thank you! |
is undefined (a typical case is that they follow the order of appreance of | ||
respecive values in the grouping columns, but a notable exception is when the | ||
columns are `PooledVector`s, in which case they are ordered accoring to the `pool` | ||
field in these vectors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true: in general the order is really undefined due to the dict-like grouping fallback. And the pool of PooledDataArray
isn't exposed to users so it's not really useful to tell give them this information.
EDIT: I was wrong, for some reason I hadn't realize that the fallback grouping method uses the order of appearance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the description to make it more precise.
src/abstractdataframe/selection.jl
Outdated
(not all keyword arguments are supported in all cases; in general they are allowed | ||
in situations when they are meaningful, see the documentation of the specific functions | ||
for details): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how about making this part specific to each function's docstring so that we never mention an argument that a function doesn't support?
(BTW I don't see the change to "signature".)
Co-authored-by: Milan Bouchet-Valat <[email protected]>
@nalimilan - I have incorporated the comments. Let me know if it is OK to move the descriptions from the manual to the docstring. Thank you! |
TODO: add NEWS.md entry (when @nalimilan approves moving the manual text into the docstrings). |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
All suggestions are applied, manual and dosstrings are synchronized, and NEWS.md is updated. |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Thank you! |
Fixes #2410
I am opening now to make sure that 0.22 branch of a13db50 does not get lost.
This PR is not finished. I have implemented all (in particular allowing returning multiple columns from functions) except handling
AsTable
and multipeSymbol
s as destination columns (this still needs to be implemented as it requires new logic)