Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS review - Functionality documentation #14

Closed
craig-willis opened this issue Sep 22, 2022 · 2 comments
Closed

JOSS review - Functionality documentation #14

craig-willis opened this issue Sep 22, 2022 · 2 comments

Comments

@craig-willis
Copy link

craig-willis commented Sep 22, 2022

Review issue: openjournals/joss-reviews#4707
Branch reviewed: release-0.2.4

Problem
It's not immediately clear to me from the documentation which dplyr and tidyr operations are supported by dtrackr or why some are missing. This can be determined only by comparing the list of p_* functions and inspecting the source code comments.

I'm not a regular user of dplyr, but based on comparison to https://dplyr.tidyverse.org/reference/ see the following operations are not supported by dtrackr: count, tally, glimpse, group*, pull, summarize, slice*, bind_cols, union_all, nest_join, rowwise.

Code comments indicate count / tally - not included as summarisation step, not generally part of a dplyr pipeline and group_map/group_walk -- not included as does not return DF.

The pivot_wider and pivot_longer appear to be the only operations supported from tidyr.

As a user, it would be helpful to have a quick reference of supported operations and explanations of why certain operations are/are not included. This might also be an opportunity to highlight the relationship between dplyr::filter and exclude_all, since it's not strictly wrapping the dplyr operation.

@robchallen
Copy link
Contributor

robchallen commented Oct 4, 2022

Updated branch: joss-fixes-0.2.4.9000 (and main)

So I should have addressed this issue first.

As you identified, there were some gaps in functionality support. These were partly pragmatic and partly due to me not using those functions very much. It was also due to the implementation being achieved by sub-classing data.frame (as a trackr_df) and the wrapper functions as being S3 methods for the subclass. This means that mixing the native dplyr and dtrackr functions should work together and can be switched back and forth, and in theory implementing all the dplyr methods is not strictly required.

Anyway in the light of this comment I decided to implement the missing functions in dplyr. This triggered me to also address some of the testing issues and improve test coverage (see #13). I have now included dtrackr functions for slice*, bind_cols, union_all, nest_join.

For documentation of the supported reference I have updated the function reference page to make it more logically organised and providing some detail about the rationale for the scope. This includes an explanation about what other functions are out of scope, and not yet in scope, compared to dplyr.

I updated the vignette and the documentation of exclude_all. There is now a much better description of filter, exclude_all and include_any in the section on filtering in the vignettes

@robchallen
Copy link
Contributor

I don't think there is anything outstanding on this issue so closing for now. Please reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants