-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace dplyr
use with data.table
(or vice versa)
#246
Comments
I'm DT native, so happy to tagged on discussions of "how do I ..." / review PRs aimed in this direction / etc. |
I just checked out dependencies and whilst That being said I strongly think we want to only have one design language when it comes to data munging and as everywhere else that is We could discuss porting the whole package to |
As well as in #242 (comment), i.e. prior <- dplyr::full_join(...) I've also been using |
dplyr
use with data.table
dplyr
use with data.table
(or vice versa)
I am happy to either try to learn I don't have a great sense of the other aspects of this choice. I've not maintained packages using If there aren't objections I am personally happy enough with
This pushes more towards |
Yes, that and stability. Having said that I think @pearsonca thoughts? FYI I see this as needing to be solved ahead of |
I'm generally anti-tidyverse. Too tight knit - pretend modular, not actually modular - though I'll agree they are doing better at coupling (and that as a whole system, does fine - I just prefer less vendor lock in). For a moment considering false dichotomy problem: how fancy is the munging anyway? Would it just be fine in base R? Cursory glance suggests that the only in code use is join, filter, and select - that could be done approximately with Leaving it in the vignette is fine - probably desirable for the audience, and just becomes a "suggests" which is better posture wise (even if there is a transient dependency due to other packages - maybe they'll get better at some future time). |
I think those are reasonable points @pearsonca but I thing the missing piece is what system is going to be easiest to maintain for the active contributors. Historically that has been The counter point that is strongest to me is that most of the code is data.tablee, its performant easy to maintain and fairly readable. To me the base arg doesn't really have that |
I generally use the I think there's a bit of balance to be struck between easiest-for-active maintainers vs other engineering objectives (e.g. easiest for new maintainers [which might also cut towards tidyverse], reducing direct vs encapsulated dependencies). |
* Remove data.table from simulate.R * Remove data.table from postprocess.R * Remove data.table from observe.R aside from filter_obs_by_ptime * Remove data.table from roxygen in latent_individual.R * Rewrite as_latent_individual to use dplyr * Rebase * Use dplyr to subsample * Using dplyr in epidist vignette * Need to round here! * Use dplyr in epidist_diagnostics * Use dplyr in filter_obs_by_ptime * Altering creation of latent_individual to correctly work with dplyr (and uncovering bugs here that existed before) * Working through getting tests to pass * Two fixes to epidist vignette * Update FAQ vignette away from dt * Update approximate inference vignette * Remove call to arrange which creates bug (add issue for this) * Remove data.table from Imports * Rebase * Rebase * Lint * Fix to logo and improve a little * data.frame not data.table * Hexsticker fixes * Don't library data.table * Removing final uses of data.table using find in files... * Import runif * Remove another mention of data.table * Use dplyr in cmdstan check * Need to use , with data.frame * Fix to index being a factor issue * Perhaps this is needed? * Typo * Regenerate globals and namespace * Remove excess imports in line with R packages (2e) recommendations * Need higher version of R for data importing * Missed some stats:: qualifiers * Remove old test code * Somethings break when I don't import all of brms (because I'm not importing Stan). I think the saved data * Run document * Skip tests with fit on CRAN * Revert import all of brms * Resize logo
I am not a
data.table
native and so when under pressure default to usingdplyr
. I think this is fine to get things done for the first release of the package, but after that I think I should spend some time more properly learningdata.table
and going back over the places I've useddplyr
in this package and replacing them. Reasons to do this? It's faster I think. And perhaps is one less dependency. Arguably it's not a good use of my time to do this, but learningdata.table
seems useful anyway.The text was updated successfully, but these errors were encountered: