Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with dplyr 1.1.0 #30

Merged
merged 4 commits into from
Dec 19, 2022
Merged

Conversation

lionel-
Copy link
Contributor

@lionel- lionel- commented Dec 14, 2022

In dplyr 1.1.0 the join functions are now stricter with unknown arguments instead of ignoring them. This PR fixes an issue where suffix would be supplied to join functions that don't have this argument, causing an error.

I've also fixed a couple other issues related to tidyselect and I set multiple = "all" in join calls to silence a new warning about multiple matches. That might need adjustment if that's not the expected behaviour.

We plan to release dplyr on January 27. If possible, a pre-emptive fix release would be helpful to our release process. Thanks!

@robchallen
Copy link
Contributor

Thanks for the PR and I'll take a look. I'm supposed to be submitting a new version anyway so will wrap the changes into it.

@robchallen robchallen merged commit 85a61ac into terminological:main Dec 19, 2022
@robchallen
Copy link
Contributor

robchallen commented Dec 19, 2022

@lionel-

I've also fixed a couple other issues related to tidyselect and I set multiple = "all" in join calls to silence a new warning about multiple matches. That might need adjustment if that's not the expected behaviour.

I think as this is a data.frame join specific parameter (and not in the generic part of the API) I should let the user control this and pass through the value for multiple unchanged if the user specifies it in the future.

On a separate issue though:

I notice that the default values of keep are changing from FALSE to NULL, for the new inequality joining with join_by.

Currently dtrackr mirrors the previous dplyr default keep=FALSE (apart I notice from p_inner_join, which is an omission on my part). If I update dtrackr now to keep in step with dplyr 1.1.0 and pass keep=NULL, people on pre-1.1.0 dplyr will get an "Error in !keep : invalid argument type".

I guess I can explicitly check formal parameters for the dplyr join functions and use those, which seems a bit overkill, alternatively I can remove the keep parameter from dtrackr and pass through whatever the user provides as ... parameters but the downside to this is the keep parameter wont be documented in dtrackr.

  • Do you have a view?
  • Are there any other defaults changing in dplyr API that I should be aware of?

Speaking of documentation: since I inherit documentation from dplyr they'll be a potential mismatch between what documentation is inherited when dtrackr is built using devtools::document on my machine and whatever dplyr versions the user is running in the future. I guess this must be a common problem but not one I've encountered.

  • Have you encountered this and is there a good solution?
  • Maybe my inheritance of dplyr documentation is not a good idea. Is there an alternative?

I guess I should try and install the unstable github version of dplyr before building the dtrackr documentation, and anyone on an old version of dplyr will just have to be confused, unless you have advice on how to best do this.

Many thanks in advance!

@lionel- lionel- deleted the dplyr-1.1.0 branch December 19, 2022 15:05
@lionel-
Copy link
Contributor Author

lionel- commented Dec 19, 2022

That's a bit tricky. Maybe the simplest way is to pass ... for now, and then reintroduce keep in the signature after release? cc @DavisVaughan who might have a better idea.

@DavisVaughan
Copy link

Leaving keep = FALSE as the default in your methods should be fine in the short term. keep = NULL is exactly the same as keep = FALSE for equality joins, which is all that dplyr was able to do before 1.1.0. If there are any of the new inequality conditions detected, then keep = NULL becomes keep = TRUE instead, but I figure that is unlikely to affect you in the short term.

Once 1.1.0 is released you can just update to keep = NULL and require dplyr 1.1.0 as the minimum version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants