-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: .SDcols=patterns() #3186
RFC: .SDcols=patterns() #3186
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3186 +/- ##
=========================================
+ Coverage 94.6% 94.6% +<.01%
=========================================
Files 61 61
Lines 11742 11747 +5
=========================================
+ Hits 11108 11113 +5
Misses 634 634
Continue to review full report at Codecov.
|
inst/tests/tests.Rraw
Outdated
data.table(V4 = 3.4, V8 = -0.4)) | ||
|
||
# also with !/- inversion | ||
test(1963.4, DT[ , lapply(.SD, sum), .SDcols = !patterns('^c|i')], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mentioned !/-
but tested only !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the two cases are literally handled side-by-side, there's not as of now any way for the behavior to differ for the two approaches? Is why I only tested the one:
Line 1009 in 87a7bb6
if (is.call(colsub) && deparse(colsub[[1L]], 500L, backtick=FALSE) %chin% c("!", "-")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"as of now", that is how regression happens :) in this case quite unlikely, agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kicking the can to downstream... if someone wants to treat them differently at some point, add the tests then 😛
The only worry is some upstream change that somehow changes how -
and !
work in R in general? But I think as long as lazy eval remains as is and they both remain functions we'll be fine.
I don't think I'd need more than one pattern at a time, fwiw. Since the behavior is different from melt's For negation with multiple patterns, maybe individual patterns could be negated as well,
"In the intersection" = all conditions hold, but "Not in the intersection" != any conditions hold (right?), so that's not what I'd expect |
Thanks Frank. Will definitely update The point about Agree about I'm not sure what the expected output of |
My 2c: I'm not really a fan of functions that are implicit within DT[, .SD, .SDgrep = list(pattern = c("^V", "9"), invert = c(FALSE, TRUE))]
DT[, .SD, .SDgrep = "^V"] # if atomic just pass to first list argument. Similarly, I think this would be easier to reason about and maintain than trying to interpret |
@HughParsonage FWIW we've had that NSE trick in Interpretation of I'm not sure how useful it is to make this functionality extremely flexible, the more I think about it... probably 80-90% of the target use cases are as simple as your I think the main reason for having this in the first place is the same as the reason why it's built on the implicitly-interpreted function -- convenience/dynamism -- being able to apply the pattern filtering without having to be bothered with referring to I don't really have a rigorous way of evaluating the cost/benefit of adding a new argument to Another thing that came to mind at some point was
|
I feel like this was brought up before, but could there just be a symbol for "self"? So...
In that case, existing functions (grep or other) can still be relied on without inventing new ones. |
I'm not seeing anything like |
IMO if we could refer to |
OK I'm less keen on But I think that your |
do you have any examples you could share? to make things more concrete...
seems like we're making some progress here 😁
…On Sat, Dec 8, 2018, 3:38 PM HughParsonage ***@***.*** wrote:
OK I'm less keen on .SDgrep than I was; I think .SDcols is the right
place to implement this.
But I think that your .GREP idea is better: makes it clearer that the
function should not be called directly. I also think the .and and .but.not
arguments from my select_grep are important: I've used them a lot more
than I thought I would.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3186 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHQQdUXnEzj2gXXK-soCI4gYWTvXaYgaks5u22xmgaJpZM4ZCQa2>
.
|
Something like
And then |
Thanks for the ping. As a user, what I like about data.table is that it offers conciseness and flexibility at the same time. I definitely agree with @MichaelChirico's comment about avoiding "cognitive overload". When upvoting this feature, |
Closes #1878 (.SDcols=patterns(...))
Closes #3185 (ancillary edge case)
RFC given the popularity of this issue despite how simple the PR is to be sure we can basically agree this is the right API for implementing this feature.
Tagging everyone who participated in #1878: @eantonya, @ksavin, @mbacou, @hannes101, @franknarf1, @HughParsonage, @arunsrinivasan, @skanskan, @therimalaya, @bchen102, @sbsdc as well as @mattdowle and @jangorecki.
As Implemented
Essentially we re-use
patterns
like we do inmelt
; multiple patterns --> intersection of matches:Questions to address before merging:
I think for me the main design decision was this line:
patterns(regex1, regex2, ...)
to mean the user wants only columns matching bothregex1
andregex2
, or is it columns matchingregex1
orregex2
?patterns
is negated? I considered conditioning if.SDcols
is like!patterns(regex1, regex2, ...)
, we take theunion
, otherwise, theintersection
.Other points to clarify:
cols
argument topatterns
? Right now it's undefined behavior, but it will fail/not work as expected. Would take a bit more digging around in the code to get this to work, is it worth it? Should we fail ifcols
is supplied?patterns
the right name? Is it confusing that the behavior ofpatterns
is not exactly the same as it is formelt.data.table
? Should we replacepatterns
with a flexible helper a lalike_any
/like_all
which could handle theintersect
/union
question directly?🙇 in advance and happy
data.table
ing!