Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ! on LHS of := #4031

Closed
MichaelChirico opened this issue Nov 7, 2019 · 5 comments
Closed

Allow ! on LHS of := #4031

MichaelChirico opened this issue Nov 7, 2019 · 5 comments

Comments

@MichaelChirico
Copy link
Member

Related to #571, #1710, a few others about allowing NSE on LHS of :=.

Based on the example given here:

#4030 (comment)

keep = c('Species', 'Sepal.Width')
dt[ , setdiff(names(dt), keep) := NULL]
setcolorder(dt, keep)

Would be quite natural/less clumsy/available for chaining to do:

keep = c('Species', 'Sepal.Width')
dt[ , !keep := NULL] #or maybe, !(keep)
setcolorder(dt, keep)
@moodymudskipper
Copy link

I've already been confused by the use of ! in data.table and this one gave me pause too (it's not obvious that we mean "not this column" and not, "negate the values of this variable".

I think this ambiguity can be avoided by having select helpers in the style of https://www.rdocumentation.org/packages/dplyr/versions/0.7.2/topics/select_helpers

We would have :

dt[ , !one_of(keep) := NULL]

And then all other useful select helpers would work as well :

dt[ , starts_with("Petal") := NULL]

In tidy select those helpers return numeric indices, so to use several helpers we need to use set functions like intersect(), union() and setdiff(), I'm not sure why this design choice was made, I think logical output makes more sense and allows more compact syntax using | and &.

So for this to work the select helpers should detect the data table context and [.data.table should allow logical indices on the lhs of := in j.

Allowing functions on the rhs would go with this quite well, so we could do things like :

dt[, sapply(.SD, is.factor) := as.character] # or dt[, sapply(.SD, is.factor) := as.character(.)] 

@jangorecki
Copy link
Member

I am not sure about !keep. If names(.SD) will work in LHS then

keep = c('Species', 'Sepal.Width')
dt[ , setdiff(names(.SD), keep) := NULL]
setcolorder(dt, keep)

@mik3y64
Copy link

mik3y64 commented Dec 16, 2019

Upvoting this feature. Column selection and deletion are bread and butter of data manipulation. It is not intuitive to setdiff names of columns and then delete them. It involves two extra steps for code author and not intuitive to be read by colleagues or collaborators. It is like typing --1 (double negative signs) to get 1. I am hoping to see a direct way of selecting column using reference semantics.

keep = c("Species", "Sepal.Width")
dat[ , keep := KEEP]

or a more general but less direct approach, as proposed by @MichaelChirico. This is also similar to typing --1 to get 1 but the codes are much simpler.

dat[ , !keep := NULL]
# or
dat[ , -keep := NULL]

@ColeMiller1
Copy link
Contributor

Another route would be a helper function on j to combine .SDcols with j. Use cases:

dt[, update.at(is.factor, as.character)]
dt[, update.at(!keep, NULL)]

dt[, delete.at(!keep)]

dt[, select.at(keep)]
dt[, select.at(keep, x+ 3)]

@MichaelChirico
Copy link
Member Author

With names(.SD) available on LHS of :=, this now works:

dt[ , names(.SD) := NULL, .SDcols=!keep]

Closing here, please open other FRs if there's still something missing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants