Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

names(.SD) := ... should work #795

Closed
brodieG opened this issue Sep 2, 2014 · 11 comments · Fixed by #4163
Closed

names(.SD) := ... should work #795

brodieG opened this issue Sep 2, 2014 · 11 comments · Fixed by #4163
Labels
feature request top request One of our most-requested issues

Comments

@brodieG
Copy link

brodieG commented Sep 2, 2014

This would allow the following type of code:
DT[, names(.SD) := lapply(.SD, rev), .SDcols = -c(1,3,8)]
To reverse every column except 1, 3, and 8 in DT by reference. See discussion in #787. Maybe potentially even:
DT[, := lapply(.SD, rev), .SDcols = -c(1,3,8)]
and have the names of the columns to be updated inferred from the names of the return value to lapply.

@matthieugomez
Copy link
Contributor

That would be very nice.
Another implementation would be that, when .SDcols is not specified, and .SD is present in j, the LHS of := is understood as .SDcols.

@brodieG
Copy link
Author

brodieG commented Nov 7, 2014

Alternate is to create a .SDcols object that is usable in LHS of :=.

Also, using .SD in LHS of := should probably throw an error instead of kind of work. More discussion on SO.

@jangorecki
Copy link
Member

the straight "workaround" for this would be to use with=FALSE
example in: http://jangorecki.github.io/blog/2014-11-07/Data-Anonymization-in-R.html#minimal-script

still the .SDcols object usable in j (both LHS and RHS) seems to be the best idea.

@brodieG
Copy link
Author

brodieG commented Nov 8, 2014

Jan, not sure with=F is necessary, since you can do:

dt <- data.table(a=1:10, b=1:10, c=rep(c(T,F), 5))
cols <- 1:2
dt[, cols:=lapply(.SD, `*`, 2), .SDcols=cols, with=F]

or

dt[, (cols):=lapply(.SD, `*`, 2), .SDcols=cols]  # add parens
dt

equivalently, though this is a good reminder that with can help in situations when wanting to use data.table programatically, which is another issue I've been discussing with Arun.
Also, note both fail with:

cols <- -3

@stefanfritsch
Copy link

Another inconsistency with this currently is that

this works:

A<-data.table(x=1:10,y=10:1,z=rnorm(10))

A[,`:=`(colnames(A),.SD)]

and this doesn't:

A[,`:=`(colnames(.SD),.SD)]

The second fails with:

Error in `[.data.table`(A, , `:=`(colnames(.SD), .SD)) : 
  LHS of := isn't column names ('character') or positions ('integer' or 'numeric')

That precludes some elegant .SDcols syntax.

@franknarf1
Copy link
Contributor

@MichaelChirico
Copy link
Member

Unfortunate that this workaround is blocked by :=:

set.seed(23940)
DT = setDT(lapply(integer(10), function(...) sample(1e7, 100)))

DT[ , do.call(`:=`, lapply(.SD, .POSIXct, tz = 'UTC'))]

Error in (function (...) :
Check that is.data.table(DT) == TRUE. Otherwise, := and :=(...) are defined for use in j, once only and in particular ways. See help(":=").

It's unfortunate since the result of lapply is already named, so this is a shorthand for the \`:=\`(V1 = .POSIXct(V1, tz = 'UTC'), ...) approach of explicitly naming columns

DT[ , str(lapply(.SD, .POSIXct, tz = 'UTC'))]
List of 10
 $ V1 : POSIXct[1:100], format: "1970-02-28 08:44:45" "1970-01-26 19:24:38" ...
 $ V2 : POSIXct[1:100], format: "1970-03-14 20:40:49" "1970-01-04 23:53:16" ...
 $ V3 : POSIXct[1:100], format: "1970-01-09 03:32:08" "1970-02-12 06:18:31" ...
 $ V4 : POSIXct[1:100], format: "1970-04-13 04:15:52" "1970-03-17 19:10:23" ...
 $ V5 : POSIXct[1:100], format: "1970-03-22 07:57:03" "1970-02-10 19:42:45" ...
 $ V6 : POSIXct[1:100], format: "1970-01-28 05:56:39" "1970-04-20 11:43:32" ...
 $ V7 : POSIXct[1:100], format: "1970-01-02 03:41:31" "1970-04-01 23:58:52" ...
 $ V8 : POSIXct[1:100], format: "1970-03-05 05:58:29" "1970-03-05 23:27:10" ...
 $ V9 : POSIXct[1:100], format: "1970-04-13 20:29:31" "1970-01-24 12:18:58" ...
 $ V10: POSIXct[1:100], format: "1970-04-22 15:01:36" "1970-03-08 00:33:20" ...
NULL

@MichaelChirico
Copy link
Member

I actually lean towards allowing .SD on the LHS of :=. More concise and I think the intent is clear. We're doing this with NSE so we can just capture .SD --> names(.SD) anyway.

DT[ , .SD := lapply(.SD, rev), .SDcols = -c(1,3,8)]

That & whatever comes out of #3795 would make adding/editing many columns much less clunky

@jangorecki
Copy link
Member

jangorecki commented Sep 24, 2019

or eventually which is more like names(.SD)

DT[ , .SDcols := lapply(.SD, rev), .SDcols = -c(1,3,8)]

@grantmcdermott
Copy link
Contributor

grantmcdermott commented Nov 10, 2021

I've been thinking about this FR again after having to do quite a bit of "manual" LHS creation in a current project. (FWIW my own preferred option is @MichaelChirico's DT[, .SD := ....], but would support any of the proposed solutions.)

Another possible syntax variant — which would involve even less typing if it is feasible to code up — would be to enable := directly in .SDcols. I'm not sure how others would feel about this, though.

DT[ , lapply(.SD, rev), .SDcols := c(1,3,8)]

@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 11, 2021

It does ~basically read well here, but I would be against that... := semantics are (based on legion user reports/SO Q&A) confusing enough without opening up another API surface for it. Being able to consistently look only for j to know if a table is being updated by reference will keep code more readable than if := could show up in other [ arguments, possibly on other lines, possible separated from j by dozens of lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request top request One of our most-requested issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants