Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

names(.SD) should work #4163

Merged
merged 49 commits into from
Mar 20, 2024
Merged

names(.SD) should work #4163

merged 49 commits into from
Mar 20, 2024

Conversation

ColeMiller1
Copy link
Contributor

@ColeMiller1 ColeMiller1 commented Jan 8, 2020

Closes #795. Towards #3189.

This is the proposed implementation:

dt[, names(.SD) := lapply(.SD, '*', 5)]
  • Update Reference Semantics Vignette
  • Update .SD Usage Vignette
  • Update assign help
  • Include tests

TO DO after release:
Update StackOverflow post to mirror updated .SD vignette.

Copy link
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my vote is for only names(.SD) in LHS and not .SD
even if we decide to use just .SD on LHS then still names(.SD) should works as well

R/data.table.R Outdated Show resolved Hide resolved
inst/tests/tests.Rraw Outdated Show resolved Hide resolved
inst/tests/tests.Rraw Outdated Show resolved Hide resolved
man/assign.Rd Outdated Show resolved Hide resolved
@ColeMiller1
Copy link
Contributor Author

Thank you, Jan. I am working on better test cases that do not involve logicals. I will update tonight.

I also noticed that #4031 would be partially addressed:

dt = data.table(iris)
keep = c('Species', 'Sepal.Width')
dt[, .SD := NULL, .SDcols = !keep]

I will try to substitute names(.SD) with the sdvars - that would allow solutions like yours setdiff(names(.SD), keep) work and would provide the option of paste0(names(.SD), 'max') so you can update all of the columns without having to use .SDcols or an intermediate character vector.

@jangorecki jangorecki added the WIP label Jan 8, 2020
R/data.table.R Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jan 14, 2020

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.50%. Comparing base (958e3dd) to head (8fe60ee).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4163   +/-   ##
=======================================
  Coverage   97.50%   97.50%           
=======================================
  Files          80       80           
  Lines       14884    14884           
=======================================
  Hits        14513    14513           
  Misses        371      371           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ColeMiller1
Copy link
Contributor Author

This should be good to go unless we want to support ncol(.SD) or length(.SD) which may be expected to work on the LHS of :=. I could see that being useful for lapply with multiple functions.

Regarding codecov, I am not sure what to do. There are 10 new tests to address 7 lines of new code plus all previous tests using lhs := rhs would also use this code.

Copy link
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.
Support for length(.SD), ncol(.SD) would be nice but that would generally require a different approach, probably an temporary object relating to subset of self. Current approach is fine for most of use cases.
Thing to decide IMO is if we really want to support .SD on LHS which breaks consistency about column names there. My vote goes for an extra names() to be necessary, so we maintain consistency in API.
We also have .SD related vignette so would be good to put some examples there, but that we can always do as a follow-up.

R/data.table.R Outdated Show resolved Hide resolved
man/assign.Rd Outdated Show resolved Hide resolved
@ColeMiller1
Copy link
Contributor Author

ColeMiller1 commented Jan 20, 2020

Should be good again. The .SD vignette could be updated with a few items such as patterns() and the ability of using functions. I am happy to update the .SD vignette with this PR, but am unsure since they were done outside of this PR.

But there are two examples that this and Michael's recent PR will work great with:

fkt_idx = which(sapply(Teams, is.factor))
Teams[, (fkt_idx) := lapply(.SD, as.character), .SDcols = fkt_idx]
## to
Teams[, names(.SD) := lapply(.SD, as.character), .SDcols = is.factor]

##From 
team_idx = grep('team', names(Teams), value = TRUE)
Teams[, (team_idx) := lapply(.SD, factor), .SDcols = team_idx]
## To
Teams[, names(.SD) := lapply(.SD, factor), .SDcols = patterns('team')]

I may have went too far. There's no use of ```(cols) := ...``` now but there is at least a reference to the other vignette.
@MichaelChirico
Copy link
Member

Friendly ping :) would be great to get this merged!

@ColeMiller1
Copy link
Contributor Author

I will make changes tonight or by the end of tomorrow night. Thanks for going through everything.

Cole

@ColeMiller1
Copy link
Contributor Author

I will give the vignette's another look tomorrow and give you a final ping when I'm done.

NEWS.md Outdated Show resolved Hide resolved
R/data.table.R Outdated
for (i in 2:length(e)) if (!is.null(e[[i]])) e[[i]] = replace_names_sd(e[[i]], cols)
e
}
lhs = eval(replace_names_sd(lhs, sdvars), parent.frame(), parent.frame())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler implementation might be the following:

e <- copyenv(parent.frame()) # pseudocode
e$.SD <- setNames(logical(length(sdvars)), sdvars) # or vector("list"), or even vector("raw") to really scrimp on storage
lhs = eval(lhs, e, e)

WDYT?

#### -- How can we update multiple existing columns in place using `.SD`?

```{r}
flights[, names(.SD) := lapply(.SD, as.factor), .SDcols = is.character]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I can imagine will happen soon is a user trying:

x[, names(.SD)[1:5] := ...]

Does that work already? If so, please add a test. If not, no need to handle it until it's requested later unless you see an easy fix.

@MichaelChirico
Copy link
Member

Looks great! Last thing is checking if we can simplify implementation / possibly make it much more general.

@ColeMiller1
Copy link
Contributor Author

ColeMiller1 commented Mar 20, 2024

I attempted to make it more general per your suggestion. I copied the implementation below. All the tests pass and it supports something like dt[, names(.SD)[1:5] :=...]. After the commit, I checked to see if base::names(.SD) works and it would as well.

It's easy to revert back or to go down a route you suggested as well. Let me know and I add a base::names(.SD) test, too, depending on how we go.

# i.e lhs is names(.SD) || setdiff(names(.SD), cols) || (cols)

lhs = eval(lhs, list(.SD = setNames(logical(length(sdvars)), sdvars)), parent.frame())

@TysonStanley
Copy link
Member

I'm a personal fan of the ability to use dt[, names(.SD)[1:5] :=...]. Would be a really nice feature to have that flexibility.

@MichaelChirico
Copy link
Member

New version looks great, awesome!

R/data.table.R Outdated Show resolved Hide resolved
@MichaelChirico
Copy link
Member

Almost 10 years later, it's a one-line change 😎

Great stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

names(.SD) := ... should work
6 participants