Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional arguments for setcolorder: before, after #4358

Closed
matthiasgomolka opened this issue Apr 6, 2020 · 15 comments
Closed

Additional arguments for setcolorder: before, after #4358

matthiasgomolka opened this issue Apr 6, 2020 · 15 comments

Comments

@matthiasgomolka
Copy link

It would be nice to have two additional arguments (before = NULL and after = NULL) in setcolorder() for reordering one or more columns to a position other than the "front" of the data.table. This would make it possible to place a single column after the, say, 45th column, without having to list all these 46 columns.

After having a look at the code, I think I could try to file a PR if you think this feature is helpful.

@Atrebas
Copy link

Atrebas commented Jun 7, 2020

Was just thinking about it. It has been implemented in dplyr 1.0.0: relocate(). Would be good to have it in data.table as well.

@dracodoc
Copy link
Contributor

I had some implementation suggested on #4668. If there is a plan to add this feature, I'd love to submit a PR.

@Atrebas
Copy link

Atrebas commented Aug 12, 2020

Thanks for the PR. As a humble data.table user, I think I would prefer having before and after parameters than anchor. Just my two cents.

@ben-schwen
Copy link
Member

ben-schwen commented Aug 12, 2020

Arguments like before and after seem more "natural" to me than anchor. #4668

Changing setcolorder like below should do the trick. This also ensures backward compatibility.

setcolorder = function(x, neworder=key(x), before=NULL, after=NULL)
{
  if (is.character(neworder) && anyDuplicated(names(x)))
    stop("x has some duplicated column name(s): ", paste(names(x)[duplicated(names(x))], collapse=","), ". Please remove or rename the duplicate(s) and try again.")
  # if (!is.data.table(x)) stop("x is not a data.table")

  if (!is.null(before) && !is.null(after))
    stop("Provide either before= or after= or none but not both.")

  neworder = colnamesInt(x, neworder, check_dups=FALSE)  # dups are now checked inside Csetcolorder below

  if (!is.null(before))
    neworder = c(setdiff(seq_len(colnamesInt(x, before) - 1L), neworder), neworder)

  if (!is.null(after))
    neworder = c(setdiff(seq_len(colnamesInt(x, after)), neworder), neworder)

  if (length(neworder) != length(x)) {
    #if shorter than length(x), pad by the missing
    #  elements (checks below will catch other mistakes)
    neworder = c(neworder, setdiff(seq_along(x), neworder))
  }
  .Call(Csetcolorder, x, neworder)
  invisible(x)
}

edit: adding some testcases

# setcolorder with new arguments before and after, #4358
DT = data.table(a=1, b=2, c=3)
test(2153.1, setcolorder(DT, "a", after="c"), data.table(b=2, c=3, a=1))
test(2153.2, setcolorder(DT, "a", before="b"), data.table(a=1, b=2, c=3))
test(2153.3, setcolorder(DT, 1, after=3), data.table(b=2, c=3, a=1))
test(2153.4, setcolorder(DT, 3, before=1), data.table(a=1, b=2, c=3))
test(2153.5, setcolorder(DT, 1, before=1, after=1), error="Provide either before= or after= or none but not both.")

@dracodoc
Copy link
Contributor

dracodoc commented Aug 12, 2020

Thanks for all the comments!

@ben-schwen After reading your code I found all my code can be replaced with yours. Mine was originated from a separate function, the original design is simple, but to make it work with new api (instead of before, anchor, change to before=anchor) I spent too much lines on parameter checking and also need to deal with edge cases.

Your code is much more shorter because

  • colnamesInt can convert name to index or keep index asis, I didn't know about it
  • reusing the padding part of original function removed the need of right side part
  • seq_len replaced my creation of slice to deal with 1:0 case, also don't need the x:length +1 case because of reusing padding part
  • also don't need to check before/after value out of column name/index error since it will be covered with existing checks.

This is a much better implementation, fully utilized existing structures to make the new changes very short.

I don't think I need to continue my PR anymore, can you create a PR with your code?

(The only suggestion I want to make is the error message could be "before= and after= cannot be specified at the same time", this is subjective but I feel it be easier to read)

@dracodoc
Copy link
Contributor

dracodoc commented Sep 2, 2020

Seeing @ben-schwen may not have time to create the PR, I created a PR #4691 with his implementation and full credit go to @ben-schwen .

@Feakster
Copy link

Feakster commented Apr 6, 2021

What became of this in the end?

@matthiasgomolka
Copy link
Author

@dracodoc, your PR does not pass the build pipeline. Are you willing to give it another try? If not, I will try to find the time to take over.

@dracodoc
Copy link
Contributor

I'm not sure what was the problem. Was it just test case orders? I just modified the test cases and pushed an update, let's see if it pass this time. The tests run without problem for me.

@matthiasgomolka
Copy link
Author

I don't know. I can't look into the detailed test results.

@dracodoc
Copy link
Contributor

I think I knew what caused original error. I'm updating the commit and wait for next build check.

@dracodoc
Copy link
Contributor

Looks all checks passed this time. @matthiasgomolka can you have a look?

@matthiasgomolka
Copy link
Author

Yes, now it passes the tests. But I don't know the usual procedure if and when a new PR is merged.

@jangorecki, you already had a look at this issue. Can you explain the next steps?

@jangorecki
Copy link
Member

Once PR is passing tests the next step is to wait for reviews and merging.

@mattdowle mattdowle added this to the 1.14.1 milestone Aug 26, 2021
@mattdowle
Copy link
Member

Closed by #4691 which implements this. (It didn't have "closes" in the description so this issue didn't get closed automatically by the merge.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants