Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why doesn't DT[v1==max(v1), , by=v2] work? #1246

Closed
ywhcuhk opened this issue Jul 26, 2015 · 14 comments
Closed

Why doesn't DT[v1==max(v1), , by=v2] work? #1246

ywhcuhk opened this issue Jul 26, 2015 · 14 comments

Comments

@ywhcuhk
Copy link

ywhcuhk commented Jul 26, 2015

When I try DT[v1==max(v1), , by=v2]. It throws me an error.
Currently, I am doing something like:

DT[, v1_max:=max(v1), by=v2][v1==v1_max][, v1_max:=NULL]

Note max() is used for illustration. It could be any function.

@DavidArenburg
Copy link
Member

Why not just DT[, .SD[which.max(v1)] , by = v2]? Or unique(DT[order(-v1)], by = "v2") ? Or DT[DT[, .I[which.max(v1)] , by=v2]$V1]? Also, please provide a reproducible example next time, for instance set.seed(123) ; DT <- data.table(v1 = sample(10), v2 = rep(1:2, each = 5))

@jangorecki
Copy link
Member

I assume it throws an error because you are not quering any data = lack of j argument. I recommend to use stackoverflow for such kind of question. When posting here it is good to follow this doc. You can close the issue if you get an answer.

@ywhcuhk
Copy link
Author

ywhcuhk commented Jul 28, 2015

@jangorecki I see. I asked this question not for a solution, because I obviously have one already. Can I suggest this way of querying (DT[v1==max(v1), , by=v2]) as a feature request? Must I include j argument? We don't have to include i argument all the time, right? I think this way of querying is in the spirit with data.table syntax and quite useful, no?

@DavidArenburg Thanks for suggesting. But IMHO, the proposed way is much more intuitive and general. I used max function just for illustration as I noted. The proposed solutions from you is ad hoc for max. Say, if I change max to mean or some other arbitrary user defined function, your way won't work anymore.

Lastly, I think the code is intuitive enough such that a replicable example is not necessary. A replicable example may not be general enough and thus lead to ad hoc solutions to the particular example, rather than a general situation.

@jangorecki
Copy link
Member

Would be good then to isolate FR. What I understand you simply forms FR to j to be default to .SD. Your first post complicates that FR a lot. Similar FR #1149.

@eantonya
Copy link
Contributor

@ywhuofu how would you read DT[i, j, by = b] to get that result?

For reference, this is the current reading: "take DT, apply i, then do j by b"

@ywhcuhk
Copy link
Author

ywhcuhk commented Jul 28, 2015

@eantonya simply "take DT, apply i by b". You don't have to do something with j. It's like SQL's where or having statement.

@eantonya
Copy link
Contributor

@ywhuofu that's not enough. Any reading paradigm you come up with has to include j in it in such a way that omitting it (~= taking the default value) becomes whatever you just wrote for just having an i. FYI current default value of j is .SD (even though it's not fully implemented yet in all cases).

@franknarf1
Copy link
Contributor

I agree that "apply i by by" is too far from the normal reading of the syntax. As far as I can tell, this is another case that could be resolved by adding having (somehow), #788 Shoehorning having into i can only lead to confusion.

@ywhcuhk
Copy link
Author

ywhcuhk commented Jul 28, 2015

@franknarf1 @eantonya , by the same logic, why would DT[, j] makes sense?

@franknarf1
Copy link
Contributor

@ywhuofu It does make sense as "do j"... Your message is too short/cryptic for me to understand the point you're making.

The general syntax is "subset to i, then do j by by". Depending on the args present, we can cut up this instruction and it still makes sense

  • Without i we have "do j by by"
  • Without j we have "subset to i, then do nothing by by"
  • and so on

Your proposed reading of "do i by by" is not consistent with this pattern.

@ywhcuhk
Copy link
Author

ywhcuhk commented Jul 28, 2015

@franknarf1 Okay, your points are well taken. What I am proposing is simply "subset by group". Moreover, how do I read DT[i,] then? I understand that i is allowed to be a result of (any) function that returns a vector of boolean with the same length of DT, right?

In general, my questions concerns following general situation.

I intend accomplish something like (I know following code doesn't work)

DT[v1 == func(v3), , by=v2]

without having to go through

DT[, v_help:=func(v3), by=v2][v1==v_help][, v_help:=NULL]

?
Here v3 can be any column(s) in DT, including v1 or v2. And func is any function including user defined function.

In any case, thank you for your responses. I am an end user and a fan of data.table, but has no clue on whatsoever under the hood.

@franknarf1
Copy link
Contributor

@ywhuofu I'm also an end user. The problem I have with your proposed syntax is what I said initially: i already does enough (covering subsetting and joins) and having by affect i would be a pretty radical change to the syntax. That's why I would favor some version of a having arg instead, DT[i,j,by,having] which would fit into the syntax like

"apply i to DT, then do j for each by having having

though I haven't thought through all the problems that might raise. Anyway, the discussion at that FR may interest you: #788

@eantonya
Copy link
Contributor

@ywhuofu (morally) the default value of i is TRUE, of j is .SD. Therefore DT[i, ] is read as: "take DT, subset by i, then do .SD" or equivalently "take DT, subset by i". Similarly you can work out how DT[, j] is read.

The standard (most readable) syntax to accomplish what you wrote is:

DT[, .SD[v1 == func(v3)], by = v2]

with the faster version of it being: DT[DT[, .I[v1 == func(v3)], by = v2]$V1]. There is an FR and small hope that the two will eventually become equivalently fast.

@ywhcuhk
Copy link
Author

ywhcuhk commented Jul 28, 2015

@franknarf1 @eantonya , thank you very much for going through this with me. I've learned a lot. I will close this issue.

@ywhcuhk ywhcuhk closed this as completed Jul 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants