-
Notifications
You must be signed in to change notification settings - Fork 991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forderv handles complex input #3701
Changes from 6 commits
73a63ca
bdce4a5
e6533dc
c210ede
55bd501
2e8c996
e3a6aa6
37e8431
52bcac0
bf59da2
09c8b19
e5d1d1d
9a4cef4
494468f
e0b17b8
761fe90
7261011
25cdf0d
681ab5d
81a941a
5f443cb
b922ee9
b656541
3c3228b
009935c
f59dc57
0338226
523ab00
598097b
2df22a3
dd7b24a
5e81694
a9d9165
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -440,11 +440,61 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrpArg, SEXP sortGroupsArg, SEXP ascArg, S | |
if (!isInteger(by) || !LENGTH(by)) error("DT has %d columns but 'by' is either not integer or is length 0", length(DT)); // seq_along(x) at R level | ||
if (!isInteger(ascArg) || LENGTH(ascArg)!=LENGTH(by)) error("Either 'ascArg' is not integer or its length (%d) is different to 'by's length (%d)", LENGTH(ascArg), LENGTH(by)); | ||
nrow = length(VECTOR_ELT(DT,0)); | ||
int n_cplx = 0; | ||
for (int i=0; i<LENGTH(by); i++) { | ||
if (INTEGER(by)[i] < 1 || INTEGER(by)[i] > length(DT)) | ||
error("'by' value %d out of range [1,%d]", INTEGER(by)[i], length(DT)); | ||
if ( nrow != length(VECTOR_ELT(DT, INTEGER(by)[i]-1)) ) | ||
error("Column %d is length %d which differs from length of column 1 (%d)\n", INTEGER(by)[i], length(VECTOR_ELT(DT, INTEGER(by)[i]-1)), nrow); | ||
if (TYPEOF(VECTOR_ELT(DT, i)) == CPLXSXP) n_cplx++; | ||
} | ||
if (n_cplx) { | ||
// we don't expect users to need complex sorting extensively | ||
// or on massive data sets, so we take the approach of | ||
// splitting a complex vector into its real & imaginary parts | ||
// and using the regular forderv machinery to sort; a baremetal | ||
// implementation would at root do the same, but the approach | ||
// here is a bit more slapdash with respect to memory efficiency | ||
// (seen clearly here at C from the 3+2*n_cplx PROTECT() calls) | ||
int n_out = length(by) + n_cplx; | ||
SEXP new_dt = PROTECT(allocVector(VECSXP, n_out)); n_protect++; | ||
SEXP new_asc = PROTECT(allocVector(INTSXP, n_out)); n_protect++; | ||
// will be simply 1:n_out | ||
SEXP new_by = PROTECT(allocVector(INTSXP, n_out)); n_protect++; | ||
int j = 0; | ||
for (int i=0; i<length(by); i++) { | ||
int by_idx = INTEGER(by)[i]-1; | ||
if (TYPEOF(VECTOR_ELT(DT, by_idx)) == CPLXSXP) { | ||
// I don't see any shorthand way of splitting of the real&imaginary components, | ||
// i.e., a shorthand way of doing Re(z), Im(z). That includes searching all of | ||
// the r-source code & all of the r-devel archives. So just reproduce Re(), Im() | ||
// as done in do_cmathfuns in complex.c | ||
SEXP realPart = PROTECT(allocVector(REALSXP, nrow)); n_protect++; | ||
SEXP cplxPart = PROTECT(allocVector(REALSXP, nrow)); n_protect++; | ||
double *pre = REAL(realPart); | ||
double *pim = REAL(cplxPart); | ||
Rcomplex *pz = COMPLEX(VECTOR_ELT(DT, by_idx)); | ||
for (int i = 0; i < nrow; i++) { | ||
pre[i] = pz[i].r; | ||
pim[i] = pz[i].i; | ||
} | ||
SET_VECTOR_ELT(new_dt, j, realPart); | ||
SET_VECTOR_ELT(new_dt, j+1, cplxPart); | ||
INTEGER(new_asc)[j] = INTEGER(ascArg)[i]; | ||
INTEGER(new_asc)[j+1] = INTEGER(ascArg)[i]; | ||
INTEGER(new_by)[j] = j+1; | ||
INTEGER(new_by)[j+1] = j+2; | ||
j += 2; | ||
} else { | ||
SET_VECTOR_ELT(new_dt, j, VECTOR_ELT(DT, by_idx)); | ||
INTEGER(new_asc)[j] = INTEGER(ascArg)[i]; | ||
INTEGER(new_by)[j] = j+1; | ||
j += 1; | ||
} | ||
} | ||
DT = new_dt; | ||
ascArg = new_asc; | ||
by = new_by; | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit nervous about this branch. I looks constrained to only when n_cplx>0 (which is good that it's isolated), and I know what it is doing (splitting complex into real and imaginary columns) but I don't see how it's doing it. I'm going to need more time to review it and make sure it doesn't impact anything unexpected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when i first took a look i decided against that (I think my initial reaction was The Idea is: receive Instead of sorting using the input list If there are no complex columns, If there are complex columns, say the first is at the third index of Similar thinking applies to the re-mapped There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wasn't thinking of doing a single range for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The other option is to postpone this PR to a later date. Elio's response above confirms that setkey-on-complex and grouping-on-complex isn't high priority. It was There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the biggest impediment to that is I don't see a way to get a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
True... I'm just loath to close the PR given that it's passing tests. Also understand the tradeoff that is the maintenance burden. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes makes sense. That's why I wrote above: "There might need to be new range_complex_r and range_complex_i perhaps." But I didn't look at it closely.
It's not a burden in the sense that more code is just always generally bad. It's that this particular change could cause hard to trace bugs because of the type of change it is to such a core function. It's risk. And it's time in reviewing it.
But this shows the tests aren't sufficient: the cast isn't correct but it's still passing tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Very true, I meant more that it's passing existing tests (so "no" existing functionality is being affected). Would adding a lot more tests to the suite for the new code be a good direction? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A good direction would be to focus on the most requested issues: #3189 |
||
|
||
if (!isLogical(retGrpArg) || LENGTH(retGrpArg)!=1 || INTEGER(retGrpArg)[0]==NA_LOGICAL) error("retGrp must be TRUE or FALSE"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
if
is redundant & contradictory -- just a few lines earlier (L47) hasif (!length(cols))
as awarning
. I guess this branch was some leftover or copy/paste fromsetkey
... revealed by Codecov