Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forderv handles complex input #3701

Merged
merged 33 commits into from
Jul 19, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
73a63ca
Closes #1444 -- setkey works on tables with complex columns
Jul 10, 2019
bdce4a5
extra test for group operation mentioned in issue
Jul 10, 2019
e6533dc
new tests for coverage
Jul 11, 2019
c210ede
missing arg
Jul 11, 2019
55bd501
Closes #1703 -- forderv handles complex input
Jul 12, 2019
2e8c996
slight re-tooling, now passing tests
Jul 13, 2019
e3a6aa6
more progress; but stonewalled by bmerge
Jul 13, 2019
37e8431
Merge branch 'master' of https://github.com/Rdatatable/data.table int…
Jul 13, 2019
52bcac0
moved new logic to C so e.g. bmerge can call it from there
Jul 13, 2019
bf59da2
some coverage tests, extension to rleid()
Jul 13, 2019
09c8b19
progress making ctwiddle (dtwiddle for cplx)
Jul 13, 2019
e5d1d1d
start preferring Rcomplex type
Jul 13, 2019
9a4cef4
switch to Rcomplex API
Jul 13, 2019
494468f
Merge branch 'master' of https://github.com/Rdatatable/data.table int…
Jul 13, 2019
e0b17b8
Merge branch 'cplx_setkey' into cplx_forder
Jul 14, 2019
761fe90
setkey now works on complex columns
Jul 14, 2019
7261011
ostensibly done uniqlist; progress on bmerge
Jul 14, 2019
25cdf0d
Merge branch 'master' into cplx_forder
Jul 17, 2019
681ab5d
Merge branch 'master' into cplx_forder
mattdowle Jul 17, 2019
81a941a
Merge branch 'master' into cplx_forder
mattdowle Jul 18, 2019
5f443cb
scale back attempts at bmerge, all of uniqlist
Jul 18, 2019
b922ee9
tidy up tests
Jul 18, 2019
b656541
unique also works
Jul 18, 2019
3c3228b
updated NEWS item & added coverage tests
Jul 18, 2019
009935c
one more nocov
Jul 18, 2019
f59dc57
actually hit LGLSXP branch!
Jul 18, 2019
0338226
more coverage
Jul 18, 2019
523ab00
Merge branch 'master' into cplx_forder
Jul 18, 2019
598097b
use direct double comparison instead of type punning
Jul 19, 2019
2df22a3
replaced big block with smaller modification to the main loop; also h…
mattdowle Jul 19, 2019
dd7b24a
memcmp for complex instead of == on double
mattdowle Jul 19, 2019
5e81694
merge master
mattdowle Jul 19, 2019
a9d9165
news item tidy
mattdowle Jul 19, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,8 @@
identical(y1,y2) && identical(y1,y3)
# TRUE
```

19. Sorting now extended to complex vectors, [#1703](https://github.com/Rdatatable/data.table/issues/1703). Consistent with `base::order`, sorting is done lexicographically (`z1<z2` means `Re(z1) < Re(z2) | (Re(z1) == Re(z2) & Im(z1) < Im(z2))`).

#### BUG FIXES

Expand Down
2 changes: 1 addition & 1 deletion R/bmerge.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
# careful to only plonk syntax (full column) on i/x from now on otherwise user's i and x would change;
# this is why shallow() is very importantly internal only, currently.

supported = c("logical", "integer", "double", "character", "factor", "integer64")
supported = c(ORDERING_TYPES, "factor", "integer64")

getClass = function(x) {
ans = typeof(x)
Expand Down
2 changes: 1 addition & 1 deletion R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -829,7 +829,7 @@ replace_order = function(isub, verbose, env) {
if (!is.list(byval)) stop("'by' or 'keyby' must evaluate to a vector or a list of vectors (where 'list' includes data.table and data.frame which are lists, too)")
if (length(byval)==1L && is.null(byval[[1L]])) bynull=TRUE #3530 when by=(function()NULL)()
if (!bynull) for (jj in seq_len(length(byval))) {
if (!typeof(byval[[jj]]) %chin% c("integer","logical","character","double")) stop("column or expression ",jj," of 'by' or 'keyby' is type ",typeof(byval[[jj]]),". Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]")
if (!typeof(byval[[jj]]) %chin% ORDERING_TYPES) stop("column or expression ",jj," of 'by' or 'keyby' is type ",typeof(byval[[jj]]),". Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]")
}
tt = vapply_1i(byval,length)
if (any(tt!=xnrow)) stop("The items in the 'by' or 'keyby' list are length (",paste(tt,collapse=","),"). Each must be length ", xnrow, "; the same length as there are rows in x (after subsetting if i is provided).")
Expand Down
18 changes: 7 additions & 11 deletions R/setkey.R
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,9 @@ setkeyv = function(x, cols, verbose=getOption("datatable.verbose"), physical=TRU
}
if (identical(cols,"")) stop("cols is the empty string. Use NULL to remove the key.")
if (!all(nzchar(cols))) stop("cols contains some blanks.")
if (!length(cols)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if is redundant & contradictory -- just a few lines earlier (L47) has if (!length(cols)) as a warning. I guess this branch was some leftover or copy/paste from setkey... revealed by Codecov

cols = colnames(x) # All columns in the data.table, usually a few when used in this form
} else {
# remove backticks from cols
cols = gsub("`", "", cols, fixed = TRUE)
miss = !(cols %chin% colnames(x))
if (any(miss)) stop("some columns are not in the data.table: ", paste(cols[miss], collapse=","))
}
cols = gsub("`", "", cols, fixed = TRUE)
miss = !(cols %chin% colnames(x))
if (any(miss)) stop("some columns are not in the data.table: ", paste(cols[miss], collapse=","))

## determine, whether key is already present:
if (identical(key(x),cols)) {
Expand All @@ -83,7 +78,7 @@ setkeyv = function(x, cols, verbose=getOption("datatable.verbose"), physical=TRU
if (".xi" %chin% names(x)) stop("x contains a column called '.xi'. Conflicts with internal use by data.table.")
for (i in cols) {
.xi = x[[i]] # [[ is copy on write, otherwise checking type would be copying each column
if (!typeof(.xi) %chin% c("integer","logical","character","double")) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported as a key column type, currently.")
if (!typeof(.xi) %chin% ORDERING_TYPES) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported as a key column type, currently.")
}
if (!is.character(cols) || length(cols)<1L) stop("Internal error. 'cols' should be character at this point in setkey; please report.") # nocov

Expand Down Expand Up @@ -181,6 +176,7 @@ is.sorted = function(x, by=seq_along(x)) {
# Important to call forder.c::fsorted here, for consistent character ordering and numeric/integer64 twiddling.
}

ORDERING_TYPES = c('logical', 'integer', 'double', 'complex', 'character')
forderv = function(x, by=seq_along(x), retGrp=FALSE, sort=TRUE, order=1L, na.last=FALSE)
{
if (!(sort || retGrp)) stop("At least one of retGrp or sort must be TRUE")
Expand Down Expand Up @@ -208,7 +204,7 @@ forderv = function(x, by=seq_along(x), retGrp=FALSE, sort=TRUE, order=1L, na.las
stop("'by' is type 'double' and one or more items in it are not whole integers")
}
by = as.integer(by)
if ( (length(order) != 1L && length(order) != length(by)) || any(!order %in% c(1L, -1L)) )
if ( (length(order) != 1L && length(order) != length(by)) || !all(order %in% c(1L, -1L)) )
stop("x is a list, length(order) must be either =1 or =length(by) and each value should be 1 or -1 for each column in 'by', corresponding to ascending or descending order, respectively. If length(order) == 1, it will be recycled to length(by).")
if (length(order) == 1L) order = rep(order, length(by))
}
Expand Down Expand Up @@ -330,7 +326,7 @@ setorderv = function(x, cols = colnames(x), order=1L, na.last=FALSE)
if (".xi" %chin% colnames(x)) stop("x contains a column called '.xi'. Conflicts with internal use by data.table.")
for (i in cols) {
.xi = x[[i]] # [[ is copy on write, otherwise checking type would be copying each column
if (!typeof(.xi) %chin% c("integer","logical","character","double")) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported for ordering currently.")
if (!typeof(.xi) %chin% ORDERING_TYPES) stop("Column '",i,"' is type '",typeof(.xi),"' which is not supported for ordering currently.")
}
if (!is.character(cols) || length(cols)<1L) stop("Internal error. 'cols' should be character at this point in setkey; please report.") # nocov

Expand Down
77 changes: 63 additions & 14 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -11701,11 +11701,11 @@ test(1844.2, forder(DT,V1,V2,na.last=NA), INT(2,1,3,0,4)) # prior to v1.12.0 th
# now with two NAs in that 2-group covers forder.c:forder line 1269 starting: else if (nalast == 0 && tmp==-2) {
DT = data.table(c("a","a","a","b","b"),c(2,1,3,NA,NA))
test(1844.3, forder(DT,V1,V2,na.last=NA), INT(2,1,3,0,0))
DT = data.table((0+0i)^(-3:3), 7:1)
test(1844.4, forder(DT,V1,V2), error="Column 1 of by= (1) is type 'complex', not yet supported")
test(1844.5, forder(DT,V2,V1), error="Column 2 of by= (2) is type 'complex', not yet supported")
DT = data.table((0+0i)^(-3:3), c(5L,5L,1L,2L,2L,2L,2L))
test(1844.6, forder(DT,V2,V1), error="Column 2 of by= (2) is type 'complex', not yet supported")
DT = data.table(as.raw(0:6), 7:1)
test(1844.4, forder(DT,V1,V2), error="Column 1 of by= (1) is type 'raw', not yet supported")
test(1844.5, forder(DT,V2,V1), error="Column 2 of by= (2) is type 'raw', not yet supported")
DT = data.table(as.raw(0:6), c(5L,5L,1L,2L,2L,2L,2L))
test(1844.6, forder(DT,V2,V1), error="Column 2 of by= (2) is type 'raw', not yet supported")

# fix for non-equi joins issue #1991. Thanks to Henrik for the nice minimal example.
d1 <- data.table(x = c(rep(c("b", "a", "c"), each = 3), c("a", "b")), y = c(rep(c(1, 3, 6), 3), 6, 6), id = 1:11)
Expand Down Expand Up @@ -13158,9 +13158,9 @@ setnames(DT, '.xi')
setkey(DT, NULL)
test(1962.037, setkey(DT, .xi),
error = "x contains a column called '.xi'")
DT = data.table(a = 1+3i)
DT = data.table(a = as.raw(0))
test(1962.038, setkey(DT, a),
error = "Column 'a' is type 'complex'")
error = "Column 'a' is type 'raw'")

test(1962.039, is.sorted(3:1, by = 'x'),
error = 'x is vector but')
Expand Down Expand Up @@ -13216,8 +13216,8 @@ test(1962.064, setorderv(copy(DT)),
test(1962.065, setorderv(DT, 'c'), error = 'some columns are not in the data.table')
setnames(DT, 1L, '.xi')
test(1962.066, setorderv(DT, 'b'), error = "x contains a column called '.xi'")
test(1962.067, setorderv(data.table(a = 1+3i), 'a'),
error = "Column 'a' is type 'complex'")
test(1962.067, setorderv(data.table(a = as.raw(0)), 'a'),
error = "Column 'a' is type 'raw'")

DT = data.table(
color = c("yellow", "red", "green", "red", "green", "red",
Expand Down Expand Up @@ -13743,7 +13743,7 @@ test(1984.05, DT[ , sum(b), keyby = c, verbose = TRUE],
### hitting byval = eval(bysub, setattr(as.list(seq_along(xss)), ...)
test(1984.06, DT[1:3, sum(a), by=b:c], data.table(b=10:8, c=1:3, V1=1:3))
test(1984.07, DT[, sum(a), by=call('sin',pi)], error='must evaluate to a vector or a list of vectors')
test(1984.08, DT[, sum(a), by=1+3i], error='column or expression.*type complex')
test(1984.08, DT[, sum(a), by=as.raw(0)], error='column or expression.*type raw')
test(1984.09, DT[, sum(a), by=.(1,1:2)], error='The items.*list are length [(]1,2[)].*Each must be length 10; .*rows in x.*after subsetting')
options('datatable.optimize' = Inf)
test(1984.10, DT[ , 1, by = .(a %% 2), verbose = TRUE],
Expand Down Expand Up @@ -14755,14 +14755,14 @@ dt1 <- data.table(int = 1L:10L,
bool = c(rep(FALSE, 9), TRUE),
char = letters[1L:10L],
fact = factor(letters[1L:10L]),
complex = as.complex(1:5))
raw = as.raw(1:5))
dt2 <- data.table(int = 1L:5L,
doubleInt = as.numeric(1:5),
realDouble = seq(0.5, 2.5, by = 0.5),
bool = TRUE,
char = letters[1L:5L],
fact = factor(letters[1L:5L]),
complex = as.complex(1:5))
raw = as.raw(1:5))
if (test_bit64) {
dt1[, int64 := as.integer64(c(1:9, 3e10))]
dt2[, int64 := as.integer64(c(1:4, 3e9))]
Expand All @@ -14779,8 +14779,8 @@ test(2044.08, nrow(dt1[dt2, on="fact==fact", verbose=TRUE]), nrow(dt
if (test_bit64) {
test(2044.09, nrow(dt1[dt2, on = "int64==int64", verbose=TRUE]), nrow(dt2), output="No coercion needed")
}
test(2044.10, dt1[dt2, on = "int==complex"], error = "i.complex is type complex which is not supported by data.table join")
test(2044.11, dt1[dt2, on = "complex==int"], error = "x.complex is type complex which is not supported by data.table join")
test(2044.10, dt1[dt2, on = "int==raw"], error = "i.raw is type raw which is not supported by data.table join")
test(2044.11, dt1[dt2, on = "raw==int"], error = "x.raw is type raw which is not supported by data.table join")
# incompatible types
test(2044.20, dt1[dt2, on="bool==int"], error="Incompatible join types: x.bool (logical) and i.int (integer)")
test(2044.21, dt1[dt2, on="bool==doubleInt"], error="Incompatible join types: x.bool (logical) and i.doubleInt (double)")
Expand Down Expand Up @@ -15242,6 +15242,55 @@ ll = list(1:2, NULL, 3:4)
test(2063.4, transpose(ll, ignore=TRUE), list(c(1L, 3L), c(2L, 4L)))
options(old)

# forderv (and downstream functions) handles complex vector input, part of #3690
DT = data.table(
a = c(1L, 1L, 8L, 2L, 1L, 9L, 3L, 2L, 6L, 6L),
b = c(3+9i, 10+5i, 8+2i, 10+4i, 3+3i, 1+2i, 5+1i, 8+1i, 8+2i, 10+6i),
c = 6
)
test(2064.01, DT[order(a, b)], DT[base::order(a, b)])
test(2064.02, DT[order(a, -b)], DT[base::order(a, -b)])
test(2064.03, forderv(DT$b, order = 1L), base::order(DT$b))
test(2064.04, forderv(DT$b, order = -1L), base::order(-DT$b))
test(2064.05, forderv(DT, by = 2:1), forderv(DT[ , 2:1]))
test(2064.06, forderv(DT, by = 2:1, order = c(1L, -1L)), DT[order(b, -a), which = TRUE])

# downstreams of forder
DT = data.table(
z = c(0, 0, 1, 1, 2, 3) + c(1, 1, 2, 2, 3, 4)*1i,
grp = rep(1:2, 3L),
v = c(3, 1, 4, 1, 5, 9)
)
unq_z = 0:3 + (1:4)*1i
test(2064.07, DT[ , .N, by=z], data.table(z=unq_z, N=c(2L, 2L, 1L, 1L)))
# uniqlist.c needs work
# test(2064.08, DT[ , .N, keyby = z],
# DT = setkey(copy(DT[.N:1]), z)
# test(2964.09, key(DT), 'z')
# test(2964.10, DT
test(2964.11, dcast(DT, z ~ grp, value.var='v', fill=0),
data.table(z=unq_z, `1`=c(3, 4, 5, 0), `2`=c(1, 1, 0, 9), key='z'))
test(2964.12, frank(DT$z), c(1.5, 1.5, 3.5, 3.5, 5, 6))
test(2964.13, frank(DT$z, ties.method='max'), c(2L, 2L, 4L, 4L, 5L, 6L))
test(2964.14, frank(-DT$z, ties.method='min'), c(5L, 5L, 3L, 3L, 2L, 1L))
test(2964.15, DT[ , rowid(z, grp)], rep(1L, 6L))
test(2964.16, DT[ , rowid(z)], c(1:2, 1:2, 1L, 1L))
test(2964.17, rleid(c(1i, 1i, 1i, 0, 0, 1-1i, 2+3i, 2+3i)), rep(1:4, c(3:1, 2L)))

## assorted coverage tests from along the way
if (test_bit64) {
test(2964.50, is.sorted(as.integer64(10:1)), FALSE)
test(2964.51, is.sorted(as.integer64(1:10)))
}
# sort by vector outside of table
ord = 3:1
test(2964.52, forder(data.table(a = 3:1), ord), 3:1)

# DT1 = data.table(z = c(0+1i, 2-3i, 4+1i))
# DT2 = data.table(z = c(2-3i, 0+1i, 0+0i))
# DT1[DT2, on = 'z']



###################################
# Add new tests above this line #
Expand Down
50 changes: 50 additions & 0 deletions src/forder.c
Original file line number Diff line number Diff line change
Expand Up @@ -440,11 +440,61 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrpArg, SEXP sortGroupsArg, SEXP ascArg, S
if (!isInteger(by) || !LENGTH(by)) error("DT has %d columns but 'by' is either not integer or is length 0", length(DT)); // seq_along(x) at R level
if (!isInteger(ascArg) || LENGTH(ascArg)!=LENGTH(by)) error("Either 'ascArg' is not integer or its length (%d) is different to 'by's length (%d)", LENGTH(ascArg), LENGTH(by));
nrow = length(VECTOR_ELT(DT,0));
int n_cplx = 0;
for (int i=0; i<LENGTH(by); i++) {
if (INTEGER(by)[i] < 1 || INTEGER(by)[i] > length(DT))
error("'by' value %d out of range [1,%d]", INTEGER(by)[i], length(DT));
if ( nrow != length(VECTOR_ELT(DT, INTEGER(by)[i]-1)) )
error("Column %d is length %d which differs from length of column 1 (%d)\n", INTEGER(by)[i], length(VECTOR_ELT(DT, INTEGER(by)[i]-1)), nrow);
if (TYPEOF(VECTOR_ELT(DT, i)) == CPLXSXP) n_cplx++;
}
if (n_cplx) {
// we don't expect users to need complex sorting extensively
// or on massive data sets, so we take the approach of
// splitting a complex vector into its real & imaginary parts
// and using the regular forderv machinery to sort; a baremetal
// implementation would at root do the same, but the approach
// here is a bit more slapdash with respect to memory efficiency
// (seen clearly here at C from the 3+2*n_cplx PROTECT() calls)
int n_out = length(by) + n_cplx;
SEXP new_dt = PROTECT(allocVector(VECSXP, n_out)); n_protect++;
SEXP new_asc = PROTECT(allocVector(INTSXP, n_out)); n_protect++;
// will be simply 1:n_out
SEXP new_by = PROTECT(allocVector(INTSXP, n_out)); n_protect++;
int j = 0;
for (int i=0; i<length(by); i++) {
int by_idx = INTEGER(by)[i]-1;
if (TYPEOF(VECTOR_ELT(DT, by_idx)) == CPLXSXP) {
// I don't see any shorthand way of splitting of the real&imaginary components,
// i.e., a shorthand way of doing Re(z), Im(z). That includes searching all of
// the r-source code & all of the r-devel archives. So just reproduce Re(), Im()
// as done in do_cmathfuns in complex.c
SEXP realPart = PROTECT(allocVector(REALSXP, nrow)); n_protect++;
SEXP cplxPart = PROTECT(allocVector(REALSXP, nrow)); n_protect++;
double *pre = REAL(realPart);
double *pim = REAL(cplxPart);
Rcomplex *pz = COMPLEX(VECTOR_ELT(DT, by_idx));
for (int i = 0; i < nrow; i++) {
pre[i] = pz[i].r;
pim[i] = pz[i].i;
}
SET_VECTOR_ELT(new_dt, j, realPart);
SET_VECTOR_ELT(new_dt, j+1, cplxPart);
INTEGER(new_asc)[j] = INTEGER(ascArg)[i];
INTEGER(new_asc)[j+1] = INTEGER(ascArg)[i];
INTEGER(new_by)[j] = j+1;
INTEGER(new_by)[j+1] = j+2;
j += 2;
} else {
SET_VECTOR_ELT(new_dt, j, VECTOR_ELT(DT, by_idx));
INTEGER(new_asc)[j] = INTEGER(ascArg)[i];
INTEGER(new_by)[j] = j+1;
j += 1;
}
}
DT = new_dt;
ascArg = new_asc;
by = new_by;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit nervous about this branch. I looks constrained to only when n_cplx>0 (which is good that it's isolated), and I know what it is doing (splitting complex into real and imaginary columns) but I don't see how it's doing it. I'm going to need more time to review it and make sure it doesn't impact anything unexpected.

Copy link
Member

@mattdowle mattdowle Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about dealing with complex columns further down in this switch. (GitHub wouldn't let me add a comment on that line as it's outside the diff perhaps? so I pasted an image)
image

Copy link
Member Author

@MichaelChirico MichaelChirico Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when i first took a look i decided against that (I think my initial reaction was range doesn't make sense for complex, but of course it does because they're well-ordered using the lexicographic rule). will take another look.

The n_cplx branch is a bit messy, and took me several iterations to get there as I kept failing a few more tests each time. I think it can be cleaned up a little by snazzier use of the j index.

Idea is: receive by which is say length n, m of which are complex columns.

Instead of sorting using the input list DT, do so using a new list (constructed as new_dt).

If there are no complex columns, new_dt[[1]] is DT[[ by[1] ]], new_dt[[2]] is DT[[ by[2] ]], and so on.

If there are complex columns, say the first is at the third index of by, then new_dt[[3]] is Re( DT[[ by[3] ]] ) and new_dt[[4]] is Im( DT[[ by[3] ]] ). Then continuing on as usual. That "bump" of the index by a complex column is what made things complicated.

Similar thinking applies to the re-mapped ascArg; by simply gets re-mapped to seq_along(new_dt)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't thinking of doing a single range for complex. The thought was to trick the loop into doing a complex column twice. The first time doing range for the real part only. The 2nd time calling range for the imaginary part only. There might need to be new range_complex_r and range_complex_i perhaps.

Copy link
Member

@mattdowle mattdowle Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other option is to postpone this PR to a later date. Elio's response above confirms that setkey-on-complex and grouping-on-complex isn't high priority. It was complex value columns that was the main thing; which you did and have been merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest impediment to that is I don't see a way to get a double * object pointing to the real/imaginary vectors (like REAL(COMPLEX(x)) or COMPLEX(x).r, hope that makes sense), which is what all of the functions for REAL columns take as input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would nice for completeness to get this merged and closed #1703, yes. But I'm also an eye that we've both been working on complex number support for 3 days now (elapsed) and continuing. And Elio said going this far (sorting complex) isn't needed. There isn't any demand for this from users as far as I can see.

True... I'm just loath to close the PR given that it's passing tests. Also understand the tradeoff that is the maintenance burden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest impediment to that is I don't see a way to get a double * object pointing to the real/imaginary vectors (like REAL(COMPLEX(x)) or COMPLEX(x).r, hope that makes sense), which is what all of the functions for REAL columns take as input.

Yes makes sense. That's why I wrote above: "There might need to be new range_complex_r and range_complex_i perhaps." But I didn't look at it closely.

maintenance burden

It's not a burden in the sense that more code is just always generally bad. It's that this particular change could cause hard to trace bugs because of the type of change it is to such a core function. It's risk. And it's time in reviewing it.

it's passing tests

But this shows the tests aren't sufficient: the cast isn't correct but it's still passing tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this shows the tests aren't sufficient: the cast isn't correct but it's still passing tests.

Very true, I meant more that it's passing existing tests (so "no" existing functionality is being affected).

Would adding a lot more tests to the suite for the new code be a good direction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good direction would be to focus on the most requested issues: #3189


if (!isLogical(retGrpArg) || LENGTH(retGrpArg)!=1 || INTEGER(retGrpArg)[0]==NA_LOGICAL) error("retGrp must be TRUE or FALSE");
Expand Down
13 changes: 13 additions & 0 deletions src/uniqlist.c
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "data.table.h"
#include <complex.h>

// DONE: return 'uniqlist' as a vector (same as duplist) and write a separate function to get group sizes
// Also improvements for numeric type with a hack of checking unsigned int (to overcome NA/NaN/Inf/-Inf comparisons) (> 2x speed-up)
Expand Down Expand Up @@ -200,6 +201,11 @@ SEXP rleid(SEXP l, SEXP cols) {
// 8 bytes of bits are identical. For real (no rounding currently) and integer64
// long long == 8 bytes checked in init.c
break;
case CPLXSXP: {
// tried to make long long complex * but got a warning that it's a GNU extension
double complex *pz = (double complex *)COMPLEX(jcol);
same = (long long)creal(pz[i]) == (long long)creal(pz[i-1]) && (long long)cimag(pz[i]) == (long long)cimag(pz[i-1]);
} break;
default :
error("Type '%s' not supported", type2char(TYPEOF(jcol))); // # nocov
}
Expand Down Expand Up @@ -232,6 +238,13 @@ SEXP rleid(SEXP l, SEXP cols) {
}
}
break;
case CPLXSXP: {
double complex *pzjcol = (double complex *)COMPLEX(jcol);
for (R_xlen_t i=1; i<nrow; i++) {
bool same = (long long)creal(pzjcol[i]) == (long long)creal(pzjcol[i-1]) && (long long)cimag(pzjcol[i]) == (long long)cimag(pzjcol[i-1]);
ians[i] = (grp += !same);
}
} break;
default :
error("Type '%s' not supported", type2char(TYPEOF(jcol)));
}
Expand Down