TODO items from tests.Rraw #2572

MichaelChirico · 2018-01-18T01:57:11Z

Items cut out from the end of tests.Rraw when this issue was first created:

TODO items sprinkled throughout the rest of the file:

TO DO items sprinkled throughout the rest of the file:

The text was updated successfully, but these errors were encountered:

MichaelChirico · 2018-02-02T17:19:26Z

My idea for testing:

use looped := vs set test in example(":=") or example(setnames) to test overhead in [.data.table is tested to stay low in future.

is to run:

t1 = system.time({
    DT = data.table(a = 1:10)
    for (jj in paste0(seq_len(100))) DT[ , (jj) := 10:1]
})

t2 = system.time({
    DT = data.table(a = 1:10)
    for (jj in paste0(seq_len(100))) set(DT, , jj, 10:1)
})

ratio = t1['elapsed']/t2['elapsed']

Then error if ratio is outside (1, R]. On my machine right now, the ratio is about 8; I guess setting R to 10 would be too tight/potentially machine-dependent. How about choosing R = 50? 25?

Could also make the loop more expensive (absolute time of t1 is currently about .025 on my machine -- very fast, but not sure what the right run time is for the test suite)

MichaelChirico · 2018-02-03T01:16:23Z

@mattdowle the following doesn't appear to be true anymore? unless I missed something about what's intended...

test that ordered subsets when i is unkeyed now retain x's key (using is.sorted(f__))

DT1 = data.table(ID = 1:4, v = 4:1, key = 'ID')
DT2 = data.table(ID = 3:6, w = c(3, 4, 1, 2))
key(DT1[DT2])
# NULL
key(DT1[DT2, on = 'ID'])
# NULL
DT1[DT2]
#    ID  v w
# 1:  3  2 3
# 2:  4  1 4
# 3:  5 NA 1
# 4:  6 NA 2

FWIW, I see that the line is.sorted(f__) has been commented out.

Also, I wasn't able to track down any question on SO on any March 14 mentioning eval or with you as a user (user:403310):

add FAQ that eval() is evaled in calling frame so don't need a, then update SO question of 14 March. See the test using variable name same as column name. Actually, is that true? Need "..J".

MichaelChirico · 2018-02-07T06:29:34Z

@st-pasha you're more familiar with the fread test suite, is this covered by now?

Test mid read bump of logical T/F to character, collapse back to T and F.

st-pasha · 2018-02-09T00:12:00Z

I don't remember if there is such a test specifically, but it's easy to add:

DT = data.table(A=rep("True", 2200), B="FALSE")
DT[111, A:="here be"]
DT[111, B:="dragons"]
fwrite(DT, f<-tempfile())
data.table:::test(9999, fread(f, verbose=TRUE), DT, output="Column 2 (\"B\") bumped from 'bool8' to 'string' due to <<dragons>> on row 110")

MichaelChirico · 2018-02-09T01:59:14Z

@st-pasha thanks, I wasn't sure if there's an exact row number to pick to ensure re-read.

Tried beefing it up, but fread doesn't take "T" "F" to be bool?

DT = data.table(A=rep("True", 2200), B="FALSE", C="T", D='0')
DT[111, LETTERS[1:4] := .("here", "be", "dra", "gons")]
fwrite(DT, f<-tempfile())
fread(f, verbose=TRUE)

MichaelChirico · 2018-02-09T08:23:02Z

@mattdowle this one appears to be obsolete as well?

check the "j is named list could be inefficient" message from verbose than Chris N showed recently to 15 May

grep "named list" R/* comes up blank.

MichaelChirico · 2018-02-09T15:58:17Z

re:

tests on double in add hoc by

I am making a test along the lines of:

DT = data.table(a = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), v = 1:12,
                jig = 10^(289:300))
DT[ , sum(v), by = .(a_jiggle = a + jig * .Machine$double.xmin)]
#     a_jiggle V1
#  1:        1  6
#  2:        1  4
#  3:        1  5
#  4:        1  6
#  5:        2  7
#  6:        2  8
#  7:        2  9
#  8:        2 10
#  9:        2 11
# 10:        2 12

I seem to recall it being documented somewhere what exactly the internal limits of accuracy are/how to query this for data.table; on my machine, it appears that roughly disturbances less than 1e-17 don't make a difference... not sure the robust/machine-independent way to design this test, however.

…#2572. Thanks to Michael C.

MichaelChirico · 2019-02-11T14:49:54Z

use looped := vs set test in example(":=") or example(setnames) to test overhead in [.data.table is tested to stay low in future.

@jangorecki this can't make it to the CRAN test suite, but do we have anywhere we could readily drop in such a test (more like a dashboard, really) in dev?

MichaelChirico · 2019-02-11T15:19:22Z

test and highlight in docs that negatives are fine and fast in forderv (ref R wish #15644)

?forder already mentions the use of - in both setorder and setorderv. If anything the documentation can be cleaned up -- hardly relevant to be pointing out differences of 1.9.2 vs 1.9.4 (i.e. 4.5-5 years ago)... that set ?setorder could go for a tidying

MichaelChirico · 2019-02-12T01:06:05Z

check the "j is named list could be inefficient" message from verbose than Chris N showed recently to 15 May

grep "named list" R/* returns nothing; I think this is obsolete

MichaelChirico · 2019-02-12T01:08:42Z

tests on double in add hoc by

IIUC we're not really supposed to be testing things like numerical accuracy. Limitations on double sorting are spelled out in the documentation IIRC... and anyway we haven't seen any issues raised on this in a long time. So I think this is covered.

jangorecki · 2019-02-12T04:57:20Z

use looped := vs set test in example(":=") or example(setnames) to test overhead in [.data.table is tested to stay low in future.

@jangorecki this can't make it to the CRAN test suite, but do we have anywhere we could readily drop in such a test (more like a dashboard, really) in dev?

There is benchmark.Rraw, or if there is no, there is a plan to have one. This will be proper place for all tests that needs to call system.time(). It should be now easier to add it as summary footer of Rraw script has been moved to test.data.table().

MichaelChirico · 2019-02-13T02:54:17Z

FWIW, did the following benchmark of set vs. :=:

library(microbenchmark)
library(data.table)

set.seed(5419681)

BB = 100L
times = CJ(rows = 10^(3:6), cols = c(10, 50, 100, 1000), 
           type = c(':=', 'set'), repetition = 1:BB)

for (ii in seq_len(nrow(times))) {
  gc()
  times[ii, duration := switch(type, ':=' = {
    DT = data.table(ID = seq_len(rows))
    t0 = get_nanotime()
    DT[ , paste0('V', 1:cols) := lapply(integer(cols), function(...) 1L)]
    get_nanotime() - t0
  }, 'set' = {
    DT = data.table(ID = seq_len(rows))
    t0 = get_nanotime()
    for (jj in 1:cols) set(DT, , paste0('V', jj), 1L)
    get_nanotime() - t0
  })]
}

times[ , .(ratio = duration[type == ':=']/duration[type == 'set'],
           avg = mean(duration)), keyby = .(rows, cols, repetition)
       ][ , as.list(c(absolute_seconds = mean(avg)/1e9, summary(ratio))), 
          keyby = .(rows, cols)]
#      rows cols absolute_seconds      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
#  1: 1e+03   10     0.0003131304 3.0605254 3.8471289 4.0469033 4.1023044 4.3067497  6.915743
#  2: 1e+03   50     0.0008448426 1.0162382 1.3855990 1.4398864 1.5730537 1.5222325 11.719836
#  3: 1e+03  100     0.0014121146 0.3493482 1.1065934 1.1422921 1.1502817 1.2021676  1.431746
#  4: 1e+03 1000     0.0182234995 0.2997860 0.4154354 0.4314394 0.4375887 0.4445711  0.769559
#  5: 1e+04   10     0.0006716579 1.5684996 1.8859146 1.9747949 1.9763925 2.0480862  2.625070
#  6: 1e+04   50     0.0024221906 0.7199252 1.0999812 1.1368900 1.1405756 1.1763198  2.045857
#  7: 1e+04  100     0.0049328084 0.4010431 1.0242058 1.0517822 1.0688827 1.0850502  2.420760
#  8: 1e+04 1000     0.0540806265 0.5498318 0.7804711 0.8012923 0.8062797 0.8221343  1.134574
#  9: 1e+05   10     0.0038287573 0.8844346 1.1154718 1.1282488 1.1298522 1.1366939  2.021127
# 10: 1e+05   50     0.0280899392 0.7681287 0.9787483 0.9973578 0.9989867 1.0211156  1.274340
# 11: 1e+05  100     0.0600740518 0.5545713 0.9471064 0.9744283 0.9680503 0.9994554  1.086229
# 12: 1e+05 1000     0.4934547127 0.8932713 0.9409418 0.9509392 0.9609168 0.9675374  1.573699
# 13: 1e+06   10     0.0358802545 0.7430620 0.9720593 0.9927718 0.9938076 1.0076720  1.534889
# 14: 1e+06   50     0.2703836096 0.7398572 0.9533737 0.9785185 0.9942344 1.0259876  1.501244
# 15: 1e+06  100     0.4599221823 0.9430767 0.9883373 1.0063509 1.0110553 1.0279786  1.265928
# 16: 1e+06 1000     3.9212144387 0.8491473 0.9790677 0.9865773 0.9899165 0.9928261  1.220000
cat('\nTotal Time:', times[ , sum(duration)/1e9/60], 'minutes\n')
# Total Time: 17.8525 minutes

MichaelChirico · 2019-02-13T15:32:20Z

Add to warning about a previous copy that class<-, levels<- can also copy whole vector. Any fun<- form basically.

Not sure where it is the intention to put this. Axing barring that.

jangorecki · 2019-02-14T05:36:44Z

why assign rnorm? it costs
good to call gc before each timing start.

MichaelChirico · 2019-02-14T07:29:17Z

why assign rnorm? it costs

best to just assign TRUE?

will do re:gc

MichaelChirico · 2019-02-15T01:58:18Z

updated set/:= benchmark, 20x as fast when not using rnorm

MichaelChirico · 2019-02-16T17:30:42Z

!make sure explicitly that unnamed lists are being executed by dogroups!

honestly no idea what this is about. IINM dogroups only entryway in R is when computing groups and !GForce here.

Pretty trivial to see that there's plenty of coverage of this in tests already (e.g. some places in #3406 where optimization is turned off, among probably hundreds of others).

Alternative interpretation would be that .ok should be FALSE if j is a named list? Though this is obviously not the case (maybe it was at some point?)

Gonna mark this one done

MichaelChirico · 2019-02-16T17:31:34Z

tests of freading classes like Date and the verbose messages there.

Marking this complete within this issue since it's covered in a separate PR/issue already

* ignore .Rprofile if present * Closes #3405 -- more careful about preset datatable.optimize * clean up setting/unsetting of options in testing * more thorough testing of datatable.optimize levels, part of #2572 * prettyprint.char needs reset

MichaelChirico · 2019-02-17T04:13:00Z

confirm with Matt the behavior of rbindlist w.r.t. naming of input in edge cases -- rbindlist should look for the first non-empty data.table - New changes (from Arun). Explanation below: Even if data.table is empty, as long as there are column names, they should be considered. Ex: What if all data.tables are empty? What'll be the column name then? If there are no names, then the first non-empty set of names will be allocated. I think this is the way to do it...

@mattdowle any feedback here?

MichaelChirico · 2019-02-17T04:25:52Z

"Ragged" files that should have fill = TRUE generate warnings, not errors; as such, we need another way to trigger a STOP() inside fread.c. options(warn=2) isn't enough.

@mattdowle ditto here... could you split off into a new issue if this is still a concern?

MichaelChirico · 2019-02-17T04:35:02Z

# TO DO: TODO: think of and add more tests for rbindlist

This is quite vague and 5 years old... closing as duplicate of the 40+ rbindlist issues outstnding

MichaelChirico · 2019-02-17T04:45:27Z

allow.cartesian is ignored if jsub[1L] has :=; could still warn if i has duplicates

AFAICT this is subsumed by #3077; closing here

MichaelChirico · 2019-02-17T04:47:57Z

Add comment=="#". Ensure only after last field is observed.

Presumably this is just #856; closing here

MichaelChirico · 2019-02-17T05:00:08Z

Add integer64 support for transpose

This is a full issue; filed as #3408 and closing here

MichaelChirico · 2019-02-17T05:05:17Z

Add a test when fill=FALSE, but blank.lines.skip=TRUE, when the same effect should happen

Unclear what "same effect" is referring to; I see tests 1584.x that seem to cover this case? @arunsrinivasan this is under you per git blame (admittedly 3+ years ago), any idea?

MichaelChirico · 2019-02-17T05:09:28Z

Fix and add test for cases like this: a,b,c\n1,2,3\n4,5,6\n7,8,9,6\n1,2,3\n... -- extra column in line 4, but we've only detected 3

This seems to be handled already:

s = "a,b,c
1,2,3
4,5,6
7,8,9,6
1,2,3"

fread(s)
#        a     b     c
#    <int> <int> <int>
# 1:     1     2     3
# 2:     4     5     6

Warning message:
In fread(s) :
Stopped early on line 4. Expected 3 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<7,8,9,6>>

fread(s, fill=TRUE)
#        a     b     c    V4
#    <int> <int> <int> <int>
# 1:     1     2     3    NA
# 2:     4     5     6    NA
# 3:     7     8     9     6
# 4:     1     2     3    NA

MichaelChirico · 2019-02-17T05:17:43Z

Add tests for nomatch=NA in non-equi-join suite (tests 1640-ish)... tested, but takes quite some time... so commented for now

@jangorecki this set of tests could be added to the GitLab suite?

Is the TODO covered by the commented tests?

# TODO: add tests for nomatch=NA..
# tested, but takes quite some time.. so commenting for now
# nqjoin_test(x, y, 3L,1643.0)
# nqjoin_test(dt1,dt2,3L,1652.0)

# nqjoin_test(  x,dt2,1L,1644.0) # without NA only in x
# nqjoin_test(  x,dt2,2L,1645.0)
# nqjoin_test(  x,dt2,3L,1646.0)
# nqjoin_test(dt1,  y,1L,1647.0) # without NA only in i
# nqjoin_test(dt1,  y,2L,1648.0)
# nqjoin_test(dt1,  y,3L,1649.0)

MichaelChirico · 2019-02-17T05:20:32Z

0.0 written as 0, but TODO #2398, probably related to the 2 lines after l==0 missing coverage in writeFloat64

As stated, this is just #2398

MichaelChirico · 2019-02-17T05:23:38Z

Add strict.numeric (default FALSE) to all.equal.data.table()?

@jangorecki can revisit this and file as a separate issue if still viable? But I guess no because of the difficulty in testing identical for numeric across platforms? Or could include it but discourage it "use at your own risk"?

MichaelChirico · 2019-02-17T05:28:08Z

Begin to deprecate logicalAsInt argument

This was 1.11.0 (May 2018); can begin the deprecation process with next release?

MichaelChirico · 2019-02-17T05:29:41Z

#1416 TODO (trailing tabs on most but not at the beginning and a "-" intended to mean missing but taken as text column name)

As stated, this is just #1416

MichaelChirico · 2019-02-17T05:32:34Z

Begin deprecation of ..x as valid column names

This is also 1.11.0 (May 2018). But see #3321, making things more complicated...

…s (ported to issue tracker)

MichaelChirico · 2024-04-06T23:36:59Z

Same as #678 -- better to handle this with a linter

MichaelChirico added the internals label Jan 18, 2018

MichaelChirico mentioned this issue Jan 18, 2018

Replace integers with explicit integers (1 -> 1L, etc.) #2573

Merged

MichaelChirico self-assigned this Feb 2, 2018

MichaelChirico mentioned this issue Feb 7, 2018

Progress towards #2572 (miscellaneous tests) #2605

Merged

mattdowle changed the title ~~Clean-up of TODO comments in tests~~ TODO items from tests.Rraw Feb 22, 2018

mattdowle changed the title ~~TODO items from tests.Rraw~~ TODO items from end of tests.Rraw Feb 22, 2018

mattdowle added a commit that referenced this issue Feb 22, 2018

Removed TODO at the end of tests.Rraw as they are now tracked in issue …

d6daed0

…#2572. Thanks to Michael C.

mattdowle pushed a commit that referenced this issue Feb 22, 2018

Progress towards #2572 (miscellaneous tests) (#2605)

e2386ab

MichaelChirico mentioned this issue Feb 11, 2019

updating setorder.Rd #3389

Merged

MichaelChirico added a commit that referenced this issue Feb 13, 2019

add some tests on expected frankv behavior with NA/NaN, part of #2572

8c7cd71

MichaelChirico added a commit that referenced this issue Feb 13, 2019

Modernize ?setkey; delete long-deprecated functions; part of #2572

59137b0

MichaelChirico mentioned this issue Feb 14, 2019

Generalize DtNonAsciiTests to handle more non-CRAN tests Rdatatable/DtNonAsciiTests#1

Open

MichaelChirico mentioned this issue Feb 16, 2019

test.data.table() fails if datatable.optimize is already set when run #3405

Closed

MichaelChirico added a commit that referenced this issue Feb 16, 2019

more thorough testing of datatable.optimize levels, part of #2572

5dddf66

MichaelChirico mentioned this issue Feb 16, 2019

Gforce tests #3406

Merged

MichaelChirico changed the title ~~TODO items from end of tests.Rraw~~ TODO items from tests.Rraw Feb 16, 2019

MichaelChirico added a commit that referenced this issue Feb 17, 2019

Add tests of forder NA/NaN behavior (#2572); also purge tests of TODO…

4fafb87

…s (ported to issue tracker)

MichaelChirico mentioned this issue Feb 17, 2019

Add tests of forder NA/NaN behavior #3409

Merged

This was referenced Feb 17, 2019

Add strict.numeric (default FALSE) to all.equal.data.table()? #3410

Closed

Add tests for nomatch=NA in non-equi-join suite #3411

Open

MichaelChirico mentioned this issue Mar 2, 2019

Modernize ?setkey; and tackle long-deprecated functions #3399

Merged

jangorecki added the tests label Mar 12, 2019

MichaelChirico closed this as completed Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TODO items from tests.Rraw #2572

TODO items from tests.Rraw #2572

MichaelChirico commented Jan 18, 2018 •

edited by mattdowle

Loading

MichaelChirico commented Feb 2, 2018 •

edited

Loading

MichaelChirico commented Feb 3, 2018 •

edited

Loading

MichaelChirico commented Feb 7, 2018

st-pasha commented Feb 9, 2018 •

edited

Loading

MichaelChirico commented Feb 9, 2018

MichaelChirico commented Feb 9, 2018 •

edited

Loading

MichaelChirico commented Feb 9, 2018

MichaelChirico commented Feb 11, 2019

MichaelChirico commented Feb 11, 2019

MichaelChirico commented Feb 12, 2019

MichaelChirico commented Feb 12, 2019

jangorecki commented Feb 12, 2019 •

edited

Loading

MichaelChirico commented Feb 13, 2019 •

edited

Loading

MichaelChirico commented Feb 13, 2019

jangorecki commented Feb 14, 2019

MichaelChirico commented Feb 14, 2019

MichaelChirico commented Feb 15, 2019

MichaelChirico commented Feb 16, 2019

MichaelChirico commented Feb 16, 2019 •

edited

Loading

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019 •

edited

Loading

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019 •

edited

Loading

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019 •

edited

Loading

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Apr 6, 2024

TODO items from tests.Rraw #2572

TODO items from tests.Rraw #2572

Comments

MichaelChirico commented Jan 18, 2018 • edited by mattdowle Loading

MichaelChirico commented Feb 2, 2018 • edited Loading

MichaelChirico commented Feb 3, 2018 • edited Loading

MichaelChirico commented Feb 7, 2018

st-pasha commented Feb 9, 2018 • edited Loading

MichaelChirico commented Feb 9, 2018

MichaelChirico commented Feb 9, 2018 • edited Loading

MichaelChirico commented Feb 9, 2018

MichaelChirico commented Feb 11, 2019

MichaelChirico commented Feb 11, 2019

MichaelChirico commented Feb 12, 2019

MichaelChirico commented Feb 12, 2019

jangorecki commented Feb 12, 2019 • edited Loading

MichaelChirico commented Feb 13, 2019 • edited Loading

MichaelChirico commented Feb 13, 2019

jangorecki commented Feb 14, 2019

MichaelChirico commented Feb 14, 2019

MichaelChirico commented Feb 15, 2019

MichaelChirico commented Feb 16, 2019

MichaelChirico commented Feb 16, 2019 • edited Loading

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019 • edited Loading

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019 • edited Loading

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019 • edited Loading

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Feb 17, 2019

MichaelChirico commented Apr 6, 2024

MichaelChirico commented Jan 18, 2018 •

edited by mattdowle

Loading

MichaelChirico commented Feb 2, 2018 •

edited

Loading

MichaelChirico commented Feb 3, 2018 •

edited

Loading

st-pasha commented Feb 9, 2018 •

edited

Loading

MichaelChirico commented Feb 9, 2018 •

edited

Loading

jangorecki commented Feb 12, 2019 •

edited

Loading

MichaelChirico commented Feb 13, 2019 •

edited

Loading

MichaelChirico commented Feb 16, 2019 •

edited

Loading

MichaelChirico commented Feb 17, 2019 •

edited

Loading

MichaelChirico commented Feb 17, 2019 •

edited

Loading

MichaelChirico commented Feb 17, 2019 •

edited

Loading