Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move suggests from tests.Rraw to other.Rraw #5518

Merged
merged 7 commits into from
Nov 10, 2022
Merged

Move suggests from tests.Rraw to other.Rraw #5518

merged 7 commits into from
Nov 10, 2022

Conversation

mattdowle
Copy link
Member

@mattdowle mattdowle commented Nov 9, 2022

Closes #5516

  • yaml
  • xts
  • nanotime
  • R.utils lots of fread tests use compressed input data files so it's more fundamental to the test environment
  • bit64 already decided in 5516 to leave bit64 in as it's fundamental really

@mattdowle mattdowle added the tests label Nov 9, 2022
@mattdowle mattdowle added this to the 1.14.5 milestone Nov 9, 2022
@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 9, 2022

might I suggest putting each suggested package in its own test script? that way users with only a select few of the suggested deps can easily pick/choose which suites to run.

generally I'm in favor of splitting up our suite into many more scripts which facilitates parallelization (e.g. scripts across machines)

but maybe it's better to do so by functionality, not required packages.

@jangorecki
Copy link
Member

jangorecki commented Nov 10, 2022

In other test scripts we are using suggested pkgs as well.
Maybe instead of moving all those tests we could only disable loading those pkgs in test scripts, which basically will disable those related tests, without reorganizing our test scripts so deeply. So we can try out if that helps to resolve CRAN windows issues.
I can imagine there will be many git merge conflicts to existing PRs after merging such a big change.

@mattdowle
Copy link
Member Author

We don't really have a speed problem: other.Rraw takes under 4 seconds currently. And tests.Rraw takes under 1 minute on a normally loaded machine. I suspect the over use of parallelization might be what's causing CRAN Windows machine to be so slow, for example. It's slow to even install lots of packages, not just data.table.
The common thing about tests in other.Rraw is that other.Rraw doesn't run on CRAN. There are special cases for "other.Rraw" in the code and GLCI scripts. If we split it up then we'll have to group a set of files somehow (by marking them or naming them in some standard) to have the properties of other.Rraw. Anytime more structure is added like that, it opens the door to a new set of bugs due to the structure.
In general I look at other projects which do as you're suggesting, iiuc, lots and lots of test files. It gets quite unwieldy. I know you think tests.Rraw is too big and unwieldy but it's size would be the same if split up. I like having one test file open in an editor and I use search a lot. If there were lots of test files, search within the editor would need to be across many files.
The very biggest desirable property of the test suite is having a low barrier to adding tests. It has been deliberate policy not to worry about lots of similar tests in different places. The goal is not really to organize the tests. Instead one file can be searched and computed on. Similar to Google's strategy: search don't organize. The goal is to make it easy to add coverage without putting structure and organization overheads in the way which make it harder to add new tests; e.g. discussions over which file new tests should be added, or whether there should be a new test file added. I'm not a fan of wasted time in that regard.
I don't think users would really pick and choose packages from other.Rraw to run themselves. What the vast majority of them want is for everything to work and it all be tested. other.Rraw is for us. And if any users do want to run it themselves, they can just install those packages. It is actually setup to skip not-installed packages but we (maybe I) put in a stop near the top with GLCI in mind to make sure it didn't skip any. That could be refined so only GLCI would halt on any missing, but users could skip. I think that's easier for users (auto skip packages not installed) than having them pick which files to run.

@MichaelChirico
Copy link
Member

under 1 minute on a normally loaded machine.

FWIW I bring this up because just today I had to hack some unholy stuff into the Google fork's existing test() + test.data.table() to support test sharding without editing tests.Rraw -- at scale, we were starting to see the main suite occasionally taking e.g. 10-20 min to run.

yes, partially our problem (hence fixing in the fork instead of filing a PR here), but offering evidence that issues w the script in crowded machines are not a CRAN-only thing.

Anytime more structure is added like that, it opens the door to a new set of bugs due to the structure.

I disagree here. I think that's outweighed by making the tests more hermetic. There are a lot of tests in our current suite that take a lot of effort to debug because the DT is defined several hundred lines earlier, and has by-reference updates in-between so you have to run x00 lines of code just to be sure DT at line N is exactly what's needed as the input to a certain test. Very cumbersome practice we're saddled with.

it's size would be the same if split up

In fact I would be fine if it's larger due to beneficial repetitiveness.

If there were lots of test files, search within the editor would need to be across many files.

Modern editors make this easy 😉 You can give it a spin in VS-code-on-GitHub in a few seconds: go to the repo page, press ., then press Ctrl+Shift+F -- search the whole repo. here's all fread calls in inst/tests:

image

e.g. discussions over which file new tests should be added, or whether there should be a new test file added.

the easiest solution here is to have one test per R/ file, matching the test file name to the source file. tests should be localized enough that it's rare to need a test covering >1 source file.


I know we're very far away from being able to land something like my from-scratch preference, and it's honestly low-return effort. But I do think the main suite has gotten too big. Even something as silly as tests00001-00500.Rraw, tests00501-01000.Rraw, ... would be an improvement and still satisfy basically all of the benefits you mentioned of the current system.

@mattdowle
Copy link
Member Author

Yes disabling those tests would avoid the change, with some new logic/flag to control that; maybe the OTHER environment variable could be read at the top of tests.Rraw or something. But really the right place for these tests is other.Rraw so might as well get on with it. The change is just code blocks moving. I can't imagine there will be many PRs touching these tests, but any that do shouldn't be hard to resolve as the same code is just in a different place.

@mattdowle
Copy link
Member Author

Google fork's existing test() + test.data.table() to support test sharding without editing tests.Rraw -- at scale, we were starting to see the main suite occasionally taking e.g. 10-20 min to run.

Could you explain why exactly it takes 10-20 min to run? Is it because of the fork, or because of "test sharding"? It takes 46 seconds for me on my laptop.

but offering evidence that issues w the script in crowded machines are not a CRAN-only thing.

Then perhaps you can provide the logs I've asked Uwe for but have no reply. Since you're reporting similar behavior.

There are a lot of tests in our current suite that take a lot of effort to debug because the DT is defined several hundred lines earlier, and has by-reference updates in-between so you have to run x00 lines of code just to be sure DT at line N is exactly what's needed as the input to a certain test. Very cumbersome practice we're saddled with.

You're talking about writing the tests in a different way here; i.e. the test would still be like that if it was in its own file. I don't really see your motivation or problem here. Perhaps you could pick a test like that and show how you would write it.

@mattdowle
Copy link
Member Author

mattdowle commented Nov 10, 2022

Also the results of Rprof() on a test.data.table() run that takes 20 mins would be very handy to see please. Sometimes, just sometimes, there is a surprise there like it sticking for a long time on disk read/write, or an uncaught error that causes dump.frames() to run, or something else unexpected. Often these 20x runtimes are caused by the server starting to swap. Could the server be swapping? If it is then it would affect other software too depending on priority settings etc. On servers I like to set tempdir() to be a ramdisk. If that has been done it could use more RAM and cause the server to swap, perhaps.
The long standing mantra is Rprof, Rprof, Rprof before making changes. The only reason I've been proceeding to reduce ram usage of tests.Rraw is that I've been forced to guess since Uwe doesn't reply to requests for the detailed logs. Now we have you as a 2nd observation so you might be our chance of getting logs and running Rprof.

@mattdowle
Copy link
Member Author

mattdowle commented Nov 10, 2022

memtest shows RSS starting at 99.4MB now (vs 123.8MB before). That'll do for now then. Doesn't feel right to move bit64 or the fread tests needing R.utils as they are more fundamental and need to be tested on CRAN.

@codecov
Copy link

codecov bot commented Nov 10, 2022

Codecov Report

Merging #5518 (de9612c) into master (e956716) will decrease coverage by 1.11%.
The diff coverage is n/a.

❗ Current head de9612c differs from pull request most recent head 0f2d2f2. Consider uploading reports for the commit 0f2d2f2 to get more accurate results

@@            Coverage Diff             @@
##           master    #5518      +/-   ##
==========================================
- Coverage   99.44%   98.32%   -1.12%     
==========================================
  Files          80       80              
  Lines       14787    14787              
==========================================
- Hits        14705    14540     -165     
- Misses         82      247     +165     
Impacted Files Coverage Δ
R/xts.R 0.00% <0.00%> (-100.00%) ⬇️
R/last.R 47.91% <0.00%> (-52.09%) ⬇️
R/fread.R 69.32% <0.00%> (-30.68%) ⬇️
R/fwrite.R 79.48% <0.00%> (-20.52%) ⬇️
src/fwrite.c 92.37% <0.00%> (-4.75%) ⬇️
src/between.c 96.24% <0.00%> (-3.76%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mattdowle mattdowle merged commit e11a09b into master Nov 10, 2022
@mattdowle mattdowle deleted the move_suggests branch November 10, 2022 04:29
@mattdowle
Copy link
Member Author

mattdowle commented Nov 10, 2022

The other thing to check on your Google server is the settings of getDTthreads(verbose=TRUE) and OMP_THREAD_LIMIT (which CRAN sets to 2). If you are parallelizing (the "test sharding" you mentioned) and each R process using data.table is each using 50% of the logical CPUs by default, then that could be adding up and they could be fighting against each other. Maybe something like that is going on. It's possible something like that is happening on Uwe's Windows machine and that's one of the reasons I'd like to see the more detailed logs (in data.table.Rcheck that I asked for) because those logs include the output of getDTthreads(verbose=TRUE). However because CRAN Windows is so much slower than CRAN Linux for so many other packages too, it's more likely to be simply overloaded / over-parallelized (significantly more packages being checked in parallel than there are CPUs) than anything to do with data.table settings.

@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 10, 2022

Could you explain why exactly it takes 10-20 min to run? Is it because of the fork, or because of "test sharding"? It takes 46 seconds for me on my laptop.

Extra context I can share: the tests are run continuously on TAP, single-threaded (we don't support openMP) on shared machines. That time will also include the "from-scratch" build time (think of a docker build on GLCI; more specifically it's on bazel). Typical times from what I've seen are around 2-3 minutes. I am not too surprised the upper tail of test timings can get so high e.g. on crowded machines.

Then perhaps you can provide the logs I've asked Uwe for but have no reply. Since you're reporting similar behavior.

Happy to help if I can here -- IIUC the CRAN issue is for running R CMD check right? And the examples are where it's hanging? If so that's a separate issue -- my timing is for tests.Rraw alone. If not, you're just looking for the test.data.table() output at the end of the suite? Maybe so given your comment about getDTthreads(), but as mentioned, we won't be very helpful as OpenMP doesn't work for us. Here's an example (it appears the build/compile/other execution time are responsible for the balance of the reported test time of >15 min; I also noticed some failing runs without timings, hence #5519):

getDTthreads(verbose=TRUE):
This installation of data.table has not been compiled with OpenMP support.
  omp_get_num_procs()            1
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         1
  omp_get_max_threads()          1
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 1 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: /build/work/6ccb1e897e673bd837b6647e5fdc9dae959b/google3/tmp/Rar-forge-005g5og3nt/lib/data.table/tests/tests.Rraw 
Test 1760 not run because this session either has no OpenMP or has been limited to one thread (e.g. under UBSAN and ASAN)

Wed Nov  9 22:02:51 2022  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ=='US/Pacific', Sys.timezone()=='US/Pacific', Sys.getlocale()=='LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='This installation of data.table has not been compiled with OpenMP support.; omp_get_num_procs()==1; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==1; omp_get_max_threads()==1; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 1 threads with throttle==1024. See ?setDTthreads.', zlibVersion()==1.2.12.1 ZLIB_VERSION==1.2.12.1
10 longest running tests took 141s (67% of 208s)
      ID    time nTest
 1: 2155 106.291     5
 2: 1438   5.567   738
 3: 1835   4.379     1
 4: 1888   4.294     9
 5: 1648   3.859    91
 6: 1650   3.779    91
 7: 1652   3.762    91
 8: 1848   3.205     2
 9: 1437   3.061    36
10: 1912   2.985     2
All 10035 tests (last 2163) in tests/tests.Rraw completed ok in 00:03:28 elapsed (00:03:13 cpu)

You're talking about writing the tests in a different way here; i.e. the test would still be like that if it was in its own file. I don't really see your motivation or problem here. Perhaps you could pick a test like that and show how you would write it.

Better handled as its own issue, I'll file as such if I come across the issue again.

Also the results of Rprof() on a test.data.table() run that takes 20 mins would be very handy to see please.

This one I'm not sure we'll be too much help on... we have some edits to R's memory handling, I'd have to understand what's been done a lot better to know whether it could give results worth comparing to vanilla R / our CRAN issues. Rprof was not even enabled until somewhat recently.

If you are parallelizing (the "test sharding" you mentioned) and each R process using data.table is each using 50% of the logical CPUs by default

Sharding in this case is usually the test binary being executed on different machines, rather than different processes on the same machine. But other threads on each machine will be occupied by other workloads; I assume the test is at a low priority.

@mattdowle
Copy link
Member Author

mattdowle commented Nov 10, 2022

Thanks. Very interesting.

IIUC the CRAN issue is for running R CMD check right? And the examples are where it's hanging?

I'm working on the basis that the examples have finished due to :

Check process probably crashed or hung up for 20 minutes ... killed
Most likely this happened in the example checks (?),
if not, ignore the following last lines of example output:
... snip last example output ...
======== End of example output (where/before crash/hang up occured ?) ========

i.e. I'm currently thinking that the if not, ignore and before crash/hang occured are the relevant parts in this case. Btw, that's a spelling error in the R message isn't it: occurred has 2 r's in both Europe and US?

I don't know what "hung up for 20 minutes" means. Does it mean the operating system has detected that the process has been inactive for 20 mins? Or does it mean there is a timeout of 20 mins (setup by Uwe) on the R CMD check task? There is a successful OK run shown in #5507 with Ttotal 1805s (30 mins) and Tcheck 1497s (25 mins) so if a timeout is imposed (which would be very reasonable on machines which are not unreasonably overloaded) then that doesn't really fit. Unless Uwe is pushing it through manually sometimes. Or maybe the 20 min timeout is on a subset of Tcheck. The message does include the word 'probably' and question mark '?' so it's just an unclear situation by its nature.

Perhaps these messages are in R's sources inside R CMD check itself. I was assuming Uwe has some infrastructure around R CMD check which we don't have access to.

and the entire R CMD check task is being killed somewhere in the tests (e.g. main.R that calls

yes. I am guessing that it's main.R because I do find growing memory usage there. Growing memory usage would lead to the process being swapped out on an overloaded server and perhaps being killed by OOM. It's just a theory given Uwe doesn't reply. However, with that said, the time to even install at over 10x Linux, so I really hate working on things like this and then people say 'oh right it was data.table's growing ram usage and they fixed it' when in fact the root cause is (could be, I don't know, I'm forced to guess) a severely overloaded and mismanaged server where even installing takes 10x longer, and installing other packages on the same machine is also much slower too.

If so that's a separate issue -- my timing is for tests.Rraw alone. If not, you're just looking for the test.data.table() output at the end of the suite? Maybe so given your comment about getDTthreads(), but as mentioned,

yes, timings and Rprof for tests.Rraw alone.

we won't be very helpful as OpenMP doesn't work for us.

OpenMP does work and works very well though. It's just that, iiuc, Google have decided not to use it and as far as I know not share why not. In general I find it very frustrating when big influential companies and people, in general in this industry, when software solutions are labelled as not working without explaining why. For example with Apple not turning on OpenMP by default, and people seem to accept that's just the way it is, and, therefore data.table is a problem for using OpenMP. That riles me.

it appears the build/compile/other execution time are responsible for the balance of the reported test time of >15 min

Then surely the problem is that your sever is overloaded and mismanaged. No?

Here's what I see on my laptop.

$ time R CMD build .
* checking for file ‘./DESCRIPTION’ ... OK
* preparing ‘data.table’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* running ‘cleanup’
* installing the package to build vignettes
* creating vignettes ... OK
* cleaning src
* running ‘cleanup’
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘data.table_1.14.5.tar.gz’

real	0m35.515s
user	0m38.988s
sys	0m2.503s
$ time R CMD check data.table_1.14.5.tar.gz 
* using log directory ‘/home/mdowle/GitHub/data.table/data.table.Rcheck’
* using R version 4.2.2 (2022-10-31)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘data.table/DESCRIPTION’ ... OK
* this is package ‘data.table’ version ‘1.14.5’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘data.table’ can be installed ... OK
* checking installed package size ... NOTE
  installed size is  5.0Mb
  sub-directories of 1Mb or more:
    tests   1.7Mb
* checking package directory ... OK
* checking ‘build’ directory ... OK
* checking DESCRIPTION meta-information ... OK
* checking top-level files ... OK
* checking for left-over files ... OK
* checking index information ... OK
* checking package subdirectories ... OK
* checking R files for non-ASCII characters ... OK
* checking R files for syntax errors ... OK
* checking whether the package can be loaded ... OK
* checking whether the package can be loaded with stated dependencies ... OK
* checking whether the package can be unloaded cleanly ... OK
* checking whether the namespace can be loaded with stated dependencies ... OK
* checking whether the namespace can be unloaded cleanly ... OK
* checking loading without being on the library search path ... OK
* checking dependencies in R code ... OK
* checking S3 generic/method consistency ... OK
* checking replacement functions ... OK
* checking foreign function calls ... OK
* checking R code for possible problems ... OK
* checking Rd files ... OK
* checking Rd metadata ... OK
* checking Rd cross-references ... OK
* checking for missing documentation entries ... OK
* checking for code/documentation mismatches ... OK
* checking Rd \usage sections ... OK
* checking Rd contents ... OK
* checking for unstated dependencies in examples ... OK
* checking line endings in shell scripts ... OK
* checking line endings in C/C++/Fortran sources/headers ... OK
* checking line endings in Makefiles ... OK
* checking compilation flags in Makevars ... OK
* checking for GNU extensions in Makefiles ... OK
* checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK
* checking use of PKG_*FLAGS in Makefiles ... OK
* checking compiled code ... OK
* checking installed files from ‘inst/doc’ ... OK
* checking files in ‘vignettes’ ... OK
* checking examples ... OK
* checking for unstated dependencies in ‘tests’ ... OK
* checking tests ...
  Running ‘autoprint.R’
  Comparing ‘autoprint.Rout’ to ‘autoprint.Rout.save’ ... OK
  Running ‘froll.R’
  Running ‘knitr.R’
  Comparing ‘knitr.Rout’ to ‘knitr.Rout.save’ ... OK
  Running ‘main.R’
  Running ‘nafill.R’
  Running ‘other.R’
  Running ‘programming.R’
  Running ‘types.R’
 OK
* checking for unstated dependencies in vignettes ... OK
* checking package vignettes in ‘inst/doc’ ... OK
* checking running R code from vignettes ...
  ‘datatable-benchmarking.Rmd’ using ‘UTF-8’... OK
  ‘datatable-faq.Rmd’ using ‘UTF-8’... OK
  ‘datatable-importing.Rmd’ using ‘UTF-8’... OK
  ‘datatable-intro.Rmd’ using ‘UTF-8’... OK
  ‘datatable-keys-fast-subset.Rmd’ using ‘UTF-8’... OK
  ‘datatable-programming.Rmd’ using ‘UTF-8’... OK
  ‘datatable-reference-semantics.Rmd’ using ‘UTF-8’... OK
  ‘datatable-reshape.Rmd’ using ‘UTF-8’... OK
  ‘datatable-sd-usage.Rmd’ using ‘UTF-8’... OK
  ‘datatable-secondary-indices-and-auto-indexing.Rmd’ using ‘UTF-8’... OK
 NONE
* checking re-building of vignette outputs ... OK
* checking PDF version of manual ... OK
* DONE

Status: 1 NOTE
See
  ‘/home/mdowle/GitHub/data.table/data.table.Rcheck/00check.log’
for details.

real	1m47.281s
user	1m58.073s
sys	0m6.517s
$ time R CMD INSTALL data.table_1.14.5.tar.gz 
* installing to library ‘/usr/local/lib/R/site-library’
* installing *source* package ‘data.table’ ...
** using staged installation
gcc 11.3.0
zlib 1.2.11 is available ok
R CMD SHLIB supports OpenMP without any extra hint
** libs
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c assign.c -o assign.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c between.c -o between.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c bmerge.c -o bmerge.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c chmatch.c -o chmatch.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c cj.c -o cj.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c coalesce.c -o coalesce.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c dogroups.c -o dogroups.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c fastmean.c -o fastmean.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c fcast.c -o fcast.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c fifelse.c -o fifelse.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c fmelt.c -o fmelt.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c forder.c -o forder.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c frank.c -o frank.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c fread.c -o fread.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c freadR.c -o freadR.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c froll.c -o froll.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c frollR.c -o frollR.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c frolladaptive.c -o frolladaptive.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c fsort.c -o fsort.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c fwrite.c -o fwrite.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c fwriteR.c -o fwriteR.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c gsumm.c -o gsumm.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c idatetime.c -o idatetime.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c ijoin.c -o ijoin.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c init.c -o init.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c inrange.c -o inrange.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c nafill.c -o nafill.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c negate.c -o negate.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c nqrecreateindices.c -o nqrecreateindices.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c openmp-utils.c -o openmp-utils.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c programming.c -o programming.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c quickselect.c -o quickselect.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c rbindlist.c -o rbindlist.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c reorder.c -o reorder.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c shift.c -o shift.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c snprintf.c -o snprintf.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c subset.c -o subset.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c transpose.c -o transpose.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c types.c -o types.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c uniqlist.c -o uniqlist.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c utils.c -o utils.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c vecseq.c -o vecseq.o
gcc -I"/usr/share/R/include" -DNDEBUG     -fopenmp  -fpic  -O3 -Wall -pedantic -c wrappers.c -o wrappers.o
gcc -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o data.table.so assign.o between.o bmerge.o chmatch.o cj.o coalesce.o dogroups.o fastmean.o fcast.o fifelse.o fmelt.o forder.o frank.o fread.o freadR.o froll.o frollR.o frolladaptive.o fsort.o fwrite.o fwriteR.o gsumm.o idatetime.o ijoin.o init.o inrange.o nafill.o negate.o nqrecreateindices.o openmp-utils.o programming.o quickselect.o rbindlist.o reorder.o shift.o snprintf.o subset.o transpose.o types.o uniqlist.o utils.o vecseq.o wrappers.o -fopenmp -lz -L/usr/lib/R/lib -lR
PKG_CFLAGS = -fopenmp
PKG_LIBS = -fopenmp -lz
if [ "data.table.so" != "data_table.so" ]; then mv data.table.so data_table.so; fi
if [ "" != "Windows_NT" ] && [ `uname -s` = 'Darwin' ]; then install_name_tool -id data_table.so data_table.so; fi
installing to /usr/local/lib/R/site-library/00LOCK-data.table/00new/data.table/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (data.table)

real	0m16.779s
user	0m15.946s
sys	0m0.823s

If you split out real, user and sys like that on your server then maybe it will help.

It might be helpful to concentrate on build and install times, to start with. If it takes me on my laptop 40 seconds to build and 16 seconds to install, do you on your laptop concur? But your server takes much much longer to do that (which apparently happens on the CRAN Windows sever too but not CRAN Linux). Then it could be very simple: you and Uwe's servers are overloaded. When it comes to the memory usage of main.R, yes I am spending my time to reduce ram usage to try to even work on overloaded servers because it is possible for us to do that. I can't do anything about the time to install because, iiuc, that is all down to the mismanaged server. In terms of the time to build, we haven't looked at the vignettes yet. If the vignettes are being rebuilt then there is some memory usage there that we could work on. Whether vignettes are rebuilt or not depends on options I think.

In other words, let's say Uwe and your server installed data.table in a normal time. But then the tests inside R CMD check were being killed OOM. Then I wouldn't be so critical of your server management and it would be fair to point the finger at data.table's tests in main.R. But it's the fact that the time to just install the package takes so much longer on your servers that is critical here.

Tinstall is 16 seconds for me on my laptop but over 5 minutes on CRAN Windows.

@MichaelChirico
Copy link
Member

Btw, that's a spelling error in the R message isn't it: occurred has 2 r's in both Europe and US?

yes, but not part of the R sources. The same typo occurs there but only in comments, probably not worth filing a patch:

./doc/NEWS.1:   have occured yet.  (PR#4389)
./src/extra/graphapp/context.c:    /* Has a colour or width change occured? */
./src/nmath/nmath.h:/*  and underflow occured (important for IEEE)*/
./src/library/tcltk/exec/widget.tcl:## occured, so the widget options are all the default.  If this proc
./src/library/tcltk/exec/hierarchy.tcl:## changes have occured in the hierarchy since the last call to Recompute.
./src/library/stats/src/fft.c: *        If maxf is zero on return, an error occured during factorization.

Perhaps these messages are in R's sources inside R CMD check itself

I think not, I don't see any of the phrases "Check process" "example output" "where/before" or "Most likely this happened" in the R sources (I would expect a hit in src/library/tools/R/QC.R if it's something R CMD check is doing).

OpenMP does work and works very well though...

Sorry, poor wording on my part -- you're absolutely right. I should say OpenMP is not supported internally, which means potential maintenance headaches if we go against the grain and try to enable it on our own. I am not sure how sensitive the reasoning is so I'm being conservative and not sharing much, but basically it was a choice to invest elsewhere.

Then surely the problem is that your sever is overloaded and mismanaged. No?

Overloaded, probably, mismanaged, I'd say no :) The test is being run many times and not production-critical -- mismanagement would be if we were prioritizing "more important" stuff over a CI test whose timing is of second- or third-order importance.


Sorry if this is more noise than signal -- there's a lot of our CI system / R infra that won't be directly comparable (as I repeat several times below). Nevertheless...

But your server takes much much longer to do that

I'm afraid it's not exactly comparable. There's no timing split out for R CMD build (which is not run directly), just for the 'build' step (bazel build which includes compiling a full DAG docker-style, in parallel. E.g. to build data.table (generate the .tar.gz) there were >1000 "packages" (in the bazel sense of the word), almost 50K "targets"; here's that timing:

real    1m59.981s
user    0m1.166s
sys     0m3.116s

Running the test binary (basically includes starting python, then starting an R subprocess and executing tests/main.R):

real    1m51.400s
user    1m35.665s
sys     0m5.487s

We also don't run R CMD check, instead use tools:::.check_packages() with --no-examples and then run examples individually (so that one failing example doesn't stop R CMD check), so the timings are again not all that comparable (in particular examples are a lot slower to run):

real    2m17.401s
user    1m30.757s
sys     0m20.915s

And we also don't use R CMD INSTALL basically at all, can't think of any timings that would be worthwhile to share there.

@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 10, 2022

Tried Rprof on the tests/main.R; here's the top 40 of each:

lapply(summaryRprof(tmp), head, 40)
$by.self
                    self.time self.pct total.time total.pct
"[.data.table"          12.42    13.49      50.84     55.22
"gc"                    12.08    13.12      12.08     13.12
".Call"                  3.66     3.98       3.66      3.98
"base::cat"              2.76     3.00       2.80      3.04
"replace_dot_alias"      2.50     2.72       2.56      2.78
"forderv"                2.16     2.35       2.26      2.45
"FUN"                    1.44     1.56      19.94     21.66
"paste0"                 1.24     1.35       1.26      1.37
"unique"                 1.22     1.33       2.04      2.22
"length"                 1.18     1.28       1.18      1.28
"lapply"                 1.10     1.19      16.40     17.81
"eval"                   1.08     1.17      92.06    100.00
"setalloccol"            1.08     1.17       1.68      1.82
"sys.call"               1.06     1.15       1.08      1.17
"regexpr"                1.00     1.09       1.02      1.11
"bmerge"                 0.92     1.00       1.86      2.02
"[[.data.frame"          0.90     0.98       2.34      2.54
"test"                   0.88     0.96      47.72     51.84
"strsplit"               0.88     0.96       0.88      0.96
"setattr"                0.84     0.91       1.78      1.93
"substring"              0.80     0.87       1.10      1.19
"%chin%"                 0.76     0.83       0.98      1.06
"print.default"          0.76     0.83       0.78      0.85
"names"                  0.76     0.83       0.76      0.83
"[["                     0.74     0.80       3.08      3.35
"rank"                   0.68     0.74       0.80      0.87
"["                      0.64     0.70      51.54     55.99
"getOption"              0.64     0.70       0.64      0.70
"unlist"                 0.62     0.67      13.68     14.86
"readBin"                0.62     0.67       0.62      0.67
"=="                     0.60     0.65       0.70      0.76
"assign"                 0.58     0.63       2.36      2.56
"vapply"                 0.58     0.63       1.60      1.74
"<Anonymous>"            0.56     0.61       2.00      2.17
"%in%"                   0.56     0.61       1.50      1.63
"gregexpr"               0.54     0.59       0.54      0.59
"as.list"                0.52     0.56       1.06      1.15
"cat"                    0.50     0.54       3.36      3.65
"attr"                   0.50     0.54       0.50      0.54
"doTryCatch"             0.48     0.52      92.04     99.98

$by.total
                          total.time total.pct self.time self.pct
"eval"                         92.06    100.00      1.08     1.17
"source"                       92.06    100.00      0.00     0.00
"withVisible"                  92.06    100.00      0.00     0.00
"doTryCatch"                   92.04     99.98      0.48     0.52
"tryCatch"                     92.04     99.98      0.36     0.39
"tryCatchList"                 92.04     99.98      0.12     0.13
"tryCatchOne"                  92.04     99.98      0.08     0.09
"test.data.table"              91.94     99.87      0.00     0.00
"try"                          91.92     99.85      0.08     0.09
"sys.source"                   91.92     99.85      0.00     0.00
"["                            51.54     55.99      0.64     0.70
"[.data.table"                 50.84     55.22     12.42    13.49
"test"                         47.72     51.84      0.88     0.96
"withCallingHandlers"          34.30     37.26      0.20     0.22
"suppressMessages"             34.26     37.21      0.02     0.02
"FUN"                          19.94     21.66      1.44     1.56
"nqjoin_test"                  18.20     19.77      0.02     0.02
"lapply"                       16.40     17.81      1.10     1.19
"unlist"                       13.68     14.86      0.62     0.67
"gc"                           12.08     13.12     12.08    13.12
"check"                        12.08     13.12      0.00     0.00
"construct"                    10.34     11.23      0.04     0.04
"...elt"                        9.70     10.54      0.20     0.22
"capture.output"                9.60     10.43      0.00     0.00
"print"                         9.16      9.95      0.02     0.02
".prepareFastSubset"            6.74      7.32      0.32     0.35
"system.time"                   6.56      7.13      0.00     0.00
"setkeyv"                       6.54      7.10      0.08     0.09
"apply"                         4.90      5.32      0.42     0.46
"fread"                         4.46      4.84      0.12     0.13
"data.table"                    4.30      4.67      0.16     0.17
"fwrite"                        4.20      4.56      0.06     0.07
".Call"                         3.66      3.98      3.66     3.98
"nq"                            3.54      3.85      0.00     0.00
"combn"                         3.42      3.71      0.00     0.00
"cat"                           3.36      3.65      0.50     0.54
"[["                            3.08      3.35      0.74     0.80
"with"                          3.08      3.35      0.24     0.26
"decompressFile.default"        3.04      3.30      0.00     0.00
"R.utils::decompressFile"       3.04      3.30      0.00     0.00


$sample.interval
[1] 0.02

$sampling.time
[1] 92.06

First thing that sticks out to me is replace_dot_alias, suggests #5501 should be high-priority.

@mattdowle
Copy link
Member Author

mattdowle commented Nov 11, 2022

Initially I wondered why you would focus on replace_dot_alias when it's only 2.7%. But I see what you mean now and agree. What was jumping out at me was the gc() which I figured/hoped pointed to memory usage, and I might not be wrong there. If so, once the ram using tests are removed, that would indeed leave replace_dot_alias sticking out. Ok I marked #5501 high and added feedback about its news item there.
What about the base::cat? There is a separate line for cat without the base:: prefix too, interestingly. Either way, unless this is running in verbose mode, why is there a lot in cat?
(Btw, so nice to be looking at Rprof results, I'm staring to enjoy this again, thanks!)

@mattdowle
Copy link
Member Author

mattdowle commented Nov 11, 2022

Overloaded, probably, mismanaged, I'd say no :) The test is being run many times and not production-critical -- mismanagement would be if we were prioritizing "more important" stuff over a CI test whose timing is of second- or third-order importance.

Are you saying that production critical tasks are taking place on the same server and those production tasks are taking priority? If so, it's a good point, so long as the non-critical tasks don't somehow impact the production tasks; e.g. if the best laid plans and priority mechanism don't work for some reason. All I've experienced is production running on, well, servers dedicated to production, and that seems to be a good way to set things up. Otherwise, mismanagement to me means running so many tasks concurrently that the time spent context switching all those tasks becomes significant; in the extreme when a box starts to swap. Good to me would be running fewer tasks concurrently so that the total time to run everything is lower. Wasting compute resources on context switching is bad management to me, no? Now, if the tasks aren't really taking longer at all on your servers but are simply being paused while production tasks run, then that's different and the time the tasks are paused should be reported separately. I suspect what's happening on the CRAN WIndows server, based on Tinstall being so high, is that too many tasks are running concurrently and the box is wasting a lot of time context switching all those tasks.

@MichaelChirico
Copy link
Member

Wasting compute resources on context switching is bad management to me, no?

I think we'll get too far into the weeds to answer externally, but as you note, there's some trade-off, and rest assured there is enough $ flowing around here that getting it right is worth investing a lot in. Plenty of glory to be had for improving things 😄

the time the tasks are paused should be reported separately

That should be the gap between elapsed and cpu time here right? 00:03:28 elapsed (00:03:13 cpu)

I see a similar gap in the test log of an execution that took >10min.

What about the base::cat?

My first guess was because I ran this interactively, the Running test id ... counter is likely responsible. You're right that it's weird cat() and base::cat() are tallied separately, I don't see base::cat() specifically called anywhere except here:

https://github.com/r-devel/r-svn/blob/83a8351a7068bf2047822fed5d544e2a2bd79b67/share/R/examples-footer.R#L5

That shouldn't be run within our test suite, or within the Google code orchestrating the test suite, so beats me.

It also persists when I ran Rprof() non-interactively to try and isolate the issue, so I'm more lost:

$by.self
                     self.time self.pct total.time total.pct
"[.data.table"           12.80    13.23      51.62     53.36
"gc"                     11.74    12.14      11.74     12.14
"print.default"           4.40     4.55       4.40      4.55
".Call"                   3.82     3.95       3.84      3.97
"base::cat"               3.42     3.54       3.52      3.64
"replace_dot_alias"       2.68     2.77       2.72      2.81
"forderv"                 2.44     2.52       2.62      2.71
"paste0"                  1.50     1.55       1.54      1.59
"FUN"                     1.38     1.43      20.18     20.86
"setalloccol"             1.24     1.28       1.72      1.78
"eval"                    1.20     1.24      96.30     99.55
"sys.call"                1.20     1.24       1.20      1.24
"unique"                  1.18     1.22       1.98      2.05
"lapply"                  1.12     1.16      16.66     17.22
"[["                      1.06     1.10       3.56      3.68
"bmerge"                  1.02     1.05       2.42      2.50
"length"                  1.02     1.05       1.02      1.05
"regexpr"                 1.00     1.03       1.00      1.03
"%chin%"                  0.92     0.95       1.10      1.14
"[[.data.frame"           0.86     0.89       2.50      2.58
"setattr"                 0.84     0.87       1.78      1.84
"dim"                     0.82     0.85       1.00      1.03
"apply"                   0.80     0.83       5.26      5.44
"substring"               0.80     0.83       1.00      1.03
"vapply"                  0.74     0.76       1.58      1.63
"strsplit"                0.74     0.76       0.74      0.76
"test"                    0.72     0.74      49.82     51.50
"%in%"                    0.68     0.70       1.50      1.55
"unlist"                  0.66     0.68      13.86     14.33
"as.POSIXlt.POSIXct"      0.66     0.68       0.66      0.68
"readBin"                 0.66     0.68       0.66      0.68
"<Anonymous>"             0.64     0.66       1.60      1.65
"getOption"               0.64     0.66       0.64      0.66
"["                       0.60     0.62      52.28     54.04
"=="                      0.60     0.62       0.72      0.74
"names"                   0.60     0.62       0.60      0.62
"rank"                    0.58     0.60       0.64      0.66
"doTryCatch"              0.50     0.52      96.72     99.98
"do_j_names"              0.48     0.50       0.76      0.79
"assign"                  0.46     0.48       2.26      2.34

$by.total
                          total.time total.pct self.time self.pct
"doTryCatch"                   96.72     99.98      0.50     0.52
"tryCatch"                     96.72     99.98      0.34     0.35
"tryCatchOne"                  96.72     99.98      0.22     0.23
"tryCatchList"                 96.72     99.98      0.08     0.08
"test.data.table"              96.58     99.83      0.00     0.00
"sys.source"                   96.56     99.81      0.06     0.06
"try"                          96.56     99.81      0.06     0.06
"eval"                         96.30     99.55      1.20     1.24
"["                            52.28     54.04      0.60     0.62
"[.data.table"                 51.62     53.36     12.80    13.23
"test"                         49.82     51.50      0.72     0.74
"withCallingHandlers"          33.76     34.90      0.10     0.10
"suppressMessages"             33.70     34.84      0.02     0.02
"FUN"                          20.18     20.86      1.38     1.43
"nqjoin_test"                  18.48     19.10      0.00     0.00
"lapply"                       16.66     17.22      1.12     1.16
"unlist"                       13.86     14.33      0.66     0.68
"...elt"                       13.12     13.56      0.12     0.12
"capture.output"               12.98     13.42      0.06     0.06
"withVisible"                  12.84     13.27      0.00     0.00
"print"                        12.70     13.13      0.00     0.00
"check"                        12.58     13.00      0.00     0.00
"gc"                           11.74     12.14     11.74    12.14
"construct"                    10.64     11.00      0.12     0.12
".prepareFastSubset"            6.60      6.82      0.34     0.35
"setkeyv"                       6.60      6.82      0.16     0.17
"system.time"                   6.38      6.59      0.00     0.00
"apply"                         5.26      5.44      0.80     0.83
"fread"                         4.74      4.90      0.10     0.10
"data.table"                    4.70      4.86      0.14     0.14
"fwrite"                        4.64      4.80      0.08     0.08
"print.default"                 4.40      4.55      4.40     4.55
".Call"                         3.84      3.97      3.82     3.95
"cat"                           3.68      3.80      0.04     0.04
"nq"                            3.58      3.70      0.00     0.00
"[["                            3.56      3.68      1.06     1.10
"base::cat"                     3.52      3.64      3.42     3.54
"cat.default"                   3.52      3.64      0.00     0.00
"combn"                         3.28      3.39      0.00     0.00
"decompressFile.default"        3.28      3.39      0.00     0.00

$sample.interval
[1] 0.02

$sampling.time
[1] 96.74

PS just remembered I am running on 1.14.4, I haven't added any of the other commits between there and GH master. It should be fine since we're trying to debug a CRAN issue, but LMK if it's worth the effort to do this from our current HEAD (alas it's not as simple as git pull)

@mattdowle
Copy link
Member Author

Great. Could you run test.data.table() again interactively but with silent=TRUE? That will set showProgress=FALSE too and that will stop the Running test id .... Just to see if it is that.

Ok, yep when I'm done with #5520 would be a good time if you could try from master. Will LYK.

That should be the gap between elapsed and cpu time here right? 00:03:28 elapsed (00:03:13 cpu)
I see a similar gap in the test log of an execution that took >10min.

I guess so. But then in that case it means that the process isn't being paused in a good way, but instead it is spending/wasting its time context switching. Can you reduce the number of tasks on that machine, or at least see how many tasks there are vs how many logical cpus there are.

@MichaelChirico
Copy link
Member

Can you reduce the number of tasks on that machine

no -- since it's batched in with the regular CI flow, it gets scheduled beyond my control. and whenever I launch the job myself, I haven't seen any latency issue.

see how many tasks there are vs how many logical cpus there are.

That will be pretty far down the rabbit hole as there's a few layers of abstraction between what I see easily (a web UI) and the bare metal running the jobs... let's revisit if we've really got no other leads. Tamping down memory usage in the suite seems pretty promising so far.

sure... as expected broadly similar to running non-interactively:

$by.self
                    self.time self.pct total.time total.pct
"[.data.table"          12.02    13.50      49.62     55.72
"gc"                    11.86    13.32      11.86     13.32
".Call"                  3.60     4.04       3.60      4.04
"base::cat"              2.84     3.19       2.86      3.21
"replace_dot_alias"      2.40     2.69       2.40      2.69
"forderv"                2.22     2.49       2.44      2.74
"unique"                 1.36     1.53       2.16      2.43
"eval"                   1.30     1.46      89.06    100.00
"FUN"                    1.22     1.37      19.20     21.56
"paste0"                 1.04     1.17       1.14      1.28
"test"                   1.00     1.12      44.86     50.37
"lapply"                 1.00     1.12      15.74     17.67
"sys.call"               1.00     1.12       1.00      1.12
"[[.data.frame"          0.96     1.08       2.24      2.52
"setalloccol"            0.94     1.06       1.70      1.91
"strsplit"               0.92     1.03       0.92      1.03
"%chin%"                 0.90     1.01       1.06      1.19
"vapply"                 0.84     0.94       1.46      1.64
"dim"                    0.84     0.94       1.06      1.19
"print.default"          0.78     0.88       0.78      0.88
"setattr"                0.76     0.85       1.70      1.91
"regexpr"                0.76     0.85       0.76      0.85
"unlist"                 0.74     0.83      13.32     14.96
"length"                 0.74     0.83       0.74      0.83
"bmerge"                 0.72     0.81       1.90      2.13
"rank"                   0.72     0.81       0.82      0.92
"%in%"                   0.68     0.76       1.32      1.48
"substring"              0.66     0.74       0.86      0.97
"[["                     0.64     0.72       2.88      3.23
"["                      0.60     0.67      50.26     56.43
"apply"                  0.60     0.67       4.82      5.41
"all.vars"               0.60     0.67       0.68      0.76
"isTRUE"                 0.52     0.58       0.92      1.03
"readBin"                0.52     0.58       0.52      0.58
"do_j_names"             0.50     0.56       0.78      0.88
"chmatch"                0.50     0.56       0.54      0.61
"names"                  0.50     0.56       0.50      0.56
"<Anonymous>"            0.48     0.54       1.60      1.80
"getOption"              0.48     0.54       0.48      0.54
"assign"                 0.46     0.52       2.26      2.54

$by.total
                          total.time total.pct self.time self.pct
"eval"                         89.06    100.00      1.30     1.46
"source"                       89.06    100.00      0.00     0.00
"withVisible"                  89.06    100.00      0.00     0.00
"doTryCatch"                   89.04     99.98      0.40     0.45
"tryCatch"                     89.04     99.98      0.30     0.34
"tryCatchList"                 89.04     99.98      0.14     0.16
"tryCatchOne"                  89.04     99.98      0.08     0.09
"try"                          88.96     99.89      0.04     0.04
"sys.source"                   88.96     99.89      0.02     0.02
"test.data.table"              88.96     99.89      0.00     0.00
"["                            50.26     56.43      0.60     0.67
"[.data.table"                 49.62     55.72     12.02    13.50
"test"                         44.86     50.37      1.00     1.12
"withCallingHandlers"          33.46     37.57      0.12     0.13
"suppressMessages"             33.42     37.53      0.00     0.00
"FUN"                          19.20     21.56      1.22     1.37
"nqjoin_test"                  17.76     19.94      0.02     0.02
"lapply"                       15.74     17.67      1.00     1.12
"unlist"                       13.32     14.96      0.74     0.83
"gc"                           11.86     13.32     11.86    13.32
"check"                        11.86     13.32      0.00     0.00
"construct"                    10.00     11.23      0.00     0.00
"...elt"                        9.04     10.15      0.12     0.13
"capture.output"                8.86      9.95      0.02     0.02
"print"                         8.60      9.66      0.00     0.00
"setkeyv"                       6.66      7.48      0.10     0.11
".prepareFastSubset"            6.42      7.21      0.36     0.40
"system.time"                   6.30      7.07      0.00     0.00
"apply"                         4.82      5.41      0.60     0.67
"fread"                         4.56      5.12      0.24     0.27
"fwrite"                        4.04      4.54      0.06     0.07
"data.table"                    3.88      4.36      0.08     0.09
".Call"                         3.60      4.04      3.60     4.04
"nq"                            3.44      3.86      0.00     0.00
"combn"                         3.26      3.66      0.00     0.00
"decompressFile.default"        3.14      3.53      0.02     0.02
"R.utils::decompressFile"       3.14      3.53      0.00     0.00
"cat"                           2.94      3.30      0.02     0.02
"[["                            2.88      3.23      0.64     0.72
"with"                          2.88      3.23      0.10     0.11

$sample.interval
[1] 0.02

$sampling.time
[1] 89.06

@OfekShilon
Copy link
Contributor

Please note that #5501 speeds up name_dots, not replace_dot_alias

@mattdowle
Copy link
Member Author

@MichaelChirico ok now that #5520 is merged would you mind testing test.data.table() from master on your loaded server to see if it helps vs 1.14.4 as mentioned above? I saw you said it's not as trivial as git pull so no worries.

@MichaelChirico
Copy link
Member

would you mind testing test.data.table() from master on your loaded server to see if it helps vs 1.14.4 as mentioned above?

I'll give it a try. Need to look into how to do so without necessarily using latest dev as our "prod" version (since the issue wasn't reproducing in my local test env). Check back in a few days.

@mattdowle
Copy link
Member Author

No worries I'll prepare the minimal release to CRAN and maybe send it by tomorrow. The worst that can happen is that it just doesn't fix CRAN Windows. Unless I mess something up.

@MichaelChirico
Copy link
Member

OK, in that case, itll be easier with a CRAN-tagged release

@mattdowle mattdowle modified the milestones: 1.14.7, 1.14.6 Nov 16, 2022
@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 16, 2022

Figured out a way to do this (quite easily in fact) where I can run the test 100 times & report some basic stats. Note that the caveats mentioned above about how to interpret these timings still apply.

# DATA.TABLE 1.14.4
  Stats over 100 runs: max = 621.6s, min = 87.3s, avg = 207.1s, dev = 111.1s
# DATA.TABLE MASTER @ 1b324637045c97b40960fa64a4dc4dc27b00596a
  Stats over 100 runs: max = 241.9s, min = 73.5s, avg = 131.4s, dev = 44.1s
# DATA.TABLE 1.14.6
  Stats over 100 runs: max = 211.8s, min = 56.8s, avg = 120.3s, dev = 50.1s

Looks like a pretty substantial improvement already, great work! 1 minute off the average and almost 66% off the max (99 %ile-ish) time!

@mattdowle
Copy link
Member Author

Great! Thanks. It's very nice to have it over multiple runs like that when we're dealing with memory effects.

Those timings are for test.data.table() or might it include time to start up a fresh R session and load data.table in that timing? Because test.data.table() from master on my laptop takes 27 seconds for me using the completed ok in 26.7s elapsed (31.6s cpu) line it prints at the end. But you're saying the fastest you see is 73.5 seconds? That's 2.75 times as long.

@MichaelChirico
Copy link
Member

Yes it includes start-up time, and turn-down time for that matter. Also recall that we are forced to be single-threaded; glancing at some logs, I'm seeing ~60-90 seconds for the exact comparison, e.g. "All 9916 tests (last 2163) in tests/tests.Rraw completed ok in 00:01:30 elapsed (00:01:21 cpu)"

Just added the 1.14.6 timings since it's now shown up on CRAN.

@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 16, 2022

Oh, nice, I just learned I can grep the test logs and read the completed ok times over the 100 runs, here's the summary of the more direct test timing on 1.14.6:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   53.6    81.0    89.5   115.5   181.5   207.0 

(looks like there's about 3 seconds of overhead for start-up/turn-down, most of the timing difference vs what you're seeing should be due to the single threading)

@mattdowle
Copy link
Member Author

The tests in the test suite aren't big enough for threads to be utilized; there's a throttle. I ran test.data.table() after setDTthreads(1) and 3 runs took 25.8s, 25.8s, 26.3s master, and 23.8s, 24.0s, 23.8s 1.14.6. So multi-threading doesn't speed up the test suit of small tests which is correct. Seems like a 2x difference between my 26 and your 53 fastest time then, and an 8x difference between my consistent 26 and your max 207. My guess is that your server has too many tasks running on it concurrently and a lot of time is being wasted context switching those tasks; e.g. L2 cache churn as the cpu tries to do a little bit of every task. Or that the task is being given low priority like you said but the cpu time was close to the elapsed last time we discussed that where we expected the cpu time to be lower than elapsed if the process was being paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move Suggest-ed package tests out of tests.Rraw
4 participants