-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test.data.table(memtest=TRUE) #5515
Conversation
…e the env variable
….GlobalEnv, closes #5514; remove DTfun fix no longer needed.
The result above was in dev mode using cc(). Results are a little different after installing but still finds similar test numbers.
Looking at how RSS builds from starting a fresh session : Using :
as.integer(as.numeric(system(paste0("ps -o rss --no-headers ",Sys.getpid()),intern=TRUE))/1024) #MB
R --vanilla # 50MB
R # 53MB
require(data.table) # 67MB
test.data.table(memtest=TRUE) # 123MB test 1, rising to 265MB last test
> attr(.Last.value, "timings")
ID time nTest RSS diff
<int> <num> <int> <num> <num>
1: 1 0.509 2 123.8 NA
2: 2 0.003 2 123.8 0
3: 3 0.001 1 123.8 0
4: 4 0.001 1 123.8 0
5: 5 0.001 1 123.8 0
---
2046: 2234 0.009 9 265.4 0
2047: 2235 0.001 2 265.4 0
2048: 2236 0.001 9 265.4 0
2049: 2237 0.001 2 265.4 0
2050: 2238 0.000 9 265.4 0 So somewhere at the beginning of test.data.table, on or before the first test, 56MB is allocated (123MB - 67MB). Might be worth investigating that after reducing the biggest tests. source() loads and parses tests.Rraw, iiuc, and maybe the parsed result takes that much. Or maybe something initializes or caches in test 1: its a simple test of |
Loading the suggested packages takes up ~30 MB of RAM. Line 113-118 in for (s in sugg) {
assign(paste0("test_",s), loaded<-suppressWarnings(suppressMessages(
library(s, character.only=TRUE, logical.return=TRUE, quietly=TRUE, warn.conflicts=FALSE, pos="package:base") # attach at the end for #5101
)))
if (!loaded) cat("\n**** Suggested package",s,"is not installed or has dependencies missing. Tests using it will be skipped.\n\n")
} Maybe we could cut this down by moving those tests to an additional script? |
R/test.data.table.R
Outdated
nTest = RSS = NULL # to avoid 'no visible binding' note | ||
timings = env$timings[nTest>0] | ||
if (!memtest) { | ||
ans = head(timings[-1L][order(-time)], 10L)[,RSS:=NULL] # exclude id 1 as in dev that includes JIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in non-dev that should be still kept and not excluded, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good spot. now not-excluded in non-dev. I tested non-dev and test 1 doesn't appear in the top 10 as expected but good to have that covered now in case anything sneaks into test 1 in future.
R/test.data.table.R
Outdated
gc() # force gc so we can find tests that use relatively larger amounts of RAM | ||
timings[as.integer(num), RSS:=max(ps_mem(),RSS), verbose=FALSE] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we call gc()
after ps_mem()
? if a test used a lot of memory but released all, and gc cleans that up, then ps_mem won't report much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have in my mind tests that create test input data, call test()
, then clear up after the test()
call, perhaps by calling rm()
. So the test is more than just the test()
call. ps_mem()
is reporting and finding these top 10 tests. I was more thinking about getting temporary R usage out of the way (by calling gc()
first) to reveal the larger test input datasets.
But yes, good idea to move gc()
to be after and see what happens...
Using gc before ps_mem ............................................ gc after ps_mem
The left is almost identical to earlier result in comment above, as expected because it's a rerun. Small differences there interesting to note their scale and those are assumed due to random nature of memory allocation. Btw, for future reference, it takes just under 13 mins to run on my laptop. So it doesn't make much difference with Comparing the sorted list of the top 10 IDs, 9 are common.
Looking at tests 1157 (out) and 1542 (in) they are both nice to find and it would be a shame to miss either. Other versions of R on different OS, compiled and configured differently, and other test files, etc, could be different of course. Let's change the argument to control the |
Codecov Report
@@ Coverage Diff @@
## master #5515 +/- ##
==========================================
- Coverage 99.51% 99.44% -0.07%
==========================================
Files 80 80
Lines 14773 14787 +14
==========================================
+ Hits 14701 14705 +4
- Misses 72 82 +10
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I ran |
Current output below. Shows steps up where we have relatively bigger test data sizes which could be reduced. The plot is just to show shape and extents; use the table to identify the tests. This growth to 280MB might be enough to tip the overloaded CRAN Windows server to decide to kill, #5507. If we can reduce that down to 170MB it might help avoid that tipping point and kill. I still think that the CRAN Windows server is severely overloaded due to the install time being 5 times longer (214s) than the non-Windows CRAN servers (42s). If we can reduce ram usage in tests, then we should, just to show willing if nothing else. In the absence of any reply from Uwe, in particular not having the detailed logs asked for, this is the best I can guess and do. The FAIL is on CRAN Windows again now.