-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move suggests from tests.Rraw to other.Rraw #5518
Conversation
might I suggest putting each suggested package in its own test script? that way users with only a select few of the suggested deps can easily pick/choose which suites to run. generally I'm in favor of splitting up our suite into many more scripts which facilitates parallelization (e.g. scripts across machines) but maybe it's better to do so by functionality, not required packages. |
In other test scripts we are using suggested pkgs as well. |
We don't really have a speed problem: other.Rraw takes under 4 seconds currently. And tests.Rraw takes under 1 minute on a normally loaded machine. I suspect the over use of parallelization might be what's causing CRAN Windows machine to be so slow, for example. It's slow to even install lots of packages, not just data.table. |
FWIW I bring this up because just today I had to hack some unholy stuff into the Google fork's existing yes, partially our problem (hence fixing in the fork instead of filing a PR here), but offering evidence that issues w the script in crowded machines are not a CRAN-only thing.
I disagree here. I think that's outweighed by making the tests more hermetic. There are a lot of tests in our current suite that take a lot of effort to debug because the
In fact I would be fine if it's larger due to beneficial repetitiveness.
Modern editors make this easy 😉 You can give it a spin in VS-code-on-GitHub in a few seconds: go to the repo page, press ., then press Ctrl+Shift+F -- search the whole repo. here's all fread calls in
the easiest solution here is to have one test per R/ file, matching the test file name to the source file. tests should be localized enough that it's rare to need a test covering >1 source file. I know we're very far away from being able to land something like my from-scratch preference, and it's honestly low-return effort. But I do think the main suite has gotten too big. Even something as silly as |
Yes disabling those tests would avoid the change, with some new logic/flag to control that; maybe the OTHER environment variable could be read at the top of tests.Rraw or something. But really the right place for these tests is other.Rraw so might as well get on with it. The change is just code blocks moving. I can't imagine there will be many PRs touching these tests, but any that do shouldn't be hard to resolve as the same code is just in a different place. |
Could you explain why exactly it takes 10-20 min to run? Is it because of the fork, or because of "test sharding"? It takes 46 seconds for me on my laptop.
Then perhaps you can provide the logs I've asked Uwe for but have no reply. Since you're reporting similar behavior.
You're talking about writing the tests in a different way here; i.e. the test would still be like that if it was in its own file. I don't really see your motivation or problem here. Perhaps you could pick a test like that and show how you would write it. |
Also the results of |
|
Codecov Report
@@ Coverage Diff @@
## master #5518 +/- ##
==========================================
- Coverage 99.44% 98.32% -1.12%
==========================================
Files 80 80
Lines 14787 14787
==========================================
- Hits 14705 14540 -165
- Misses 82 247 +165
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
The other thing to check on your Google server is the settings of |
Extra context I can share: the tests are run continuously on TAP, single-threaded (we don't support openMP) on shared machines. That time will also include the "from-scratch" build time (think of a docker build on GLCI; more specifically it's on bazel). Typical times from what I've seen are around 2-3 minutes. I am not too surprised the upper tail of test timings can get so high e.g. on crowded machines.
Happy to help if I can here -- IIUC the CRAN issue is for running R CMD check right? And the examples are where it's hanging? If so that's a separate issue -- my timing is for
Better handled as its own issue, I'll file as such if I come across the issue again.
This one I'm not sure we'll be too much help on... we have some edits to R's memory handling, I'd have to understand what's been done a lot better to know whether it could give results worth comparing to vanilla R / our CRAN issues. Rprof was not even enabled until somewhat recently.
Sharding in this case is usually the test binary being executed on different machines, rather than different processes on the same machine. But other threads on each machine will be occupied by other workloads; I assume the test is at a low priority. |
Thanks. Very interesting.
I'm working on the basis that the examples have finished due to :
i.e. I'm currently thinking that the I don't know what "hung up for 20 minutes" means. Does it mean the operating system has detected that the process has been inactive for 20 mins? Or does it mean there is a timeout of 20 mins (setup by Uwe) on the R CMD check task? There is a successful OK run shown in #5507 with Ttotal 1805s (30 mins) and Tcheck 1497s (25 mins) so if a timeout is imposed (which would be very reasonable on machines which are not unreasonably overloaded) then that doesn't really fit. Unless Uwe is pushing it through manually sometimes. Or maybe the 20 min timeout is on a subset of Tcheck. The message does include the word 'probably' and question mark '?' so it's just an unclear situation by its nature. Perhaps these messages are in R's sources inside
yes. I am guessing that it's main.R because I do find growing memory usage there. Growing memory usage would lead to the process being swapped out on an overloaded server and perhaps being killed by OOM. It's just a theory given Uwe doesn't reply. However, with that said, the time to even install at over 10x Linux, so I really hate working on things like this and then people say 'oh right it was data.table's growing ram usage and they fixed it' when in fact the root cause is (could be, I don't know, I'm forced to guess) a severely overloaded and mismanaged server where even installing takes 10x longer, and installing other packages on the same machine is also much slower too.
yes, timings and Rprof for tests.Rraw alone.
OpenMP does work and works very well though. It's just that, iiuc, Google have decided not to use it and as far as I know not share why not. In general I find it very frustrating when big influential companies and people, in general in this industry, when software solutions are labelled as not working without explaining why. For example with Apple not turning on OpenMP by default, and people seem to accept that's just the way it is, and, therefore data.table is a problem for using OpenMP. That riles me.
Then surely the problem is that your sever is overloaded and mismanaged. No? Here's what I see on my laptop.
If you split out It might be helpful to concentrate on build and install times, to start with. If it takes me on my laptop 40 seconds to build and 16 seconds to install, do you on your laptop concur? But your server takes much much longer to do that (which apparently happens on the CRAN Windows sever too but not CRAN Linux). Then it could be very simple: you and Uwe's servers are overloaded. When it comes to the memory usage of In other words, let's say Uwe and your server installed data.table in a normal time. But then the tests inside Tinstall is 16 seconds for me on my laptop but over 5 minutes on CRAN Windows. |
yes, but not part of the R sources. The same typo occurs there but only in comments, probably not worth filing a patch:
I think not, I don't see any of the phrases "Check process" "example output" "where/before" or "Most likely this happened" in the R sources (I would expect a hit in src/library/tools/R/QC.R if it's something
Sorry, poor wording on my part -- you're absolutely right. I should say OpenMP is not supported internally, which means potential maintenance headaches if we go against the grain and try to enable it on our own. I am not sure how sensitive the reasoning is so I'm being conservative and not sharing much, but basically it was a choice to invest elsewhere.
Overloaded, probably, mismanaged, I'd say no :) The test is being run many times and not production-critical -- mismanagement would be if we were prioritizing "more important" stuff over a CI test whose timing is of second- or third-order importance. Sorry if this is more noise than signal -- there's a lot of our CI system / R infra that won't be directly comparable (as I repeat several times below). Nevertheless...
I'm afraid it's not exactly comparable. There's no timing split out for
Running the test binary (basically includes starting python, then starting an R subprocess and executing tests/main.R):
We also don't run
And we also don't use |
Tried
First thing that sticks out to me is |
Initially I wondered why you would focus on |
Are you saying that production critical tasks are taking place on the same server and those production tasks are taking priority? If so, it's a good point, so long as the non-critical tasks don't somehow impact the production tasks; e.g. if the best laid plans and priority mechanism don't work for some reason. All I've experienced is production running on, well, servers dedicated to production, and that seems to be a good way to set things up. Otherwise, mismanagement to me means running so many tasks concurrently that the time spent context switching all those tasks becomes significant; in the extreme when a box starts to swap. Good to me would be running fewer tasks concurrently so that the total time to run everything is lower. Wasting compute resources on context switching is bad management to me, no? Now, if the tasks aren't really taking longer at all on your servers but are simply being paused while production tasks run, then that's different and the time the tasks are paused should be reported separately. I suspect what's happening on the CRAN WIndows server, based on Tinstall being so high, is that too many tasks are running concurrently and the box is wasting a lot of time context switching all those tasks. |
I think we'll get too far into the weeds to answer externally, but as you note, there's some trade-off, and rest assured there is enough $ flowing around here that getting it right is worth investing a lot in. Plenty of glory to be had for improving things 😄
That should be the gap between I see a similar gap in the test log of an execution that took >10min.
My first guess was because I ran this interactively, the That shouldn't be run within our test suite, or within the Google code orchestrating the test suite, so beats me. It also persists when I ran
PS just remembered I am running on 1.14.4, I haven't added any of the other commits between there and GH |
Great. Could you run Ok, yep when I'm done with #5520 would be a good time if you could try from master. Will LYK.
I guess so. But then in that case it means that the process isn't being paused in a good way, but instead it is spending/wasting its time context switching. Can you reduce the number of tasks on that machine, or at least see how many tasks there are vs how many logical cpus there are. |
no -- since it's batched in with the regular CI flow, it gets scheduled beyond my control. and whenever I launch the job myself, I haven't seen any latency issue.
That will be pretty far down the rabbit hole as there's a few layers of abstraction between what I see easily (a web UI) and the bare metal running the jobs... let's revisit if we've really got no other leads. Tamping down memory usage in the suite seems pretty promising so far.
|
Please note that #5501 speeds up |
@MichaelChirico ok now that #5520 is merged would you mind testing test.data.table() from master on your loaded server to see if it helps vs 1.14.4 as mentioned above? I saw you said it's not as trivial as |
I'll give it a try. Need to look into how to do so without necessarily using latest dev as our "prod" version (since the issue wasn't reproducing in my local test env). Check back in a few days. |
No worries I'll prepare the minimal release to CRAN and maybe send it by tomorrow. The worst that can happen is that it just doesn't fix CRAN Windows. Unless I mess something up. |
OK, in that case, itll be easier with a CRAN-tagged release |
Figured out a way to do this (quite easily in fact) where I can run the test 100 times & report some basic stats. Note that the caveats mentioned above about how to interpret these timings still apply.
Looks like a pretty substantial improvement already, great work! 1 minute off the average and almost 66% off the max (99 %ile-ish) time! |
Great! Thanks. It's very nice to have it over multiple runs like that when we're dealing with memory effects. Those timings are for |
Yes it includes start-up time, and turn-down time for that matter. Also recall that we are forced to be single-threaded; glancing at some logs, I'm seeing ~60-90 seconds for the exact comparison, e.g. "All 9916 tests (last 2163) in tests/tests.Rraw completed ok in 00:01:30 elapsed (00:01:21 cpu)" Just added the 1.14.6 timings since it's now shown up on CRAN. |
Oh, nice, I just learned I can
(looks like there's about 3 seconds of overhead for start-up/turn-down, most of the timing difference vs what you're seeing should be due to the single threading) |
The tests in the test suite aren't big enough for threads to be utilized; there's a throttle. I ran |
Closes #5516
R.utilslots offread
tests use compressed input data files so it's more fundamental to the test environmentbit64already decided in 5516 to leave bit64 in as it's fundamental really