Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS review: Consider long compute times #10

Closed
mmrabe opened this issue Nov 3, 2020 · 3 comments
Closed

JOSS review: Consider long compute times #10

mmrabe opened this issue Nov 3, 2020 · 3 comments

Comments

@mmrabe
Copy link

mmrabe commented Nov 3, 2020

Refers to JOSS submission openjournals/joss-reviews#2810

Under some circumstances such as sampling effect_size from a distribution and/or large number of iterations B, the computation may take very long. As the computation takes place in your C++ code, the session may appear frozen without any option of aborting or checking the status. Please consider one or more of the following:

  • Parallelizing your code., e.g. with OpenMP. Ideally, this is just one additional line of code before the loop to parallelize and one or two lines in Makevars
  • Displaying some sort of progress bar or a message every x% so that the user has some idea of how long the computation may take
  • Checking for SIGINT or using R_CheckUserInterrupt() in your loop to allow the user to abort
@ClaudioZandonella
Copy link
Owner

Dear @mmrabe thanks for underlining this issue. You are absolutely right, in the case of a large number of iterations the computation may tale very long. Without any messages, users don't know if the function is still running or stuck.

Considering your suggestions, all three are important improvements. However, I have currently implemented only a progress bar.

Parallelizing the code. This would be very important. I have seen that using OpenMP is quite straightforward but it also requires more dependencies and configurations that could be problematic on different OS. I am not an expert in C++ (actually this was my first time coding in C++) so, before implementing this, I would prefer to study more about parallelization to be sure I don't break the code. I will add the parallelization in the list of future plans (see issue #7) and I am sure once implemented it will be a great plus.

Checking for SIGINT. Probably I miss something here. I agree it is important to allow users to abort running computations. But this can be already obtained with the stop button in R- studio or "Ctrl-C" in the console. This already kills the Rsession process and looking at the task manager there are no other processes that keep running. So probably is not necessary to add a further check in the C++ code. But is likely that I miss something, so let me know if I am wrong.

Progress bar. I added a progress bar using the package {pbapply}. For the progress bar, I considered the lapply() loop on the sampled effect_size , as the other for loop is internal to the C++ code. As a consequence, the progress is displayed only when a function is used to define the effect_size . This is reasonable as long computational times occur in this case. For a single value of effect_size , instead, no progress bar is displayed (there is only one step in the progress bar so I hidden it) but computational times are usually less than 1-2 seconds for 10'000 iterations or less than 10 seconds for 100'000 iterations. Computational time increases for a greater number of iterations but using more than 10'000 iterations is unlikely as results are already stable. Finally, the display of the progress bar is conditional to the argument display_message = TRUE for both functions (prospective and retrospective).

Thanks again for your suggestions, as this improved a lot the usability of the package. Let me know if there are further issues on this topic o I can close it!

@mmrabe
Copy link
Author

mmrabe commented Nov 23, 2020

As I said above, one of the three solutions would be sufficient. Your progress bar solution seems fine, so you can consider this issue solved with respect to the JOSS review.

Concerning the SIGINT suggestion: You are right that it is always possible to kill the R session but this may not be the most preferable option for the user. Killing the session also usually means dumping the entire workspace, which may contain other important computation results. When clicking the STOP button in R Studio or hitting Ctrl+C, the user typically expects that only the currently running function call is aborted, not the entire R session. As far as I can tell, PRDA function calls usually don't take very long and as long as control is handed back to R every now and then, the user will be able to actually abort the PRDA call. But if for some reason the computation takes very long because of many C++ loop iterations, checking for a SIGINT inside the C++ loop may be useful to let the user get out of a long computation without losing the whole R session.

@ClaudioZandonella
Copy link
Owner

@mmrabe thank you for the clarification. I have merged the changes in the master branch and closed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants