Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fixes for launching GPU detection #2713

Merged
merged 9 commits into from
Sep 26, 2018
Merged

client: fixes for launching GPU detection #2713

merged 9 commits into from
Sep 26, 2018

Conversation

JuhaSointusalo
Copy link
Contributor

Launching GPU detection has a few issues.

  • doesn't work reliably with --detach_console
  • doesn't work if client is run from $PATH
  • if it fails the user is presented with hard to understand error code
  • the code throws compiler warnings

This PR fixes those issues. To support the fixes get_real_executable_path() is implemented for several more operating systems and run_program() is changed to use $PATH. See individual commit messages for details.

Fixes #2657.

The function will be used in launching GPU detection.
GPU detection launches a copy of the client based on the contents of
argv[0]. If argv[0] doesn't include path to the client and the current
directory is not BOINC program directory then the GPU detection fails.
Hard coding the client name in --detach_console code only works because
of the extra work CreateProcess() does to find the executable. Launching
GPU detection uses CreateProcess() in a way that it doesn't try to find
the executable.

Fix this by using the full path to the client executable in
--detach_console code. This in turn makes the full path available to GPU
detection code. This also takes care of the case of the client
executable not being named boinc.exe.
GPU detection tries to launch a copy of the client using argv[0] but
argv[0] doesn't always include path. If path is not included and current
directory is not the same as BOINC program directory then the GPU
detection fails.

Use full path to the client executable instead if possible. Quote the
path on Windows to handle paths with spaces.
GPU detection tries to launch a copy of the client using full path to
the client executable but getting the full path may fail. In this case
the code falls back to using argv[0] but argv[0] doesn't always contain
path to the client. This may happen if the client was started from shell
and the shell used $PATH to find the client executable.

Change run_program() to search $PATH for the program file if the file
name doesn't include path.
On Unixes, if GPU detection fails the client logs an error message
containing a raw value from waitpid() call. This raw value generally
requires writing an external program to decode it before it is useful
for troubleshooting.

Decode the raw value to something more useful to humans. Make a similar
change to Windows code too.
GPU detection code goes back and forth between program and data
directories but this is unnecessary because run_program() already takes
care of running the program in correct directory.

Fixes two "ignoring return value" warnings from GCC/GLIBC.
When building BOINC on MSYS+MinGW the build time environment does have
/proc but the run time environment never has /proc.

BSDs consider /proc to be an optional feature and, as such, /proc may be
present at build time but missing at run time or vice versa.

This makes checking for /proc/self/exe in configure unreliable and the
check is better done at run time. get_real_executable_path() is the only
user of the test result and has already been changed to do the check at
run time.
@JuhaSointusalo
Copy link
Contributor Author

JuhaSointusalo commented Sep 22, 2018

@CharlieFenton Please test.

get_real_executable_path() is implemented on OSX using _NSGetExecutablePath(). Since GPU detection falls back to argv[0] if get_real_executable_path() fails you'll need to add some test code. I used something like this at the top of client's main():

char path[MAXPATHLEN] = "";
int ret = get_real_executable_path(path, sizeof(path));
fprintf(stderr, "ret %d path %s\n", ret, path);
exit(0);

If _NSGetExecutablePath() doesn't work or you think it's not appropriate for the job an alternative I found is sysctl {CTL_KERN, KERN_PROCNAME} (kern.procname), though that one looks like not exactly well documented.

@RichardHaselgrove
Copy link
Contributor

Built and tested under Windows 7 / VS2013.

Checked both problems reported in #2657 and #2029: successful GPU detection in both cases.

@CharlieFenton
Copy link
Contributor

@JuhaSointusalo: _NSGetExecutablePath() works correctly and is appropriate. However, I disagree in part with your commit 1296c86. I agree that the command chdir(client_dir) is redundant, since that is also done inside run_program(). However, I believe that without chdir(data_dir) after run_program() to restore the working directory, the original instance of the client will no longer properly access the BOINC Data directory. Am I missing something here?

@JuhaSointusalo
Copy link
Contributor Author

However, I believe that without chdir(data_dir) after run_program() to restore the working directory, the original instance of the client will no longer properly access the BOINC Data directory.

On Windows, the directory that is passed to run_program() is only used in a call to CreateProcess(). CreateProcess() shouldn't modify the parent process in any way.

On others, the directory is used in a call to chdir() but that happens after fork(), that is, it's changing the current working directory of the child process.

boinc/lib/util.cpp

Lines 471 to 476 in 460a6db

if (pid == 0) {
if (dir) {
retval = chdir(dir);
if (retval) return retval;
}
execv(file, argv);

Maybe it's the return after failed chdir() that's confusing here. That's actually a bug in run_program() and the code should call _exit() instead. If the code returns we'll have two processes running, the parent and a crippled copy of it.

I intend to review all child process launching code in BOINC to see if these bugs are made elsewhere but that's another day another PR.

@davidpanderson
Copy link
Contributor

davidpanderson commented Sep 24, 2018 via email

@JuhaSointusalo
Copy link
Contributor Author

@CharlieFenton

I logged current working directory before and after run_program() and in both log entries it was the data directory.

@RichardHaselgrove
Copy link
Contributor

Although Windows isn't the primary concern of recent comments, the private build I made yesterday has been running since

23-Sep-2018 14:07:24 [---] Starting BOINC client version 7.13.0 for windows_x86_64

The report I made earlier on tests for the original reported problems was with a secondary instance: the primary client has been running constantly and updating files in the data directory as normal. The list of files updated since I started typing this report is

client_state.xml
daily_xfer_history.xml
stdoutdae.txt
client_state_prev.xml
job_log_setiathome.berkeley.edu.txt
statistics_setiathome.berkeley.edu.xml
sched_reply_setiathome.berkeley.edu.xml
sched_request_setiathome.berkeley.edu.xml
job_log_einstein.phys.uwm.edu.txt

@CharlieFenton
Copy link
Contributor

@JuhaSointusalo: You are correct. I misread the run_program code. I now see that chdir(dir) is in the forked process, not the parent. Since the arguments passed to the child include -dir quoted_data_dir, the client_dir argument to run_program is not needed and NULL could be passed instead. If you do that, all references to client_dir should be removed, including the entire COPROCS::set_path_to_client() method and the call to it in main().

This log_flags message should also be removed, as it is no longer correct since you have modified the code to use an absolute path:

       msg_printf(0, MSG_INFO,
            "[coproc] relative to directory %s",
            client_dir
        );

@CharlieFenton
Copy link
Contributor

Has anyone tested this on Mac?
I was hoping to make the 7.14 branch today.

@davidpanderson: I tested it and further changes are needed, as I wrote in my last comment. At the very least, the incorrect log_flags message must be removed. All references to the now unnecessary static variable client_dir should also be removed, including the entire COPROCS::set_path_to_client() method and the call to it in main().

@davidpanderson
Copy link
Contributor

Juha, can you please address Charlie's suggestions?
Thanks -- D

@JuhaSointusalo
Copy link
Contributor Author

since you have modified the code to use an absolute path

Well, the code tries to use absolute path but that doesn't work everywhere. I know it doesn't work on OpenBSD at all because OpenBSD provides no way to find out the filename and path of the current program. It also doesn't work on NetBSD 7 if /proc is not mounted (the sysctl is NetBSD 8 thing). I don't know for sure it won't ever fail on other systems.

One could argue that OpenBSD doesn't support GPU crunching and never will and we don't need to care about it but I still would like to keep argv[0] as a backup. argv[0] could be something like ./boinc and then we need to know what directory the client was started in. Calling the variable client_dir is a bit misleading but I'm not sure what to call it then. dir_that_was_cwd_at_startup?

Sidenote: I suppose implementing get_real_executable_path() on BSDs and Solaris was kinda unnecessary. I was searching for something and came across info on how to find out executable path on those. So I figured why not use the info, if we ever decide to have good support for BSDs then there won't be need to find that information again.

@CharlieFenton
Copy link
Contributor

I still would like to keep argv[0] as a backup

Fair enough. Perhaps this is getting too fussy, but it would be nice if this log_flags message would appear only when get_real_executable_path() failed:

        msg_printf(0, MSG_INFO,
            "[coproc] relative to directory %s",
            client_dir
        );

@JuhaSointusalo
Copy link
Contributor Author

@CharlieFenton Ready for re-review.

@CharlieFenton
Copy link
Contributor

@JuhaSointusalo I tested this and it works as intended. Thank you. I approve this PR for merging into master.

@davidpanderson davidpanderson merged commit 388dcea into BOINC:master Sep 26, 2018
@JuhaSointusalo JuhaSointusalo deleted the client-gpu-detection branch September 28, 2018 19:26
TheAspens pushed a commit that referenced this pull request Oct 9, 2018
…tion"

This reverts commit 388dcea, reversing
changes made to 188e7f2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Windows) GPU detection fails when started from command line (take 2)
6 participants