-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[icu] Enable parallel builds #6695
Conversation
toolsrc/src/vcpkg/base/system.cpp
Outdated
|
||
int System::get_num_logical_cores() | ||
{ | ||
int num_cores = std::thread::hardware_concurrency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Thats capped at 64 on windows or even lower (1 process group). To get the real number of cores you have to iterate over all groups and ask for the number of cores. The problem with that is it might not be the number asigned to the job task in an HPC env. For that you have to read in the typical ENV variables which contains the assigned cores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, you learn something new every day. I'm more of a Linux guy so I had never heard of processor groups. I also had not considered the HPC use case where you queue jobs that are assigned a limited amount of resources. I'd guess that checking the cpu affinities before the number of cores on the machine should be sufficient. Though on Linux I would hope people are using cgroups. I wonder if the get_hardware_concurrency function returns the number of logical cores in your cgroup.
Edit:
How does batching software on windows limit the number of cpus a process and its children have access to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem with std::thread::hardware_concurrency()
is that it is only a hint. So it probably does not always return what you want.
How does batching software on windows limit the number of cpus a process and its children have access to?
One process will always only spawn threads within its own group unless the process itself moves newly-created threads to a different group by explicitly assigning affinities. This makes the process a multi group process (Fun Fact: Looking at the affinities of such a multi group process in the task manager crashes the task manger). There is no external way of doing it and there is no way of telling in which group windows decides to spawn a new process (which is problematic if your HPC assigned resources span two groups). Furthermore groups might even be asymmetrical in core count if requested.
That is why in my simulation program I read in the env variable CCP_COREIDS
(HPC_PACK) and assign each thread to one COREID by hand because threads on windows are just broken ;)
I wonder if the get_hardware_concurrency function returns the number of logical cores in your cgroup.
Cannot answer that, never tried something like that because our nodes run windows. Maybe @cenit knows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at documentation for process groups it appears windows makes guarantees about the topology of logical cores in a process groups (that they're as physically/logically close as possible and/or respecting some Numa group settings) and that windows will minimize the number of process groups. What happens if the number of cores you are supposed to be allocated is less than the number in a process group?
Either way I was hoping that I wouldn't have to implement a per platform solution but it's looking like I might have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Neumann-A @ras0219-msft What do you guys think of adding two environment variables/command line options that override the automated cpu detection. VCPKG_MAX_LOGICAL_PROCESSOR_COUNT
which sets an upper bound on the number of logical processors used and VCPKG_LOGICAL_PROCESSOR_COUNT
which overrides the number of processors automatically detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbezault: Why two? Is one not enough? If VCPKG_LOGICAL_PROCESSOR_COUNT
is not set you could use the automated detection else use whatever the triplet says. (probably requires usage of get_triplet_environment.cmake)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the single variable is probably fine, I was just thinking of the corner case where you still want some automated detection for whatever reason but want to cap it. At that point though if you know what a reasonable cap is you probably also know what you want the parallelism to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a VCPKG_MAX_CONCURRENCY
setting makes sense. If not set, we will use best-effort heuristics (for now, probably just hardware_concurrency()
, but in the future we could definitely do platform-specifics). If set, we just use that setting directly.
This enables both "fixing" our heuristics upwards when they're wrong (due to complex machine architectures) as well as downwards when there are other constraints we aren't considering (RAM limits, system interactivity, other concurrent processes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's a good heuristic to determine if I want this to be a command line setting or an environment variable?
trigger ci
…in) from ports since they are broken and usually break non-win32 ports
endif() | ||
endif() | ||
elseif(out_contents MATCHES "mt : general error c101008d: " OR out_contents MATCHES "mt.exe : general error c101008d: ") | ||
# Antivirus workaround - occasionally files are locked and cause mt.exe to fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably print a message here to inform the user that deactivating antivirus for the vcpkg buildtree folder could speed up the build if he encounters this error
* Add VCPKG_NUM_LOGICAL_CORES * break out logic that retries running a command several times into its own function * Parallelize icu
Add a VCPKG_CONCURRENCY variable for use in portfiles.
Use VCPKG_CONCURRENCY in icu as a parameter to make.