Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make system module map generation faster and fully hermetic #280

Merged
merged 1 commit into from
Mar 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 0 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,11 +251,6 @@ deps (also known as "depend on what you use") for `cc_*` rules. This feature
can be enabled by enabling the `layering_check` feature on a per-target,
per-package or global basis.

If one of toolchain or sysroot are specified via an absolute path rather than
managed by Bazel, the `layering_check` feature may require running
`bazel clean --expunge` after making changes to the set of header files
installed on the host.

## Prior Art

Other examples of toolchain configuration:
Expand Down
2 changes: 1 addition & 1 deletion toolchain/internal/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.

exports_files(["generate_system_module_map.sh"])
exports_files(["template.modulemap"])
15 changes: 14 additions & 1 deletion toolchain/internal/configure.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -545,7 +545,14 @@ cc_toolchain(
unfiltered_compile_flags = _list_to_string(_dict_value(toolchain_info.unfiltered_compile_flags_dict, target_pair)),
extra_files_str = extra_files_str,
host_tools_info = host_tools_info,
cxx_builtin_include_directories = _list_to_string(cxx_builtin_include_directories),
cxx_builtin_include_directories = _list_to_string([
# Filter out non-existing directories with absolute paths as they
# result in a -Wincomplete-umbrella warning when mentioned in the
# system module map.
dir
for dir in cxx_builtin_include_directories
if _is_hermetic_or_exists(rctx, dir, sysroot_prefix)
]),
extra_compiler_files = ("\"%s\"," % str(toolchain_info.extra_compiler_files)) if toolchain_info.extra_compiler_files else "",
)

Expand Down Expand Up @@ -586,3 +593,9 @@ native_binary(
llvm_dist_label_prefix = llvm_dist_label_prefix,
host_dl_ext = host_dl_ext,
)

def _is_hermetic_or_exists(rctx, path, sysroot_prefix):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please help me understand how to make RBE work in this case? For example when running bazel from macOS and executing in a remote Linux executor, /usr/include does not exist on macOS but exists on Linux, this filter will remove /usr/include from cxx_builtin_include_directories and that could result in bazel validation failure ...includes the following non-builtin files with absolute paths (if these are builtin files, make sure these paths are in your toolchain).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have this scenario in mind when I wrote this, my assumption being that you would only use this toolchain for RBE together with a hermetic sysroot.

This function only makes sense if you are actually targetting the host, so we should probably disable it when cross compiling and just use all directories unfiltered. If you want to give that a try, I can help you implement and test this approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'm not sure of a reasonable implementation. What would you suggest?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _is_hermetic_or_exists function should be fed the information whether exec_os equals target_os and only perform the existence check if so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was looking at those, but not sure that is reliable. For RBE, exec_os could be the same as target_os, right? For example I can define a toolchain like the following and while produce target for linux as well.

llvm.toolchain(
    name = "llvm_toolchain_linux_exec",
    exec_arch = "amd64",
    exec_os = "linux",
    llvm_version = LLVM_VERSION,
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are completely right, I guess we need to match the host OS (obtained e.g. from repository_ctx.os) against the target OS (this determines the shape of the sysroot).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me give that a try.

Copy link
Contributor

@honnix honnix Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rctx.os.name for mac is like mac os x, which is not the same as the convention used by target_os :( But otherwise this should be a reasonable approach. We just need to do some formalisation.

path = path.replace("%sysroot%", sysroot_prefix)
if not path.startswith("/"):
return True
return rctx.path(path).exists
35 changes: 0 additions & 35 deletions toolchain/internal/generate_system_module_map.sh

This file was deleted.

93 changes: 56 additions & 37 deletions toolchain/internal/system_module_map.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -14,62 +14,81 @@

load("@bazel_skylib//lib:paths.bzl", "paths")

def _textual_header(file, *, include_prefixes, execroot_prefix):
path = file.path
for include_prefix in include_prefixes:
if path.startswith(include_prefix):
return " textual header \"{}{}\"".format(execroot_prefix, path)

# The file is not under any of the include prefixes,
return None

def _umbrella_submodule(path):
return """
module "{path}" {{
umbrella "{path}"
}}""".format(path = path)

def _system_module_map(ctx):
module_map = ctx.actions.declare_file(ctx.attr.name + ".modulemap")

dirs = []
non_hermetic = False
for dir in ctx.attr.cxx_builtin_include_directories:
if ctx.attr.sysroot_path and dir.startswith("%sysroot%"):
dir = ctx.attr.sysroot_path + dir[len("%sysroot%"):]
if dir.startswith("/"):
non_hermetic = True
dirs.append(paths.normalize(dir))

# If the action references a file outside of the execroot, it isn't safe to
# cache or run remotely.
execution_requirements = {}
if non_hermetic:
execution_requirements = {
"no-cache": "",
"no-remote": "",
}
absolute_path_dirs = []
relative_include_prefixes = []
for include_dir in ctx.attr.cxx_builtin_include_directories:
if ctx.attr.sysroot_path and include_dir.startswith("%sysroot%"):
include_dir = ctx.attr.sysroot_path + include_dir[len("%sysroot%"):]
include_dir = paths.normalize(include_dir)
if include_dir.startswith("/"):
absolute_path_dirs.append(include_dir)
else:
relative_include_prefixes.append(include_dir + "/")

# The builtin include directories are relative to the execroot, but the
# paths in the module map must be relative to the directory that contains
# the module map.
execroot_prefix = (module_map.dirname.count("/") + 1) * "../"
textual_header_closure = lambda file: _textual_header(
file,
include_prefixes = relative_include_prefixes,
execroot_prefix = execroot_prefix,
)

ctx.actions.run_shell(
outputs = [module_map],
inputs = ctx.attr.cxx_builtin_include_files[DefaultInfo].files,
command = """
{tool} "$@" > {module_map}
""".format(
tool = ctx.executable._generate_system_module_map.path,
module_map = module_map.path,
),
arguments = dirs,
tools = [ctx.executable._generate_system_module_map],
env = {"EXECROOT_PREFIX": execroot_prefix},
execution_requirements = execution_requirements,
mnemonic = "LLVMSystemModuleMap",
progress_message = "Generating system module map",
template_dict = ctx.actions.template_dict()
template_dict.add_joined(
"%textual_headers%",
ctx.attr.cxx_builtin_include_files[DefaultInfo].files,
join_with = "\n",
map_each = textual_header_closure,
allow_closure = True,
)
template_dict.add_joined(
"%umbrella_submodules%",
depset(absolute_path_dirs),
join_with = "\n",
map_each = _umbrella_submodule,
)

ctx.actions.expand_template(
template = ctx.file._module_map_template,
output = module_map,
computed_substitutions = template_dict,
)
return DefaultInfo(files = depset([module_map]))

system_module_map = rule(
doc = """Generates a Clang module map for the toolchain and sysroot headers.""",
doc = """Generates a Clang module map for the toolchain and sysroot headers.

Files under the configured built-in include directories that are managed by
Bazel are included as textual headers. All directories referenced by
absolute paths are included as umbrella submodules.""",
implementation = _system_module_map,
attrs = {
"cxx_builtin_include_files": attr.label(mandatory = True),
"cxx_builtin_include_directories": attr.string_list(mandatory = True),
"sysroot_path": attr.string(),
"_generate_system_module_map": attr.label(
default = ":generate_system_module_map.sh",
"_module_map_template": attr.label(
default = "template.modulemap",
allow_single_file = True,
cfg = "exec",
executable = True,
),
},
)
4 changes: 4 additions & 0 deletions toolchain/internal/template.modulemap
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
module "crosstool" [system] {
%textual_headers%
%umbrella_submodules%
}
Loading