Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated headers should work the same as checked-in headers (C++) #13803

Open
GMNGeoffrey opened this issue Aug 4, 2021 · 6 comments
Open
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: support / not a bug (process)

Comments

@GMNGeoffrey
Copy link
Contributor

GMNGeoffrey commented Aug 4, 2021

Description of the problem / feature request:

When you generate a C++ header it should behave exactly as if you had checked in the header as a source file. Currently, inclusion ordering gets totally messed up in strange and hard-to-debug ways.

Feature requests: what underlying problem are you trying to solve with this feature?

When processing an #include directive, all major compilers (clang, gcc, MSVC) search the directory the current source file before looking in directories provided on the command line. I don't think this is specified in the C++ standard, but it appears to be the defacto standard. In Bazel, when you generate a file, it is generated in the genfiles/binfiles directory bazel-out/<build option>/bin. So if you try to include the generated file it is not found in the same directory. It gets even worse if there's some other header with the same name on the include path, which is what the attached repro demonstrates.

I think I've seen some Bazel documentation that says Bazel symlinks together binfiles, genfiles, and source roots to form one virtual directory (maybe it was the line in output directory layout about execroot?), but that doesn't seem to be the case.

You can work around this by adding includes = ["<directory>"] to the target to add the directory path to the include search, and that works for a bit. But if your dependency is exporting a public header with the name and switches from includes to the preferred strip_include_prefix then you're in trouble again. The -I for the virtual_includes is searched before the -isystem for the includes argument on your rule. You have to create a cc_library rule that wraps your generated file and uses strip_include_prefix to move it into the top level directory. And then the poison spreads because you've created a header that can be included by any rule with no directory prefix (in fact just such a workaround is I think why I ran into an Options.inc that was being exported with no header in the first place). And it's even worse than that. The order that the virtual_includes are added is dependent on the order of deps. So if your cc_library that wraps the generated header sorts after the dependency or transitive dependency through which you get the conflicting header, you lose it again.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Download and extract any of the attached archives and run bazel build Foo. Note that bazel build Bar Baz succeeds because their options.inc are source files instead of generated.

bazel-include-repro-1.tar.gz demonstrates the first case where you end up with a private header from a dependency. You can work around this by adding includes.
bazel-include-repro-2.tar.gz demonstrates the second case that can't be worked around using includes but can be worked around with strip_include_prefix dependent on dependency ordering.
bazel-include-repro-3.tar.gz demonstrates how the strip_include_prefix workaround falls over if the dependency that (even transitively) provides the wrong include is sorted before the cc_library wrapper around the genfile.

What operating system are you running Bazel on?

Linux Debian

What's the output of bazel info release?

4.1.0

Have you found anything relevant by searching the web?

This looks like an extension of #4463 which was closed as fixed. That one only covered using full include paths, I think.

Potential solutions

Use symlinks to make generated files appear exactly like source files in the execroot. I thought this was something Bazel at least claimed to do.

Add the corresponding generated directory of the current file as the first -I argument to any C++ compile command. This is what CMake does, based on looking at compile_commands.json

The first option sounds like the right long term fix if it's in Bazel's plans, but the second option would be simpler and relieve a lot of pain right now. I'm not sure what this would do to compile times, but given the number of virtual_includes directories I see on the compile commands right now, it doesn't seem too expensive. One option would be to use another virtual_includes directory to do this and only move over files in the rule's srcs.

@chandlerc
Copy link

FWIW, I've thought a lot more about this, and I'm not sure its really a good idea...

Finding generated files via file-relative #include search seems fundamentally broken with any build system that builds outside of the source tree. I know that LLVM does this (extensively), but from what I can tell it does so by having really extreme pollution of its header search paths and prefixing a vast number of files to ensure a lack of collisions. Without those prefixes for example, every LLVM target's headers would collide with each other -- they're all in a shared header search path, both source files and generated files. In part, this hides the fact that the source and generated files are not in the same directory, but at the cost of collision risk.

@GMNGeoffrey -- I don't think CMake add a -I for the current directories by default based on this:
https://cmake.org/cmake/help/latest/variable/CMAKE_INCLUDE_CURRENT_DIR.html#:~:text=default%20CMAKE_INCLUDE_CURRENT_DIR%20is-,OFF,-.

LLVM enables this option explicitly, I would guess because of all its generated headers:
https://github.com/llvm/llvm-project/blob/c6ebc651b6fac9cf1d9f8c00ea49d29093003f85/llvm/CMakeLists.txt#L936

And for LLVM's targets, which have lots of these kinds of generated outputs with more complex layouts and complex include structures, LLVM has dedicated code to add the generated output directory of that target to the header search:
https://github.com/llvm/llvm-project/blob/8da3b7d857298a306973ea8f78c35adb5ba89837/llvm/cmake/modules/AddLLVM.cmake#L1324

And for other parts of the targets, they add even more dedicated searching of generated output trees:
https://github.com/llvm/llvm-project/blob/4c2e01232cfc397a438eab57a8f9e3a849a6e9f1/llvm/lib/Target/AArch64/TargetInfo/CMakeLists.txt#L1

So my feeling is that for projects currently using generated headers like this, workarounds matching the CMake ones are probably more appropriate than using symlinks to fuse the directory structures.

I'd suggest what would minimize the pain here without undue complexity is:

  1. To make includes work for both the source and generated trees -- it doesn't seem to currently?
  2. Add a non-transitive form of includes instead of requiring folks to conjure the correct copts spelling for different C++ toolchains. I think this has been requested before / elsewhere.

Combined, I think that would let us work around this explicitly in build rules like LLVM's that need it in a similar manner to how CMake and other build systems without symlink forests work around this.

@GMNGeoffrey
Copy link
Contributor Author

(catching up because I was out of office)

Thanks for digging into this Chandler! In particular, thanks for pointing out that LLVM has to configure this explicitly in CMake. To respond to your suggestions directly:

  1. includes does include both source and generated trees, but unfortunately it adds them as -isystem and near the end of the compilation command 😞 I believe the -isystem is for historical reasons internal to google (things where this were being used tended to be things where we wanted to ignore various diagnostics because they were outside our control). So unfortunately the virtual includes of dependencies get searched first.
  2. This seems like a good thing regardless. The recommendation to just add a --copt=-I is I think pretty misguided, since it doesn't work well for generated directories and when the repository is used as an external repository, I think. local_includes seems like a straightforward addition and its naming mirrors local_defines.

The issue of colliding with _virtual_includes from dependencies remains though. Should local_includes get added before _virtual_includes directories? Should regular includes? Does Bazel provide any guarantee about include directory ordering? Should it, or should that be deliberately undefined (since it's super fragile)?

A further issue with local_includes is if an exported header file includes a generated header and is then included by a header in another rule. Then you're back to needing includes or strip_include_prefix to get down to an unprefixed path and polluting the include path of all dependent targets. Here's an example
bazel-include-repro-4.tar.gz. Try bazel build DepOnLocalInclude.

The particular place where this came up in LLVM, is

https://github.com/llvm/llvm-project/blob/9ed4a94d6451046a51ef393cd62f00710820a7e8/utils/bazel/llvm-project-overlay/llvm/BUILD.bazel#L2330-L2354:

gentbl(
    name = "LibOptionsTableGen",
    strip_include_prefix = "lib/ToolDrivers/llvm-lib",
    tbl_outs = [(
        "-gen-opt-parser-defs",
        "lib/ToolDrivers/llvm-lib/Options.inc",
    )],
    tblgen = ":llvm-tblgen",
    td_file = "lib/ToolDrivers/llvm-lib/Options.td",
    td_srcs = ["include/llvm/Option/OptParser.td"],
)

cc_library(
    name = "LibDriver",
    srcs = glob(["lib/ToolDrivers/llvm-lib/*.cpp"]),
    hdrs = glob(["include/llvm/ToolDrivers/llvm-lib/*.h"]),
    copts = llvm_copts,
    deps = [
        ":LibOptionsTableGen",
        # ...
    ],
)

and

https://github.com/llvm/llvm-project/blob/9ed4a94d6451046a51ef393cd62f00710820a7e8/utils/bazel/llvm-project-overlay/lld/BUILD.bazel#L100-L145

gentbl(
    name = "coff_options_inc_gen",
    # See https://github.com/bazelbuild/bazel/issues/13803
    strip_include_prefix = "COFF",
    tbl_outs = [(
        "-gen-opt-parser-defs",
        "COFF/Options.inc",
    )],
    tblgen = "//llvm:llvm-tblgen",
    td_file = "COFF/Options.td",
    td_srcs = ["//llvm:include/llvm/Option/OptParser.td"],
)

cc_library(
    name = "COFF",
    srcs = glob([
        "COFF/*.cpp",
        "COFF/*.h",
    ]),
    includes = ["COFF"],
    deps = [
        ":coff_options_inc_gen",
        "//llvm:LibDriver",
        # ...
    ],
)

where there's a #include "Options.inc" somewhere in a COFF file. In this case, none of the exported headers are including the generated file, so I think we could handle instead by a local_includes = "lib/ToolDrivers/llvm-dlltool/" on DlltoolDriver instead of the strip_include_prefix. The issue gets a little trickier when tablegen is being used to generate a large number of generated files and so it's convenient to use the generated cc_library and treat these as exported (textual except for #12424) hdrs. Then they need to be exported somehow. I think we can just take the loss on that one because you really shouldn't be exporting unprefixed headers anyway.

I still fundamentally think that whether a file is generated or not shouldn't matter past the initial configuration to create the generation and declare a dependency on it.

@oquenchil oquenchil added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: support / not a bug (process) labels Aug 20, 2021
@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label May 25, 2023
@GMNGeoffrey
Copy link
Contributor Author

Still a thing... There are detailed discussions and reproductions in these comments and specific good suggestions for Bazel. I would appreciate if someone took a look.

[Also, as I've mentioned before it is not actually possible to "reach out to the triage team" unless you're in the bazebuild org]

Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs. If you think this issue is still relevant and should stay open, please post any comment here and the issue will no longer be marked as stale.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label Jul 30, 2024
@Wyverald Wyverald removed the stale Issues or PRs that are stale (no activity for 30 days) label Jul 30, 2024
@chandlerc
Copy link

(I saw the stale label get removed, but adding a comment as requested to confirm that yes, this is still needed, in case that is the specific kind of activity needed to keep the issue from being closed.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: support / not a bug (process)
Projects
None yet
Development

No branches or pull requests

4 participants