Skip to content

Latest commit

 

History

History
736 lines (615 loc) · 38.8 KB

design.md

File metadata and controls

736 lines (615 loc) · 38.8 KB

High-level design of C++/Rust interop

This document describes the high-level design choices of Crubit, a C++/Rust Bidirectional Interop Tool.

[TOC]

C++/Rust interop goal

The primary goal of Crubit is to enable Rust to be used side-by-side with C++ in large existing codebases.

In the short term we would like to focus on codebases that roughly follow the Google C++ style guide to improve the interop fidelity. Other, more diverse codebases are possible prospective users in the long term, and their needs will be addressed by customization and extension points.

C++/Rust interop requirements

In support of the interop goal, we identify the following requirements:

  1. Enable using existing C++ libraries from Rust with high fidelity
    • High fidelity means that interop will make C++ APIs available in Rust, even when those API projections would not be idiomatic, ergonomic, or safe in Rust, to facilitate cheap, small step incremental migration workflow. Based on the experience of other cross-language interoperability systems and language migrations (for example, Objective-C/Swift, Java/Kotlin, JavaScript/TypeScript), we believe that working in a mixed C++/Rust codebase would be significantly harder if some C++ APIs were not available in Rust.
    • Interop will bridge C++ constructs to Rust constructs only when the semantics match closely. Bridging large semantic gaps creates a risk of making C++ APIs unusable in Rust, as well as a risk of creating performance problems. For example, interop will not bridge destructive Rust moves and non-destructive C++ moves; instead it will make C++ move constructors and move assignment operators available to use in Rust code. As another example, interop will not bridge C++ templates and Rust generics by default.
    • Interop should be performant, as close to having no runtime cost as possible. The performance costs of the interop should be documented, and where possible, intuitive to the user.
    • Interop should be ergonomic and safe, as long as ergonomic and safety accommodations do not hurt performance or fidelity. Where a tradeoff is possible, the interop will choose performance and fidelity over ergonomics; the user will be allowed to override this choice.
    • Enable owners of the C++ API to control their Rust API projection, for example, with attributes in C++ headers and by extending generated bindings with a manually implemented overlay. Such an overlay will wrap or extend generated bindings to improve ergonomics and safety.
  2. Enable using Rust libraries from C++
    • However, using C++ libraries from Rust has a higher priority than using Rust libraries from C++.
  3. Put little to no barriers to entry
    • Ideally, no boilerplate code needs to be written in order to start using a C++ library from Rust. Adding some extra information can make the generated bindings more ergonomic to use.
    • The amount of duplicated API information is minimized.
    • Future evolution of C++ APIs should be minimally hindered by the presence of Rust users.

Proposal and high-level design

We propose to develop our own C++/Rust interop tooling. There are no existing tools that satisfy all of our requirements. Modifying an existing tool to fulfill these requirements would take more effort than building a new tool from scratch or might require forking its codebase given that some existing tools have goals that conflict with our goals.

See the "alternatives considered" section for a discussion of existing tools.

Source of information about C++ API

Interop tooling will read C++ headers, as they contain the information needed to generate Rust API projections and the necessary glue code. Interop tooling that is used during builds will not read C++ source files, to maintain the principle that C++ API information is only located in headers, and that a C++ library can't break the build of its dependencies by changing source files.

Some interop-adjacent tools (e.g., large-scale refactoring tools that seed the initial set of lifetime annotations) will also read C++ sources. These tools will not be used during builds.

Pros

  • Minimal barrier to entry: minimal amount of manual work is required to start using a C++ library from Rust.
    • Encourages leaf projects to start incrementally adopting Rust in new code, or incrementally rewriting C++ targets in Rust.
  • C++ API information is located only in headers, regardless of the language that the API consumer is written in (C++ or Rust).
  • Interop tooling that generates Rust API projections from a C++ header can get exactly the same information that the C++ compiler has when processing a translation unit that uses one of the APIs declared within that header.
    • Interop tooling can generate the most performant calls to C++ APIs, without C++-side thunks that translate the C++ ABI into a C ABI.
    • Interop tooling can autodetect implementation details that are critical for interop but are not a part of the API surface (for example, the size and alignment of C++ classes that have private data members).
    • In alternative solutions, users need to repeat these implementation details in sidecar files. Interop can verify that the specified information is correct through static assertions in generated C++ code, but the overall user experience is inferior.

Cons

  • Having to read C++ headers makes interop tooling more complex.
  • The Rust projection of the C++ API is only visible in machine-generated files.
    • These are not trivially accessible.
    • There is a limit on how readable these files can be made.
    • We can mitigate these issues by building tooling that shows the Rust view of a C++ header (for example in Code Search, or in editors as an alternative go-to-definition target).

Customizability

Interop tooling will be sufficiently customizable to accommodate the unique needs of different C++ libraries in the codebase. Interop should be customizable enough to accommodate existing codebases. C++ API owners can:

  • Guide how interop tooling generates Rust API projections from C++ headers. For example, headers can provide:
    • Custom Rust names for C++ function overloads (instead of applying the general interop strategy for function overloads),
    • Custom Rust names for overloaded C++ operators,
    • Custom Rust lifetimes for pointers and references mentioned in the C++ API,
    • Nullability information for pointers in the C++ API,
    • Assertions (verified at compile time) and promises (not verified by tooling) that certain C++ types are trivially relocatable.
  • Provide custom logic to bridge types, for example, mapping C++ absl::StatusOr to Rust Result.
  • Provide API overlays that improve the automatically generated Rust API.
    • For example, the overlays could inject additional methods into automatically generated Rust types or hide some of the generated methods.

More intrusive customization techniques will be useful for template and macro-heavy libraries where the baseline import rules just won't work. We believe customizability will be an essential enabler for providing high-fidelity interop.

Source of additional information that customizes C++ API projection into Rust

Where C++ headers don't already provide all information necessary for interop tooling to generate a Rust API projection, we will add such information to C++ headers whenever possible. If it is not desirable to edit a certain C++ header, extra information can be stored in a sidecar file.

Examples of additional information that interop tooling will need:

  • Nullability annotations. C++ APIs often expose pointers that are documented or assumed by convention to be never null, but can't be refactored to references due to language limitations (for example, std::vector<MyProtobuf *>). If C++ headers don't provide nullability information for pointers in a machine-readable form, interop tooling has to conservatively mark all C++ pointers as nullable in the Rust API projection. The Rust compiler will then force users to write unnecessary (and untestable) null checks.
  • Lifetimes of references and pointers in C++ headers are not described in a machine-readable way (and sometimes are not even documented in prose). Lifetime information is essential to generate safe and idiomatic Rust APIs from C++ headers.

Additional information is stored in C++ headers

Pros

  • Additional information needed for C++/Rust interop will be expressed as annotations on existing syntactic elements in C++.
    • The annotations are located in the most logical place.
    • The annotations are more likely to be noticed and updated by C++ API owners.
    • API owners retain full control over how the API looks in Rust.
  • C++ users may find lifetime and nullability annotations useful. For example, information about lifetimes is highly important to C++ and Rust users alike.
  • C++ API definitions are only written once, minimizing duplication and maintenance burden.

Cons

  • Annotations that benefit Rust users can bother C++ API owners who don't care about Rust. Especially at the beginning of integrating Rust into an existing codebase, C++ API owners can push back on adding annotations.
    • To encourage adoption of annotations, we can develop tooling for C++ that uses lifetime and nullability annotations to find bugs in C++ code.
    • The pushback is likely to be short-term: if Rust takes off in a C++ codebase, C++ library owners in that codebase will need to care about Rust users and how their API looks in Rust.
  • There may be headers that we cannot (or would not want to) change, for example, headers in third-party code, headers that are open-sourced, or when first-party owners are not cooperating.

Additional information is stored in sidecar files

Additional information needed for C++/Rust interop can be stored in sidecar files, similarly to Swift APINotes, CLIF etc. If sidecar files get sufficiently broad adoption (for example, if annotating third-party code turns out to be sufficiently important that optimizing C++/Rust interop ergonomics there would be worth it), it would make sense to write sidecar files in a Rust-like language, as that provides the most natural way to define Rust APIs.

Pros

  • Sidecar files enable more broad adoption of annotations by providing additional interop information without modifying C++ headers. Sidecar files will allow us to annotate headers in third-party code, headers that can't adopt annotations for technical reasons, or headers owned by first-party owners who are not cooperating.

Cons

  • Like in the Use Rust code to customize API projection into Rust alternative, some part of C++ API information is duplicated, which is a burden for the C++ API owners.
  • The projection of C++ APIs to Rust is defined in a new language.
    • C++ API owners and Rust users will have to learn this language.
    • If we expect wide adoption of sidecar files, we will need to create tooling to parse, edit, and run LSCs against this language.
  • Annotations in sidecar files are more prone to become out of sync with the C++ code. When making changes to C++ code, engineers are less likely to notice and update the annotations in sidecar files.
    • Presubmits can catch some cases of desynchronization between C++ headers and sidecar filles. However, presubmit errors that remind engineers to edit more files create an inferior user experience.
  • Sidecar files create extra friction to modify the code. Where previously one had to edit only a C++ header and a C++ source file, now one also likely needs to update a sidecar file.
    • When engineers realize that they need to update a sidecar file, opening another file and finding the right place to update creates extra friction to modify code.
    • Once engineers understand the extra maintenance burden associated with sidecar files that tend to go out of sync with headers, they will be less likely to adopt annotations in the first place.

Glue code generation

C++/Rust interop tooling will generate executable glue code and type definitions in Rust and in C++ (not just merely extern "C" function declarations) in order to achieve the following goals:

  • Enable instantiating C++ templates from Rust, and monomorphizing Rust generics from C++. Enable Rust types to participate in C++ inheritance hierarchies.
    • For example, imagine Rust code using an object of type std::vector<MyProtobuf>, while C++ code in the same program is never instantiating this type. The Bazel rust_library target that mentions this type must therefore be responsible for instantiating this template and linking the resulting executable code into the final program. We propose that this instantiation happens in an automatically generated "glue" C++ translation unit that is a part of that rust_library.
  • Enable automatically wrapping C++ code to be more ergonomic in Rust. For example:
    • extern "C" functions in Rust are necessarily unsafe (it is a language rule). We would like the vast majority of C++ API projections into Rust to be safe. In the current Rust language, we can achieve that only by wrapping the unsafe extern "C" function in a safe function marked with #[inline(always)].
    • C++ API owners can provide rules for automatic type bridging, for example, mapping C++ absl::StatusOr to Rust Result. This conversion necessitates generation of a Rust wrapper function around a C++ entry point that takes advantage of such type bridging.
  • Provide stable locations (C++ modules, Rust crates) that "own" the types from the language point of view.
    • For example, when we project a C++ type into Rust, its Rust definition must be located in a Rust crate. Furthermore, all Rust users of this type must observe it as being defined in the same crate in order for every users to consider that they use the same type. Indeed, this is a rule in Rust, that types defined in different crates are unrelated types.
    • When we project a Rust type into C++ we could repeat its C++ definition in C++ code any number of times (for example, in every C++ user of a Rust type). This is technically fine because C++ allows the same type to be defined multiple types within a program. Nevertheless, such duplication is error-prone.

Glue code is generated as C++ and Rust source code

Interop tooling will generate glue code as C++ and Rust source files, which are then compiled with an unmodified compiler for that language. The alternative is to generate LLVM IR or object files with machine code directly from interop tooling.

Pros

  • It is easy to inject customizations provided by API owners into generated source code.
    • The customizations will be written in the target language, making it (hopefully) intuitive to write them.
  • Generated source code can be easily inspected by compiler engineers while debugging interop problems and compiler bugs.
  • Generated source code can be inspected and understood by interop users, who are not compiler experts.
    • LLVM IR wouldn't be meaningful to them.
  • Generated source code is processed by the regular toolchain like any other code in the project.
    • It automatically benefits from all performance optimizations and sanitizers that are newly implemented in Clang and Rust compilers.
  • We avoid adding a new tool that generates unique LLVM IR patterns.
    • We avoid making the job of the C++ toolchain maintainers harder.

Cons

  • Interop tooling will be limited to generating LLVM IR and machine code that Clang and Rust compilers can generate.

Glue code and API projections will assume implementation details of the target execution environment

To provide the most ergonomic and performant interop, C++/Rust interop tooling will allow the target codebase to opt into assuming various implementation details of the target execution environment. For example:

  • When calling C++ from Rust, interop tooling can either wrap C++ functions in thunks with a C calling convention, or call C++ entry points directly. Thunks cause code bloat and can collectively add up to become a performance problem, so it is desirable to call C++ entry points from Rust directly. Interop tooling can do that only if it may assume a specific target platform and C++ ABI.

Implementation details of the target execution environment that are considered stable enough will be reflected in API projections, for example:

  • The C++ standard does not specify sizes of integer types (short, int, long etc.) To map them to Rust, interop tooling will need to assume a size that they have on the platform that targets in practice. The alternative would be to create target-agnostic integer types (for example, Int in Swift is a strong typedef for Int32 on 32-bit targets, and Int64 on 64-bit targets), but this makes it harder to provide idiomatic, transparent, high-performance interop.
  • The C++ standard does not specify whether standard library types like std::vector are trivially relocatable; it is an implementation detail. Universal interop tooling would have to conservatively assume non-trivially-relocatable types. Interop tooling specific to certain environments can rely on libc++ providing a trivially-relocatable std::vector and project it into Rust in a much more ergonomic way.

Pros

  • Interop tooling will generate the most performant code sequences to call foreign language functions.
    • If interop tooling generates portable code, it would have some overhead. The overhead can be eliminated by C++ and Rust optimizers at least in some cases, but at the cost of increased build times. For example, eliminating thunks would require turning on LTO, which is not fast, and usually only used for release builds. It is much preferable to not generate thunks in the first place, if the target platform does not need them.
  • Ergonomics of API projections will be improved.
    • For example, whether a C++ type is trivially relocatable or not is an implementation detail in C++, transparent to C++ users of that type, but it makes a huge ergonomic difference in the Rust API projection.

Cons

  • C++ code will have additional evolution constraints.
    • For example, changing a type from trivially relocatable to non-trivially relocatable is a non-API-breaking change for C++ users, but it would break Rust users.
  • It would be more difficult to switch internal environments to a different C++ standard library.
  • Code that is deployed in environments that have incompatible implementation details won't be able to use this C++/Rust interop system.
    • Alternatively, these executables would have to bring a suitable execution environment with them (e.g., a copy of libc++).

Interop tooling should be maintainable and evolvable for a long time

We should design and implement C++/Rust interop tooling in such a way that we can maintain and evolve it for more than a decade. If Rust becomes tightly integrated into an existing C++ project, specific requirements for interop and API projection rules will keep changing. The more Rust adoption we will have, the more library and team-specific interop customizations we will have to support, and the more it will make sense for the performance team to tweak generated code to implement sweeping optimizations. These kinds of changes should be readily possible, and they should not create conflicts of interest between diferent users of the interop tooling.

Interop tooling should facilitate C++ to Rust migration

C++/Rust interop tooling should try to create a favorable environment for migrating C++ code to Rust. Specifically, projections of C++ APIs into Rust should be implementable in Rust. This way, a C++ library can be converted from C++ into Rust transparently for its users, as its public API won't change.

Alternatives Considered: Design decisions

Repeat C++ API completely in a separate IDL

Instead of reading C++ headers in the interop tooling, we would require the user to repeat the C++ API in some other form, for example, in a Rust-based IDL like in the cxx crate, or in sidecar files in a completely new format.

Pros

  • Interop tooling can be simpler if it does not have to read C++ headers. But even under this alternative approach, tooling might want to read C++ headers, nullifying this advantage. For example, tooling might want to automatically generate an initial Rust snippet or to suggest in presubmits to adjust the Rust code that mirrors a C++ API when that C++ API changes.
  • The most natural way to define Rust APIs is by using Rust code or Rust-like syntax in sidecar files.
  • Available Rust APIs are defined in easily accessible checked-in files.
  • API definitions written by a human might have higher quality, on average.

Cons

  • A big part of the C++ API needs to be duplicated to reliably match the Rust code with the C++ declarations. The initial code can be generated by tooling, but it has to be kept in sync. This is a burden for the C++ API owners, potentially a bigger one than allowing annotations in C++ headers.
    • There is a risk that C++ API owners might refuse to own IDL files.
  • The need to create a sidecar file creates a barrier to start using C++ libraries from Rust.
    • While the duplication overhead is justifiable for widely-used libraries, it is relatively high for libraries with few users and binaries, making it less likely that leaf teams will start adopting Rust.
  • When the C++ API is changed, the Rust definitions become out-of-sync with it. Tooling needs to detect this, and the Rust definitions need to be changed (either manually or tool-assisted).
  • There is no effective way to verify Rust binding code at the presubmit time of a C++ library other than building downstream projects.
  • Mapping Rust API definitions to the original C++ API definitions is more complicated and error-prone. For example, how would we target a specific overload of a function or constructor?
  • There is a risk that individual teams will build team-specific tooling that generates IDL files from C++ headers or generates both IDL files and C++ headers from a single source. These solutions are unlikely to scale to existing large codebases and will likely only work for that specific team.

Use Rust code to customize API projection into Rust

An alternative to storing additional information in C++ headers is to put it into Rust code. For example, the cxx crate requires users to re-state the C++ API in Rust syntax, adding information about lifetimes and nullability. The pros and cons of this choice are the same as when defining a special IDL that repeats the C++ API completely (see above).

Generate glue code in binary formats

Instead of generating glue code as textual sources, interop tooling could use Clang and LLVM APIs to emit object files with C++ glue code and use Rust compiler APIs to generate rmeta and rlib files with Rust glue code.

Pros

  • More flexibility in the code that can be generated. Controlling LLVM IR generation allows interop tooling to generate code that an unmodified compiler can't generate from textual source code. For example, the Rust language does not have any constructs that map to linkonce_odr functions in LLVM IR; if the interop tooling embedded the Rust compiler as a library and had more control over how it generates the IR, we could make that happen.

Cons

  • Injecting customizations provided by API owners is harder.
  • LLVM, Clang, and Rust compiler APIs are not stable. The format of Rust metadata files is not stable either. The larger the API subset we consume from Clang and Rust, the more difficult it becomes to maintain the tooling.
  • To generate object files the interop tooling has to ensure that its Clang/LLVM version and configuration is identical with the Clang compiler used to build other C++ code.
    • We can solve this problem, but it makes the system more fragile, compared to using existing C++ and Rust compilers to compile generated sources.
  • From time to time LLVM introduces bugs that cause miscompilations. If interop tooling embeds LLVM, we would be adding another tool that toolchain engineers will need to look into when debugging a miscompilation. We would be making the job of C++ toolchain maintainers harder.

Alternatives Considered: Existing tools

bindgen

bindgen automatically generates Rust bindings from C and C++ headers, which it consumes using libclang. The generated bindings are pure Rust code that interfaces with C and C++ using Rust’s built-in FFI for C (#[repr(C)] to indicate that a struct should use C memory layout and extern "C" to indicate that a function should use a C calling convention). C++ functions are handled by generating a Rust extern "C" function that has the same ABI as the C++ function and attaching a link_name attribute with the mangled name.

See here for an in-depth description of the use of bindgen in Stylo, a Rust component in Firefox.

Pros

  • The oldest and the most mature of the existing C++ interop tools (developed since Feb 2012).

Cons

  • Deficiencies in safety and ergonomics, for example:
    • References are imported as pointers. No lifetimes, no null-safety.
    • Constructors and destructors are not called automatically.
    • Overloads are distinguished by a numbered suffix in Rust. These numbers clutter the source code and are hard to remember, as they have no meaning. Adding overloads can change the numbering and hence break Rust callers.
  • It is impossible to use C++ inline functions and templates from Rust because of bindgen’s architecture1. The architecture is unlikely to change, and therefore, this is a dealbreaker.

Evaluation

bindgen could be used in a project that has very limited C++ interop needs. However, creating safe and ergonomic wrappers for the generated bindings would require additional effort. Our vision and goals for C++ interop are very different from what bindgen provides.

cbindgen

cbindgen automatically generates C or C++ headers for Rust libraries which expose a public C API.

Pros

Cons

  • Shallow understanding of Rust's modules and types.

    • cbindgen's docs point out that "A major limitation of cbindgen is that it does not understand Rust's module system or namespacing. This means that if cbindgen sees that it needs the definition for MyType and there exists two things in your project with the type name MyType, it won't know what to do. Currently, cbindgen's behaviour is unspecified if this happens."
    • This limitation seems mostly caused by building cbindgen on top of the syn crate. syn is able to parse Rust source code into an AST, but there is no facility at the syn level for type deduction or module traversal. Building such functionality would require replicating parts of the rustc compiler into cbindgen, or alternatively rewriting cbindgen on top of the rustc_driver crate).
  • Support of only extern "C" functions.

    • Supporting Rust functions that use the default calling convention would require generating not only C/C++ headers, but also generating Rust source with extern "C" thunks that trampoline into the original function (requiring that cbindgen starts generating Rust sources).
  • Support of only #[repr(C)] structs.

    • Default memory layout of Rust structs is unspecified and therefore cannot be determined by code examination at the syn level.
    • Even if the memory layout could be determined, the layout can change in a future compiler version, or change depending on compilation command line flags. To prevent using stale layout information, the auto-generated FFI code should therefore include compile-time assertions that the layout didn't change from the FFI generation time. The assertions should be present both in the generated C/C++ headers and on the Rust side (requiring that cbindgen starts generating Rust sources). The assertions would effectively verify that the FFI generation is driven by the build system (i.e. by Bazel, or Cargo, or GN/ninja, rather than manually) and that the integration between the FFI tools and the build system doesn't have any bugs (e.g. that it faithfully replicates all relevent compilation flags).

Evaluation

cbindgen could be used in a project that can create a narrow extern "C" / #[repr(C)] API and that is ready to manage the risk of incorrect name/module resolution. Wrapping additional Rust APIs would require extra effort.

Take-aways for Crubit design

Notes and observations about cbindgen can guide some design aspects of Crubit's cc_bindings_from_rs tool (that similarly to cbindgen generates C++ bindings for Rust crates). Using internal compiler knowledge (e.g. memory layout of structs, name and type resolution) requires that cc_bindings_from_rs depends on rustc_driver and other internal crates of rustc. The API of these crates is unstable which might increase the risk and maintenance cost of Crubit. Nevertheless, our experience with maintaining tools based on (also unstable) Clang APIs suggests that this extra risk and cost is likely going to be acceptable.

Build determinism requires that the Rust compiler produces the same output for the same set of inputs (the same compiler version, the same command-line flags, the same sources, etc.). This means that (despite conservative reservations about layout determinism) it should be okay to assume that cc_bindings_from_rs and rustc invocations will observe the same memory layout of structs, but this requires that cc_bindings_from_rs is built against exactly the same version of rustc_driver libraries as rustc. (This should also be reinforced by compile-time assertions in the generated FFI layer.)

cxx

cxx generates Rust bindings for C++ APIs and vice versa from an interface definition language (IDL) included inline in Rust source code. cxx generates Rust and C++ source code from IDL definitions. To check that the IDL definitions match the actual C++ API, cxx inserts static assertions2 into the generated C++ code; it does not, however, read the C++ headers itself. cxx contains built-in bindings for various Rust and C++ standard library types that are not customizable.

As far as we understand, cxx has the following design constraints and goals:

  • Ship a stable product for its intended audience.
    • As a consequence, improvements such as integrating move semantics are not going to be accepted soon. We understand that cxx is not a vehicle for experimentation. cxx maintainers would prefer us to first show that our ideas work in a fork of cxx or in a different system, such as autocxx, and that our improvements pull their weight given the added complexity.
  • Remain simple and transparent. There is a limit on the amount of complexity that will be tolerated.
    • There is a chance that improvements such as modeling C++ move semantics or various attempts at eliminating thunks will not be ever accepted in upstream cxx.
  • Non-goal: Automatically provide high fidelity interop.
    • cxx is designed for the use case of an executable where C++ and Rust parts communicate through a narrow interface.
  • Non-goal: Automatically provide the most performant interop in as many cases as possible. For example:
    • cxx does not attempt to eliminate C++-side thunks. Instead, using LTO is recommended.
    • cxx considers it acceptable to allocate all objects of "opaque" types on the heap. Users who find these heap allocations unacceptable for performance reasons are expected to implement a different C++ entry point that does not hit this limitation and bind it to Rust instead of the original C++ API. Heap allocation is acceptable for many C++ classes in most environments, but the exceptions are important enough for us that this is a major restriction.

Pros

  • Mature and ergonomic enough today for mixing C++ and Rust in existing codebases with limited C++ interop needs.
  • We avoid being on a tech island.

Cons

  • cxx’s stability goal makes it hard to experiment with how the Rust API looks.
  • Our goals are unlikely to align well with the goals of the intended user audience of cxx. We would be pulling cxx in directions that make it a worse product for its current users.
  • Almost no customizability. Users who are not satisfied with what cxx does are expected to wrap the target C++ API in a different C++ API that is more friendly to cxx.
  • cxx tries to be compatible with most standard C++ implementations found in the real world, so it cannot take advantage of unique guarantees provided by the target execution environment.

Evaluation

cxx could be used in projects with limited C++/Rust interop requirements. However, we would not be able to implement many interop features that we consider essential (for example, move semantics, templates).

autocxx

autocxx automatically generates Rust bindings from C++ headers. As the name implies, it automatically generates IDL definitions for cxx, which then produces the actual bindings. In addition, autocxx generates its own Rust and C++ code to extend the Rust API beyond what cxx itself would provide, for example to support passing POD types by value. autocxx consumes C++ headers indirectly by first running bindgen on them and then parsing the Rust code output by bindgen.

autocxx’s design goals are similar to our own in this document.

We did a case study on using an existing project's C++ API from Rust using autocxx.

Pros

  • Low barrier to entry: Bindings are generated from C++ headers, no need to write duplicate API definitions.
  • Ergonomic mappings for many C++ constructs.
  • Open to contributions that change the generated Rust APIs or make architectural changes.

Cons

  • Relatively new and immature.
  • Cannot (yet) consume complex headers without errors. We’ve managed to import some actual Spanner headers, but there are still enough outstanding issues that we can’t yet do anything useful with Spanner.
  • Architecture can make modifications difficult. autocxx is built on top of two other tools, bindgen and cxx, and the interfaces between these components can make it harder to make a modification than it would be in a monolithic tool. Specifically:
    • autocxx uses bindgen to generate a description of the C++ API that it can parse easily (as opposed to trying to parse C++ headers either directly or using Clang APIs). Since bindgen was not intended for this purpose, its output lacks some information that autocxx needs, so autocxx has forked bindgen to adapt it to its needs. The forked version emits additional information about the C++ API in the form of attributes attached to various API elements.
    • bindgen in turn is built on the libclang API, which doesn’t surface all of the functionality available through Clang’s C++ API. Adding features to libclang requires additional effort and has a 6 month lead time to appear in a stable release (to become eligible to be used from bindgen).
    • When errors occur, it can be hard to figure out which of the components is responsible.
    • Adding features can require touching multiple components, which requires commits to multiple repositories.

Evaluation

We initially intended to use autocxx to prototype various interop ideas and potentially as a basis for a field trial. We still believe this would be feasible, but after trying to modify autocxx and its bindgen fork during an internal C++/Rust interop study, we feel that autocxx’s complex architecture is enough of an impediment that we could achieve our goals with less total effort by creating an interop tool from scratch that consists of a single codebase and uses the Clang C++ API to directly interface with Clang.

Footnotes

  1. Doing so would require either generating C++ source code or interfacing deeply enough with Clang to generate object code for inline functions and template instantiation.

  2. And tricks such as suitable type conversions that force the C++ compiler to perform appropriate checks at compile time.