Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support for rust #7260

Closed
wants to merge 38 commits into from
Closed

[WIP] Support for rust #7260

wants to merge 38 commits into from

Conversation

Robert-Steiner
Copy link

@Robert-Steiner Robert-Steiner commented Feb 19, 2019

Problem

Pants doesn't support rust.

Solution

Creating a new contributor plugin that adds support for the rust language. Since cargo stores all compiled crates (including all dependencies) in one target directory, it isn't enough to write a thin wrapper around cargo and tell pants to cache the whole target directory. Once you change a source file in one crate, it will compile and cache all crates again and not only the crate where the change was made. This plugin tries to solve this problem by creating a cache for each crate/dependency. It uses the cargo --build-plan option as an initial point which outputs all invocations that cargo would do to compile a crate.

Execution Flow

  • Task bootstrap
    • downloads the rustup shell script
    • installs rustup, rustc and cargo via the downloaded shell script
    • installs the toolchain that is specified in pants.ini
    • provides the environment variable PATH of cargo/bins as the product of this task
  • Task resolve
    • sets cargo home in pants.d/resolve/cargo/<version_dir>/cargo_home and provides the path of cargo home as the product of this task
    • fetches all workspace dependencies specified in the Cargo.toml manifest and stores them in path of cargo home
  • Task compile
    • gets the build plan for a cargo workspace via the nightly cargo option --build-plan
    • creates synthetic targets for all cargo invocations within the build plan and adds them to the build-graph
    • coverts all cargo invocations into pants invocations, so that each invocation (crate / build-script) has its own pants cache directory
    • executes all pants invocation
    • saves the compiled tests, binaries and libraries as product of this task
  • Task binary
    • copies all binaries and libraries that are created by the compile task to the pants dist folder
  • Task test
    • copies all tests that are created by the compile task to the pants.d/test/cargo/<version_dir>/ directory and executes them

Limitations

  • only supports cargo workspaces
  • requires cargo nightly (because of the usage of the option --build-plan)
  • doesn't support all cargo custom build statements (see TODO)

Building the rust pants engine

Since this plugin supports cargo workspaces you can compile the pants rust engine with it.

Steps:

  • if the environment variables PROTOC and PROTOC_INCLUDE aren't set yet, set them first help
  • change the content in ./rust-toolchain to nightly-2018-12-31
  • add the following content to the ./src/rust/engine/BUILD file
cargo_workspace(
  name='engine_rust',
  include=['src/*.rs', 'Cargo.toml'],
)
  • add the following content to the pants.ini file
[bootstrap.cargo]
toolchain: nightly-2018-12-31

Additional cargo options can be specified in the pants.ini file:

Example

[compile.cargo]
cargo_opt: ['--release']

TODO

  • add tests
  • add documentation
  • add target support for single cargo binary and library crates
  • add support for unsupported cargo target kinds and cargo lib kinds
  • add examples that show how to use this plugin
  • add support for the goals fmt, run and doc
  • improve how the toolchain path is specified
  • add support for the cargo custom build statements:
    • rerun-if-changed,
    • rerun-if-env-changed,
    • rustc-env
  • replace the member name in the cargo_workspace target with the path of the crate (the path of the Cargo.toml file)
  • add support for defining which files can be excluded

@Robert-Steiner Robert-Steiner marked this pull request as ready for review February 19, 2019 16:24
@stuhood
Copy link
Member

stuhood commented Feb 19, 2019

This is very awesome to see: thank you @Robert-Steiner!

Reviewing it will likely take considerable time, but when you believe that you have an MVP (doesn't need to be perfect, but would be unlikely to require a fundamental overhaul of BUILD files or Options values to fix), please let us know and we'll review as soon as possible!

Of the TODOs listed in the description, I think that only:

add examples that show how to use this plugin

...is likely to be a blocker for landing.

Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thank you!

I took a quick glance over this for small style and Python 3 compatibility changes. Didn't check correctness at this point, as it seems you're still iterating on this.

@stuhood
Copy link
Member

stuhood commented Feb 23, 2019

You hit one flaky test in CI: have restarted it.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a huge amount for this patch: it already looks really great, and I expect that:

  1. adding the examples
  2. applying a bit of review feedback
  3. addressing the future of --build-plan

will be sufficient to get it landable.


Regarding one of the limitations you mentioned in the description:

  • requires cargo nightly (because of the usage of the option --build-plan)

Is this feature on track to be stabilized? Is it risky to depend on it? Is it likely to be maintained?


cmd = ['cargo', 'build',
'--manifest-path', abs_manifest_path,
'--build-plan', '-Z', 'unstable-options']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a reference to the upstream issues/prs that introduced/will-stabilize this feature?

@staticmethod
def is_cargo_synthetic_binary(target):
return isinstance(target, CargoSyntheticBinary)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that the is_X methods might actually obscure things a little bit here. I would recommend inlining all of them, and waiting to reintroduce them until they are either used to represent a union of multiple parent classes... or, at that point, choosing to change the class hierarchy by introducing a mixin or other parent class instead.

register(
'--toolchain',
type=str,
default=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably have a fixed default value to encourage build determinism. For now, it sounds like it would need to be a particular nightly sometime past nightly-2018-12-31. Once the --build-plan feature has landed, could swap it to a particular stable release.

It's just a default, and would want the docs to recommend overriding it. But important to have a stable default to minimize surprises.

@Robert-Steiner
Copy link
Author

Robert-Steiner commented Feb 25, 2019

Reference to cargo's build plan generation: rust-lang/cargo#5579

Robert Steiner added 5 commits February 25, 2019 15:11
- member names and paths are automatically resolved by interpreting the workspace manifest
- the list of member names in the BUILD file is no longer necessary
@Robert-Steiner
Copy link
Author

Thanks @stuhood for your feedback.
I will apply your feedback as soon as possible.

Before this pr is ready I would like to complete two tasks. First, I would like to add support for single cargo binary/library crates, so a user doesn't have to create a workspace to build a single crate.
Second I want to finish the integration of the custom build statements rerun-if-changed and rerun-if-env-changed. With rerun-if-changed cargo tracks files for modifications and decides on that if a crate should be re-compiled or not. I see that as quite crucial because without that functionality it can lead to invalid builds.

- binary and library targets aren’t enabled yet
- refactored target class hierarchy
@Robert-Steiner
Copy link
Author

@stuhood is there a way to define a 3rd party library that is only used by this plugin? The CI is failing because I use the toml library in my cargo_workspace target. I defined the library in 3rdparty/python/requirements.txt. The library will be installed during the bootstrap process but it isn't included in the pants.pex binary. If I add the library as a dependency in src/python/pants/BUILD, the library will be included.

- the toolchain can be defined in `pants.ini`

Example

```
[bootstrap.cargo]
toolchain: nightly-2018-12-31
```

- the value can be a name or a path to a rust-toolchain file
- the default value is `nightly-2018-12-31`
- removed the toolchain path in the cargo_workspace target
@stuhood
Copy link
Member

stuhood commented Feb 27, 2019

Adding it to 3rdparty/python/requirements.txt is definitely necessary, and then declaring the dep somewhere in the build graph is necessary as well. But most likely you want the dep declared in contrib/rust/src/python/pants/contrib/rust/targets/BUILD rather than src/python/pants/BUILD?

Robert Steiner added 2 commits February 28, 2019 13:30
- added examples for binary, library and workspace targets
@Robert-Steiner
Copy link
Author

Right, I've already added the dependency in contrib/rust/src/python/pants/contrib/rust/targets/BUILD, but it won't be included in the binary. I think it has something to do with how the binary is build. pants_local_binary has only checks as a dependency and therefore the toml dependency isn't included in the build graph. Is src/python/pants/bin/BUILD a good place to declare dependency?

@stuhood
Copy link
Member

stuhood commented Feb 28, 2019

pants_local_binary has only checks as a dependency and therefore the toml dependency isn't included in the build graph.

Gotcha. Yea, there are at least 3 execution modes for pants (from-source in a venv, a directly-built local pex, wheels-into-pex, etc)... I believe that under the "local pex" mode, it does make sense to actually declare the dep on the rust contrib module.

I'll go ahead and make this edit to save you some CI time.

stuhood and others added 16 commits February 28, 2019 10:31
- removed duplicate log message
- switched default toolchain to explicit toolchain to have the highest precedence
- CI tests failed because the `rust-toolchain` file overwrote the default toolchain
- [More information about the toolchain override precedence](https://github.com/rust-lang/rustup.rs#override-precedence)
- made the fingerprint of the rustup install script configurable via  the option `--script_fingerprint`
- the rustup install script is only downloaded once and stored in `<versioned_workdir>/rustup_install_script/rustup.sh`
- the task tries to find the executable of rustup in the default location `~/.cargo/bin` if the task can’t find the executable via the `PATH` variable
- fixed AttributeError for self.LAST_KNOWN_FINGERPRINT
To build always valid binaries, the compile task implements a simple solution to support `cargo:rerun-if-env-changed` and `cargo:rerun-if-changed` statements.
If a build script target contains a `cargo:rerun-if-env-changed` or `cargo:rerun-if-changed` statement, the compile task marks that target and its dependent targets as invalid. The task itself doesn’t actually check if an environment variable or a file has changed.
- to have no duplicate paths in make_dirs and make_sym_links
- Cancel the execution of pants if the execution of a subprocess call in the tasks `fetch`, `build` or `bootstrap` fails.
- Show `std_out` and `std_err` if the execution of a build script fails.

- fixed a bug in program_rule (multiple targets may depend on a build script)
- libraries of the kind `rlib`, `dylib`, and `staticlib` are now supported
- removed the `CargoSyntheticProcMacro` target
- cargo libraries of the kind `proc-marco` are now treated as `CargoSyntheticLibrary`
- replaced the workunit outcome with the return code
- removed the logging of the workunit outcome
- added the env variables of a test invocation to the `rust_test` product (if the `DYLD_LIBRARY_PATH` is missing, the proc-macro test will fail)
- added build, fetch integration tests
- added build, workspace tests
- refactored basic, custom build rules to reduce code duplications
- added build_flags fingerprint strategy (additional cargo options such as `release` or `test` are now taken into account in the fingerprint)
- added additional examples
- removed cargo lock files in rust examples
- changed dist path of libraries and binaries from `dist/lib`, `dist/bin` to `dist/rust/lib`, `dist/rust/bin`
- the way in which the fingerprint of a crate is calculated has been changed to prevent crates from being compiled multiple times
@stuhood
Copy link
Member

stuhood commented Apr 1, 2019

Very excited for this work! Please let us know when you're ready for another round of review.

@Robert-Steiner
Copy link
Author

@stuhood I think the plugin is now in good state to be reviewed. I will add the README for this plugin next week.

@stuhood
Copy link
Member

stuhood commented Apr 5, 2019

Thank you! I'll try to take a look this weekend.

@stuhood
Copy link
Member

stuhood commented Apr 8, 2019

Hey @Robert-Steiner : really sorry: I didn't get time to look this weekend. Will try again tomorrow night. Thank you again for the patch!

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Robert.

There is a huge amount to absorb here... I think that it is on the right track, but if possible, it would be great to defer any more-advanced cargo usage to a followup PR and focus on getting some basic documentation in place.

I'm very heartened by the thorough testing, so I think that once the docs are in place we should be able to look at landing this.

@classmethod
def register_options(cls, register):
super(Build, cls).register_options(register)
register(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these options affect the identity of the output (they probably do), they should be marked fingerprint=True.

from pants.contrib.rust.tasks.cargo_task import CargoTask


class Workspace(CargoTask):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this class is intended to be abstract, it would be good to have it subclass AbstractClass as well:

from pants.util.meta import AbstractClass
...
class NativeCompile(NativeTask, AbstractClass):

def get_member_sources_files(self, member_definition):
_, path, include_sources = member_definition
rglobs = RGlobs.to_filespec(include_sources, root=path)
path_globs = [PathGlobsAndRoot(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To capture a Snapshot at the buildroot, you should be able to skip the AndRoot part here and request something like:

snapshot = self.context._scheduler.product_request(Snapshot, [PathGlobs(tuple(rglobs['globs']))])

...which has the advantage of being fully cachable.

':program_rule',
]
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a huge amount to absorb in this PR... would it be possible to break out some of the more advanced features of the cargo support into a second PR? I'm commenting here just because I'm not sure I recognize "custom build invocations" (...are they build.rs scripts?)

get_test_target_information)


def get_default_conversion_rules():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand the pattern behind the rules that are exposed by basic_invocation and custom_invocation.

Are they intended to have the same shape (similar to two implementations of a trait/interface)? If so, should something enforce/signal that (ie: should they be in an interface)? If not, some comments to explain their composition would be helpful.

And did they start out as @rules as in https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/README.md ? If so, would be interested to hear if you ran into trouble using that model, as we're still iterating on it, and feedback would be welcome.

@Eric-Arellano
Copy link
Contributor

Closing due to being stale. This is a great change, though, and we appreciate all the work you put into this @Robert-Steiner!

For what it's worth, @cosmicexplorer has been drafting adding support for Rust through Pants' new V2 engine. @gshuflin is also looking into it. This PR will be quite helpful to both of them for sketching out what the API should look like.

Please let us know also if you'd have any interest in contributing to that project :) we'd love the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants