-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[major] executor/clean: Adding per-package cleaning, linked develspaces, and a new execution pipeline #196
Conversation
I would have preferred doing this separately. It makes it really difficult to review things when you've mixed several large, somewhat unrelated, changes together. In the mean time you could have just imported what you needed from the
I must have missed that road map 😛. |
The changes aren't hard to separate, and I can do that when this is ready for an in-depth review. The only reason why they're all together is that it was the quickest way to figure out what needed to move in order to make the clean functionality happen.
Meaning this is the last big feature that would enable me to spend as little time compiling code in my day-to-day work as I did with |
That makes sense; I get that it is easier to develop like this. Like you said, we can split it up when it's time for review. |
_NOTE: This information is out of date, please see the PR description for updated details._
|
4e8d0d0
to
74148f8
Compare
clean support
clean support
clean support
catkin clean
support
@wjwwood Check out the current design details in the updated PR description. I think it's actually converging to something that will work well in practice. |
+1 i've always wanted this feature. i haven't looked at the actual code, though |
Thanks for working on this @jbohren, it's on my list to review, and I'll get to it asap. |
catkin clean
supportcatkin clean
support via "linked" devel spaces
One caveat about using It's not a deal-breaker, and some people might not even notice, but it's worth keeping in mind. |
This PR does in fact clean empty folders: |
Nice! |
39a14ac
to
03da6ec
Compare
@wjwwood I've integrated my experimental asyncio hacking into this branch. This still includes the per-package clean support and symlinked devel support. I still need to flesh out the rest of the logging, remove the old execution stuff, update the cli interface, and update both the PR design details and the catkin_tools documentation, but the tests are passing now. |
@jbohren I am testing this pull request because I want to be able to remove individual packages without rebuilding my whole work space. After switching to your branch and running
Thoughts? Thanks! |
@davetcoleman you must install Trollius (it is a Python2.7 compatible implementation of the Python3's asyncio). This should get you going:
|
Thanks. Now I get:
It looks like osrf_pycommon is only available from source? I tried using a similar command to the one your just provided. |
Yeah, you'll have to install that from source for now: https://osrf-pycommon.readthedocs.org/en/latest/#installing-from-source |
Yes, its only from source AFAIK. Use pip from github: On Sat, Dec 5, 2015, 11:18 AM Dave Coleman [email protected] wrote:
|
Or that will work too. On Sat, Dec 5, 2015, 11:24 AM Jeff Eberl [email protected] wrote:
|
context=context, | ||
env_file_path=env_file_path)) | ||
|
||
# Only use it for building if the develspace is isolated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbohren Can you explain the motivation here? This breaks building from scratch for me, e.g.
# Clean workspace (just src folder)
. /opt/ros/indigo/setup.bash
catkin build
This fails for packages including message headers built in other packages, since devel/include
will not be included in the header search path. Indeed, how should a package know about the shared devel space if not through the env file?
After the first build fails, I can source the then-available devel/setup.bash
and a second build will succeed.
If I use env_prefix = [env_file_path]
unconditionally here, the devel/include
is added to CPATH
and everything works on the first build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xqms Yeah, this is a bug in the current PR. The original intention was to avoid re-sourcing the same setup files, but I left this in by mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, you report a great speedup when avoiding re-sourcing the env.sh
files above. Can we somehow keep that speedup? Maybe source one env.sh at the beginning of the build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can definitely avoid sourcing it with every command by only sourcing it once per package and saving the environment. We can also go a step further and bypass the environment files by sourcing the workspace setup file directl. The ultimate would be to only source it once per build and then re-source it only when the catkin env hooks change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For everyones reference: I have some minor tweaks to this PR here: jbohren-forks#17 |
@jbohren, in the |
@NikolausDemmel Yeah, the interface should definitely be more enum-like. Currently there is no "precedence" they are all just mutually exclusive. |
_NOTE: This has been superseded by #247, #249, #276, and #293._
Overview
This PR involves a dramatic refactor of the core execution pipeline for building and cleaning packages in order to implement two high-priority features:
catkin build
execution pipeline outside of thebuild
verbstderr
-based warning and error detectionstdout
fromstderr
during execution of each jobAdditionally, addresses the following outstanding issues:
-p
option is set to whatever the-j
option iscatkin
tool should always exit cleanly when ctrl-c'd due to the new use of asynciocmake
packagesIt also provides the following:
catkin_tools
executor is doing via named job "stages"--pre-clean
option has been added to thebuild
verb tomake clean
in the build directory before runningmake
--unbuilt
option has been added to thebuild
verb to build any packages in the workspace which have not been builtIt also resulted in some unexpected benefits:
It also revealed the following problems:
Per-Package Cleaning and the Linked Devel Space
This PR provides support for cleaning individual packages with the following syntax:
Where
-b
means buildspace,-d
means develspace,-i
means installspace, and-a
means all three. Other examples include:catkin clean
Examplescatkin clean
/catkin clean -a
catkin clean -af
/catkin clean --all --force
catkin clean -b
catkin clean pkg_a pkg_b
pkg_a
andpkg_b
from the build, devel, and install spacescatkin clean --deps pkg_a
pkg_a
as well as any packages which depend onpkg_a
.catkin clean --orphans
catkin clean --deinit
.catkin_tools
) directoryApproach
Doing this right is hard, but it's an important feature. I think this sort of per-package
clean
functionality is one of the the final big features preventingcatkin_tools
from being complete. In complex workspaces with hundreds of packages, debugging can be a huge challenge, and this is only made more challenging by the inability to reliably remove all of the products of a specific packages from a workspace.CMake Packages
Vanilla CMake packages are handled trivially since their
install
target is used to write files todevel
andinstall
spaces. This target creates aninstall_manifest.txt
which lists all the generated files, so to clean CMake packages, we can simply remove these files.The normal location for
install_manifest.txt
, however, is in the package's build space. This file could get overwritten if the user builds both a devel and install variant, so we copy this file into the appropriate result space for safe keeping. Also, theinstall_manifest.txt
file does not specify directories which are created bycmake
, so the CMake clean job will remove any directories which are made empty through the removal of files listed in theinstall_manifest.txt
file.Catkin Packages
Catkin packages prove more challenging for a few reasons. For Catkin packages, there are two different steps which generate files in the
devel
space: the CMake configure step, and the Make build step. It's possible to clean the files generated in the build step with the standardclean
target, but this still leaves numerous files generated by the configure step. Catkin packages also each generate some of the same files such as the setup shell scripts and each write to.catkin
file which lists all the source packages in the workspace. This all makes for a big mess to clean up.The approach used in this PR is to build each package into an isolated devel space, and then symbolically link the contents into the merged devel space. This prototype behavior is enabled by passing the
--link-devel
option tocatkin build
. Sincecatkin_tools
does the linking, it can generate adevel_manifest.txt
similar to CMake'sinstall_manifest.txt
which logs all of the files that a given package contributes to the devel space.If a file that was listed in the previous link step is no longer found in the isolated devel space, it is removed from the merged devel space.
The above handles tracking the files generated by each package, but doesn't handle collisions very well. In order to handle collisions, the symlink generation step counts the number of times a collision occurs for each file. This information is stored in a file called
devel_collisions.txt
.In summary:
DEVEL_PREFIX/.catkin_tools/PKG_NAME/linked_devel
DEVEL_PREFIX/.catkin_tools/PKG_NAME/devel_manifest.txt
DEVEL_PREFIX/.catkin_tools/devel_collisions.txt
DEVEL_PREFIX/
The clean step for a given package reads its
devel_manifest.txt
and for each file that was generated it either removes the file or decrements the count in thedevel_collisions.txt
file.Setup File Generation
One challenge with this approach, as mentioned above, is that catkin packages normally generate several files in the root of the devel space used for "sourcing" that devel space. These include:
.catkin
.rosinstall
_setup_util.py
env.sh
setup.sh
setup.bash
setup.zsh
With the exclusion of
.catkin
, these files are all generated from catkin_generate_environment.cmake during the CMake configure step of a catkin package, and several of them include information computed by existing Catkin CMake code. Since some of these reference the absolute path of the develspace, they can't simply be copied from the isolated devel spaces of the constituent packages.The simplest solution to this problem is to have a "ghost" bootstrap package called
catkin_tools_bootstrap
which gets generated the first time a given resultspace is built. This guarantees that the setup files are generated correctly by Catkin, itself.A convenient side-effect of the
catkin_tools_bootstrinstallap
package is that it also enablescatkin build
to generate setup files for a develspace consisting only of vanilla CMake packages.For the
.catkin
file, we makecatkin_tools
responsible for managing its contents. This is for two reasons:Since
catkin_tools
will be modifying this file in parallel from a single process, we can manage exclusive access to it and prevent collisions. Additionally, we can remove packages from it when they are removed from the workspace.Additional Notes
The current implementation works in Unix-based systems, but could be extended to Windows with one of these methods.
Architectural Changes
Executor
This PR completely re-implements the execution pipeline using
osrf_pycommon
and thetrollius
library (a Python2 and Python3 compatible version of Python3.4'sasyncio
. This yields an execution pipeline which can be used bycatkin clean
(and other potential verbs) and enables us to extract and displaystdout
andstderr
separately. As such, it solves #1.Execution Jobs and Stages
The new execution pipeline operates on a task model where a given "task" (build, clean, etc) can be decomposed into an acyclic "job" dependency graph, where each job is a sequence of "stages". Each job can be executed in parallel once its dependencies have been completed, and each stage within a job must be executed serially.
A
Job
is a simple container with a few non-mutating utility functions. It is designated with a unique Job ID (jid
) and has a list of thejid
s of the jobs on which it depends. Incatkin_build
, thesejid
s are normally package names.A
Stage
is some atomic operation, which contains a non-unqie label (such as "cmake", "make", or "install"). There are currently two types of stages:CmdStage
andFunStage
. TheCmdStage
describes a system command. This type is similar to the current stage implementation, except it also facilitates using asynchronous protocols for capturingstdout
andstderr
. TheFunStage
, on the other hand, describes a blocking Python function. This second stage type enables us to affect the filesystem in arbitrary ways without spawning subprocesses or relying on CMake's shell utilities.The construction of
Job
andStage
instances should have no side-effects, and are designed to be lazily evaluated only once their dependencies have been completed. Once a collection ofJob
objects have been created, they are passed directly to the executor.Executing Jobs Asynchronously
The executor is an
asyncio
coroutine which executes each job asynchronously. Jobs follow a well-defined lifecycle. At any time, a given job is on one of the following lists:By default, if one of the jobs fails, it will abandon all other active, queued, and pending jobs. This behavior can be controlled with the
continue_on_failure
andcontinue_without_deps
options:continue_on_failure
- Don't abandon all jobs if one job fails.continue_without_deps
- Don't abandon the dependencies of failed jobs.Jobs are activated subject to the availability of tokens from the job server. This happens whether or not the jobs erver is being used to manage
make
jobs since the job server treats the executor just like any other job client. Once a job is activated, each of its stages is executed in order, and if one stage fails, the job fails.Getting Feedback from Execution
Like the current executor, the new execution pipeline uses a
Queue
to asynchronously communicate with the console thread. The executor writesExecutionEvent
objects into the event queue to be processed by any kind of output controller.The default console controller has several options for output filtering and formatting:
show_stage_events
Falseshow_buffered_stdout
Falseshow_buffered_stderr
Trueshow_live_stdout
Falseshow_live_stderr
Falseshow_active_status
Trueshow_full_summary
Falseactive_status_rate
20.0Creating
catkin build
andcatkin clean
JobsJobs for building and cleaning jobs are created by factory functions defined for catkin and cmake packages. This makes the job objects task-agnostic, even if they require different types of information, as seen in the case of
catkin build
andcatkin clean
.Error Display
This PR dramatically simplifies the error output display in order to make errors clearer and less verbose. Since we have more controls to build single packages, it doesn't seem like it's necessary to have the "command to reproduce" and the numerous paths to different logfiles visible by default. On small screens, the actual error is usually even lost up above in the scrollback.
Current
This PR
Cleaning Individual Packages
This PR adds jobs for cleaning all files generated by individual catkin and CMake packages in the build and devel spaces. The "clean" jobs are different from the "build" jobs since they do not technically need to know the source path to packages. In fact, it's desirable to make them only depend on the information in the
devel_manifest.txt
files to be cleaned. This way we can support full orphan removal if a package's source is removed from the workspace.TODO
devel_manifest.txt
files into the develspace so they can be cleaned without the build directoriesinstall_manifest.txt
files into the develspace so they can be cleaned without the build directoriescatkin clean
pipeline to not require source packages to exist in the source space--link-devel
the default behavior and add an option to build with an "unlinked" merged devel space like the current behavior--unbuilt
option tocatkin build
to build any packages which have not yet been builtcmake
stage runs and completes, then job is idle until the jobserver allows themake
stage to complete)--unbuilt
cause any packages which depend on the unbuilt packages to be cleaned completely and rebuilt