Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manage external source trees as Git submodules #2487

Merged
merged 11 commits into from
Jun 15, 2016
Merged

Manage external source trees as Git submodules #2487

merged 11 commits into from
Jun 15, 2016

Conversation

bradking
Copy link
Contributor

@bradking bradking commented May 31, 2016

Drake's top-level CMakeLists.txt file has a table of Git repository URLs and commit SHA-1s for the external source trees. These two pieces of information are exactly what Git submodules record (in a .gitmodules file and a tree-object entry, respectively). Drake also tells CMake's ExternalProject module to create build rules that clone the Git repositories into subdirectories within its own source tree. This is exactly what Git submodules are meant to do.

Drake's external source trees are actually part of its overall source tree from the perspective of local development and build operations. This is why we place them in subdirectories of the main source tree. They just happen to be externally maintained and version-controlled separately. Using the build system to manage parts of the source tree is sub-optimal:

  • We need a kludge like the AUTO_UPDATE_EXTERNALS option to help developers update their external source trees after performing git pull on the main source tree while still providing a way to do local development in the external source trees without erasing the changes on the next build.
  • We need to .gitignore the externals/* directories (and a few others) since they are placed in the source tree but not managed by Git as part of the main source tree. This means that top-level operations like git status only see the main source tree rather than the whole source tree.

A solution to these problems is to use Git to manage the whole source tree by using Git submodules to manage the external source trees underneath our main source tree. The only involvement we need from the build system is to determine which external source trees need to be downloaded for the chosen build options. After the needed external source trees (submodules) have been downloaded (cloned) all further source tree status and update operations can be managed cleanly by the version control tool (Git).


This change is Reviewable

@bradking
Copy link
Contributor Author

Cc: @david-german-tri

bradking added 9 commits May 31, 2016 14:55
If an example directory exists but does not have a CMakeLists.txt file
then assume it is not complete and ignore it because add_subdirectory
will fail anyway.
This makes `git status` in the nlopt source tree report a clean tree by
ignoring an executable that appears in its source tree when building
with `octave` installed.
The DOWNLOAD_DIR option is not used when GIT_REPOSITORY is used.
Source trees currently placed in our source tree by ExternalProject
build rules will soon be converted to Git submodules, so we should
no longer ignore them.
The CMake ExternalProject module produces a `mkdir` step that makes
relevant directories before other steps, including the build directory.
For `director` we set the BINARY_DIR to be inside the source tree
because that is where we want `make` to run.  When we later consolidate
our `download-${proj}` targets into their main `${proj}` targets then
this will cause a subdirectory to be made in the director source tree
before its `download` step.  This will cause `git clone` to fail
complaining that the clone destination directory is a non-empty
directory.  Avoid this problem by adding a `chdir` step to our in-source
`director` BUILD_COMMAND so that we can use a BINARY_DIR that does not
sit in a subdirectory of the source tree.
…dule

Later we may optionally convert the submodule configuration to follow a
branch to restore current behavior.
Hack the `CMakeLists.txt` file to convert all of our

    <proj>_GIT_REPOSITORY
    <proj>_GIT_TAG

pairs into Git submodule configuration.
Use Git submodules to track external source trees and manage checkouts.
This will allow Git, our version control tool, to manage downloads and
updates of both our source tree and its external project source trees.
For example, a single `git status` from the top will now report the
state of all source trees used to build the project.

Convert our table of `<proj>_GIT_REPOSITORY`/`<proj>_GIT_TAG` pairs from
the CMake code into Git submodule configuration in the source tree.
Perform the conversion automatically by running CMake once.  Our
temporarily hacked `CMakeLists.txt` file configures the `.gitmodules`
file and adds the submodule entries to the Git index.
…odules

Remove from `CMakeLists.txt` the hack previously added to perform the
conversion.
@bradking
Copy link
Contributor Author

Some CI builds fail with:

error: could not lock config file .git/config: File exists
Failed to register url for submodule path 'externals/...'

I suspect this is caused by parallel downloads colliding on submodule initialization. Will fix.

@mwoehlke-kitware
Copy link
Contributor

:lgtm: (caveat: I didn't attempt to verify that the repository URL's and SHA's are unchanged; I'll just trust you there 😄)

Previously, bradking (Brad King) wrote…

Some CI builds fail with:


error: could not lock config file .git/config: File exists

Failed to register url for submodule path 'externals/...'

I suspect this is caused by parallel downloads colliding on submodule initialization. Will fix.


Reviewed 34 of 34 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, all discussions resolved, some commit checks failed.


Comments from Reviewable

@mwoehlke-kitware
Copy link
Contributor

mwoehlke-kitware commented May 31, 2016

Some CI builds fail with:

error: could not lock config file .git/config: File exists
Failed to register url for submodule path 'externals/...'

I suspect this is caused by parallel downloads colliding on submodule initialization. Will fix.

That could be; I always run the top level build as ninja -j1 (since the sub-ninjas will run in parallel anyway, there isn't much point building multiple externals in parallel, and besides, that would result in n^2 total jobs 😁).

In fact, since we require CMake 3.5 which I believe is new enough, can we just make the externals use ninja's console pool for all steps? (I guess we'll still need a fix for other generators, but this would be nice anyway...)

@bradking
Copy link
Contributor Author

bradking commented Jun 1, 2016

parallel downloads colliding on submodule initialization. Will fix.

I've revised the topic to fix this, and the CI builds are now clean. The fix was simply to perform git submodule init serially during the CMake configuration step in order to make all needed changes to .git/config ahead of time.

can we just make the externals use ninja's console pool for all steps

Perhaps, but that is beyond the scope of this change/PR.

@bradking
Copy link
Contributor Author

bradking commented Jun 1, 2016

:lgtm: (caveat: I didn't attempt to verify that the repository URL's and SHA's are unchanged; I'll just trust you there 😄)

The topic commit history shows how this was done automatically to preserve the values. Basically I added temporary changes to CMakeLists.txt to do the conversion, did the conversion, and then reverted the temporary changes.

@mwoehlke-kitware
Copy link
Contributor

can we just make the externals use ninja's console pool for all steps

Perhaps, but that is beyond the scope of this change/PR.

I was expecting you would need to explicitly serialize the download steps, likely making changes very similar to having the externals use the console pool also. Since you ended up doing something quite different, such a change would indeed be much more orthogonal than I was anticipating.

Anyway, LGTM... :lgtm:


Reviewed 1 of 1 files at r3.
Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@david-german-tri
Copy link
Contributor

Review status: all files reviewed at latest revision, 1 unresolved discussion.


drake/doc/from_source.rst, line 137 [r3] (raw file):

  git checkout master
  git pull
  git submodule update --recursive

Am I correctly understanding the following?

  1. Someone changes an external SHA.
  2. I git pull.
  3. I build.
  4. I'm still using the old version of the external, because I didn't git submodule update. Things explode.

If so: is it possible to make the generated build files re-run ${proj}-download when the submodule file changes?


Comments from Reviewable

@bradking
Copy link
Contributor Author

bradking commented Jun 1, 2016

Review status: all files reviewed at latest revision, 1 unresolved discussion.


drake/doc/from_source.rst, line 137 [r3] (raw file):
That's correct. Some people have trained themselves to do both steps. Others create an alias for them:

$ git config alias.pullall '!bash -c "git pull && git submodule update --recursive"'
...
$ git pullall

We could have CMake create this alias for the local Git work tree, or provide a "developer setup" script.

is it possible to make the generated build files re-run ${proj}-download when the submodule file changes?

We could make the build do the update by setting the external project UPDATE_COMMAND to git submodule update --recursive too. This would mean restoring an option like AUTO_UPDATE_EXTERNALS to control it, but the behavior when the option is OFF will be much more tractable than it is without submodules because updating manually will be easy.

Either way we will still have gained whole-tree management by Git (e.g. git status).


Comments from Reviewable

@david-german-tri
Copy link
Contributor

Review status: all files reviewed at latest revision, 1 unresolved discussion.


drake/doc/from_source.rst, line 137 [r3] (raw file):

Previously, bradking (Brad King) wrote…

That's correct. Some people have trained themselves to do both steps. Others create an alias for them:

$ git config alias.pullall '!bash -c "git pull && git submodule update --recursive"'
...
$ git pullall

We could have CMake create this alias for the local Git work tree, or provide a "developer setup" script.

is it possible to make the generated build files re-run ${proj}-download when the submodule file changes?

We could make the build do the update by setting the external project UPDATE_COMMAND to git submodule update --recursive too. This would mean restoring an option like AUTO_UPDATE_EXTERNALS to control it, but the behavior when the option is OFF will be much more tractable than it is without submodules because updating manually will be easy.

Either way we will still have gained whole-tree management by Git (e.g. git status).

The `UPDATE_COMMAND` approach sounds like a good idea to me. Using an sha other than the one specified in `master` for a subproject should be a conscious choice.

Comments from Reviewable

@bradking
Copy link
Contributor Author

bradking commented Jun 2, 2016

Review status: 34 of 35 files reviewed at latest revision, 1 unresolved discussion.


drake/doc/from_source.rst, line 137 [r3] (raw file):

Previously, david-german-tri (David German) wrote…

The UPDATE_COMMAND approach sounds like a good idea to me. Using an sha other than the one specified in master for a subproject should be a conscious choice.

Done.

Comments from Reviewable

@david-german-tri
Copy link
Contributor

Looks good to me. @RussTedrake, did you want to review this change as well?

Previously, mwoehlke-kitware (Matthew Woehlke) wrote…

can we just make the externals use ninja's console pool for all steps

Perhaps, but that is beyond the scope of this change/PR.

I was expecting you would need to explicitly serialize the download steps, likely making changes very similar to having the externals use the console pool also. Since you ended up doing something quite different, such a change would indeed be much more orthogonal than I was anticipating.

Anyway, LGTM... :lgtm:


Review status: 34 of 35 files reviewed at latest revision, all discussions resolved.


drake/doc/from_source.rst, line 137 [r3] (raw file):

Previously, bradking (Brad King) wrote…

Done.

LGTM, thanks!

Comments from Reviewable

@david-german-tri
Copy link
Contributor

Reviewed 33 of 34 files at r1, 1 of 1 files at r2, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@rdeits
Copy link
Contributor

rdeits commented Jun 6, 2016

We actually already had this discussion almost two years ago: #1169 and came to exactly the opposite conclusion. We had been using git submodules within drake for a long time, and ultimately decided to abandon them for several reasons:

  1. Submodules were an extremely common point of failure for new users. We spent an unreasonable amount of time helping people git submodule update --init --recursive. I really can't stress enough how confusing this was for a lot of students working with Drake.
  2. There was no way for users to check out only a specific set of submodules, which was a particular problem since some of our externals are private.
  3. It forced us to handle git externals and non-git externals through entirely different mechanisms
  4. It's annoying and difficult to change the URL that a submodule points to. This turns out to be a useful feature of the current system because I can, for example, fork one of our externals, push a branch to my fork, then edit Drake's CMakeLists to point to my fork in order to test my changed external with Drake.

Previously, david-german-tri (David German) wrote…

Looks good to me. @RussTedrake, did you want to review this change as well?


Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@bradking
Copy link
Contributor Author

bradking commented Jun 6, 2016

@rdeits thanks for the link and the summary of previous discussion.

I've found it difficult to work with Drake externals without submodules. It is very hard to tell whether any external sources have been modified. In fact while working on this I found that our builds were making some source trees dirty and no one noticed (I added .gitignore updates to the externals to fix this).

  1. Submodules were an extremely common point of failure for new users. We spent an unreasonable amount of time helping people git submodule update --init --recursive. I really can't stress enough how confusing this was for a lot of students working with Drake.
  2. There was no way for users to check out only a specific set of submodules, which was a particular problem since some of our externals are private.

With the approach proposed in this PR no users actually have to deal with submodules. The workflow is the same as before for users. The needed subset of externals is automatically selected by the CMake configuration and used to configure a subset of submodules. Builds automatically update them unless AUTO_UPDATE_EXTERNALS is turned off (this changed during review so the opening description in this PR is no longer up to date).

  1. It forced us to handle git externals and non-git externals through entirely different mechanisms

Currently we have no non-Git externals. We can handle them by importing them into Git ourselves in a dedicated support repository. Even before this change the externals logic hard-coded use of GIT_REPOSITORY ... in ExternalProject_Add calls.

  1. It's annoying and difficult to change the URL that a submodule points to. This turns out to be a useful feature of the current system because I can, for example, fork one of our externals, push a branch to my fork, then edit Drake's CMakeLists to point to my fork in order to test my changed external with Drake.

Turn off AUTO_UPDATE_EXTERNALS and deal with the submodule repository work trees as you would any other tree. Add a remote, fetch, and checkout the desired branch. The local edit will appear in git diff of the top-level source tree just as a change to its CMakeLists.txt file would.

@david-german-tri
Copy link
Contributor

+@RussTedrake +@rdeits

Previously, bradking (Brad King) wrote…

So all users will have all of the submodules listed in their .gitmodules, but only some of them will actually be initialized?

Correct. Actually .gitmodules is part of the repository tracked content and is just a table of available submodules. Initializing a submodule means copying the configuration into .git/config, and only the needed subset will have that done. Furthermore, checking out a submodule is only done by git submodule update, and that can be restricted even to a subset of initialized modules. Indeed the download step implemented in this PR for each external project just checks out its own submodule.

I believe this will make it more difficult to check out older versions of drake without rm -rf-ing the entire externals directory.

I did that regularly while developing this change without removing the externals directory and I never saw that diagnostic. It should be fairly clean because in both cases the subdirectories are managed by Git. I think difficulty like you encountered occurs only when .gitignore is insufficient in the older external repo versions. One can simply remove the untracked files. Either way, this is a temporary transitional concern. Over time the need to checkout versions prior to submodules will diminish.


Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@RussTedrake
Copy link
Contributor

@rdeits -- thanks for clearly articulating the concerns that I had immediately, too. I'm still a bit worried because the current system has been much better / cleaner than any submodule approach we had before.

a few clarification questions to make sure I understand:

  • we are avoiding the major pain-point with novice users because cmake will call git submodule update (recursively) on every build?
  • every user will have a directory for every external, but the directories which are not used (e.g. they do not have access) will be empty?
  • is the mechanism for NOT initializing the disabled externals simply that cmake never calls init/update on them?
  • if the above is true, what happens if I run git submodule update --init --recursive in the top directory (as your from_source.rst instructs them to do)? won't it try to download those other repos?

We should also add some documentation in the developers section telling them how to update the submodule hash since I know we'll get questions about it.

Previously, david-german-tri (David German) wrote…

+@RussTedrake +@rdeits


Reviewed 10 of 34 files at r1, 1 of 1 files at r2, 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 3 unresolved discussions.


CMakeLists.txt, line 316 [r4] (raw file):

    # Download externals as Git submodules.
    set(${proj}_DOWNLOAD_COMMAND ${GIT_EXECUTABLE} submodule update --init --recursive -- ${${proj}_GIT_SUBMODULE_PATH})

i would have said that leaning on CMake to do this properly (instead of calling git manually) would be better, but I suppose if the curator of CMake is recommending this then I will heed the advice.


CMakeLists.txt, line 489 [r4] (raw file):

    # Initialize the submodule configuration now so parallel downloads do not conflict later.
    execute_process(COMMAND ${GIT_EXECUTABLE} submodule init -- drake/examples/${example}

it's impossible for me to think through all of the use cases in reading this PR. i have to trust that you've thought through them all.


drake/doc/from_source.rst, line 137 [r4] (raw file):

  git checkout master
  git pull
  git submodule update --recursive

i thought we said cmake would do this for you??


Comments from Reviewable

@bradking
Copy link
Contributor Author

we are avoiding the major pain-point with novice users because cmake will call git submodule update (recursively) on every build?

Yes.

every user will have a directory for every external, but the directories which are not used (e.g. they do not have access) will be empty?

Yes.

is the mechanism for NOT initializing the disabled externals simply that cmake never calls init/update on them?

Yes.

if the above is true, what happens if I run git submodule update --init --recursive in the top directory (as your from_source.rst instructs them to do)? won't it try to download those other repos?

The from_source.rst suggests:

git submodule update --recursive

Without the --init flag no extra repos will be downloaded.


Review status: all files reviewed at latest revision, 3 unresolved discussions.


drake/doc/from_source.rst, line 137 [r4] (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

i thought we said cmake would do this for you??

CMake does take care of this, but it won't hurt for the user to do it either (without `--init`). We can drop this line from the documentation if you prefer.

Comments from Reviewable

@mwoehlke-kitware
Copy link
Contributor

what happens if I run git submodule update --init --recursive in the top directory (as your from_source.rst instructs them to do)? won't it try to download those other repos?

It will, as will I believe git submodule init (with no path given). You can use git submodule deinit <path(s)> to "undo" this (as I learned the hard way 😄).

@RussTedrake
Copy link
Contributor

thx. i trust that you've thought it through. i'm willing to give it a shot. :lgtm:

Previously, mwoehlke-kitware (Matthew Woehlke) wrote…

what happens if I run git submodule update --init --recursive in the top directory (as your from_source.rst instructs them to do)? won't it try to download those other repos?

It will, as will I believe git submodule init (with no path given). You can use git submodule deinit <path(s)> to "undo" this (as I learned the hard way 😄).


Review status: all files reviewed at latest revision, 3 unresolved discussions.


Comments from Reviewable

@jamiesnape
Copy link
Contributor

@drake-jenkins-bot retest this please

@bradking
Copy link
Contributor Author

This is ready to merge other than some Reviewable discussions not marked as resolved.

@david-german-tri
Copy link
Contributor

Review status: all files reviewed at latest revision, 1 unresolved discussion.


CMakeLists.txt, line 316 [r4] (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

i would have said that leaning on CMake to do this properly (instead of calling git manually) would be better, but I suppose if the curator of CMake is recommending this then I will heed the advice.

Closing this thread since Russ has LGTMed.

CMakeLists.txt, line 489 [r4] (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

it's impossible for me to think through all of the use cases in reading this PR. i have to trust that you've thought through them all.

Closing this thread since Russ has LGTMed.

drake/doc/from_source.rst, line 137 [r4] (raw file):

Previously, bradking (Brad King) wrote…

CMake does take care of this, but it won't hurt for the user to do it either (without --init). We can drop this line from the documentation if you prefer.

I'd support dropping it.

Comments from Reviewable

@bradking
Copy link
Contributor Author

Review status: all files reviewed at latest revision, 1 unresolved discussion.


drake/doc/from_source.rst, line 137 [r4] (raw file):

Previously, david-german-tri (David German) wrote…

I'd support dropping it.

Done.

Comments from Reviewable

bradking added 2 commits June 15, 2016 12:31
Our external source trees are now managed by Git as submodules.  Teach
our external projects to download and update their source trees via

    git submodule update --init --recursive $path

Users that turn off the AUTO_UPDATE_EXTERNALS option may now run

    git submodule update --recursive

to update submodule checkouts after changing versions of the host
project (e.g. `git pull`).
Combine the `${proj}` and `download-${proj}` external project targets
into a single `${proj}` external project target that contains all steps.
Add a separate `${proj}-update` target that depends on the update step
of `${proj}` (and transitively on the download step).  Then make
`download-all` depend on all of those.
@david-german-tri
Copy link
Contributor

Review status: all files reviewed at latest revision, all discussions resolved.


drake/doc/from_source.rst, line 137 [r4] (raw file):

Previously, bradking (Brad King) wrote…

Done.

LGTM

Comments from Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants