diff --git a/docs/design/git-submodule-storage.md b/docs/design/git-submodule-storage.md index e80f34e999..c3dcf686eb 100644 --- a/docs/design/git-submodule-storage.md +++ b/docs/design/git-submodule-storage.md @@ -57,24 +57,90 @@ These extensions are: - Non-git subrepos - Colocated Git repos -- Non-git backends +- The superproject using a non-git backend -## Possible approaches +## Proposed design -### Approach 1: Store Git submodules as full jj repos +Git submodules will be stored as full jj repos. In the code, jj commands will +only interact with the submodule's repo as an entire unit, e.g. it cannot query +the submodule's commit backend directly. A well-abstracted submodule will extend +well to non-git backends and non-git subrepos. -This would be somewhere in `.jj` but outside of `.jj/store`. We would then -expose a "submodules" interface that gets hooked up to the relevant machinery -(e.g. updating the working copy). +The main challenge with this approach is that the submodule repo can be in a +state that is internally valid (when considering only the submodule's repo), but +invalid when considering the superproject-submodule system. This will be managed +by requiring all submodule interactions go through the superproject so that +superproject-submodule coordination can occur. For example, jj will not allow +the user to work on the submodule's repo without going through the superproject +(unlike Git). -TODO(chooglen): Discuss operation log -TODO(chooglen): Discuss nested submodules +The notable workflows could be addressed like so: -### Approach 3: Store Git submodules as alternate jj repo backends +### Fetching submodule commits + +The submodule would fetch using the equivalent of `jj git fetch`. It remains to +be decided how a "recursive" fetch should work, especially if a newly fetched +superproject commit references an unfetched submodule commit. A reasonable +approximation would be to fetch all branches in the submodule, and then, if the +submodule commit is still missing, gracefully handle it. + +### "jj op restore" and operation log format + +As full repos, each submodule will have its own operation log. We will continue +to use the existing operation log format, where each operation log tracks their +own repo's commits. As commands are run in the superproject, corresponding +commands will be run in the submodule as necessary, e.g. checking out a +superproject commit will cause a submodule commit to also be checked out. + +Since there is no association between a superproject operation and a submodule +operation, `jj op restore` in the superproject will not restore the submodule to +a previous operation. Instead, the appropriate submodule operation(s) will be +created. This is sufficient to preserve the superproject-submodule relationship; +it precludes "recursive" restore (e.g. restoring branches in the superproject +and submodules) but it seems unlikely that we will need such a thing. + +### Nested submodules + +Since submodules are full repos, they can contain submodules themselves. Nesting +is unlikely to complicate any of the core features, since the top-level +superproject/submodule relationship is almost identical to the submodule/nested +submodule relationship. + +### Extending to colocated Git repos + +Git expects submodules to be in `.git/modules`, so it will not understand this +storage format. To support colocated Git repos, we will have to change Git to +allow a submodule's gitdir to be in an alternate location (e.g. we could add a +new `submodule..gitdir` config option). This is a simple change, so it +should be feasible. + +## Alternatives considered + +### Git repos in the main Git backend + +Since the Git backend contains a Git repository, an 'obvious' default would be +to store them in the Git superproject the same way Git does, i.e. in +`.git/modules`. Since Git submodules are full repositories that can have +submodules, this storage scheme naturally extends to nested submodules. + +Most of the work in storing submodules and querying them would be well-isolated +to the Git backend, which gives us a lot of flexibility to make changes without +affecting the rest of jj. However, the operation log will need a significant +rework since it isn't designed to reference submodules, and handling edge cases +(e.g. a submodule being added/removed, nested submodules) will be tricky. + +This is rejected because handling that operation log complexity isn't worth it +when very little of the work extends to non-Git backends. + +### Store Git submodules as alternate Git backends -This is Approach 3, but instead of storing the submodule in a Git backend, -create a new backend that is backed by a full jj repo (like Approach 2), and -store the Git submodule in its own jj repo backend. +Teach jj to use multiple commit backends and store Git submodules as Git +backends. Since submodules are separate from the 'main' backend, a repository +can use whatever backend it wants as its 'main' one, while still having Git +submodules in the 'alternate' Git backends. -TODO(chooglen): Discuss operation log -TODO(chooglen): Discuss nested submodules +This approach extends fairly well to non-Git submodules (which would be stored +in non-Git commit backends). However, this requires significantly reworking the +operation log to account for multiple commit backends. It is also not clear how +nested submodules will be supported since there isn't an obvious way to +represent a nested submodule's relationship to its superproject.