Customers have often asked for the ability to use multiple repositories.
Today, we offer a weak workaround: a script step with the additional git clone
.
You lose access to things like our smart re-use of clones (when running on a private agent).
Also, we tell you how to re-use the System.AccessToken
for other Azure Repos, but that doesn't generalize to other source providers.
This feature will add multi-repo checkout as a first-class citizen in YAML pipelines. (Note: we already allow repository resources, but they're used only for YAML templates today.)
Goals:
- Allow fetching multiple repositories, regardless of source host
- Allow triggering pipelines from changes to any of the repositories
- Provide useful/sane defaults for common scenarios, while allowing flexibility for customers with very specific requirements
- Preserve back-compat with existing pipelines, tasks, ad-hoc scripts, and template inclusion for customers who don't opt into multi-checkout
Non-goals:
- Adding a second checkout step is perfectly seamless (customers who opt into multiple checkouts may have to alter other aspects of their pipelines)
- Alter our existing behaviors when there are 0 or 1
checkout
steps - Design for multiple repository triggers (look for a sibling document shortly)
- Change our security posture for additional Azure Repos (other features may address this in the future)
Customers often have source, tools, scripts, or other collateral stored in a secondary repo. Even a big mono-repo like the Azure DevOps codebase has adjunct repositories that could be used in a CI/CD pipeline. Especially for teams working with microservices, it's common to require additional sidecar repos containing shared infrastructure or SDKs.
On the other hand, we have lots of customers using just a single repository in their pipeline, and they plan to stay that way. Any features we build must respect the investment those customers have made in their pipelines. Therefore we have this stake in the ground:
We are in "multi-checkout" when the customer has explicitly asked for multiple repositories to be checked out. Zero checkout steps yields the same behavior as today (checkout
self
in a normal job, no checkout in adeployment
job). One checkout (whether it'sself
or not) preserves existing single-repo behavior.
(Note: I'm just calling it a "mode" for purposes of discussion here. We won't document it as a "mode".)
The agent will look at the job message and discover all checkout
steps.
- If there are none at all, we'll preserve the existing default behavior, acting as if a
checkout: self
were the first step. - If there is exactly one
checkout: none
, then we'll preserve the existing behavior: no repositories are synced or checked out. - If there is exactly one
checkout
step that isn'tnone
, then we'll use existing single-checkout behavior but check out that repository rather thanself
.
- If there are multiple checkout steps and one or more are
checkout: none
, this is an error. It's unclear what the user meant for us to do, so we fail with a clear message. - Otherwise, we are in multi-checkout mode.
Notably, a user may request the same repo to be checked out multiple times at the same or different refs.
There are niche scenarios around code gen, trying out different versions of a tool, etc. which are interesting to support.
For purposes of source mapping (an agent feature to avoid re-fetching), we'll consider only the first checkout
step mentioning a particular repo.
If it's used again, it's OK to re-fetch the same data.
(But this is not part of the contract, and we could change behavior in the future based on customer demand/feedback.)
By default, all repositories will be checked out as if they've been git clone
d into the Build.SourcesDirectory
directory.
Git uses the last part of the path, minus any trailing slashes, spaces, and the string .git
.
(We have YAML syntax for selecting a different path, and if the user specifies that, we'll use their name.)
This changes the default directory for the self
repo: it would be $(Pipeline.Workspace)/s
if it were the only checkout
step, but now it will be $(Pipeline.Workspace)/s/<reponame>
.
The pipeline workspace directory has four well-known subdirectories:
s/
for sourcesa/
for artifactsb/
for generated binariestestResults/
for test results files
We'll recommend not to use any of these as checkout paths, but we won't attempt to block them.
No change to the contents of system variablies like Build.SourcesDirectory
or System.DefaultWorkingDirectory
.
Those will remain $(Pipeline.Workspace)/s
.
NOTE: if one or more checkout
s sets an explicit path outside of $(Pipeline.Workspace)/s
, it's ambiguous where the "root of source" should be.
Since we don't know the user's or the task's intent, we won't try to be clever and do something else.
We'll document this behavior so that it's learnable.
The resources
block already allows a repositories
list.
Today, those repositories hold only YAML templates, so we can't blindly check them out automatically.
That's why an explicit checkout
step is also required.
However, in multi-checkout scenarios, this leads to unnecessary verbosity.
First you have to mention the repository up top, give it a name, and then you check it out by name.
To combat this, we'll add an inline repository syntax.
Inline syntax will cover the type
, name
, and ref
of a repository resource.
(If other changes are needed - such as endpoint
or trigger
- users must fall back to explicit resource syntax.)
Supported repo types will match what YAML supports:
- Azure Repos
- GitHub
- External Git (but not for the triggering repo)
If and when other providers are supported for YAML, they will be supported by this feature as well.
The most basic, and probably most common form of multi-checkout:
steps:
- checkout: self
- checkout: git://MyProject/MyToolsRepo
This grabs a tools repository from Azure Repos and checks out its default branch, typically master
.
Assuming this is in a repository called "MySourceRepo", the resulting directory structure on a hosted agent would be:
_w/
1/
MySourceRepo/
(files from MySourceRepo)
MyToolsRepo/
(files from MyToolsRepo)
The same syntax works for GitHub:
steps:
- checkout: self
- checkout: github://MyGitHubOrg/MyToolsRepo
And this yields the same directory structure.
You can tell the checkout
step where to put your repos.
steps:
- checkout: self
path: src
- checkout: github://MyGitHubOrg/MyToolsRepo
path: toolzdir
_w/
1/
src/
(files from MySourceRepo)
toolzdir/
(files from MyToolsRepo)
steps:
- checkout: self
- checkout: git://MyProject/MyToolsRepo@features/mytools
By appending @<ref>
, the agent can be instructed to check out a different ref.
In this case, it's assumed to be a branch called features/mytools
.
Branches can also be prefixed with refs/heads/
to make it unambiguous that it's a branch.
Other valid refs (and ref-like things):
refs/tags/<tagname>
- commit ID
If you need an endpoint or other extended resource field, you must use explicit resource syntax.
resources:
repositories:
- repository: tools
type: github
endpoint: MyServiceConnection
steps:
- checkout: self
- checkout: tools
This one is not a multi-checkout scenario, but is a new capability:
steps:
- checkout: git://MyProject/MySourceInAnotherRepo
The contents of MySourceInAnotherRepo
will be checked out into the typical s/
directory.
All other behaviors will remain identical.
This one is an error:
steps:
- checkout: none
- checkout: self # error! `checkout: none` must be the only checkout step
This one will trigger on the main repo with the YAML file as well as the app
repo.
trigger:
- master
resources:
repositories:
- repository: app
type: github
name: org1/repoA
ref: master
endpoint: 'GitHub endpoint'
trigger:
- master
- release/*
The default refs for self
and app
repos will be the latest commit from master
in the respective repos. But, you can change that when you manually start a run. Similarly, if this pipeline is triggered by a change to app
repo, then your checkout tasks will pull that change for the app
repo and the latest from master
for self
repo.