Skip to content

Commit

Permalink
mailmap: add support for .mailmap files
Browse files Browse the repository at this point in the history
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

In this model, you can think of raw signatures as being like database
keys for a table mapping them to canonical signatures. In an ideal
world, these keys would be opaque synthetic keys with no human meaning
that are only surfaced when poking into internals; the idea is to treat
them as such to the greatest extent possible given the realities of
the current object model.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow a
  query for any member to match all of them, but this would contradict
  what is actually displayed for the commits, violate the principles
  about surfacing raw signatures detailed above, and the `*_raw`
  functions may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
  • Loading branch information
emilazy authored and maddiemort committed Sep 20, 2024
1 parent 12592bf commit 90ae2d0
Show file tree
Hide file tree
Showing 17 changed files with 631 additions and 27 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
address unconditionally. Only ASCII case folding is currently implemented,
but this will likely change in the future.

* Support for [`.mailmap`](https://git-scm.com/docs/gitmailmap) files has
been added.

### Fixed bugs

## [0.19.0] - 2024-07-03
Expand Down
18 changes: 16 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ gix = { version = "0.63.0", default-features = false, features = [
"blob-diff",
] }
gix-filter = "0.11.2"
# We list `gix-{actor,mailmap}` separately, as they are used by
# `jj_lib::mailmap` even when the Git backend is disabled.
gix-actor = { version = "0.31.3" }
gix-mailmap = { version = "0.23.4" }
glob = "0.3.1"
hex = "0.4.3"
ignore = "0.4.20"
Expand Down
9 changes: 9 additions & 0 deletions cli/src/cli_util.rs
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ use jj_lib::git_backend::GitBackend;
use jj_lib::gitignore::{GitIgnoreError, GitIgnoreFile};
use jj_lib::hex_util::to_reverse_hex;
use jj_lib::id_prefix::IdPrefixContext;
use jj_lib::mailmap::{read_current_mailmap, Mailmap};
use jj_lib::matchers::Matcher;
use jj_lib::merge::MergedTreeValue;
use jj_lib::merged_tree::MergedTree;
Expand Down Expand Up @@ -73,6 +74,7 @@ use jj_lib::workspace::{
};
use jj_lib::{dag_walk, fileset, git, op_heads_store, op_walk, revset};
use once_cell::unsync::OnceCell;
use pollster::FutureExt;
use tracing::instrument;
use tracing_chrome::ChromeLayerBuilder;
use tracing_subscriber::prelude::*;
Expand Down Expand Up @@ -712,6 +714,11 @@ impl WorkspaceCommandHelper {
self.repo().view().get_wc_commit_id(self.workspace_id())
}

pub fn current_mailmap(&self) -> Result<Mailmap, CommandError> {
// TODO: Consider figuring out a caching strategy for this.
Ok(read_current_mailmap(self.repo().as_ref(), self.workspace.workspace_id()).block_on()?)
}

pub fn working_copy_shared_with_git(&self) -> bool {
self.working_copy_shared_with_git
}
Expand Down Expand Up @@ -996,6 +1003,8 @@ impl WorkspaceCommandHelper {
self.settings.user_email(),
&self.revset_extensions,
Some(workspace_context),
// TODO: Consider handling errors here.
Rc::new(self.current_mailmap().unwrap_or_default()),
)
}

Expand Down
21 changes: 19 additions & 2 deletions cli/src/commit_templater.rs
Original file line number Diff line number Diff line change
Expand Up @@ -468,8 +468,14 @@ fn builtin_commit_methods<'repo>() -> CommitTemplateBuildMethodFnMap<'repo, Comm
Ok(L::wrap_commit_list(out_property))
},
);
map.insert("author", |language, _build_ctx, self_property, function| {
function.expect_no_arguments()?;
let mailmap = language.revset_parse_context.mailmap().clone();
let out_property = self_property.map(move |commit| mailmap.author(&commit));
Ok(L::wrap_signature(out_property))
});
map.insert(
"author",
"author_raw",
|_language, _build_ctx, self_property, function| {
function.expect_no_arguments()?;
let out_property = self_property.map(|commit| commit.author_raw().clone());
Expand All @@ -478,6 +484,15 @@ fn builtin_commit_methods<'repo>() -> CommitTemplateBuildMethodFnMap<'repo, Comm
);
map.insert(
"committer",
|language, _build_ctx, self_property, function| {
function.expect_no_arguments()?;
let mailmap = language.revset_parse_context.mailmap().clone();
let out_property = self_property.map(move |commit| mailmap.committer(&commit));
Ok(L::wrap_signature(out_property))
},
);
map.insert(
"committer_raw",
|_language, _build_ctx, self_property, function| {
function.expect_no_arguments()?;
let out_property = self_property.map(|commit| commit.committer_raw().clone());
Expand All @@ -486,8 +501,10 @@ fn builtin_commit_methods<'repo>() -> CommitTemplateBuildMethodFnMap<'repo, Comm
);
map.insert("mine", |language, _build_ctx, self_property, function| {
function.expect_no_arguments()?;
let mailmap = language.revset_parse_context.mailmap().clone();
let user_email = language.revset_parse_context.user_email().to_owned();
let out_property = self_property.map(move |commit| commit.author_raw().email == user_email);
let out_property =
self_property.map(move |commit| mailmap.author(&commit).email == user_email);
Ok(L::wrap_boolean(out_property))
});
map.insert(
Expand Down
1 change: 1 addition & 0 deletions cli/tests/runner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ mod test_immutable_commits;
mod test_init_command;
mod test_interdiff_command;
mod test_log_command;
mod test_mailmap;
mod test_move_command;
mod test_new_command;
mod test_next_prev_commands;
Expand Down
168 changes: 168 additions & 0 deletions cli/tests/test_mailmap.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
// Copyright 2024 The Jujutsu Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// https://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

use crate::common::{get_stdout_string, TestEnvironment};

#[test]
fn test_mailmap() {
let test_env = TestEnvironment::default();
test_env.jj_cmd_ok(test_env.env_root(), &["git", "init", "repo"]);
let repo_path = test_env.env_root().join("repo");

let mut mailmap = String::new();
let mailmap_path = repo_path.join(".mailmap");
let mut append_mailmap = move |extra| {
mailmap.push_str(extra);
std::fs::write(&mailmap_path, &mailmap).unwrap()
};

let run_as = |name: &str, email: &str, args: &[&str]| {
test_env
.jj_cmd(&repo_path, args)
.env("JJ_USER", name)
.env("JJ_EMAIL", email)
.assert()
.success()
};

append_mailmap("# test comment\n");

let stdout = test_env.jj_cmd_success(&repo_path, &["log", "-T", "author"]);
insta::assert_snapshot!(stdout, @r###"
@ Test User <[email protected]>
"###);

// Map an email address without any name change.
run_as("Test Üser", "[email protected]", &["new"]);
append_mailmap("<[email protected]> <[email protected]>\n");

let stdout = test_env.jj_cmd_success(&repo_path, &["log", "-T", "author"]);
insta::assert_snapshot!(stdout, @r###"
@ Test Üser <[email protected]>
◉ Test User <[email protected]>
"###);

// Map an email address to a new name.
run_as("West User", "[email protected]", &["new"]);
append_mailmap("Fest User <[email protected]>\n");

let stdout = test_env.jj_cmd_success(&repo_path, &["log", "-T", "author"]);
insta::assert_snapshot!(stdout, @r###"
@ Fest User <[email protected]>
◉ Test Üser <[email protected]>
◉ Test User <[email protected]>
"###);

// Map an email address to a new name and email address.
run_as("Pest User", "[email protected]", &["new"]);
append_mailmap("Best User <[email protected]> <[email protected]>\n");

let stdout = test_env.jj_cmd_success(&repo_path, &["log", "-T", "author"]);
insta::assert_snapshot!(stdout, @r###"
@ Best User <[email protected]>
◉ Fest User <[email protected]>
◉ Test Üser <[email protected]>
◉ Test User <[email protected]>
"###);

// Map an ambiguous email address using names for disambiguation.
run_as("Rest User", "user@test", &["new"]);
run_as("Vest User", "user@test", &["new"]);
append_mailmap(
&[
"Jest User <[email protected]> ReSt UsEr <UsEr@TeSt>\n",
"Zest User <[email protected]> vEsT uSeR <uSeR@tEsT>\n",
]
.concat(),
);

let stdout = test_env.jj_cmd_success(&repo_path, &["log", "-T", "author"]);
insta::assert_snapshot!(stdout, @r###"
@ Zest User <[email protected]>
◉ Jest User <[email protected]>
◉ Best User <[email protected]>
◉ Fest User <[email protected]>
◉ Test Üser <[email protected]>
◉ Test User <[email protected]>
"###);

// The `.mailmap` file in the current workspace’s @ commit should be used.
let stdout = test_env.jj_cmd_success(&repo_path, &["log", "-T", "author", "--at-operation=@-"]);
insta::assert_snapshot!(stdout, @r###"
@ Vest User <user@test>
◉ Rest User <user@test>
◉ Best User <[email protected]>
◉ Fest User <[email protected]>
◉ Test Üser <[email protected]>
◉ Test User <[email protected]>
"###);

// The `author(pattern)` revset function should find mapped committers.
let stdout = test_env.jj_cmd_success(
&repo_path,
&["log", "-T", "author", "-r", "author(substring-i:bEsT)"],
);
insta::assert_snapshot!(stdout, @r###"
◉ Best User <[email protected]>
~
"###);

// The `author(pattern)` revset function should only search the mapped form.
// This matches Git’s behaviour and the principle of not surfacing raw
// signatures by default.
let stdout =
test_env.jj_cmd_success(&repo_path, &["log", "-T", "author", "-r", "author(pest)"]);
insta::assert_snapshot!(stdout, @r###"
"###);

// The `author_raw(pattern)` revset function should search the unmapped
// commit data.
let stdout = test_env.jj_cmd_success(
&repo_path,
&["log", "-T", "author", "-r", "author_raw(\"user@test\")"],
);
insta::assert_snapshot!(stdout, @r###"
@ Zest User <[email protected]>
◉ Jest User <[email protected]>
~
"###);

// `mine()` should find commits that map to the current `user.email`.
let assert = run_as(
"Tëst Üser",
"[email protected]",
&["log", "-T", "author", "-r", "mine()"],
);
insta::assert_snapshot!(get_stdout_string(&assert), @r###"
◉ Test Üser <[email protected]>
◉ Test User <[email protected]>
~
"###);

// `mine()` should only search the mapped author; this may be confusing in this
// case, but matches the semantics of it expanding to `author(‹user.email›)`.
let stdout: String =
test_env.jj_cmd_success(&repo_path, &["log", "-T", "author", "-r", "mine()"]);
insta::assert_snapshot!(stdout, @r###"
"###);
}
2 changes: 1 addition & 1 deletion cli/tests/test_revset_output.rs
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,7 @@ fn test_function_name_hint() {
| ^-----^
|
= Function "author_" doesn't exist
Hint: Did you mean "author", "my_author"?
Hint: Did you mean "author", "author_raw", "my_author"?
"###);

insta::assert_snapshot!(evaluate_err("my_branches"), @r###"
Expand Down
6 changes: 6 additions & 0 deletions docs/revsets.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,12 +251,18 @@ revsets (expressions) as arguments.
* `author(pattern)`: Commits with the author's name or email matching the given
[string pattern](#string-patterns).

* `author_raw(pattern)`: Like `author(pattern)`, but ignoring any mappings in
the [`.mailmap` file](https://git-scm.com/docs/gitmailmap).

* `mine()`: Commits where the author's email matches the email of the current
user.

* `committer(pattern)`: Commits with the committer's name or email matching the
given [string pattern](#string-patterns).

* `committer_raw(pattern)`: Like `committer(pattern)`, but ignoring any
mappings in the [`.mailmap` file](https://git-scm.com/docs/gitmailmap).

* `empty()`: Commits modifying no files. This also includes `merges()` without
user modifications and `root()`.

Expand Down
4 changes: 4 additions & 0 deletions docs/templates.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,11 @@ This type cannot be printed. The following methods are defined.
* `commit_id() -> CommitId`
* `parents() -> List<Commit>`
* `author() -> Signature`
* `author_raw() -> Signature`: Like `author()`, but ignoring any mappings in
the [`.mailmap` file](https://git-scm.com/docs/gitmailmap).
* `committer() -> Signature`
* `committer_raw() -> Signature`: Like `committer()`, but ignoring any mappings
in the [`.mailmap` file](https://git-scm.com/docs/gitmailmap).
* `mine() -> Boolean`: Commits where the author's email matches the email of the current
user.
* `working_copies() -> String`: For multi-workspace repository, indicate
Expand Down
2 changes: 2 additions & 0 deletions lib/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ futures = { workspace = true }
git2 = { workspace = true, optional = true }
gix = { workspace = true, optional = true }
gix-filter = { workspace = true, optional = true }
gix-actor = { workspace = true }
gix-mailmap = { workspace = true }
glob = { workspace = true }
hex = { workspace = true }
ignore = { workspace = true }
Expand Down
14 changes: 14 additions & 0 deletions lib/src/commit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -146,11 +146,25 @@ impl Commit {
}

/// Returns the raw author signature from the commit data.
///
/// **Note:** You usually **should not** directly process or display this
/// information before canonicalizing it. Prefer
/// [`Mailmap::author`][`crate::mailmap::Mailmap::author`] unless you
/// care specficially about the potentially‐outdated immutable commit data,
/// or are performing low‐level operations in a context that can’t obtain a
/// [`Mailmap`][`crate::mailmap::Mailmap`].
pub fn author_raw(&self) -> &Signature {
&self.data.author
}

/// Returns the raw committer signature from the commit data.
///
/// **Note:** You usually **should not** directly process or display this
/// information before canonicalizing it. Prefer
/// [`Mailmap::committer`][`crate::mailmap::Mailmap::committer`] unless you
/// care specficially about the potentially‐outdated immutable commit
/// data, or are performing low‐level operations in a context that can’t
/// obtain a [`Mailmap`][`crate::mailmap::Mailmap`].
pub fn committer_raw(&self) -> &Signature {
&self.data.committer
}
Expand Down
Loading

0 comments on commit 90ae2d0

Please sign in to comment.