Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: Better Identity Management #2957

Open
steveklabnik opened this issue Feb 5, 2024 · 6 comments
Open

FR: Better Identity Management #2957

steveklabnik opened this issue Feb 5, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@steveklabnik
Copy link
Contributor

steveklabnik commented Feb 5, 2024

Before we get into the details, I want to emphasize that I see this FR as more of a "consider and investigate" rather than any expectation that things will change. Identity is a difficult and complex topic, and the current implementation is serviceable. But, I also think, in the same way that jj improves upon previous DVCSes in many ways, maybe it could for identity as well.

Is your feature request related to a problem? Please describe.

In Git, as with many VCS systems, commits have some sort of identity attached to them. In Git specifically, commit objects contain author information, a name and an email. Because this information is part of the commit itself, changing this information will change the SHA of the commit, which means you need a lot of rebasing to update the tree.

In the real world, identity is not so simple. See articles like Falsehoods Programmers Believe about Names for some examples. But even beyond many of these (which to be clear, git does not fall afoul of all of them), sometimes things happen where people would like to revoke their authorship. Note that while this got merged, it doesn't fully do what the author would like it to do:

I would also like to request that the Rust team strip all authorship information from my commits to the projects in the rust-lang organization.

Due to the way that git's model works, this is effectively an impossible request. (Which I find personally unfortunate but I am trying to stay neutral in the description of this issue.)

I link to this issue because I think it's a good proxy for thinking about this problem in general, revocation is one of the more difficult aspects of identity management.

Describe the solution you'd like

At this time, I am not asking for a specific solution, but would like to have this FR as a place to discuss possibilities and collect information about how to possibly support these sorts of operations in the future.

Describe alternatives you've considered

I think the first step towards evolution here is to decide what properties a given identity solution would require, and then go from there. The two that I can think of are rotation (changing an identity) and revocation (abandoning an identity), but I am sure there are more. Weighing these various factors will point towards a specific technical solution.

There is also the additional complexity of how this affects various backends, as you won't necessarily be able to represent the full solution when sending things to an upstream git repository, for example.

A specific lesser known implementation of these ideas in a similar space is ATProto. AT is the underlying protocol that powers the BlueSky social media platform, however, I bring it up because you can kind of think of AT as "git for twitter posts." Without fully getting into the details, AT uses a W3C standard called DID, for "distributed identifiers" for identity management. There are some weaknesses in this model, and it may not be appropriate for jj wholesale, but it is at least one reasonable citation for prior art that's worth at least considering or taking ideas from.

Additional context

To bring another real-world example from the Rust Project, https://thanks.rust-lang.org/ is a website that pulls git authorship information, in order to create a contributors' list. It uses a partial solution to this issue, a git mailmap, so that at least information is displayed with the latest name requested. Here is the mailmap itself. While I don't believe mailmaps solve the problem adequately, it's another example of a real-world failing when it comes to git and identity management.

@necauqua
Copy link
Contributor

necauqua commented Feb 5, 2024

The idea is cool, but unfortunately, one of the pillars jj stands on is near-ideal git compatibility, which I don't see (at a first glance anyway) a way to preserve with any, well, "non-git" identity management system

Although it could be opt-in, in case there are teams and projects fully embracing jj - but that's closer to the time jj has its own server backend

@joyously
Copy link

joyously commented Feb 6, 2024

Pijul uses Ed25519 for the author, and has a config folder with an identities folder which stores the public key and the email address.
It is supposedly easier to be anonymous and/or change your name or email. I doubt it would solve revocation.

@jyn514
Copy link
Contributor

jyn514 commented Feb 6, 2024

The idea is cool, but unfortunately, one of the pillars jj stands on is near-ideal git compatibility, which I don't see (at a first glance anyway) a way to preserve with any, well, "non-git" identity management system

well, jj supports having a different user/email pair configured than git (i've actually been meaning to open an issue for inferring it from git config -l by default). maybe it could use the did:web schema as the email address, so it shows up in git as being authored by e.g. jyn.dev, and then jj log has nicer layers on top of that to resolve that to a name.

@martinvonz
Copy link
Member

I'm not too worried about Git compatibility. Sure, any solution I can think of will make it not render as a real name to git, but that seems fine and pretty expected since we're abstracting the identity. It seems to me that the hardest part of this FR might not be the technical part but getting buy-in from GitHub or some other forge. Or how useful would people find the feature if it requires local configuration for all users without any support for synchronizing that configuration with a forge and without having the forge's own UI render the user's real(-ish) name?

@PhilipMetzger PhilipMetzger added the enhancement New feature or request label Feb 6, 2024
@steveklabnik
Copy link
Contributor Author

So yeah, to be clear, I don't think that you're going to be able to solve this problem in git. I personally would expect that you'd have some sort of "use this when making git commits with the git backend" identity, and then the native backend would support something more robust. That's why I think of it as a long-term thing rather than something that would move the needle in the short term.

@emilazy
Copy link
Contributor

emilazy commented Jun 14, 2024

While I totally understand and respect the ambition for forward‐thinking identity management, I’m wondering if there's any appetite for supporting Git‐format .mailmap files in the interim? It seems important for Git interoperability if nothing else, and I’d be interested in working on a PR for support if there’s interest in it.

emilazy added a commit to emilazy/jj that referenced this issue Jun 24, 2024
This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The various `author`/`committer` DSL and Rust functions respect the
  `.mailmap`, with `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. This does constitute a breaking change of the external
  Rust API, but hopefully the new parameter will nudge people towards
  doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 24, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The various `author`/`committer` DSL and Rust functions respect the
  `.mailmap`, with `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. This does constitute a breaking change of the external
  Rust API, but hopefully the new parameter will nudge people towards
  doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 24, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The various `author`/`committer` DSL and Rust functions respect the
  `.mailmap`, with `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. This does constitute a breaking change of the external
  Rust API, but hopefully the new parameter will nudge people towards
  doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 24, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The various `author`/`committer` DSL and Rust functions respect the
  `.mailmap`, with `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. This does constitute a breaking change of the external
  Rust API, but hopefully the new parameter will nudge people towards
  doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 24, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The various `author`/`committer` DSL and Rust functions respect the
  `.mailmap`, with `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. This does constitute a breaking change of the external
  Rust API, but hopefully the new parameter will nudge people towards
  doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 24, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The various `author`/`committer` DSL and Rust functions respect the
  `.mailmap`, with `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. This does constitute a breaking change of the external
  Rust API, but hopefully the new parameter will nudge people towards
  doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 24, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The various `author`/`committer` DSL and Rust functions respect the
  `.mailmap`, with `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. This does constitute a breaking change of the external
  Rust API, but hopefully the new parameter will nudge people towards
  doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 25, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 25, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 26, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jun 26, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow
  a query for any member to match all of them, but this would
  contradict what is actually displayed for the commits, require
  a more complicated implementation (to support patterns, we would
  basically have to map signatures back to every possible form and
  run the pattern against each of them), and the `*_raw` functions
  may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
emilazy added a commit to emilazy/jj that referenced this issue Jul 10, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

In this model, you can think of raw signatures as being like database
keys for a table mapping them to canonical signatures. In an ideal
world, these keys would be opaque synthetic keys with no human meaning
that are only surfaced when poking into internals; the idea is to treat
them as such to the greatest extent possible given the realities of
the current object model.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow a
  query for any member to match all of them, but this would contradict
  what is actually displayed for the commits, violate the principles
  about surfacing raw signatures detailed above, and the `*_raw`
  functions may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
maddiemort pushed a commit to maddiemort/jj that referenced this issue Sep 20, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

In this model, you can think of raw signatures as being like database
keys for a table mapping them to canonical signatures. In an ideal
world, these keys would be opaque synthetic keys with no human meaning
that are only surfaced when poking into internals; the idea is to treat
them as such to the greatest extent possible given the realities of
the current object model.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow a
  query for any member to match all of them, but this would contradict
  what is actually displayed for the commits, violate the principles
  about surfacing raw signatures detailed above, and the `*_raw`
  functions may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
maddiemort pushed a commit to maddiemort/jj that referenced this issue Sep 20, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

In this model, you can think of raw signatures as being like database
keys for a table mapping them to canonical signatures. In an ideal
world, these keys would be opaque synthetic keys with no human meaning
that are only surfaced when poking into internals; the idea is to treat
them as such to the greatest extent possible given the realities of
the current object model.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow a
  query for any member to match all of them, but this would contradict
  what is actually displayed for the commits, violate the principles
  about surfacing raw signatures detailed above, and the `*_raw`
  functions may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
maddiemort pushed a commit to maddiemort/jj that referenced this issue Sep 20, 2024
These live in the repository and map email addresses and names to
canonical ones, as described in [`gitmailmap(5)`].

[`gitmailmap(5)`]: https://git-scm.com/docs/gitmailmap

We introduce the notion of raw signatures, representing the name and
email baked in to an object’s metadata, and canonical signatures,
representing the up‐to‐date identity information obtained by
looking up a raw signature in a `.mailmap` file. When there are no
matching entries, the raw and canonical signatures are the same.

Canonical signatures should be used for the majority of purposes,
such as display and querying in user interfaces and automated batch
processing (e.g., to collate statistics by commit author, or email
committers in batch). Generally speaking, whenever you care about
*who made the commit* rather than *what data happens to be encoded
in the commit itself*, they are the appropriate thing to work with.

Raw signatures should usually not be surfaced to users by default
unless explicitly asked for. Valid reasons to work with them include
low‐level processing of commits where a `.mailmap` is not accessible
or would be inappropriate to use (e.g., rewriting or duplication
of commits without intent to alter metadata), automated testing,
forensics (examining the raw object data stored in the backend that
is used to compute a commit’s cryptographic hash), and analysis
and debugging of `.mailmap` files themselves.

In this model, you can think of raw signatures as being like database
keys for a table mapping them to canonical signatures. In an ideal
world, these keys would be opaque synthetic keys with no human meaning
that are only surfaced when poking into internals; the idea is to treat
them as such to the greatest extent possible given the realities of
the current object model.

The only signatures Jujutsu currently processes are commit authors
and committers, which can be obtained in raw and canonical form with
`Commit::{author,committer}_raw` and `Mailmap::{author,committer}`
respectively. If Jujutsu starts to store or process immutable identity
data in other contexts (e.g. support for additional metadata on commits
like Git’s `Co-authored-by`/`Signed-off-by`/`Reviewed-by` trailers,
or detached metadata that nonetheless must remain immutable), then
the notion of raw and canonical signatures will carry over to those
and the same guidelines about preferring to work with and display
canonical signatures whenever reasonable will apply.

This is not meant to be a comprehensive solution to identity management
or obsolete the discussion in jj-vcs#2957. There are many possible designs of
forward‐thinking author and committer identity systems that would
be a lot better than `.mailmap` files, but I don’t really want
to get lost in the weeds trying to solve an open research problem
here. Instead, this is just an acknowledgement that any system that
treats user names and emails as immutable (as Jujutsu currently does)
is going to need a mapping layer to keep them updated, and both Git
and Mercurial adopted `.mailmap` files, meaning they are already in
wide use to address this problem. All sufficiently large open source
repositories tend to grow a substantial `.mailmap` file, e.g. [Linux],
[Rust], [curl], [Mesa], [Node.js], and [Git] itself. Currently,
people working on these repositories with Jujutsu see and search
outdated and inconsistent authorship information that contradicts
what Git queries and outputs, which is at the very least somewhere
between confusing and unhelpful. Even if we had a perfect orthogonal
solution in the native backend, as long as we support working on Git
repositories it’s a compatibility‐relevant feature.

[Linux]: https://github.com/torvalds/linux/blob/f2661062f16b2de5d7b6a5c42a9a5c96326b8454/.mailmap
[Rust]: https://github.com/rust-lang/rust/blob/2c243d957008f5909f7a4af19e486ea8a3814be7/.mailmap
[curl]: https://github.com/curl/curl/blob/a7ec6a76abf5e29fb3f951a09d429ce5fbff250f/.mailmap
[Mesa]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/cdf3228f88361410175c338704908ea74dc7b8ae/.mailmap
[Node.js]: https://github.com/nodejs/node/blob/4c730aed7f825af1691740663d599e9de5958f89/.mailmap
[Git]: https://github.com/git/git/blob/9005149a4a77e2d3409c6127bf4fd1a0893c3495/.mailmap

That said, this is not exclusive to the Git backend. The `.mailmap`
name and format is perfectly generic, already shared between Git and
Mercurial, and applies to all systems that bake names and emails into
commits, including the current local backend. The code uses Gitoxide,
but only as a convenient implementation of the file format; in a
hypothetical world where the Git backend was removed without Jujutsu
changing its notion of commit signatures, `gix-mailmap` could be used
standalone, or replaced with a bespoke implementation.

I discussed this on the Discord server and we seemed to arrive
at a consensus that this would be a good feature to have for Git
compatibility and as a pragmatic stop‐gap measure for the larger
identity management problem, and that I should have a crack at
implementing it to see how complex it would be. Happily, it turned
out to be pretty simple! No major plumbing of state is required as
the users of the template and revset engines already have the working
copy commit close to hand to support displaying and matching `@`; I
think this should be more lightweight (but admittedly less powerful)
than the commit rewriting approach @arxanas floated on Discord.

## Notes on various design decisions

* The `.mailmap` file is read from the working copy commit of the
  current workspace.

  This is roughly equivalent to Git reading from
  `$GIT_WORK_TREE/.mailmap`, or `HEAD:.mailmap` in bare repositories,
  and seems like the best fit for Jujutsu’s model. I briefly looked
  into reading it from the actual on‐disk working copy, but it seemed
  a lot more complicated and I’m not sure if there’s any point.

  I didn’t add support for Git’s `mailmap.file` and `mailmap.blob`
  configuration options; unlike ignores, I don’t think I’ve
  ever seen this feature used other than directly in a repository,
  and `mailmap.blob` seems to mostly be there to keep it working in
  bare repositories. I can imagine something like a managed corporate
  multi‐repo environment with a globally‐shared `mailmap.file`
  so if people feel like this is important to keep consistency with I
  can look into implementing it. But genuinely I’ve never personally
  seen anybody use this.

* The `author`/`committer` DSL functions respect the `.mailmap`, with
  `*_raw` variants to ignore it.

  If there’s a `.mailmap` available, signatures should be mapped
  through it unless there’s a specific reason not to; this matches
  Git’s behaviour and is the main thing that makes this feature
  worthwhile. There is a corresponding breaking change of the external
  Rust API, but hopefully the new method name and documentation will
  nudge people towards doing the right thing.

  I was initially considering a keyword argument to the template
  and revset functions to specify whether to map or not (and
  even implemented keyword arguments for template functions), but
  I decided it was probably overkill and settled on the current
  separate functions. A suggestion from Discord was to add a method
  on signatures to the template language, e.g. `.canonical()` or
  `.mailmap()`. While this seems elegant to me, I still wanted
  the short, simple construction to be right by default, and I
  couldn’t think of any immediate uses outside of `.author()`
  and `.committer()`. If this is added later, we will still get the
  elegant property that `commit.{author,committer}()` is short for
  `commit.{author,committer}_raw().canonical()`.

* The mapping to canonical signatures is one‐way, and queries
  only match on the canonical form.

  This is the same behaviour as Git. The alternative would be to
  consider the mapped signatures as an equivalence set and allow a
  query for any member to match all of them, but this would contradict
  what is actually displayed for the commits, violate the principles
  about surfacing raw signatures detailed above, and the `*_raw`
  functions may be more useful in such a case anyway.

* There’s currently no real caching or optimization here.

  The `.mailmap` file is materialized and parsed whenever a template
  or revset context is initialized (although it’s still O(1), not
  parsing it for every processed commit), and `gix-mailmap` does a
  binary search to resolve signatures. I couldn’t measure any kind
  of substantial performance hit here, maybe 1‐3% percent on some
  `jj log` microbenchmarks, but it could just be noise; a couple
  times it was actually faster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants