Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scripts/release-notes.py: derive author aliases from AUTHORS #42339

Merged
merged 2 commits into from
Nov 25, 2019

Conversation

knz
Copy link
Contributor

@knz knz commented Nov 9, 2019

Previously, the release-notes script contained an explicit list of
contributors to the repo with their name aliases, to disambiguate
entries in the git log while constructing the release notes report.

The explicit list was problematic in numerous ways, foremost because
it was often out of date and incomplete, and also because it couldn't
distinguish different folk with the same name but different addresses.

Luckily, Git has a standard solution for this problem, called a
"mailmap" file. The format of this file is standardized and documented
in the man page git-check-mailmap(1). To simplify, this file format
has one line per person, and each line can contain multiple
name/mail combinations for that person.

Meanwhile, the CockroachDB repository also maintains a file AUTHORS
at the top of the source tree. This also contains a list of
contributors with names and e-mails, although for far this file was
not maintained to conform exactly to Git's mailmap format.

This commit bridges the two things as follows:

  • it updates the AUTHORS file to conform to the mailmap format.
    This removes duplicate entries, and spells out Git commit/author
    aliases alongside the primary name of a person.
  • it simplifies the release-notes script to use AUTHORS
    as its input database.

Release note: None

@knz knz requested a review from jordanlewis November 9, 2019 20:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Nov 9, 2019

cc @jseldess

@knz knz force-pushed the 20191109-rn branch 3 times, most recently from d925a5a to 875c0d0 Compare November 9, 2019 22:21
Previously, the release-notes script contained an explicit list of
contributors to the repo with their name aliases, to disambiguate
entries in the git log while constructing the release notes report.

The explicit list was problematic in numerous ways, foremost because
it was often out of date and incomplete, and also because it couldn't
distinguish different folk with the same name but different addresses.

Luckily, Git has a standard solution for this problem, called a
"mailmap" file. The format of this file is standardized and documented
in the man page git-check-mailmap(1). To simplify, this file format
has one line per _person_, and each line can contain multiple
name/mail combinations for that person.

Meanwhile, the CockroachDB repository also maintains a file `AUTHORS`
at the top of the source tree. This also contains a list of
contributors with names and e-mails, although for far this file was
not maintained to conform exactly to Git's mailmap format.

This commit bridges the two things as follows:

- it updates the `AUTHORS` file to conform to the mailmap format.
  This removes duplicate entries, and spells out Git commit/author
  aliases alongside the primary name of a person.
- it simplifies the release-notes script to use AUTHORS
  as its input database.

Release note: None
@knz
Copy link
Contributor Author

knz commented Nov 11, 2019

@chrisseto if you could review the code change that would be swell.

Copy link
Contributor

@chrisseto chrisseto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @knz)


scripts/release-notes.py, line 51 at r2 (raw file):

mmare = re.compile('(?P<alias>[^<]*)<(?P<addr>[^>]*)>')
crdb_folk = set()
class P:

A bit more descriptive name would be nice here, maybe Person?


scripts/release-notes.py, line 68 at r2 (raw file):

def define_person(name, addr):
        p = P(name, addr)
        canon = (name, addr)

You could use the person object as the key if you override hash and eq


scripts/release-notes.py, line 70 at r2 (raw file):

        canon = (name, addr)
        if canon in mmap_bycanon:
            print('warning: duplicate person %r, ignoring', canon)

Should the be going to stderr? file=sys.stderr

Python's print function doesn't do any built in formatting. Instead call str.format

print('warning: duplicate person %r, ignoring'.format(canon))

scripts/release-notes.py, line 71 at r2 (raw file):

        if canon in mmap_bycanon:
            print('warning: duplicate person %r, ignoring', canon)
            return None

This case doesn't appear likely to happen but it looks like the None could trickle through lookup_person and cause some weird errors?

Would it be better to throw an exception here and request that AUTHORS be checked for validity?


scripts/release-notes.py, line 73 at r2 (raw file):

        byaddr = mmap_byaddr.get(addr, [])
        byaddr.append(p)
        mmap_byaddr[addr] = byaddr

mmap_byaddr.setdefault(addr, []).append(p) is functionally equivalent and a bit more readable IMO


scripts/release-notes.py, line 81 at r2 (raw file):

        return p

if not os.path.exists('AUTHORS'):

AUTHORS could be targeted directly by making use of python's file variable


scripts/release-notes.py, line 83 at r2 (raw file):

if not os.path.exists('AUTHORS'):
    print('warning: AUTHORS missing in current directory.', file=sys.stderr)
    print('Maybe use "cd" to navigate to the working tree root.', file=sys.stderr)

Should the script exit in this case?


scripts/release-notes.py, line 95 at r2 (raw file):

            if p is None:
                continue
            p.crdb = '@cockroachlabs.com' in line

Remove this line, it is already handled in P's constructor


scripts/release-notes.py, line 97 at r2 (raw file):

            p.crdb = '@cockroachlabs.com' in line
            if p.crdb:
                crdb_folk.add(p)

Looks like crdb_folk.add(p) gets called twice.
Once here and once in the P constructor.

Technically it doesn't matter as crdb_folk is a set but this could be confusing for the next author.
I think the crdb_folk.add(self) in P.__init__ should be removed as it is less "side effect-y"


scripts/release-notes.py, line 99 at r2 (raw file):

                crdb_folk.add(p)
            aliases = m.group('aliases')
            aliases = mmare.findall(aliases)

aliases = m.group(m.group('aliases')) might be a bit more explicit.


scripts/release-notes.py, line 102 at r2 (raw file):

            for alias, addr in aliases:
                name = alias.strip()
                byaddr = mmap_byaddr.get(addr, [])

Can this block instead delegate to define_person?

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @chrisseto and @jordanlewis)


scripts/release-notes.py, line 51 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

A bit more descriptive name would be nice here, maybe Person?

Done.


scripts/release-notes.py, line 68 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

You could use the person object as the key if you override hash and eq

Well except that I want to be using this dict when I don't have a person object yet.


scripts/release-notes.py, line 70 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Should the be going to stderr? file=sys.stderr

Python's print function doesn't do any built in formatting. Instead call str.format

print('warning: duplicate person %r, ignoring'.format(canon))

Done.


scripts/release-notes.py, line 71 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

This case doesn't appear likely to happen but it looks like the None could trickle through lookup_person and cause some weird errors?

Would it be better to throw an exception here and request that AUTHORS be checked for validity?

Good catch. It's ok to return the existing entry. Done.


scripts/release-notes.py, line 73 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…
        byaddr = mmap_byaddr.get(addr, [])
        byaddr.append(p)
        mmap_byaddr[addr] = byaddr

mmap_byaddr.setdefault(addr, []).append(p) is functionally equivalent and a bit more readable IMO

Nice TIL. Done.


scripts/release-notes.py, line 81 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

AUTHORS could be targeted directly by making use of python's file variable

This was intentional. Added a comment to clarify.


scripts/release-notes.py, line 83 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Should the script exit in this case?

Nah it's valid to proceed without aliases.


scripts/release-notes.py, line 95 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Remove this line, it is already handled in P's constructor

Not quite: the constructor did not catch the crdb address in an alias.

But I have re-organized the code to simplify.


scripts/release-notes.py, line 97 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Looks like crdb_folk.add(p) gets called twice.
Once here and once in the P constructor.

Technically it doesn't matter as crdb_folk is a set but this could be confusing for the next author.
I think the crdb_folk.add(self) in P.__init__ should be removed as it is less "side effect-y"

Good idea. Done.


scripts/release-notes.py, line 99 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

aliases = m.group(m.group('aliases')) might be a bit more explicit.

I don't understand?


scripts/release-notes.py, line 102 at r2 (raw file):

Previously, chrisseto (Chris Seto) wrote…

Can this block instead delegate to define_person?

Done.

@knz
Copy link
Contributor Author

knz commented Nov 25, 2019

TFYRs!

bors r+

craig bot pushed a commit that referenced this pull request Nov 25, 2019
42339: scripts/release-notes.py: derive author aliases from AUTHORS r=knz a=knz

Previously, the release-notes script contained an explicit list of
contributors to the repo with their name aliases, to disambiguate
entries in the git log while constructing the release notes report.

The explicit list was problematic in numerous ways, foremost because
it was often out of date and incomplete, and also because it couldn't
distinguish different folk with the same name but different addresses.

Luckily, Git has a standard solution for this problem, called a
"mailmap" file. The format of this file is standardized and documented
in the man page git-check-mailmap(1). To simplify, this file format
has one line per _person_, and each line can contain multiple
name/mail combinations for that person.

Meanwhile, the CockroachDB repository also maintains a file `AUTHORS`
at the top of the source tree. This also contains a list of
contributors with names and e-mails, although for far this file was
not maintained to conform exactly to Git's mailmap format.

This commit bridges the two things as follows:

- it updates the `AUTHORS` file to conform to the mailmap format.
  This removes duplicate entries, and spells out Git commit/author
  aliases alongside the primary name of a person.
- it simplifies the release-notes script to use AUTHORS
  as its input database.

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
@craig
Copy link
Contributor

craig bot commented Nov 25, 2019

Build succeeded

@craig craig bot merged commit a242b72 into cockroachdb:master Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants