-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scripts/release-notes.py: derive author aliases from AUTHORS #42339
Conversation
cc @jseldess |
Release note: None
d925a5a
to
875c0d0
Compare
Previously, the release-notes script contained an explicit list of contributors to the repo with their name aliases, to disambiguate entries in the git log while constructing the release notes report. The explicit list was problematic in numerous ways, foremost because it was often out of date and incomplete, and also because it couldn't distinguish different folk with the same name but different addresses. Luckily, Git has a standard solution for this problem, called a "mailmap" file. The format of this file is standardized and documented in the man page git-check-mailmap(1). To simplify, this file format has one line per _person_, and each line can contain multiple name/mail combinations for that person. Meanwhile, the CockroachDB repository also maintains a file `AUTHORS` at the top of the source tree. This also contains a list of contributors with names and e-mails, although for far this file was not maintained to conform exactly to Git's mailmap format. This commit bridges the two things as follows: - it updates the `AUTHORS` file to conform to the mailmap format. This removes duplicate entries, and spells out Git commit/author aliases alongside the primary name of a person. - it simplifies the release-notes script to use AUTHORS as its input database. Release note: None
@chrisseto if you could review the code change that would be swell. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @knz)
scripts/release-notes.py, line 51 at r2 (raw file):
mmare = re.compile('(?P<alias>[^<]*)<(?P<addr>[^>]*)>') crdb_folk = set() class P:
A bit more descriptive name would be nice here, maybe Person
?
scripts/release-notes.py, line 68 at r2 (raw file):
def define_person(name, addr): p = P(name, addr) canon = (name, addr)
You could use the person object as the key if you override hash and eq
scripts/release-notes.py, line 70 at r2 (raw file):
canon = (name, addr) if canon in mmap_bycanon: print('warning: duplicate person %r, ignoring', canon)
Should the be going to stderr? file=sys.stderr
Python's print function doesn't do any built in formatting. Instead call str.format
print('warning: duplicate person %r, ignoring'.format(canon))
scripts/release-notes.py, line 71 at r2 (raw file):
if canon in mmap_bycanon: print('warning: duplicate person %r, ignoring', canon) return None
This case doesn't appear likely to happen but it looks like the None
could trickle through lookup_person
and cause some weird errors?
Would it be better to throw an exception here and request that AUTHORS be checked for validity?
scripts/release-notes.py, line 73 at r2 (raw file):
byaddr = mmap_byaddr.get(addr, []) byaddr.append(p) mmap_byaddr[addr] = byaddr
mmap_byaddr.setdefault(addr, []).append(p)
is functionally equivalent and a bit more readable IMO
scripts/release-notes.py, line 81 at r2 (raw file):
return p if not os.path.exists('AUTHORS'):
AUTHORS could be targeted directly by making use of python's file variable
scripts/release-notes.py, line 83 at r2 (raw file):
if not os.path.exists('AUTHORS'): print('warning: AUTHORS missing in current directory.', file=sys.stderr) print('Maybe use "cd" to navigate to the working tree root.', file=sys.stderr)
Should the script exit in this case?
scripts/release-notes.py, line 95 at r2 (raw file):
if p is None: continue p.crdb = '@cockroachlabs.com' in line
Remove this line, it is already handled in P
's constructor
scripts/release-notes.py, line 97 at r2 (raw file):
p.crdb = '@cockroachlabs.com' in line if p.crdb: crdb_folk.add(p)
Looks like crdb_folk.add(p)
gets called twice.
Once here and once in the P constructor.
Technically it doesn't matter as crdb_folk is a set but this could be confusing for the next author.
I think the crdb_folk.add(self)
in P.__init__
should be removed as it is less "side effect-y"
scripts/release-notes.py, line 99 at r2 (raw file):
crdb_folk.add(p) aliases = m.group('aliases') aliases = mmare.findall(aliases)
aliases = m.group(m.group('aliases'))
might be a bit more explicit.
scripts/release-notes.py, line 102 at r2 (raw file):
for alias, addr in aliases: name = alias.strip() byaddr = mmap_byaddr.get(addr, [])
Can this block instead delegate to define_person
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @chrisseto and @jordanlewis)
scripts/release-notes.py, line 51 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
A bit more descriptive name would be nice here, maybe
Person
?
Done.
scripts/release-notes.py, line 68 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
You could use the person object as the key if you override hash and eq
Well except that I want to be using this dict when I don't have a person object yet.
scripts/release-notes.py, line 70 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
Should the be going to stderr?
file=sys.stderr
Python's print function doesn't do any built in formatting. Instead call str.format
print('warning: duplicate person %r, ignoring'.format(canon))
Done.
scripts/release-notes.py, line 71 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
This case doesn't appear likely to happen but it looks like the
None
could trickle throughlookup_person
and cause some weird errors?Would it be better to throw an exception here and request that AUTHORS be checked for validity?
Good catch. It's ok to return the existing entry. Done.
scripts/release-notes.py, line 73 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
byaddr = mmap_byaddr.get(addr, []) byaddr.append(p) mmap_byaddr[addr] = byaddr
mmap_byaddr.setdefault(addr, []).append(p)
is functionally equivalent and a bit more readable IMO
Nice TIL. Done.
scripts/release-notes.py, line 81 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
AUTHORS could be targeted directly by making use of python's file variable
This was intentional. Added a comment to clarify.
scripts/release-notes.py, line 83 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
Should the script exit in this case?
Nah it's valid to proceed without aliases.
scripts/release-notes.py, line 95 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
Remove this line, it is already handled in
P
's constructor
Not quite: the constructor did not catch the crdb address in an alias.
But I have re-organized the code to simplify.
scripts/release-notes.py, line 97 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
Looks like
crdb_folk.add(p)
gets called twice.
Once here and once in the P constructor.Technically it doesn't matter as crdb_folk is a set but this could be confusing for the next author.
I think thecrdb_folk.add(self)
inP.__init__
should be removed as it is less "side effect-y"
Good idea. Done.
scripts/release-notes.py, line 99 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
aliases = m.group(m.group('aliases'))
might be a bit more explicit.
I don't understand?
scripts/release-notes.py, line 102 at r2 (raw file):
Previously, chrisseto (Chris Seto) wrote…
Can this block instead delegate to
define_person
?
Done.
TFYRs! bors r+ |
42339: scripts/release-notes.py: derive author aliases from AUTHORS r=knz a=knz Previously, the release-notes script contained an explicit list of contributors to the repo with their name aliases, to disambiguate entries in the git log while constructing the release notes report. The explicit list was problematic in numerous ways, foremost because it was often out of date and incomplete, and also because it couldn't distinguish different folk with the same name but different addresses. Luckily, Git has a standard solution for this problem, called a "mailmap" file. The format of this file is standardized and documented in the man page git-check-mailmap(1). To simplify, this file format has one line per _person_, and each line can contain multiple name/mail combinations for that person. Meanwhile, the CockroachDB repository also maintains a file `AUTHORS` at the top of the source tree. This also contains a list of contributors with names and e-mails, although for far this file was not maintained to conform exactly to Git's mailmap format. This commit bridges the two things as follows: - it updates the `AUTHORS` file to conform to the mailmap format. This removes duplicate entries, and spells out Git commit/author aliases alongside the primary name of a person. - it simplifies the release-notes script to use AUTHORS as its input database. Release note: None Co-authored-by: Raphael 'kena' Poss <[email protected]>
Build succeeded |
Previously, the release-notes script contained an explicit list of
contributors to the repo with their name aliases, to disambiguate
entries in the git log while constructing the release notes report.
The explicit list was problematic in numerous ways, foremost because
it was often out of date and incomplete, and also because it couldn't
distinguish different folk with the same name but different addresses.
Luckily, Git has a standard solution for this problem, called a
"mailmap" file. The format of this file is standardized and documented
in the man page git-check-mailmap(1). To simplify, this file format
has one line per person, and each line can contain multiple
name/mail combinations for that person.
Meanwhile, the CockroachDB repository also maintains a file
AUTHORS
at the top of the source tree. This also contains a list of
contributors with names and e-mails, although for far this file was
not maintained to conform exactly to Git's mailmap format.
This commit bridges the two things as follows:
AUTHORS
file to conform to the mailmap format.This removes duplicate entries, and spells out Git commit/author
aliases alongside the primary name of a person.
as its input database.
Release note: None