Lint: Update headers and checks per current guidance & provide helpful feedback #2484

CAM-Gerlach · 2022-03-30T06:37:27Z

Over the past >year, we've made significant progress in programmatically parsing the PEP headers, using them to intelligently display more useful, informative and readable output in the rendered PEPs, conforming them to our modern guidance, automatically checking their format and (particularly now with #2475) making it easy for other tools to consume and enrich them.

As a final step toward these overarching goals, this PR:

Adds the previously-missing automatic checks for all the remaining headers, so errors in them that could affect rendering and human- and machine-readability no longer pass silently
Updates the existing checks to reflect the current format specified by PEP 1/12 and expected by current and future tooling
Conforms the modest number of remaining existing PEPs to ensure the headers render correctly and are machine-readable, and
Provides much more helpful, specific and user-friendly descriptive check text.

Overall, this PRs enhancements to our existing checks:

Helps PEP authors, by giving them near-instant, automatic and targeted feedback about any syntax issues, locally or on CI, without having to for a full rebuild or a human review
Helps PEP editors by freeing them from the need to manually inspect and fix PEP headers
Avoids any edge cases in our rendering pipeline
Helps both readers and tools by ensuring the output is easy for both humans and machines to read, and
Opens the door to more "smart" processing in the future (e.g. automatic Discussions-To generation) to reduce PEP author/editor workload while making the displayed output more useful and readable.

To note, this PR does not make any new headers required, invalidate any existing established header formats nor require any of the newer ones; the new checks only trigger on the formats.

In addition to the above, this resolves the linting side of #2402 and supports the improvements in #2475 and #2467 .

hugovk

I appreciate all your work here, thank you!

CAM-Gerlach · 2022-04-17T23:02:01Z

Actually, I noticed there's actually a significant problem as it stands allowing manual obfuscation—glancing at the _mask_email() source code as part of the trailing comma oversight you noted on #2531 and I then fixed in #2467 , I notice that the automatic obfuscation is actually a fair bit more effective than just replacing @ with a literal at; rather, it actually uses the raw HTML entity codes for some of the characters, significantly raising the bar for scrapers.

However, since emails with manual obfuscation applied don't get converted into reference nodes, the automatic obfuscation logic doesn't work, so these emails actually are substantially less obfuscated than they would otherwise be (which PEP authors are almost certainly unaware of, and wasn't even to me until I dug deep into the code and actually tested it). Picking on myself here, from PEP 0, compare these two:

<tr class="[row-odd]()"><td>Gerlach, C.A.M.</td>
<td>cam.gerlach at gerlach.cam</td>
</tr>

versus, e.g.

<tr class="[row-even]()"><td>van Rossum, Guido (GvR)</td>
<td>guido&#32;&#97;t&#32;python.org</td>
</tr>

I attempted to add support for also properly masking manually obfuscated emails, but after some testing realized it would require some pretty significant code changes due to how things are currently structured, as well as some hacky and potentially unreliable heuristics. Furthermore, I realized this also makes the not showing the email addresses all, or only as abbreviations, as discussed in #2514 more complicated, since whether or not the email is actually processed as an email changes the doctree structure and node types.

Therefore, I conclude it would be best to re-revert the part of the previous change to conform the small minority of emails that were manually obfuscated to use standard email syntax and restore the linter check for such, so they are correctly processed and masked by the header transform code (and anything else that needs to mask/obfuscate/elide them), in order to ensure that the various automatic measures to protect authors' emails work consistently.

CAM-Gerlach · 2022-04-19T07:44:10Z

Actually, upon giving this more thought, making the author-emails abbrs instead of literal text as discussed in #2514 still requires doing fairly involved string-munging anyway due to having to parse and transform the older Email (author) format, so while it does make things a little more complicated, its not that much worse than parsing a while different syntax.

As such, I suggest we just go ahead and merge this PR as-is with the manual obfuscation still untouched, and then I can address properly masking manually "obfuscated" emails along with formatting them consistently as abbrs in a followup PR. It would also be pretty easy to improve the obfuscation further by choosing different Unicode lookalike characters for the spaces and letters, which coupled with being embedded in the abbr and using raw character codes should make it virtually impossible for spam harvesting, far more so than the common and well known replacement of @ with at.

CAM-Gerlach · 2022-04-20T09:52:51Z

Since it seems I've satisfied the immediate concerns that were raised and the two PEP editors that previously reviewed have ✔️ ed, this PEP has been open for a while and it seems there aren't any further objections, and it is blocking some further discussed and agreed changes (abbr for emails, making Content-Type optional, etc.,), I'll go ahead and finally merge this now.

the-knights-who-say-ni added the CLA signed label Mar 30, 2022

CAM-Gerlach self-assigned this Mar 30, 2022

CAM-Gerlach added the lint Linter-related work and linting fixes on PEPs label Mar 30, 2022

CAM-Gerlach marked this pull request as ready for review March 30, 2022 06:41

hugovk approved these changes Apr 17, 2022

View reviewed changes

CAM-Gerlach mentioned this pull request Apr 17, 2022

PEP 639: Update header, footer, link, reference and code block syntax #2531

Merged

CAM-Gerlach force-pushed the lint-update-header-checks branch from 5161ecb to 5f42e96 Compare April 19, 2022 07:30

CAM-Gerlach force-pushed the lint-update-header-checks branch from 5f42e96 to 6b371c4 Compare April 19, 2022 07:46

CAM-Gerlach added 17 commits April 19, 2022 23:53

Lint: Disable a couple potentially problematic/unneeded hooks

c826909

Lint: Refine regex for existing Content-Type and PEP references checks

e03b73c

Lint: Refine regex for Python-Version check and fix a couple deviations

8df4c4d

Lint: Refine regex for Created check and fix a couple deviations

2c98989

Lint: Refine regex for Resolution check and fix a few deviations

e75643e

Lint: Add new check for Post-History header

7abaaf3

Lint: Add check for Discussions-To header link

ad6ba56

Lint: Add basic check for PEP title presense & excessive length

dd30fff

Lint: Add check for PEP/BDFL-Delegate header

309ec3c

Lint: Add check for Sponsor header

b3fef66

Lint: Add check for legacy Author field format

eff7423

Lint: Add check for RST Author field

a829dff

Lint: Use more helpful, user-oriented descriptions for header checks

90add11

PEP 245: Add historical note and archive links to wiki page/email lsit

3712bba

PEP 628: Add direct link to resolution in acceptance message

afd0a40

Lint: Explictly allow all valid RFC 2822 line continuations in headers

b51cde8

Lint: Tweak Post-History and Resolution checks to accept fragments

a894d22

CAM-Gerlach force-pushed the lint-update-header-checks branch from 6b371c4 to a894d22 Compare April 20, 2022 04:53

CAM-Gerlach merged commit a6336e9 into python:main Apr 20, 2022

CAM-Gerlach mentioned this pull request Apr 21, 2022

PEP 11: Add Discussions section #2544

Merged

CAM-Gerlach mentioned this pull request May 8, 2022

Decouple and unify PEP header processing for rendering, PEP 0, JSON, RSS and linting #2587

Open

erlend-aasland mentioned this pull request May 13, 2022

pep8/greppable exception messages erlend-aasland/peps#1

Closed

erlend-aasland mentioned this pull request Jun 27, 2022

pep 687/mark as accepted erlend-aasland/peps#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lint: Update headers and checks per current guidance & provide helpful feedback #2484

Lint: Update headers and checks per current guidance & provide helpful feedback #2484

CAM-Gerlach commented Mar 30, 2022 •

edited

Loading

hugovk left a comment

CAM-Gerlach commented Apr 17, 2022 •

edited

Loading

CAM-Gerlach commented Apr 19, 2022

CAM-Gerlach commented Apr 20, 2022

Lint: Update headers and checks per current guidance & provide helpful feedback #2484

Lint: Update headers and checks per current guidance & provide helpful feedback #2484

Conversation

CAM-Gerlach commented Mar 30, 2022 • edited Loading

hugovk left a comment

Choose a reason for hiding this comment

CAM-Gerlach commented Apr 17, 2022 • edited Loading

CAM-Gerlach commented Apr 19, 2022

CAM-Gerlach commented Apr 20, 2022

CAM-Gerlach commented Mar 30, 2022 •

edited

Loading

CAM-Gerlach commented Apr 17, 2022 •

edited

Loading