Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint: Update headers and checks per current guidance & provide helpful feedback #2484

Merged
merged 17 commits into from
Apr 20, 2022

Conversation

CAM-Gerlach
Copy link
Member

@CAM-Gerlach CAM-Gerlach commented Mar 30, 2022

Over the past >year, we've made significant progress in programmatically parsing the PEP headers, using them to intelligently display more useful, informative and readable output in the rendered PEPs, conforming them to our modern guidance, automatically checking their format and (particularly now with #2475) making it easy for other tools to consume and enrich them.

As a final step toward these overarching goals, this PR:

  • Adds the previously-missing automatic checks for all the remaining headers, so errors in them that could affect rendering and human- and machine-readability no longer pass silently
  • Updates the existing checks to reflect the current format specified by PEP 1/12 and expected by current and future tooling
  • Conforms the modest number of remaining existing PEPs to ensure the headers render correctly and are machine-readable, and
  • Provides much more helpful, specific and user-friendly descriptive check text.

Overall, this PRs enhancements to our existing checks:

  • Helps PEP authors, by giving them near-instant, automatic and targeted feedback about any syntax issues, locally or on CI, without having to for a full rebuild or a human review
  • Helps PEP editors by freeing them from the need to manually inspect and fix PEP headers
  • Avoids any edge cases in our rendering pipeline
  • Helps both readers and tools by ensuring the output is easy for both humans and machines to read, and
  • Opens the door to more "smart" processing in the future (e.g. automatic Discussions-To generation) to reduce PEP author/editor workload while making the displayed output more useful and readable.

To note, this PR does not make any new headers required, invalidate any existing established header formats nor require any of the newer ones; the new checks only trigger on the formats.

In addition to the above, this resolves the linting side of #2402 and supports the improvements in #2475 and #2467 .

Copy link
Member

@hugovk hugovk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate all your work here, thank you!

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Apr 17, 2022

Actually, I noticed there's actually a significant problem as it stands allowing manual obfuscation—glancing at the _mask_email() source code as part of the trailing comma oversight you noted on #2531 and I then fixed in #2467 , I notice that the automatic obfuscation is actually a fair bit more effective than just replacing @ with a literal at; rather, it actually uses the raw HTML entity codes for some of the characters, significantly raising the bar for scrapers.

However, since emails with manual obfuscation applied don't get converted into reference nodes, the automatic obfuscation logic doesn't work, so these emails actually are substantially less obfuscated than they would otherwise be (which PEP authors are almost certainly unaware of, and wasn't even to me until I dug deep into the code and actually tested it). Picking on myself here, from PEP 0, compare these two:

<tr class="[row-odd]()"><td>Gerlach, C.A.M.</td>
<td>cam.gerlach at gerlach.cam</td>
</tr>

versus, e.g.

<tr class="[row-even]()"><td>van Rossum, Guido (GvR)</td>
<td>guido&#32;&#97;t&#32;python.org</td>
</tr>

I attempted to add support for also properly masking manually obfuscated emails, but after some testing realized it would require some pretty significant code changes due to how things are currently structured, as well as some hacky and potentially unreliable heuristics. Furthermore, I realized this also makes the not showing the email addresses all, or only as abbreviations, as discussed in #2514 more complicated, since whether or not the email is actually processed as an email changes the doctree structure and node types.

Therefore, I conclude it would be best to re-revert the part of the previous change to conform the small minority of emails that were manually obfuscated to use standard email syntax and restore the linter check for such, so they are correctly processed and masked by the header transform code (and anything else that needs to mask/obfuscate/elide them), in order to ensure that the various automatic measures to protect authors' emails work consistently.

@CAM-Gerlach
Copy link
Member Author

Actually, upon giving this more thought, making the author-emails abbrs instead of literal text as discussed in #2514 still requires doing fairly involved string-munging anyway due to having to parse and transform the older Email (author) format, so while it does make things a little more complicated, its not that much worse than parsing a while different syntax.

As such, I suggest we just go ahead and merge this PR as-is with the manual obfuscation still untouched, and then I can address properly masking manually "obfuscated" emails along with formatting them consistently as abbrs in a followup PR. It would also be pretty easy to improve the obfuscation further by choosing different Unicode lookalike characters for the spaces and letters, which coupled with being embedded in the abbr and using raw character codes should make it virtually impossible for spam harvesting, far more so than the common and well known replacement of @ with at.

@CAM-Gerlach
Copy link
Member Author

Since it seems I've satisfied the immediate concerns that were raised and the two PEP editors that previously reviewed have ✔️ ed, this PEP has been open for a while and it seems there aren't any further objections, and it is blocking some further discussed and agreed changes (abbr for emails, making Content-Type optional, etc.,), I'll go ahead and finally merge this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA signed lint Linter-related work and linting fixes on PEPs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants