Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Docstring validation is slow #57578

Open
lithomas1 opened this issue Feb 23, 2024 · 8 comments
Open

CI: Docstring validation is slow #57578

lithomas1 opened this issue Feb 23, 2024 · 8 comments
Labels
CI Continuous Integration

Comments

@lithomas1
Copy link
Member

It takes around 22 minutes, for some reason.

@lithomas1 lithomas1 added the CI Continuous Integration label Feb 23, 2024
@MarcoGorelli
Copy link
Member

yeah it needs to generate all these docstrings, write them to temporary files, run validation against them...

tbh I think we should rip all the docstring sharing logic, just have plain old docstrings without substitutions, and let ruff do all the linting and validation in a matter of seconds. this shouldn't be done manually though, maybe I should see if it's automatable

doctests can be validated with pytest --doctest-modules

@mroeschke
Copy link
Member

tbh I think we should rip all the docstring sharing logic, just have plain old docstrings without substitutions,

Big +1

@MarcoGorelli
Copy link
Member

cool - no idea how difficult this actually is but I'll give it a go next week and see

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 29, 2024

I started this here: #57683

I have a little script to do this, which seems to working well enough. Before I go too far, just checking people are OK with it

Pros of keeping docstring substitutions:

  • shorter docstrings within the code
  • helps dedupe some parameter definitions

Pros of removing docstring substitions:

  • docstrings become simple and easy-to-review
  • drastically faster docstring checking thanks to ruff
  • blazingly fast docstring formatting becomes possible
  • unblocks using darglint rules (which may make their way into ruff!) Implement rules in darglint astral-sh/ruff#458, and any other fast docstring rules in ruff (they already have pydocstyle ones)

I don't think ruff replaces all of the validate_docstrings script, but that script can be used with or without docstring substitutions. But removing docstring substitutions at least means we can start removing things from validate_docstrings which are covered by ruff and cut down on CI like this

@WillAyd
Copy link
Member

WillAyd commented Mar 1, 2024

yeah it needs to generate all these docstrings, write them to temporary files, run validation against them...

I get we have a lot of docstrings but I am surprised that this would take 22 minutes.

Not a hill I'd want to die on but I do think the shared docstrings serve a good purpose in keeping things consistent

@dontgoto
Copy link
Contributor

dontgoto commented Mar 17, 2024

Wasn't aware of this issue and wrote #57826 after I introduced a docstring error in my first PR and got annoyed at the speed of the docstring checks. It reduces the runtime of the check_code.sh docstrings down to 2-3 minutes again.

I also have another PR ready to go once #57826 is merged. That second PR brings the validation down to about 10 seconds. Main issue was the way the temp files were created and how pep8 was called via the shell.

So maybe with a 10 second runtime it's not necessary to do any functionality changes to the docstring validation and sharing in the end?

@mroeschke
Copy link
Member

So maybe with a 10 second runtime it's not necessary to do any functionality changes to the docstring validation and sharing in the end?

I think the docstring sharing is still worth removing to:

  1. Make it easier for new contributors to work on docstrings due to less complexity
  2. Allow ruff to take over docstring formatting and validation

@datapythonista
Copy link
Member

I'm happy to move towards removing or reducing the docstring sharing. But based on my experience getting fully rid of it may take years. If we want to proactively work on duplicating now reused docstrings, I'd start by the ones with higher complexity and see how that goes, before opening issues to remove all them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

No branches or pull requests

6 participants