Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make gpuCI and pre-commit style configurations consistent #8215

Merged
merged 27 commits into from
Jul 15, 2021

Conversation

charlesbluca
Copy link
Member

@charlesbluca charlesbluca commented May 11, 2021

Closes #8193

Makes changes to the pre-commit config, gpuCI style script, and cpp/scripts/run-clang-format.py so that the styling of the pre-commit hooks should roughly match the style checks done by gpuCI. In particular:

  • pre-commit's black and flake8 hooks now only target python/ to match with gpuCI
  • gpuCI's cpp/scripts/run-clang-format.py now runs clang-format on cpp/libcudf_kafka/ and java/src/main/native/ to match with pre-commit (also removed cpp/include/cudf and cpp/include/nvtext as they were redundant here)
  • isort has been split up by project (cudf, dask-cudf, etc.) and now runs on each project using its respective configuration; this was done because currently the isort check on gpuCI was only checking Python files one level deep in each project

It would probably be good to run the updated hooks on the codebase either in this PR or a linked one so that this changes doesn't break tests.

@charlesbluca charlesbluca requested review from a team as code owners May 11, 2021 18:23
@github-actions github-actions bot added gpuCI libcudf Affects libcudf (C++/CUDA) code. labels May 11, 2021
Copy link
Member Author

@charlesbluca charlesbluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor quirks around the behavior of isort:

.pre-commit-config.yaml Show resolved Hide resolved
ISORT=`isort --check-only python/**/*.py`
ISORT_RETVAL=$?
# Run isort-cudf and get results/return code
ISORT_CUDF=`isort python/cudf --check-only --settings-path=python/cudf/setup.cfg --skip-glob *.pyx --skip-glob *.pyi 2>&1`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, the pre-commit hooks for isort automatically skip Cython files, while the CLI does not, so I had to specify to skip them here. Do we want package sorting for Cython files?

Copy link
Member Author

@charlesbluca charlesbluca May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we can get Cython/PYI resorting in pre-commit's isort hook with:

types: [text]
types_or: [python, cython, pyi]

So if we would like package resorting on Cython and/or module files it's now totally doable.

@jakirkham jakirkham added bug Something isn't working non-breaking Non-breaking change labels May 12, 2021
@mythrocks
Copy link
Contributor

gpuCI's cpp/scripts/run-clang-format.py now runs clang-format on cpp/libcudf_kafka/ and java/src/main/native/ to match with pre-commit.

The fact that this runs on java/src/main/native/ makes me a tad nervous. Not to imply that I disagree; I would prefer the formatting be applied too. It's just that this implies a fair bit of formatting changes to the existing JNI/native code.
cc @revans2, @jlowe.

@mythrocks
Copy link
Contributor

I think this PR would need to include the results of running run-clang-format on cpp/libcudf_kafka and java/src/main/native. For the moment, we should include this as a separate commit, so as not to muddy the review.

Also, @charlesbluca, could you please confirm that as a result of this commit, the intention is for clang formatting to run on cpp/libcudf_kafka and java/src/main/native, even for PRs where [skip-ci] is specified?

I ask because java/src/main tests (even the native ones) are not currently run as part of CI for historical reasons. These PRs (e.g. #8209) are submitted with [skip-ci] so as not to use up CI resources in rerunning unrelated libcudf tests.

It would be good to have [skip-ci] not skip clang-format checks in libcudf_kafka and java. Otherwise, the errant code would get merged, and fail in other PRs that don't [skip-ci].

@charlesbluca
Copy link
Member Author

For the moment, we should include this as a separate commit, so as not to muddy the review.

I agree - I was planning to run through the pre-commit hooks once all the config changes were ironed out.

the intention is for clang formatting to run on cpp/libcudf_kafka and java/src/main/native, even for PRs where [skip-ci] is specified?

No, the intent is so that when gpuCI is ran, it will run clang-format on cpp/libcudf_kafka and java/src/main/native, in addition to all the other (in your case, unnecessary) tests. What you're describing does sound like a good idea though, but I'm not really sure how gpuCI handles conditionals for testing (i.e., only run this set of tests if these files are impacted by a PR).

A relatively simple workaround for this which would address the situation you described and also make this PR a little simpler would be to make a GitHub Actions workflow running a pre-commit action (or using precommit.ci, which seems to be its successor). This would ensure that the style checks being ran when making commits and opening PRs are always the same, since they rely on identical configs; it would also create a workflow to check for formatting independent of if a PR is submitted with [skip ci].

That being said, I'm not sure of the potential downsides to moving aspects of our testing outside of gpuCI.

ci/checks/style.sh Outdated Show resolved Hide resolved
Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving ops-codeowner file changes

@charlesbluca
Copy link
Member Author

To update the separate style CI issue, rapidsai/ops#1580 has been opened - @mythrocks, would you prefer that get resolved before merging this?

@harrism
Copy link
Member

harrism commented May 26, 2021

@charlesbluca what needs to happen to get this in? We are entering code freeze tomorrow.
@mythrocks can you answer the question above?

@charlesbluca
Copy link
Member Author

charlesbluca commented May 27, 2021

I think this would be better retargetted to 21.08; the C++ formatting is generally sorted out, but I would like more of a conversation around if we want to do package sorting for Cython code. Other than that, we would need to run the updated hooks against the codebase and make a lot of formatting changes, which we probably don't want to do this close to release.

@charlesbluca charlesbluca changed the base branch from branch-21.06 to branch-21.08 May 27, 2021 00:46
@vyasr
Copy link
Contributor

vyasr commented Jun 23, 2021

fwiw import dask.distributed as dd is also faster than from dask import distributed as dd. Generally a negligible microoptimization, but for function-local imports that can matter.

@charlesbluca
Copy link
Member Author

I couldn't find an option to force that specific import style so I just skipped isort for that specific import; once rapidsai/integration#286 is merged, I can remove that skip

@codecov
Copy link

codecov bot commented Jun 24, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@3ed87f3). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 0436347 differs from pull request most recent head 367e1cd. Consider uploading reports for the commit 367e1cd to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #8215   +/-   ##
===============================================
  Coverage                ?   10.67%           
===============================================
  Files                   ?      109           
  Lines                   ?    18669           
  Branches                ?        0           
===============================================
  Hits                    ?     1993           
  Misses                  ?    16676           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3ed87f3...367e1cd. Read the comment docs.

Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the copyright year for all the files?

cpp/scripts/run-clang-format.py Show resolved Hide resolved
java/src/main/native/src/NvcompJni.cpp Show resolved Hide resolved
java/src/main/native/src/NvtxRangeJni.cpp Show resolved Hide resolved
python/cudf/cudf/_lib/cpp/strings/extract.pxd Outdated Show resolved Hide resolved
Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dask-cudf changes seem fine - Just left a minor question

from dask.utils import tmpfile

import dask_cudf

import dask.dataframe as dd # isort:skip
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the override here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that gpuCI's isort==5.0.7 check is targetting this file, it want to change this line to

from dask import dataframe as dd

I couldn't find an option for isort to allow this type of import, but can confirm that isort==5.6.4 (which we are planning to bump gpuCI to after this PR is merged) allows either style of import, so this override can be removed once rapidsai/integration#286 is merged.

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java approval

@charlesbluca
Copy link
Member Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d07f994 into rapidsai:branch-21.08 Jul 15, 2021
@charlesbluca
Copy link
Member Author

After playing around with Cython resorting for a while, I noticed a quirk of isort means we probably aren't seeing nearly as much resorting as we would if our pyx/pxd files used exclusively cimport statements. I've opened up an issue describing this in PyCQA/isort#1787, so that hopefully we can make some modifications to our setup.cfg files to account for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Make style CI and pre-commit hooks consistent