Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pre-commit hook for common misspelling check in files #18964

Merged
merged 2 commits into from
Oct 14, 2021

Conversation

ephraimbuddy
Copy link
Contributor

This PR adds codespell to the pre-commit hooks. This will specifically help
us a bit in resolving sphinx errors.

From the project page:
It does not check for word membership in a complete dictionary, but instead looks for a set of common misspellings.
Therefore it should catch errors like "adn", but it will not catch "adnasdfasdf".
This also means it shouldn't generate false-positives when you use a niche term it doesn't know about.

This means the sphinx errors are not solved completely.

Happy to hear your thoughts.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:dev-tools area:helm-chart Airflow Helm Chart provider:cncf-kubernetes Kubernetes provider related issues area:providers area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation provider:amazon AWS/Amazon - related issues provider:microsoft-azure Azure-related issues provider:google Google (including GCP) related issues labels Oct 13, 2021
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely love that!

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Oct 14, 2021
This PR adds codespell to the pre-commit hooks. This will specifically help
us a bit in resolving sphinx errors.

From the project page:
It does not check for word membership in a complete dictionary, but instead looks for a set of common misspellings.
Therefore it should catch errors like "adn", but it will not catch "adnasdfasdf".
This also means it shouldn't generate false-positives when you use a niche term it doesn't know about.

This means the sphinx errors are not solved completely.
@ephraimbuddy ephraimbuddy merged commit 1571f80 into apache:main Oct 14, 2021
@ephraimbuddy ephraimbuddy deleted the codespell branch October 14, 2021 18:37
@@ -1214,6 +1220,7 @@ sharded
shellcheck
shellcmd
shm
sie
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from testing release doc about using gpg.

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done 👏

self.log.info('Poking for blob: %s\nin wasb://%s', self.blob_name, self.container_name)
self.log.info('Poking for blob: %s\n in wasb://%s', self.blob_name, self.container_name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is not right.
It will start a new line with space.
There are numbers of this cases in the PR.
Coding lines like this should be excluded. \nin is valid.

@ephraimbuddy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a fix for it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to fix this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can split the strings into two strings in two log entries @HaloKo4 - that will actually even better reflect thte idea of this. I personally believe using /n on logs is a bad idea because it does not play well with the way how prefix logs works. If you have separate log entries without the /n they will be much nicer laid out (both lines will 'start' in the same place:

Date time log level: Line 1
Date time log level: LIne 2 

Instead of (as we have now)

Date time log level: Line 1
LIne 2 

The current approach is also potentially (low level) opening for so called 'log injection' vulnerability and it is considered as bad practice to have \n in the log lines because of that.

I think this is a great opportunity for you to improve those logs in general.

Can we.count on your help here ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this doesn't fix the problem.
The next person who will do /nsomething in a log string will be hit by this pre-commit and might change it to space just like was done on this PR. If two lines is what you want then there is need to be another pre commit that prevent using /n in logging. isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to add it

Copy link
Member

@potiuk potiuk Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw yeah - this is exactly how other people do it. Usually when someone fixes thing like that, also adds pre-commit to prevent it in the future - so yeah we are thinking in the same direction. It would be great to fix it in all places and prevent it in the future (probably 20 or 30 pre-commits of ours came to being in this way)

cognifloyd pushed a commit to cognifloyd/stackstorm-k8s that referenced this pull request Feb 16, 2022
…flow#18964)

This PR adds codespell to the pre-commit hooks. This will specifically help
us a bit in resolving sphinx errors.

From the project page:
It does not check for word membership in a complete dictionary, but instead looks for a set of common misspellings.
Therefore it should catch errors like "adn", but it will not catch "adnasdfasdf".
This also means it shouldn't generate false-positives when you use a niche term it doesn't know about.

This means the sphinx errors are not solved completely.

Partial Commit Extracted From: https://github.com/apache/airflow
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Apr 11, 2022
@ephraimbuddy ephraimbuddy added changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) and removed type:misc/internal Changelog: Misc changes that should appear in change log labels Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:helm-chart Airflow Helm Chart area:providers area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge kind:documentation provider:amazon AWS/Amazon - related issues provider:cncf-kubernetes Kubernetes provider related issues provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants