Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable ruff's unspecified-encoding (PLW1514) rule and fix violations #3319

Merged
merged 3 commits into from
Jul 8, 2024

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Jul 8, 2024

Description of proposed changes

Enable ruff's unspecified-encoding (PLW1514) rule to check for uses of open and related calls without an explicit encoding argument. Note that this is a preview mode feature.

This lint rule tells us to consider using the encoding parameter to enforce a specific encoding. PEP 597 recommends using encoding="locale" on Python 3.10 and later, though we could also use encoding="utf-8" perhaps.

For this PR, we went for setting encoding="utf-8"

References:

Addresses #2741 (comment)

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

  • /format: automatically format and lint the code

@weiji14 weiji14 added the maintenance Boring but important stuff for the core devs label Jul 8, 2024
@weiji14 weiji14 added this to the 0.13.0 milestone Jul 8, 2024
@weiji14 weiji14 self-assigned this Jul 8, 2024
@weiji14 weiji14 marked this pull request as ready for review July 8, 2024 08:01
@weiji14
Copy link
Member Author

weiji14 commented Jul 8, 2024

Setting as ready for review to test encoding="locale" on Windows first. Will try using encoding="utf-8" after all cross-platform tests complete.

PEP0597 hints at UTF-8 becoming the default encoding in the future, so pre-emptively applying it here. Xref https://peps.python.org/pep-0597/#prepare-to-change-the-default-encoding-to-utf-8
Copy link
Member Author

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main changes to check are in the plot.py and plot3d.py file since they are user-facing. The changes to the tests/test_*.py files should be ok.

@@ -247,7 +247,7 @@ def plot( # noqa: PLR0912
kwargs["S"] = "s0.2c"
elif kind == "file" and str(data).endswith(".gmt"): # OGR_GMT file
try:
with Path(which(data)).open() as file:
with Path(which(data)).open(encoding="utf-8") as file:
Copy link
Member Author

@weiji14 weiji14 Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that *.gmt files are always encoded in UTF-8, is this the case, or should we choose encoding="locale" instead?

To be clear, we're only trying to parse the vector geometry type (Multipoint/Point) from the *.gmt file, not read the whole file, so maybe ok to assume that the header lines are utf-8 compatible?

Copy link
Member

@seisman seisman Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*.gmt can be encoded in any encoding, so there is no way to make it work in all cases. I guess encoding="utf-8" is the best choice.

To be clear, we're only trying to parse the vector geometry type (Multipoint/Point) from the *.gmt file, not read the whole file, so maybe ok to assume that the header lines are utf-8 compatible?

It usually works, but will fail if the file uses a different encoding and contains comments on the first line.

Actually, we're assuming that the @G record is always on the first line but that's not always true.

Copy link
Member Author

@weiji14 weiji14 Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will go with utf-8 then. Users can still manually override the style if needed, in case utf-8 encoding doesn't work and the style isn't automatically applied.

Actually, we're assuming that the @g record is always on the first line but that's not always true.

From https://docs.generic-mapping-tools.org/6.5/reference/ogrgmt-format.html#the-ogr-gmt-format, it says:

The first comment line must specify the version of the OGR/GMT data format, to allow for future changes or enhancements to be supported by future GMT programs. This document describes v1.0.

and the examples on the page below seem to suggest that # @VGMT1.0 is followed immediately by @GP... on the same line. Are there cases where @GP... goes onto the second line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ogr2ogr also produces OGR_GMT files with # @VGMT1.0 followed by @GP, so it should be fine, although users can manually putting @GP at the second line and even add any comments at the begin of the file.

Anyway, that's just rare cases and it's safe to ignore them.

@weiji14 weiji14 merged commit 9c13eb0 into main Jul 8, 2024
18 of 19 checks passed
@weiji14 weiji14 deleted the ruff/unspecified-encoding branch July 8, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants