-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable ruff's unspecified-encoding (PLW1514) rule and fix violations #3319
Conversation
Default unsafe-fix is to use `encoding="locale"`
Setting as ready for review to test |
PEP0597 hints at UTF-8 becoming the default encoding in the future, so pre-emptively applying it here. Xref https://peps.python.org/pep-0597/#prepare-to-change-the-default-encoding-to-utf-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main changes to check are in the plot.py
and plot3d.py
file since they are user-facing. The changes to the tests/test_*.py
files should be ok.
@@ -247,7 +247,7 @@ def plot( # noqa: PLR0912 | |||
kwargs["S"] = "s0.2c" | |||
elif kind == "file" and str(data).endswith(".gmt"): # OGR_GMT file | |||
try: | |||
with Path(which(data)).open() as file: | |||
with Path(which(data)).open(encoding="utf-8") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that *.gmt
files are always encoded in UTF-8, is this the case, or should we choose encoding="locale"
instead?
To be clear, we're only trying to parse the vector geometry type (Multipoint/Point) from the *.gmt file, not read the whole file, so maybe ok to assume that the header lines are utf-8 compatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*.gmt
can be encoded in any encoding, so there is no way to make it work in all cases. I guess encoding="utf-8"
is the best choice.
To be clear, we're only trying to parse the vector geometry type (Multipoint/Point) from the *.gmt file, not read the whole file, so maybe ok to assume that the header lines are utf-8 compatible?
It usually works, but will fail if the file uses a different encoding and contains comments on the first line.
Actually, we're assuming that the @G
record is always on the first line but that's not always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will go with utf-8 then. Users can still manually override the style if needed, in case utf-8 encoding doesn't work and the style isn't automatically applied.
Actually, we're assuming that the @g record is always on the first line but that's not always true.
From https://docs.generic-mapping-tools.org/6.5/reference/ogrgmt-format.html#the-ogr-gmt-format, it says:
The first comment line must specify the version of the OGR/GMT data format, to allow for future changes or enhancements to be supported by future GMT programs. This document describes v1.0.
and the examples on the page below seem to suggest that # @VGMT1.0
is followed immediately by @GP...
on the same line. Are there cases where @GP...
goes onto the second line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ogr2ogr
also produces OGR_GMT files with # @VGMT1.0
followed by @GP
, so it should be fine, although users can manually putting @GP
at the second line and even add any comments at the begin of the file.
Anyway, that's just rare cases and it's safe to ignore them.
Description of proposed changes
Enable ruff's unspecified-encoding (PLW1514) rule to check for uses of
open
and related calls without an explicitencoding
argument. Note that this is a preview mode feature.This lint rule tells us to consider using the
encoding
parameter to enforce a specific encoding. PEP 597 recommends usingencoding="locale"
on Python 3.10 and later, though we could also useencoding="utf-8"
perhaps.For this PR, we went for setting
encoding="utf-8"
References:
Addresses #2741 (comment)
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.Slash Commands
You can write slash commands (
/command
) in the first line of a comment to performspecific operations. Supported slash command is:
/format
: automatically format and lint the code