-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NFC Explicitly close file descriptors in emcc.py #14074
Conversation
Great! Thanks for working on this. We've been moving away from using raw open's for a while now and these days we have have quite a lot of usage of the Actually I have 3 different question/thoughts:
I suggest we do (1) and (2) as part of this PR (I could be persuaded that we don't need (2) .. but I kind of like that idea) with (3) as a followup? (We normally mark our non-functional changes with the |
Yeah, I can apply it to other files, once there is an agreed upon solution. I started small since I wasn't sure there would be interest in this change.
IMO the ideal case for new code that does path manipulations is to use For an existing large code base as in emscripten, though I'm not certain doing such migration is worth the effort. There is always some risk of bugs introduced by such migration and one has to rely on test coverage being high and/or some type annotations. More generally I think I would still prefer to write For instance, something like if not os.path.exists(file_path):
# handle it
return open(os.path.join(file_path, 'bin')).read() could be written as, if not file_path.exits():
# handle it
return (file_path / 'bin').read_text() assuming the
TBH, I saw people using Path.read_text only recently, so maybe I'm missing some potential issues, but otherwise it does sound like a good idea (for a follow up PR). |
I hear you points about transitioning to Path everwhere and maybe that should be goal. Even if that is a goal I still think is nice as an interim to have helper that take strings. i.e. in sharedpy:
This would be for use by any code that still uses
It also adds a nice interception point, but that applies even after a potential conversion to Path everywhere. I'm not sure I'm quite ready for |
OK, that works, I'll add helper functions for it. |
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
I keep a stab and using pathlib in out testing code: #14175 Are you planning on updating this PR? |
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
Great! Yes, I'll update it in the next few days. |
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.
Are you still interested in updating this PR? |
Thanks for your patience. I am interested, just with limited availability. I made the discussed changes and applied them to all the files.
Working on fixing CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
My main feedback (duplicated in comments):
- Can we keep the
read_text
read_binary
names? - Avoid adding new imports of shared.py
- Always prefer Path(..).read() over context manager (to avoid extra lines and indentation)?
@@ -106,7 +106,8 @@ def parse_config_file(): | |||
Also check EM_<KEY> environment variables to override specific config keys. | |||
""" | |||
config = {} | |||
config_text = open(EM_CONFIG, 'r').read() | |||
with open(EM_CONFIG) as fh: | |||
config_text = fh.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put the helper functions inutils.py
you can use them from here.
tools/debug/autodediffer.py
Outdated
|
||
sys.path.insert(1, str(Path(__file__).parents[2].resolve())) | ||
|
||
from tools import shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree I don't think its worth added a new dependency on shared to get these utils. Just use Path.read here maybe?
tools/debug/bisect_pair.py
Outdated
|
||
sys.path.insert(1, __rootpath__) | ||
|
||
from tools import shared |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
tools/debug/bisect_pair_lines.py
Outdated
exec(open(path_from_root('tools', 'shared.py'), 'r').read()) | ||
|
||
with open(path_from_root('tools', 'shared.py'), 'r') as fh: | ||
exec(fh.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path.read?
Appreciate you taking the time work on this |
and read_bytes -> read_binary
tests/test_sanity.py
Outdated
@@ -95,6 +94,8 @@ def make_fake_llc(filename, targets): | |||
|
|||
SANITY_MESSAGE = 'Emscripten: Running sanity checks' | |||
|
|||
EMBUILDER = path_from_root('embuilder.py') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean move this definition out of runner.py
?
tests/test_sanity.py
Outdated
@@ -24,7 +23,7 @@ | |||
from tools import response_file | |||
|
|||
SANITY_FILE = shared.Cache.get_path('sanity.txt') | |||
commands = [[EMCC], [path_from_root('tests/runner'), 'blahblah']] | |||
commands = [[EMCC], [path_from_root('tests', 'runner'), 'blahblah']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to do revert that change to use "/" in pathsnames? Perhaps a bad merge?
tools/clean_webconsole.py
Outdated
|
||
__rootpath__ = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) | ||
|
||
sys.path.insert(1, str(Path(__file__).parents[1].resolve())) | ||
from tools.shared import read_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not be worth added the new dependency on shared.py here?
tools/shared.py
Outdated
def read_binary(file_path): | ||
"""Read from a file opened in binary mode""" | ||
with open(file_path, 'rb') as fh: | ||
text = fh.read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just return fh.read()
here? (and above)
Yes sorry I should have switched to a Draft status -- the changes weren't complete and there were indeed some merge issues. Should be OK now. I think I addressed all comments, except for keeping the helpers in |
It looks like there are still some CI failures. I'll investigate tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more comments but lgtm
tools/webidl_binder.py
Outdated
@@ -55,7 +56,7 @@ def getExtendedAttribute(self, name): # noqa: U100 | |||
p.parse(r''' | |||
interface VoidPtr { | |||
}; | |||
''' + open(input_file).read()) | |||
''' + read_file(input_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just shared.read_file
to avoid that extra import above?
tools/wasm-sourcemap.py
Outdated
@@ -176,7 +178,7 @@ def remove_dead_entries(entries): | |||
|
|||
def read_dwarf_entries(wasm, options): | |||
if options.dwarfdump_output: | |||
output = open(options.dwarfdump_output, 'rb').read() | |||
output = read_binary(options.dwarfdump_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use Path
directly here to keep this tool free to deps on the rest of emscripten?
@@ -28,7 +29,7 @@ def create(final): | |||
shutil.rmtree(dest_path, ignore_errors=True) | |||
shutil.copytree(source_path, dest_path) | |||
|
|||
open(os.path.join(dest_path, 'include', 'ogg', 'config_types.h'), 'w').write(config_types_h) | |||
Path(dest_path, 'include', 'ogg', 'config_types.h').write_text(config_types_h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have access to shared
here and in the other ports
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have reached my limit for manual and tedious changes between Path.read_text
open().read()
and shared.read_file
for this PR :) I understand not using Path would save 1 import here, but in the end it is a stdlib module, it's readable and has no other undesirable side effects. Please feel free to push the change if that's a blocker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also with read_file
I would have to either revert back to using os.path.join
or change the `read_binary signature to accept path segments, and the latter gives me the feeling that this PR is never going to end :)
def write_file(file_path, text): | ||
"""Write to a file opened in text mode""" | ||
with open(file_path, 'w') as fh: | ||
fh.write(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason you chose to use context managers here rather than Path
? I guess its slightly faster because it doesn't have to do the normalization stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well no particular reason not to use open
either. It's still the canonical way to open files. Path
is shorter indeed, but open
would be easier to adapt to specify the size
parameter, for instance in read
, which is used in a few paces of the code base.
OK I think we can merge this and go from there. Thanks for your work! |
Thanks for the review @sbc100 ! |
This explicitly closes file descriptors in emcc.py.
So instead of doing,
which relies on the CPython garbage collection to eventually close the file descriptor, one can do,
and similar pathlib methods, to close the file descriptor internally which is generally recommended (cf https://stackoverflow.com/q/7395542/1791279)
Not closing file descriptors can in particular lead to subtle issues of not knowing when exactly the written data is going to be flushed on disk, and it's not going to work well with other Python implementations (e.g. PyPy) which don't use reference counting.