NFC Explicitly close file descriptors in emcc.py #14074

rth · 2021-05-02T10:19:49Z

This explicitly closes file descriptors in emcc.py.

So instead of doing,

out = open(file_path).read()

which relies on the CPython garbage collection to eventually close the file descriptor, one can do,

from pathlib import Path

out = Path(file_path).read_text()

and similar pathlib methods, to close the file descriptor internally which is generally recommended (cf https://stackoverflow.com/q/7395542/1791279)

Not closing file descriptors can in particular lead to subtle issues of not knowing when exactly the written data is going to be flushed on disk, and it's not going to work well with other Python implementations (e.g. PyPy) which don't use reference counting.

sbc100 · 2021-05-02T16:15:04Z

Great! Thanks for working on this.

We've been moving away from using raw open's for a while now and these days we have have quite a lot of usage of the with open as f: method throughout the toolchain. As a followup change perhaps we should go back and have them use Path.. too?

Actually I have 3 different question/thoughts:

We should probably apply this change to the entire codebase not just emcc.py?
Should we have read_text/write_text/read_binary/write_binary helpers to shared.py to avoid having to do from pathlib import Path everywhere and give is a single point of injection for all file reading and writing in the toolchain? (this would allow us to add logging for example).
Should we also change all the places where we do with open(..) as f: followed by single call to read or write? I vote yes, but maybe as followup?

I suggest we do (1) and (2) as part of this PR (I could be persuaded that we don't need (2) .. but I kind of like that idea) with (3) as a followup?

(We normally mark our non-functional changes with the NFC suffix rather than that MAINT prefix.. just because this matches what llvm does, not because its better necessarily).

rth · 2021-05-03T20:53:34Z

We should probably apply this change to the entire codebase not just emcc.py?

Yeah, I can apply it to other files, once there is an agreed upon solution. I started small since I wasn't sure there would be interest in this change.

Should we have read_text/write_text/read_binary/write_binary helpers to shared.py to avoid having to do from pathlib import Path everywhere and give is a single point of injection for all file reading and writing in the toolchain?

IMO the ideal case for new code that does path manipulations is to use Path objects directly, that way you could do file_path.read_text() directly without the need of helpers. A blog on that subject.

For an existing large code base as in emscripten, though I'm not certain doing such migration is worth the effort. There is always some risk of bugs introduced by such migration and one has to rely on test coverage being high and/or some type annotations.

More generally I think I would still prefer to write Path(...).read_text() (or open(...).read()) rather than read_text(...) because Path is becoming a fairly common stdlib object in Python 3 code bases and it's easier to understand what it does (or be aware of other methods it has) than a custom function that adds another layer of abstraction. If some use of Path objects is accepted in the future, it would also make it easier to interact with code around it.

For instance, something like

if not os.path.exists(file_path):
   # handle it

return open(os.path.join(file_path, 'bin')).read()

could be written as,

if not file_path.exits():
   # handle it

return (file_path / 'bin').read_text()

assuming the file_path was converted to Path in the beginning.

Should we also change all the places where we do with open(..) as f: followed by single call to read or write? I vote yes, but maybe as followup?

TBH, I saw people using Path.read_text only recently, so maybe I'm missing some potential issues, but otherwise it does sound like a good idea (for a follow up PR).

sbc100 · 2021-05-03T21:02:07Z

I hear you points about transitioning to Path everwhere and maybe that should be goal. Even if that is a goal I still think is nice as an interim to have helper that take strings.

i.e. in sharedpy:

def read_text(filename):
   Path(filename).read_text()

This would be for use by any code that still uses str rather than Path and would

make it obvious which code has yet to be converted to Path
avoid importing Path into code that hasn't already been coverted.

It also adds a nice interception point, but that applies even after a potential conversion to Path everywhere.

I'm not sure I'm quite ready for file_path / 'bin' but maybe I'm just too old fashioned :) I'll come around to it I'm sure.

rth · 2021-05-03T21:10:58Z

OK, that works, I'll add helper functions for it.

There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.

sbc100 · 2021-05-13T15:30:57Z

I keep a stab and using pathlib in out testing code: #14175

Are you planning on updating this PR?

There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.

rth · 2021-05-14T08:43:07Z

Great! Yes, I'll update it in the next few days.

There are plans underway to start using pathlib more in emscripten. See: #14074 This change a designed to test the water by using `pathlib` in test code. This allows us to use unix-like paths throughout the tests that will get converted to windows-paths on windows machines automatically by the pathlib library.

sbc100 · 2021-06-12T00:01:42Z

Are you still interested in updating this PR?

rth · 2021-06-12T15:10:54Z

Thanks for your patience. I am interested, just with limited availability.

I made the discussed changes and applied them to all the files.

half the way through I realized that you already defined such helper functions in

emscripten/tests/runner.py

Line 121 in 315d362

def read_file(*path_components):

. I'm not sure where it would make sense to keep them in the end.
because tools is not an installable Python package and one has to use sys.path hacks to be able to use shared.py. Often I find it simpler and more reliable to use the open context manager directly rather than apply fixes to be able to import shared.py. It also avoids to add coupling between modules that are independent now. Similarly in other places I used the open context manager to avoid circular import issues with shared.py

Working on fixing CI.

emrun.py

sbc100

Looking good.

My main feedback (duplicated in comments):

Can we keep the read_text read_binary names?
Avoid adding new imports of shared.py
Always prefer Path(..).read() over context manager (to avoid extra lines and indentation)?

emrun.py

site/source/conf.py

site/source/get_wiki.py

tests/test_core.py

sbc100 · 2021-06-12T15:54:02Z

tools/config.py

@@ -106,7 +106,8 @@ def parse_config_file():
  Also check EM_<KEY> environment variables to override specific config keys.
  """
  config = {}
-  config_text = open(EM_CONFIG, 'r').read()
+  with open(EM_CONFIG) as fh:
+    config_text = fh.read()


If you put the helper functions inutils.py you can use them from here.

sbc100 · 2021-06-12T15:54:39Z

tools/debug/autodediffer.py

+
+sys.path.insert(1, str(Path(__file__).parents[2].resolve()))
+
+from tools import shared


I agree I don't think its worth added a new dependency on shared to get these utils. Just use Path.read here maybe?

sbc100 · 2021-06-12T15:54:48Z

tools/debug/bisect_pair.py

+
+sys.path.insert(1, __rootpath__)
+
+from tools import shared


sbc100 · 2021-06-12T15:55:05Z

tools/debug/bisect_pair_lines.py

-exec(open(path_from_root('tools', 'shared.py'), 'r').read())
+
+with open(path_from_root('tools', 'shared.py'), 'r') as fh:
+    exec(fh.read())


sbc100 · 2021-06-12T15:57:46Z

Appreciate you taking the time work on this

and read_bytes -> read_binary

sbc100 · 2021-06-14T17:18:28Z

tests/test_sanity.py

@@ -95,6 +94,8 @@ def make_fake_llc(filename, targets):

 SANITY_MESSAGE = 'Emscripten: Running sanity checks'

+EMBUILDER = path_from_root('embuilder.py')


Did you mean move this definition out of runner.py?

sbc100 · 2021-06-14T17:19:07Z

tests/test_sanity.py

@@ -24,7 +23,7 @@
 from tools import response_file

 SANITY_FILE = shared.Cache.get_path('sanity.txt')
-commands = [[EMCC], [path_from_root('tests/runner'), 'blahblah']]
+commands = [[EMCC], [path_from_root('tests', 'runner'), 'blahblah']]


Did you mean to do revert that change to use "/" in pathsnames? Perhaps a bad merge?

sbc100 · 2021-06-14T17:20:04Z

tools/clean_webconsole.py


 __rootpath__ = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))

+sys.path.insert(1, str(Path(__file__).parents[1].resolve()))
+from tools.shared import read_file


Might not be worth added the new dependency on shared.py here?

sbc100 · 2021-06-14T17:21:10Z

tools/shared.py

+def read_binary(file_path):
+    """Read from a file opened in binary mode"""
+    with open(file_path, 'rb') as fh:
+        text = fh.read()


Just return fh.read() here? (and above)

rth · 2021-06-14T22:27:13Z

Yes sorry I should have switched to a Draft status -- the changes weren't complete and there were indeed some merge issues. Should be OK now.

I think I addressed all comments, except for keeping the helpers in shared.py instead of utils.py as I didn't want to change all imports again to save two call to open in tools/config.py. I also reverted all changes to tests/: those have their own helpers, and could be done in a separate PR. Similarly these two helpers could be merged into 1 under some location in a follow up PR if useful, though their accepted parameters are not exactly the same.

rth · 2021-06-14T22:40:50Z

It looks like there are still some CI failures. I'll investigate tomorrow.

sbc100

Just a few more comments but lgtm

sbc100 · 2021-06-15T20:43:12Z

tools/webidl_binder.py

@@ -55,7 +56,7 @@ def getExtendedAttribute(self, name): # noqa: U100
 p.parse(r'''
 interface VoidPtr {
 };
-''' + open(input_file).read())
+''' + read_file(input_file))


Maybe just shared.read_file to avoid that extra import above?

sbc100 · 2021-06-15T20:44:52Z

tools/wasm-sourcemap.py

@@ -176,7 +178,7 @@ def remove_dead_entries(entries):

 def read_dwarf_entries(wasm, options):
  if options.dwarfdump_output:
-    output = open(options.dwarfdump_output, 'rb').read()
+    output = read_binary(options.dwarfdump_output)


Maybe use Path directly here to keep this tool free to deps on the rest of emscripten?

sbc100 · 2021-06-15T20:46:13Z

tools/ports/ogg.py

@@ -28,7 +29,7 @@ def create(final):
    shutil.rmtree(dest_path, ignore_errors=True)
    shutil.copytree(source_path, dest_path)

-    open(os.path.join(dest_path, 'include', 'ogg', 'config_types.h'), 'w').write(config_types_h)
+    Path(dest_path, 'include', 'ogg', 'config_types.h').write_text(config_types_h)


You have access to shared here and in the other ports files.

Sorry, I have reached my limit for manual and tedious changes between Path.read_text open().read() and shared.read_file for this PR :) I understand not using Path would save 1 import here, but in the end it is a stdlib module, it's readable and has no other undesirable side effects. Please feel free to push the change if that's a blocker.

Also with read_file I would have to either revert back to using os.path.join or change the `read_binary signature to accept path segments, and the latter gives me the feeling that this PR is never going to end :)

sbc100 · 2021-06-15T20:51:59Z

tools/shared.py

+def write_file(file_path, text):
+  """Write to a file opened in text mode"""
+  with open(file_path, 'w') as fh:
+    fh.write(text)


Any particular reason you chose to use context managers here rather than Path? I guess its slightly faster because it doesn't have to do the normalization stuff?

Well no particular reason not to use open either. It's still the canonical way to open files. Path is shorter indeed, but open would be easier to adapt to specify the size parameter, for instance in read, which is used in a few paces of the code base.

sbc100 · 2021-06-15T21:23:06Z

OK I think we can merge this and go from there. Thanks for your work!

rth · 2021-06-15T21:25:14Z

Thanks for the review @sbc100 !

MAINT Explicitly close file descriptors in emcc.py

10bb93d

rth changed the title ~~MAINT Explicitly close file descriptors in emcc.py~~ NFC Explicitly close file descriptors in emcc.py May 2, 2021

sbc100 mentioned this pull request May 13, 2021

Use python3's pathlib in test code. NFC #14175

Merged

rth added 6 commits June 12, 2021 11:04

Merge remote-tracking branch 'upstream/main' into clause-fd-emcc.py

cd2621d

Add shared functions read_text / write_text / read_bytes

15da804

More replacements in other files

3cb249d

More adaption

f1f22e1

Fix circular import

db5dcb7

Working on fixing CI

34fc423

Fixing circular imports

b03be66

sbc100 reviewed Jun 12, 2021

View reviewed changes

emrun.py Show resolved Hide resolved

sbc100 reviewed Jun 12, 2021

View reviewed changes

rth added 2 commits June 13, 2021 08:25

Rename read_text -> read_file

d53a38a

and read_bytes -> read_binary

More fixes

2667e6c

sbc100 reviewed Jun 14, 2021

View reviewed changes

rth added 3 commits June 14, 2021 23:23

Revert changes to tests/

6dae4ab

Merge remote-tracking branch 'upstream/main' into test-pr2

c94ac45

Address more review comments

69045d6

rth marked this pull request as draft June 14, 2021 21:42

rth added 3 commits June 14, 2021 23:48

More replacements for open().write()

37d3f68

More fixes

caa25e1

Lint

17b852f

rth marked this pull request as ready for review June 14, 2021 22:27

Fix read_binary in file_packager.py

8b1165c

sbc100 reviewed Jun 15, 2021

View reviewed changes

Address review comments

f797e3d

sbc100 approved these changes Jun 16, 2021

View reviewed changes

sbc100 merged commit 8977c4f into emscripten-core:main Jun 16, 2021

jonassmedegaard mentioned this pull request Jul 20, 2021

tools/debug/bisect_pair_lines.py fails to compile: SyntaxError: invalid syntax #14696

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NFC Explicitly close file descriptors in emcc.py #14074

NFC Explicitly close file descriptors in emcc.py #14074

rth commented May 2, 2021 •

edited

Loading

sbc100 commented May 2, 2021

rth commented May 3, 2021

sbc100 commented May 3, 2021

rth commented May 3, 2021

sbc100 commented May 13, 2021

rth commented May 14, 2021

sbc100 commented Jun 12, 2021

rth commented Jun 12, 2021 •

edited

Loading

sbc100 left a comment

sbc100 Jun 12, 2021

sbc100 Jun 12, 2021

sbc100 Jun 12, 2021

sbc100 Jun 12, 2021

sbc100 commented Jun 12, 2021

sbc100 Jun 14, 2021

sbc100 Jun 14, 2021

sbc100 Jun 14, 2021

sbc100 Jun 14, 2021

rth commented Jun 14, 2021

rth commented Jun 14, 2021

sbc100 left a comment

sbc100 Jun 15, 2021

sbc100 Jun 15, 2021

sbc100 Jun 15, 2021

rth Jun 15, 2021

rth Jun 15, 2021

sbc100 Jun 15, 2021

rth Jun 15, 2021

sbc100 commented Jun 15, 2021

rth commented Jun 15, 2021


		sys.path.insert(1, str(Path(__file__).parents[2].resolve()))

		from tools import shared

		@@ -95,6 +94,8 @@ def make_fake_llc(filename, targets):

		SANITY_MESSAGE = 'Emscripten: Running sanity checks'

		EMBUILDER = path_from_root('embuilder.py')

NFC Explicitly close file descriptors in emcc.py #14074

NFC Explicitly close file descriptors in emcc.py #14074

Conversation

rth commented May 2, 2021 • edited Loading

sbc100 commented May 2, 2021

rth commented May 3, 2021

sbc100 commented May 3, 2021

rth commented May 3, 2021

sbc100 commented May 13, 2021

rth commented May 14, 2021

sbc100 commented Jun 12, 2021

rth commented Jun 12, 2021 • edited Loading

sbc100 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbc100 commented Jun 12, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth commented Jun 14, 2021

rth commented Jun 14, 2021

sbc100 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbc100 commented Jun 15, 2021

rth commented Jun 15, 2021

rth commented May 2, 2021 •

edited

Loading

rth commented Jun 12, 2021 •

edited

Loading