Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New patch importer #1672

Merged
merged 11 commits into from
Sep 3, 2018
Merged

Conversation

brandonwillard
Copy link
Member

@brandonwillard brandonwillard commented Aug 20, 2018

These changes implement a new importer using sys.path_hooks and importlib semantics. This is more-or-less a completely different approach from #1671 and, its origin, #1518.

Description

Python 3.x is patched in a way that integrates .hy source files into Python's default importlib machinery. In Python 2.7, a PEP-302 "importer" and "loader" is implemented according to the standard import logic (via pkgutil and later pure-Python imp package code).

In both cases, the entry-point for the loaders is through sys.path_hooks only. As well, the import semantics have been updated all throughout to utilize importlib and follow aspects of PEP-420. This, along with some light patches, should allow for basic use of runpy, py_compile and reload.

Design Choices

In all cases, when .hy files are shadowed by same-prefix .py files, Hy will silently use the .hy files.

@Kodiologist
Copy link
Member

Can you specify which bugs these changes fix, and add tests for them? (I see you added a test for reload, which is good.) I don't know much about the guts of Python's import system, so it's hard for me to judge how much of an improvement these changes are without tests showing fixed bugs.

@brandonwillard
Copy link
Member Author

brandonwillard commented Aug 20, 2018

Some issues fixed, targeted, and/or possible/easier to fix via this PR:

Not fixes, but some things that might be more do-able due to these changes:

I'm in the process of going through these to devise tests, check that they're fixed, etc.

Otherwise, PEP compliance should be a high priority regardless, no?

@Kodiologist
Copy link
Member

Don't forget keywords (but it's probably best to wait till you have a test for a bug before declaring it fixed).

Otherwise, PEP compliance should be a high priority regardless, no?

Not exactly. I'm not against violating a PEP if by so doing we can fix a bug, add a feature we want, shorten the code, or the like. PEP compliance per se is academic.

@brandonwillard
Copy link
Member Author

brandonwillard commented Aug 20, 2018

Ah, yeah, I'll amend that last commit with a keyword to the corresponding issue.

When I consider PEP compliance in this scenario, the concern is based in the reproduction of expected behavior and use of standard APIs for the purpose of automatic/seemless support of existing and new functionality, portability across Python implemenations, etc. Unfortunately, not even CPython makes consistent use of these (e.g. runpy and py_compile do not completely use the existing loaders and are a lot less applicable as a result), so not all those benefits are actually present; however, they might be in later versions.

@Kodiologist
Copy link
Member

Unfortunately, not even CPython makes consistent use of these (e.g. runpy and py_compile do not completely use the existing loaders and are a lot less applicable as a result), so not all those benefits are actually present

That's the sort of thing I was afraid of. If you can implement these things in passing, that's fine, but don't write a lot of code that CPython can't actually use. Unimplemented standards often end up being ignored or changed.

@brandonwillard
Copy link
Member Author

True; fortunately compliance was mostly just a matter of reorganizing existing import and compilation code, and even that was exclusively for Python 2.7.

Otherwise, this approach plugs into Python 3.x's existing importer pipeline in only a dozen lines of code! The rest is interface changes (e.g. __import__ to importlib.import_module).

@brandonwillard
Copy link
Member Author

@Kodiologist, any other issues related to the import system that I might've missed?

Regardless, what do we need, from me, in order to move forward?

@Kodiologist
Copy link
Member

There are issues with require that will probably need additional work, but this PR is probably already ambitious enough, so I wouldn't take them on here if I were you. At any rate, this is a very promising PR. I'll review it soon.

@brandonwillard
Copy link
Member Author

The require issues are why I chose to do this first; there could be some overlap.

Copy link
Member

@Kodiologist Kodiologist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Could you clean up the commit structure? Rebase out the merge commit, avoid committing code that you only end up correcting in a later commit, separate functional changes from whitespace edits, etc. I'll resume reviewing once you've done this.

2. Don't forget to update NEWS.

3. pytest --ignore tests/test_bin.py seems not to ignore the named tests any more. I don't know why you set pytest==3.2.1 in requirements-travis.txt, but downgrading pytest thus doesn't seem to help.

4. It looks like coexisting __init__.py and __init__.hy work differently from how you intended:

> mkdir testlib
> echo 'print("hi")' > testlib/__init__.py
> echo '(print "hy")' > testlib/__init__.hy
> hy
hy 0.15.0+29.gad5b1da1 using CPython(default) 3.7.0 on Linux
=> (import testlib)
hi
=> 

I think the right thing to do in this situation is throw an error. One really never ought to have two files distinguished only by .py vs. .hy (unless perhaps the Python file is a translation of the Hy file).

conftest.py Outdated
return _fspath_pyimport(self, **kwargs)
except py.path.local.ImportMismatchError:
pkgpath = self.pypkgpath()
if pkgpath is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems more direct to have if pkgpath is None as the condition, with this part in the else.

conftest.py Outdated

_fspath_pyimport = py.path.local.pyimport

def pyimport_patch_mismatch(self, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't seem to need any local variables of pytest_collect_file (except for _fspath_pyimport, which is constant), so it could be defined outside of pytest_collect_file instead of being defined on every call.

hy/cmdline.py Outdated
argv = argv[:mloc+2]
if len(argv) > mloc + 2:
module_args = argv[mloc + 2:]
argv = argv[:mloc + 2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sorts of purely aesthetic changes are best done in separate commits from functional changes.


options = parser.parse_args(argv[1:])

if options.show_tracebacks:
global SIMPLE_TRACEBACKS
SIMPLE_TRACEBACKS = False

# reset sys.argv like Python
sys.argv = options.args + module_args or [""]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your first commit has an extra commented-out line hanging around here (which isn't visible in GitHub's final diff for the whole PR).

hy/cmdline.py Outdated
x.filename, x.errno, x.strerror), file=sys.stderr)
sys.exit(x.errno)
rv = 0
if len(options.files) == 0 or \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to use parentheses than backslashes.

hy/importer.py Outdated
with io.open(fname, 'r', encoding='utf-8') as f:
source = f.read()

if fname.rpartition('.')[-1] == 'hy':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not fname.endswith('.hy')?

hy/importer.py Outdated
if PY3:
return code, fname
else:
return code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written more concisely as return (code, fname) if PY3 else code.

hy/cmdline.py Outdated

import astor.code_gen

import hy

from io import open
Copy link
Member

@Kodiologist Kodiologist Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clearer to let open, unqualified, refer to the builtin, not to io.open. Or, use builtins.open or io.open explicitly every time.

hy/importer.py Outdated
if self.file and self.file.closed:
mod_type = self.etc[2]
if mod_type == HY_SOURCE:
# XXX: Hy source files are always text.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "XXX" in this comment?

@@ -1,4 +1,4 @@
pytest >= 3.2.1
pytest==3.2.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a fix for errors observed in Travis caused by seemingly incompatible versions of py and pytest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it one of those bugs that appears on Travis but you can't replicate locally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was this one. I think I was able to replicate it locally, but it must've been caused by something in the old PR, because I can't now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, I think I saw that once, but I couldn't reproduce it. I say let Travis use the new version for now and let's try to fix that issue when we get the opportunity.

@brandonwillard
Copy link
Member Author

brandonwillard commented Aug 23, 2018

@Kodiologist, thanks for the feedback; I'll try to clean up the Git history today.

I should've clarified the whole .py vs. .hy import choices; they currently differ between Python 3.x and Python 2.7 and Python 2.7 should be the situation detailed in the description. This is simply due to an oversight.

Since the Python 3.x implementation is piggybacking off of the existing import machinery, it will choose .py before .hy (in all same-name, different-suffix cases). Changing the order of suffixes in importlib.machinery.SOURCE_SUFFIXES will alter the import order, though. To make it throw an ImportError in Python 3.x, which I also think is a good response to the situation, another patch is probably needed.

The easiest solution (minimal changes) is to have the importer silently prefer .hy files in all Python versions and cases, ignoring any same-name .py files. In other words, Hy prefers Hy.

@Kodiologist
Copy link
Member

Hmm, I see. Since Hy didn't already have a deliberate policy with respect to coexisting .py and .hy, you don't have to worry about it for this PR if you don't want to, sure.

@brandonwillard
Copy link
Member Author

@Kodiologist, all right, cleaned-up those commits.

Also, I set the import preference to .hy files in all cases and versions; that seems like the most consistent approach that doesn't introduce more patches or violate existing tests/expectations.

@brandonwillard brandonwillard force-pushed the new-patch-importer branch 2 times, most recently from ce6a21f to f065e8d Compare August 25, 2018 00:01
Copy link
Member

@Kodiologist Kodiologist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic work. See my minor inline comments. There are two more important things I'd like you to look at; then, I think we're ready to go.

  1. The speedup of 734cdcd seems to be gone, and hy.lex.lexer and hy.lex.parser are once again getting run when the user executes hy foo.hy and foo is byte-compiled. Can you rejigger the imports to get this working again?
  2. Can you get the tests passing on Travis with the latest pytest?

conftest.py Outdated
def pytest_ignore_collect(path, config):
return True if (("py3_only" in path.basename and not PY3) or
("py35_only" in path.basename and not PY35) or
("py36_only" in path.basename and not PY36)) else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just append or None to the end instead of using if.

hy/importer.py Outdated

for entry in path:
if os.path.isfile(entry) and subname == '__main__' \
and entry.endswith('.hy'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use parens instead of a backslash.

hy/importer.py Outdated
else:
path = [os.path.realpath(self.path)]

file, file_path, etc = None, None, None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best not to use the name of a builtin, like file, for a local variable.

def test_circular():
"""Test circular imports by creating a temporary file/module that calls a
function that imports itself."""
with tempfile.NamedTemporaryFile(suffix='.hy',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having the test write a temporary file, why not commit this file into tests/resources?

NEWS.rst Outdated
Internals
------------------------------
* PEP-302 rewrite of the importer system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEWS is for users, so it doesn't need items for internal changes.

NEWS.rst Outdated
@@ -12,13 +16,16 @@ New Features
* Keyword objects (not just literal keywords) can be called, as
shorthand for `(get obj :key)`, and they accept a default value
as a second argument.
* Module reloading works now.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is already listed below as a bug fix.

NEWS.rst Outdated

Bug Fixes
------------------------------
* Fixed bugs in the handling of unpacking forms in method calls and
attribute access.
* Fixed crashes on Windows when calling `hy-repr` on date and time
objects.
* Fixed module reloading, circular imports, and `__main__` file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that each of these fixes is notable enough to get its own item.

@brandonwillard
Copy link
Member Author

brandonwillard commented Aug 25, 2018

Regarding the speed-up (or lack thereof), that's probably due to the use of runpy[.run_path] (in cmdline.py), which neither caches nor uses bytecode.

Also, python *.py doesn't appear to cache or use bytecode in either Python 2.7 or 3.6 (per the verbose output messages); however, if you run python -m * in Python 3.6, bytecode is cached and read, while in 2.7 it isn't.

Not sure how we want to handle these cases in Hy, but perhaps some of the latency issues you observed were due to that.

@Kodiologist
Copy link
Member

In Hy, it's important that hy foo.hy read and write bytecode, because compilation can be arbitrarily slow (because it involves macro-expansion). I'm sorry; I missed the fact that you removed hy {fpath} from test_bin_hy_byte_compile.

@brandonwillard
Copy link
Member Author

No problem, it can be easily made to use the cache.

Python 3.x is patched in a way that integrates `.hy` source files into
Pythons default `importlib` machinery.  In Python 2.7, a PEP-302 "importer"
and "loader" is implemented according to the standard `import` logic (via
`pkgutil` and later pure-Python `imp` package code).

In both cases, the entry-point for the loaders is through `sys.path_hooks` only.
As well, the import semantics have been updated all throughout to utilize
`importlib` and follow aspects of PEP-420.  This, along with some light
patches, should allow for basic use of `runpy`, `py_compile` and `reload`.

In all cases, if a `.hy` file is shadowed by a `.py`, Hy will silently use
`.hy`.
@brandonwillard brandonwillard force-pushed the new-patch-importer branch 2 times, most recently from 05bb5ec to 343d48f Compare August 26, 2018 04:44
@brandonwillard
Copy link
Member Author

All right, those changes are in. Also, I added the hy {fpath} tests back to test_bin_hy_byte_compile.

@brandonwillard
Copy link
Member Author

It looks like we can get rid of some boilerplate code in HyLoader (e.g. get_source, get_filename, _reopen at ~43 lines) if we adopt the existing imp.PY_SOURCE module type instead of using our own HY_SOURCE.

This change only affects Python 2.7, and it somewhat more closely reflects the patched loader for Python 3.x, since — in the latter case — .hy is added as an importlib.machinery.SOURCE_SUFFIXES.

The naming alone (i.e. PY_SOURCE) is a little less welcoming of this change (Hy source is definitely not Py source), but, with the rest of the custom loader overrides, it really is just naming and nothing more.

@Kodiologist
Copy link
Member

Sounds good. Try it out.

hy/importer.py Outdated
if self.file and self.file.closed:
mod_type = self.etc[2]
if mod_type == HY_SOURCE:
self.file = io.open(self.filename, 'rU', encoding='utf-8')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so sure about removing this override; the default ImpLoader._reopen simply does open(self.filename, 'rU'), which won't necessarily use the unicode codec (I think).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, that could lead to trouble. You could try writing a test with a source file that has non-ASCII characters in it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this already implicitly tested by all the other native tests that have non-ASCII characters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but most likely not in a situation where ImpLoader._reopen is called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just re-added a custom _reopen that guarantees unicode decoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants