Skip to content
This repository has been archived by the owner on Jul 9, 2021. It is now read-only.

[order_utils.py] Feature/schema resolver cache #1317

Merged
merged 7 commits into from
Nov 30, 2018

Conversation

PirosB3
Copy link
Contributor

@PirosB3 PirosB3 commented Nov 25, 2018

Description

This PR is a proposed solution for #1316

Testing instructions

  • Perform regression on schema validation logic

Types of changes

  • Extracts LocalRefResolver() to the module level
  • Creates an instance variable LOCAL_RESOLVER that is called by assert_valid()
  • Changes the method that the resolver uses to fetch the JSON schema: from resolve_from_url I changed it to resolve, because the latter uses a LRU cache.
  • Added signedOrderSchema to the LocalRefResolver() IDs.
  • Added some new unit tests for my work
  • Changed the unit test that runs all docstrings to use importlib causes my new test to fail sporadically. This is because the old load_module() will also reload modules that are already loaded. Aside from this, load_module() is also deprecated in Python 3.4

Checklist:

  • Prefix PR title with [WIP] if necessary.
  • Prefix PR title with bracketed package name(s) corresponding to the changed package(s). For example: [sol-cov] Fixed bug.
  • Add tests to cover changes as needed.
  • Update documentation as needed.
  • Add new entries to the relevant CHANGELOG.jsons.

@PirosB3 PirosB3 requested a review from feuGeneA as a code owner November 25, 2018 09:37
Copy link
Contributor

@feuGeneA feuGeneA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must say that my apprehension about exposing LocalRefResolver, which I described in #1316 , still stands. It looks to me like we can achieve all the performance gains under consideration without exposing LocalRefResolver. Again, though, I'm open to having my mind changed. Perhaps you could further describe your use case and why this is important to you?

Having said that, there are changes in this PR that I would love to see you land. Taking advantage of that cache in jsonschema by simply calling to resolve() instead of resolve_from_url() is a great idea, and I appreciate the suggestion on importlib.

Also, do be sure to run setup.py lint in your local copy, in order to find and fix errors like those shown in the static-tests-python test run on this PR. (You could let circleci find them for you, but fixing some will likely reveal others, in subsequent runs of other linters, so I suggest you run it locally.)

Finally, could you provide a web link to some discussion of the importlib issue you mentioned? Perhaps a GitHub Issue or Stack Overflow answer or something like that?

@feuGeneA feuGeneA changed the title [WIP][order_utils] Feature/schema resolver cache [WIP][order_utils.py] Feature/schema resolver cache Nov 26, 2018
@PirosB3
Copy link
Contributor Author

PirosB3 commented Nov 26, 2018

Hi, @feuGeneA thanks for the response.

From a performance perspective, I can see how we can still hide LocalRefResolver and create an instance of the class that can be used by one or more runs of assert_valid(). The implementation I'm thinking, unfortunately, requires a global and could potentially cause race conditions if it's used in multithread envirornment. Would you mind telling me the idea that you had?

Aside from this, I'm building an order validator (both schema and on-chain) and, as part of this change, I wanted to take advantage of new schema IDs inside the 0x repository (in this case signedOrder). I noticed that instead of re-building a similar component to yours I could re-use LocalRefResolver and simply make it compatible with more schemas, and also add improved cache so it does not read from disk.

For some reason I am not able to access CI? if I open Circle I see Error response from daemon: unauthorized: authentication required. Is this expected?

Thanks!
Dan

@feuGeneA
Copy link
Contributor

feuGeneA commented Nov 26, 2018

Okay, I myself am still getting a feel for what's "pythonic" and what isn't, and I was confusing myself. The idea I had does not make sense.

What do you think about a solution like https://stackoverflow.com/a/279597/406289 ? That is, do something like

def assert_valid(...):
    class LocalRefResolver...
    if not hasattr(assert_valid, "resolver"):
        assert_valid.resolver = LocalRefResolver()
    ...
    jsonschema.validate(data, schema, resolver=assert_valid.resolver)

I do love the encapsulation this provides. However, if this is too anti-idiomatic, then I guess I'm okay with the module-level declaration and instantiation, but if we do that then we should definitely prefix the class and instance identifiers with an underscore, to scare away anyone wanting to import them.

FYI, when I wrote zero_ex.json_schemas I only added support for the schemas that I needed to use at that time. I expected to add more as we need them, so we can most definitely add whichever ones you need, regardless of where we go with the cache/optimization.

Regarding your CI error, no, that's not expected. I can't tell what's wrong just from that message, but if you want to ping me ( @Gene ) on our community chat, I'd be happy to work with you to figure it out.

@PirosB3
Copy link
Contributor Author

PirosB3 commented Nov 27, 2018

Hi @feuGeneA thanks again for the lengthy response!

While hiding attributes within functions aren't necessarily Pythonic, it's definitely done in some frameworks like Django. I personally like the module level instance definition pattern because it's less code and has less logic.

Are we sure we don't want LocalRefResolver to be importable externally? since it's a rather small, useful, and fairly generic component, I would imagine it being useful to other developers too? Of course the trade-off we make here is that it's something that we will need to maintain.

I made most of the changes you suggested. For some reason, I'm not getting mypy to run because it's not able to discover modules inside of jsonschema. Has this ever happened to you?

Copy link
Contributor

@feuGeneA feuGeneA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making something that we need to maintain is exactly what I'm trying to avoid. 😄 I'm definitely sure that I personally don't want LocalRefResolver to be importable externally, but I'm willing to defer to convention/idiom. Please either encapsulate within assert_valid or rename LocalRefResolver to _LocalRefResolver and LOCAL_RESOLVER to _LOCAL_RESOLVER.

Regarding mypy, you need to update stubs/jsonschema/__init__.pyi to tell it the types of the base class' RefResolver.resolve() method, which we weren't using before. I'm not sure why it's complaining about ValidationError, as you didn't seem to change anything around that, but armed with my suggestion about resolve() hopefully you can figure it out.

"""Gather zero_ex.* modules and doctest them."""
"""Gather zero_ex.* modules and doctest them.

NOTE: we now use `importlib` instead of `imp` because `imp` is deprecated
Copy link
Contributor

@feuGeneA feuGeneA Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate this content, but I think its too noisy for a code comment, and instead should be in a comment on the PR or in the commit message. And, I think you could just say "imp is deprecated" and leave it at that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, there never was any mention of "imp" in the code; care to clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. So, walk_packages returns a generator that crawls the module path. Eventually, it will yield a ModuleInfo tuple. The first argument (what is defined as loader in your for loop) is of type importlib.abc.FileLoader. This loader inherits from Loader whose documentation specifies that load_module is outdated (the reason this happens is that it uses imp module internally).

Documentation here does suggest to use exec_module instead but I failed in getting that to work (I tried exec_module and create_module() with no success). What did work is importlib.import_module so I ended up using that. Happy to try alternatives if you prefer!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever works! I just want to make sure that there's some logical continuity between the code we had in place and what we're saying is deprecated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds great! I will remove the comment from here, and I will ensure it's documented in the PR. In that comment, instead of mentioning imp which is internal, I'll simply specify that load_module is deprecated so that it's clearer.

@@ -30,7 +31,8 @@ def resolve_from_url(self, url: str) -> str:

:param url: a string representing the URL of the JSON schema to fetch.
:returns: a string representing the deserialized JSON schema
:raises jsonschema.ValidationError: when the resource associated with `url` does not exist.
:raises jsonschema.ValidationError: when the resource associated with
`url` does not exist.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce indent to be 4 spaces more than :raises on the line above. Anal, yes, but this would make it consistent with the rest of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolutely! it's important to keep the style consistent. Will make the change

"/ECSignature": "ec_signature_schema.json",
"/signedOrderSchema": "signed_order_schema.json",
"/ecSignatureParameterSchema": (
"ec_signature_parameter_schema.json" + ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're in here would you mind removing this redundant string concatenation and the parenthesis and putting this all on one line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling it was added there by black because it passes the max char length? I will take a look and make the change (or let you know)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm this is black

@PirosB3
Copy link
Contributor Author

PirosB3 commented Nov 27, 2018

Thanks for the advice! I will proceed as you suggested. I will go with _LOCAL_RESOLVER approach.

mypy is complaining now because I added type hints to the resolve_from_url function that was previously untyped. By default, mypy does not do static analysis on untyped functions. I will try your suggestion of updating the pyi file

@PirosB3
Copy link
Contributor Author

PirosB3 commented Nov 28, 2018

Hi @feuGeneA

I'm not sure what happened here, but I did a rebase and automagically that ended up adding a lot of people to review and it added a lot of tags. Sorry about that! I tried to revert but it looks like I'm not able to add/remove reviewers.
On the bright side, Python tests are successful now and I've addressed the changes we discussed. More specifically:

  • Refactored LocalRefResolver to _LocalRefResolver
  • Removed comment about deprecation in test_all_doctests and instead added it to the PR
  • Did other style changes.

Let me know :)

_LOCAL_RESOLVER._remote_cache.cache_info() # pylint: disable=W0212
)
assert cache_info.currsize == 4
assert cache_info.hits == 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these magic numbers 4 and 10? Are they coincident with our usage, or are they reflective of some implementation detail of the 3rd party jsonschemas package? If the latter, I'm wary, as a change there would break this test.

Copy link
Contributor Author

@PirosB3 PirosB3 Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are totally coincident to our usage. We fetched 4 unique files over 14 resolve requests

@PirosB3
Copy link
Contributor Author

PirosB3 commented Nov 28, 2018

It looks like everything is passing except the submit-coverage job?

home/circleci/repo/node_modules/coveralls/bin/coveralls.js:18
        throw err;
        ^
Bad response: 422 {"message":"Couldn't find a repository matching this job.","error":true}

Do you know why this is happening?

@feuGeneA
Copy link
Contributor

It looks like everything is passing except the submit-coverage job?

home/circleci/repo/node_modules/coveralls/bin/coveralls.js:18
        throw err;
        ^
Bad response: 422 {"message":"Couldn't find a repository matching this job.","error":true}

Do you know why this is happening?

Probably this: https://github.com/0xProject/0x-monorepo/blob/development/CONTRIBUTING.md#fix-submit-coverage-ci-failure

Daniel Pyrathon added 2 commits November 28, 2018 16:36
…`load_module()` reloads modules even if they are already imported, causing tests to fail when run in non-deterministic ordering, so we replace it with `import_module()`
@PirosB3 PirosB3 force-pushed the feature/schema_resolver_cache branch from 6ca73e4 to 7504675 Compare November 29, 2018 00:36
@PirosB3
Copy link
Contributor Author

PirosB3 commented Nov 29, 2018

All conflicts resolved! let me know if you need anything from me @feuGeneA

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants