-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(perf): Lazy rewrite of schemapi.py
, improve docs
#3547
base: main
Are you sure you want to change the base?
Conversation
Every call was identical as it was based on an existing constant `jsonschema_version_str`
I'm going to do this a lot. Docstrings can be collapsed in all editors and can benefit from markdown. Everything here is already private, so using long comments has no benefit
`typeshed` disagrees with `jsonschema`, this is just enforcing what `jsonschema` says is true
Produces the same result, but skips the upfront `deepcopy`. No longer modifying the copy inplace, new objects are created inside the iterator.
Previously, using a version below `4.0.1` would still always check first if there was a property. This would not change between checks. Defining in this style removes the need for as much documentation, since the version guards are very clear when each branch is used.
First non-failing version. Have left most of the original code in. Planning to migrate & adapt the comments before removing. #
**Remove before review**. Using for quicker feedback loop, where running mypy and all tests are not beneficial
Pulled out alternative error message idea from #3547 (comment) out for later discussion. The issue this was attached to has been resolved, but I still think this is an area the NoteOriginally added to use like below. from functools import reduce
def _lazy_enum(iterable: Iterable[ValidationError], /) -> Iterator[ValidationError]:
yield reduce(_enum_inner, iterable) Possible alternativeRather than preserving each enum, we could just extract the union of them. It would produce a different error message than the original, so for the below: altair/tests/utils/test_schemapi.py Lines 779 to 788 in 591bf40
We could adapt the formatting from Which would just read as:
This would also be easier to control the formatting than relying on the groups provided by the enums alone |
Note to self: Fill out comments re individual changes.
Temporary, will remove before review. Tried to isolate to a single function so that I can reproduce on main
I renamed this from `_SLOW_BENCHMARKS` but forgot to invert the bool lol
Wasn't able to demonstrate a performance improvement
altair/vegalite/v5/schema/core.py
Outdated
class VegaLiteSchema(SchemaBase): | ||
_schema = load_schema() | ||
_rootschema = load_schema() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linking this back to:
Has the same result for Root
, since it inherits _schema
.
The main benefit is a stronger guarantee that every derived class is non-None for both attributes
EditBug report unrelated to PR
@MarcoGorelli do you have any ideas on what is causing this error for https://github.com/vega/altair/actions/runs/10681007191/job/29608118357?pr=3547 I did a merge update from the GitHub UI, but haven't been able to track down the cause yet. UpdatedSeems to be this commit which causes the issue (only for ..\..\..\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\hatch-test.py3.8\lib\site-packages\narwhals\dependencies.py:128: in is_ibis_table
return bool((ibis := get_ibis()) is not None and isinstance(df, ibis.Table))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
name = 'Table'
def __getattr__(name: str) -> BaseBackend:
"""Load backends in a lazy way with `ibis.<backend-name>`.
This also registers the backend options.
Examples
--------
>>> import ibis
>>> con = ibis.sqlite.connect(...)
When accessing the `sqlite` attribute of the `ibis` module, this function
is called, and a backend with the `sqlite` name is tried to load from
the `ibis.backends` entrypoints. If successful, the `ibis.sqlite`
attribute is "cached", so this function is only called the first time.
"""
entry_points = {ep for ep in util.backend_entry_points() if ep.name == name}
if not entry_points:
msg = f"module 'ibis' has no attribute '{name}'. "
if name in _KNOWN_BACKENDS:
msg += f"""If you are trying to access the '{name}' backend,
try installing it first with `pip install ibis-{name}`"""
> raise AttributeError(msg)
E AttributeError: module 'ibis' has no attribute 'Table'.
..\..\..\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\hatch-test.py3.8\lib\site-packages\ibis\__init__.py:58: AttributeError |
Used in `Chart.from_dict`
__all__ = [ | ||
"Optional", # altair.utils | ||
"SchemaBase", # altair.vegalite.v5.schema.core | ||
"Undefined", # altair.utils | ||
"UndefinedType", # altair.vegalite.v5.schema.core -> (side-effect relied on to propagate to alt.__init__) | ||
"_is_valid", # altair.vegalite.v5.api | ||
"_resolve_references", # tools.schemapi.utils -> tools.generate_schema_wrapper | ||
"_subclasses", # altair.vegalite.v5.schema.core | ||
"is_undefined", # altair.typing | ||
"validate_jsonschema", # altair.utils.display | ||
"with_property_setters", # altair.vegalite.v5.schema.channels | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping track of the refs here, since you can't follow the references from the tools
copy
All were notes added earlier in PR, but not needed now
WIll put this on another branch, won't improve performance so not relevant here vega#3547 (comment)
Feeling I've squeezed out all the performance I can for now. Will add in a collpased comment on the PR for reference
schemapi
, improve docs
schemapi
, improve docsschemapi.py
, improve docs
Quite the PR!! Will take me a while to dig through it but thanks for going down all these rabbit holes, hope you came out unharmed! ;) I like jsonschema validation but it also can be a wild place, especially with the mapping of the schema to Altair objects... Do I understand this correct that the main motivation for this PR is to improve the performance and that the main impact on performance is that |
Thanks and yeah no rush @binste - I know there is a lot to go through - even after all the stuff I removed over the past few days.
So it is tricky. I found it quite difficult to benchmark this, since there are so many interconnected parts. We also have much better performance on I would like to - in a future PR - develop some more isolated benchmarking tools. |
I very much appreciate the time and effort you put into this and I believe some/many/maybe all changes here could be great improvements to Altair. However, as it's such a large and complex PR, I'm struggling to review this PR to an extent where I'd feel comfortable approving it. Although improvements to the performance of Would you be willing to split this PR into multiple ones? They could be put up for review and merged one after the other with this PR (in draft mode) being the resource to cherry-pick commits from. I think this would help with reviewing as well as with judging the tradeoff between added complexity of some of the changes vs. improved performance.
I hope this is ok for you and that you understand my position, i.e. that it does not feel random. We can also do more pre-alignment on rough drafts before starting to work on detailed PRs. Also something we can pick up in our upcoming call. |
c9f7613
to
0c829fb
Compare
@binste
Yeah that sounds like a good idea. I'll loop back to this at some point and see what smaller PRs I can make out of it. Appreciate your feedback on this, regardless of the outcome |
Related #3545 (comment)
Description
I went down a few too many rabbit holes in this PR.
This comment has an overview of most of the changes.
Misc
Removed this in
8002dab
(#3547) but if anyone was curious:Related
Benchmarking code
altair.tests.utils.test_schemapi.py