refactor: Remove `toolz` dependency #3426

dangotbanned · 2024-05-26T17:32:55Z

This PR is a proof-of-concept, entirely* removing the hard dependency on toolz.
*Excluding usage in tests and twice for a deprecation warning.

I've tried to provide reasoning on each commit, but generally, it seemed to me that the behaviour altair needs is easily replicated within stdlib.

Additional Notes

pipe and curry were the only imports
- pipe is trivial to replace.
- curry seems overkill when the only non-default argument is always data
  - Also has the drawback of less type safety
While this is documented under Advanced Usage, I haven't come across any more complex examples.
- I'd imagine users could see a benefit to performance, given the complexity of curry.
- If anyone has some sample code that heavily uses data transformers, it could be helpful to benchmark any changes.

If there was any desire to pursue this further, in b9dc070 I mentioned that this could be implemented as a decorator.
I stopped short of that for now, but would be happy to go down that route to simplify the @overload usage.

Edit

@mattijn is this worth opening an issue for discussion, or can that take place here?

AFAIK the last remaining quirks would be:

Handling the existing deprecation functions in bf99d72.
- I would assume 4 years is enough for a scheduled removal.
- Otherwise these could use proper optional imports (rather than the nesting solution I've tentatively used).
Updating pyproject.toml.
- Not entirely sure on the interaction with existing dependency groups.
- Just want to highlight it as an extra item to-do.

- Made `SupportsGeoInterface` usable for runtime checking - Simplifies type narrowing logic and aligns with `DataFrameLike` - Adds `is_data_type` - Used to reduce code duplication, specifically where `pipe` was previously referenced. - Avoids the `None` case introduced by `DataTransformerType | None`.

Currying a single argument function is equivalent to referencing the function itself. This only would've added performance overhead.

NOTE: Unsure why this differs from `vegalite.v5.api._prepare_data`. Have only removed the dependency.

…sGeoInterface` Now possible due to `@runtime_checkable`

Added missing `sys` import from earlier commit

This pattern is what I've used in most cases. It works and type checks correctly, but could be improved as a decorator. Would have the benefit of preserving the return type, if one were to inspect the intermediate function. Also replacing the `TypeVar` with a `Union` seems to have satisfied `mypy`.

Same pattern as `limit_rows`. Also aliased the return type, aligning with the other transformers.

- Not super happy with the added length, but was difficult to avoid given the number of arguments. - Moved all the generic parts to new function `_to_text`. - Replaced `os.path` with `pathlib.Path`. Can be identified (but not autofixed) with [PTH](https://docs.astral.sh/ruff/rules/#flake8-use-pathlib-pth). - Although this might be better handled by [urllib](https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urljoin)

The nested imports are purely for a deprecation warning introduced in [4.10.0](vega@af4148d). 4 years on and over a major version later, would these be due for removal?

- Both `@curried.curry` and `curried.pipe` no longer needed - Originally rewrote a generic curried pipe, but given that the two functions are explicitly named here - it seemed excessive.

- Required a new alias `NonLikeDataType` to resolve the overloads. - Merged the two default branches - `max_rows` should probably be removed since it is never used

The original describes a function before currying, but not the defered variant.

Switching from `curry` -> `functools.partial` is trivial. Everything else I'm not too sure on. Had to battle `mypy` to arrive at this and not really happy with the result. The change to `plugin_type`'s default should mean the assert in `register` is somewhat more useful.

This is valid for the `assert`. An alternative would be using `callable` from builtins, or reusing `PluginType`'s bounds/constraints.

mattijn · 2024-06-06T15:14:51Z

Hi @dangotbanned, this looks good! Thanks for working on this. Reducing the hard dependencies where possible is much welcome.
If types and tests are happy than we are very far in the progress already.
Could you explain why this PR requires the introduction of a new ._to_text() function, see here: https://github.com/vega/altair/pull/3426/files#diff-478a9046db15a358587cf77454cd2cf9cc4723435276cad0864c268d17e07650R295. It seems to be introduced during the development process?

dangotbanned · 2024-06-06T16:22:59Z

Hi @dangotbanned, this looks good! Thanks for working on this. Reducing the hard dependencies where possible is much welcome.

Thanks @mattijn!

Could you explain why this PR requires the introduction of a new ._to_text() function, see here: #3426 (files). It seems to be introduced during the development process?

Sure thing.

So, previously to_csv, to_json had almost identical code and required 2x TypedDict's per function:

to_csv
- _CsvFormatDict
- _ToCsvReturnUrlDict
to_json
- _JsonFormatDict
- _ToJsonReturnUrlDict

I split out the generic behaviour to _to_text and simplified the types (4 -> 2) to reduce code duplication.

This helped with managing keyword-only args for line 251/line 287 which would have previously been curried.

The addition of _to_text isn't required, but as a comparison, below is the additional code without it (not including the overloads):

def to_json(
    data: Optional[DataType] = None,
    prefix: str = "altair-data",
    extension: str = "json",
    filename: str = "{prefix}-{hash}.{extension}",
    urlpath: str = "",
) -> Union[partial, _ToFormatReturnUrlDict]:
    """
    Write the data model to a .json file and return a url based data model.
    """
    if data is None:
        return partial(
            to_json, # diff 1
            prefix=prefix,
            extension=extension,
            filename=filename,
            urlpath=urlpath,
        )
    else:
        data_str = _data_to_json_string(data) # diff 2
        data_hash = _compute_data_hash(data_str)
        filename = filename.format(prefix=prefix, hash=data_hash, extension=extension)
        Path(filename).write_text(data_str)
        url = str(Path(urlpath, filename))
        return _ToFormatReturnUrlDict({"url": url, "format": _FormatDict(type="json")}) # diff 3

def to_csv(
    data: Optional[NonGeoDataType] = None,
    prefix: str = "altair-data",
    extension: str = "csv",
    filename: str = "{prefix}-{hash}.{extension}",
    urlpath: str = "",
) -> Union[partial, _ToFormatReturnUrlDict]:
    """Write the data model to a .csv file and return a url based data model."""
    if data is None:
        return partial(
            to_csv,
            prefix=prefix,
            extension=extension,
            filename=filename,
            urlpath=urlpath,
        )
    else:
        data_str = _data_to_csv_string(data)
        data_hash = _compute_data_hash(data_str)
        filename = filename.format(prefix=prefix, hash=data_hash, extension=extension)
        Path(filename).write_text(data_str)
        url = str(Path(urlpath, filename))
        return _ToFormatReturnUrlDict({"url": url, "format": _FormatDict(type="csv")})

I am happy to revert back to this, but thought it might be more maintainable in the future - while still preserving the public interface.

This preserves the existing `AltairDeprecationWarning` but removes the final traces of `toolz` dependency in `altair`.

Forgot I replaced a literal string with regex

mattijn · 2024-06-06T20:38:38Z

Thanks for the explanation! That makes sense, leave it in please.

I'm in favor of removing the functions that have been officially deprecated for such a long time.

dangotbanned · 2024-06-06T20:41:03Z

@mattijn No rush on the review, just wanted to update that all traces of toolz are now gone 😅

mattijn · 2024-06-12T12:30:30Z

Hi @dangotbanned. Since you have multiple PRs open, do you have a certain order in mind how we should review and merge these?

dangotbanned · 2024-06-12T14:23:55Z

Hi @dangotbanned. Since you have multiple PRs open, do you have a certain order in mind how we should review and merge these?

#3426

Hi @mattijn this one is ready to go.

I'm in favor of removing the functions that have been officially deprecated for such a long time.

I'm not sure if you wanted me to remove pipe, curry, the new optional import and tests - or if the removal needs any more input from other maintainers.

Personally, I agree and think they should be removed.

Other PRs

#3431

I'm pretty happy with where it's at now, but the changes I've made in the tools package relating to Optional, Parameter, SchemaBase likely need further review.
They were somewhat "hacky" so as to not alter the surrounding code, and someone more familiar with the code generation may be able to improve upon it.
This one (I think) has addressed all the points from @binste original review

#3427

I've left this on draft, not ready yet, but would be good to hear your thoughts on a couple of things

alt.when signature
- To preserve compatibility with alt.condition, the first predicate arg behaves identically - but *more_predicates is more restrictive in the types allowed - as they all must implement &.
- I've left this to bake before writing docs, as I'm not sure whether the trade-off in added complexity (for the user) is worth preserving identical semantics as alt.condition for a single argument.
The new parts added for alt.expr comment which are "an" implementation, but not sure it would be the right way to go.

dangotbanned · 2024-06-14T09:11:28Z

@mattijn

I would be I favor of removing pipe and curry as well and instead of a deprecation warning use a deprecation error.

@joelostblom

I am also in favor of removing pipe and curry if we can replicate the functionality with stlib.

Thank you both for reviewing. All of your comments should be resolved now.

I have added comments suggesting the parts to remove. This excludes the imports, which would be highlighted by any IDE anyway afterwards.

Should be ready to merge after 650ee0e finishes

vega#3426 (comment)

mattijn · 2024-06-18T10:14:39Z

I think we need something else if we want a NonGeoDataType type, since the following supports the SupportsGeoInterface and also is postive as a NonGeoDataType being:

from altair.utils.data import NonGeoDataType

from shapely import geometry
import geopandas as gpd

data_geoms = [
    {"color": "#F3C14F", "geometry": geometry.Polygon([[1.45, 3.75], [1.45, 0], [0, 0], [1.45, 3.75]])},
    {"color": "#4098D7", "geometry": geometry.Polygon([[1.45, 0], [1.45, 3.75], [2.57, 3.75], [2.57, 0], [2.33, 0], [1.45, 0]])},
    {"color": "#66B4E2", "geometry": geometry.Polygon([[2.33, 0], [2.33, 2.5], [3.47, 2.5], [3.47, 0], [3.2, 0], [2.57, 0], [2.33, 0]])},
    {"color": "#A9CDE0", "geometry": geometry.Polygon([[3.2, 0], [3.2, 1.25], [4.32, 1.25], [4.32, 0], [3.47, 0], [3.2, 0]])},
]

gdf_geoms = gpd.GeoDataFrame(data_geoms)
isinstance(gdf_geoms, SupportsGeoInterface), isinstance(gdf_geoms, NonGeoDataType)

(True, True)

Example data taken from docs here: https://altair-viz.github.io/user_guide/data.html#spatial-data-gdf

dangotbanned · 2024-06-18T10:45:31Z

I think we need something else if we want a NonGeoDataType type, since the following supports the SupportsGeoInterface and also is postive as a NonGeoDataType being:

@mattijn I think this would be unavoidable given that GeoDataFrame is a subclass of pd.DataFrame and python's typing doesn't support discriminated unions.

So it would pass NonGeoDataType = Union[dict, pd.DataFrame, DataFrameLike]

DataFrameLike as it has .__dataframe__
pd.DataFrame as it is a subclass

I'm only finding this out now btw, in 8a9d593 I just attached a name to an existing type annotation.

For the actual usage in to_csv, SupportsGeoInterface is rejected in https://github.com/dangotbanned/altair/blob/ed13af6d4598c96d1f5e5a816b95a85ff82f81f3/altair/utils/data.py#L376.

So the impact of this would be static type checking doesn't pick up the incompatibility - but at runtime the existing code does.

…/altair into remove-toolz-depend

mattijn · 2024-06-18T16:02:09Z

Thanks for responding! Let me think about it too. Just wondering, do we really need this NonGeoDataType? Maybe this constrain within to_csv is too conservative?

Spotted while testing a solution for [comment](vega#3426 (comment)). This isn't a solution for that, but ensures the overloads solve correctly.

dangotbanned · 2024-06-18T18:02:27Z

Just wondering, do we really need this NonGeoDataType?

I've no strong feelings either way @mattijn.

The only thing I'd add to #3426 (comment) is that having an alias here at least hints the user towards something that we can't express as a type - if that makes sense?

I wouldn't expect passing a GeoDataFrame would work as a parameter labelled NonGeo....

Looking back to the original signature, there is no indication that to_csv doesn't accept SupportsGeoInterface:

altair/altair/utils/data.py

Lines 198 to 205 in 596b282

    
           def to_csv( 
        
               data: Union[dict, pd.DataFrame, DataFrameLike], 
        
               prefix: str = "altair-data", 
        
               extension: str = "csv", 
        
               filename: str = "{prefix}-{hash}.{extension}", 
        
               urlpath: str = "", 
        
           ) -> _ToCsvReturnUrlDict: 
        
               """Write the data model to a .csv file and return a url based data model."""

mattijn · 2024-06-19T04:08:25Z

I've opened #3441. Maybe we can rephrase the NotImplementedError with a link tot that github issue. Then we don't need the newly introduced NonGeoDataType type in this PR. Thanks for still working on this @dangotbanned, almost there!

…ta_to_csv_string` See vega#3426 (comment) See vega#3441

altair/utils/_vegafusion_data.py

…_data_transformer` Fixes: vega#3426 (comment)

mattijn · 2024-06-19T08:54:14Z

Tests are happy too after synching with main. Merging! Thanks @dangotbanned!

@binste

…ormance (#3431) * ci: Add additional `ruff` rules to `pyproject.toml` Only highly autofixable groups from [#3430](#3430). Aiming to refine further, depending on how much manual fixing this generates. * fix: add item to `pyproject.toml` to silence import errors for `pylance|pyright` * ci: add subset of `SIM` to `extend-safe-fixes` * ci: add unfixable `RUF` rules to `ignore` [RUF002](https://docs.astral.sh/ruff/rules/ambiguous-unicode-character-docstring/) may be avoidable during schema generation, so I'm not doing any manual fixes right now. * refactor: apply new `ruff` rules, fix and reformat * test: Skip tests on Win that require a tz database See apache/arrow#36996 * ci: enable `tool.ruff.lint.preview` Testing the effect this has with existing rules, prior to adding `pylint`, `refurb` * fix: replace [F841](https://docs.astral.sh/ruff/rules/unused-variable/) violations with dummy variable * ci: add additional `preview` fixes to `extend-safe-fixes` * refactor: apply `preview` fixes for existing `ruff` rules, reformat * ci: add `preview` category `FURB` rules Almost all have autofixes, splitting out from [pylint](https://docs.astral.sh/ruff/rules/#pylint-pl) which is much larger in scope + fewer fixes * refactor: apply `FURB` rule fixes, manually fix `FURB101/3` * fix: Revert newer fstring syntax, not available to `sphinx` * ci: add fixable `pylint` rules Across the 4 `PL.` categories, there are over 50 rules, with a mix of fixable, preview. Tried to be picky with what is added, because adding even 1 of the groups would generate a lot of manual fixes. * ci: add `pylint` fixes to `extend-safe-fixes` * refactor: apply `pylint` rule fixes, add an inline optimization for `_selection` `param_kwds` in `_selection` triggered `PLR6201`. Instead rewrote as a dictcomp. * fix: Recover comments lost during linting #3431 (comment) * fix: Replace sources of `RUF002` violations * fix: manual fix `RUF002` in `api` * ci: add `PTH` rules `flake8-use-pathlib` rules all require manual fixes. > Found 70 errors. <details> <summary>Details</summary> ```md 20 PTH118 `os.path.join()` should be replaced by `Path` with `/` operator 12 PTH120 `os.path.dirname()` should be replaced by `Path.parent` 11 PTH100 `os.path.abspath()` should be replaced by `Path.resolve()` 11 PTH123 `open()` should be replaced by `Path.open()` 7 PTH110 `os.path.exists()` should be replaced by `Path.exists()` 6 PTH107 `os.remove()` should be replaced by `Path.unlink()` 3 PTH103 `os.makedirs()` should be replaced by `Path.mkdir(parents=True)` ``` * refactor: Manually fix `PTH` rule violations * fix: Use safer `inspect.getattr_static` in `update_init_file` Avoids [triggering](https://docs.python.org/3/library/inspect.html#inspect.getattr_static) on objects like `datum`, `expr` * fix: Use correct f-string `!s` modifier * fix: Resolve `doc:build-html` error ```log Extension error (sphinxext.altairgallery): Handler <function main at 0x7f0873471ab0> for event 'builder-inited' threw an exception (exception: unsupported operand type(s) for +: 'PosixPath' and 'str') ``` * fix: Resolve utf-8 error for [emoji example](https://altair-viz.github.io/gallery/isotype_emoji.html) * Update sphinxext/altairgallery.py Co-authored-by: Stefan Binder <[email protected]> * build: bump `docbuild.yml` python `3.10` -> `3.12` See @binste [comment](#3431 (comment)) * style: Simplify `PluginRegistry` repr #3431 (comment) * revert: ignore `suppressible-exception` rule in `pyproject.toml` #3431 (comment) * refactor: remove need for exception handling in `utils._vegafusion_data.get_inline_tables` I'm happy to revert back to the original `try-catch`, but think it was less both less efficient and understandable. If I've understood correctly: - Every time this was called, all tables would be iterated over - On the first call, tables processed by the `vegafusion_data_transformer` would popped and returned - On all calls, other tables would be ignored but always trigger exceptions - On any call other than the first, every table triggers an exception This behaviour is easily reduced using [set methods](https://docs.python.org/3/library/stdtypes.html#set) directly, as an empty `set` would produce an empty `dict` during the `dictcomp`. #3431 (comment) * refactor: remove `python<3.3` compat code in `utils.core.use_signature` Was replaced with `contextlib.suppress` in a3fd77a but targets long-unsupported python versions. For future reference, there are also related `stdlib` functions that may have not been available when this was written: - [inspect.getdoc](https://docs.python.org/3/library/inspect.html#inspect.getdoc) - [functools.wraps](https://docs.python.org/3/library/functools.html#functools.wraps) Wouldn't cover everything there, but may replace the need for some parts. #3431 (comment) * revert: replace `PLW1514` fixes with "utf-8" The rule remains in `pyproject.toml`, but is no longer autofixed with undesirable `locale.getprefferedencoding(False)`. #3431 (comment) * refactor: minor simplify and sort `__all__` from `generate_schema_wrapper` * docs: `FA100` on most of `tools/`*` The start of a cautious introduction of `from __future__ import annotations`. Excludes `schemaapi.schemaapi.py` for now, but I think that would be safe as well. * docs: `FA100` on `sphinxext/*` * docs: `FA100` on `alt.jupyter` * docs: `FA100` on `alt.expr` * docs: `FA100` on `alt.vegalite`, excluding `v5.api`, `v5.schema/*` - Covers non-autogenerated code. - `api` is being special-cased for now. - Will be a great candidate for these fixes, but there are many simpler modules in `alt.utils` that I'm priotizing in case they highlight issues * docs: `FA100` on private `alt.utils` modules * docs(typing): Add annotations for `sphinxext` - Making use of `FA100` enabled syntax - Eventually would like to improve the performance of this, since `sphinxext` is by far the longest build step - Towards that end, the annotations give a contextual starting point * docs: `FA100` on public `alt.utils` modules, excluding `schemapi` - Also did made some improvements to annotations more broadly, now that the newer syntax is allowed * ci: update `pyproject.toml`, add new granular `tool.ruff.lint.per-file-ignores` setting - So far been making the changes one file at a time, followed by tests - Not encountered any runtime issues - `mypy` has complained occasionally, but has been resolvable * revert: Change `write_file_or_filename` default `encoding` to `None` * docs: `FA100` on `tools.schemapi.schemapi` Also adds some annotations that are now possible without circular import concerns * feat(typing): add `from __future__ import annotations` for each file in `generate_schema_wrapper` Also: - ensure `annotations` excluded from `__all__` - minor standardizing import order - adjust `ruff` rules to correct schemagen * ci(typing): adds `generate-schema-wrapper` script to `hatch` Rewrites and reformats all annotations in `altair/vegalite/v?/schema` - Meaning that the original code stays the same: ``` >>> ruff check altair/vegalite/v?/schema --fix --exit-zero Found 13006 errors (13006 fixed, 0 remaining). >>> ruff format altair/vegalite/v?/schema 4 files reformatted >>> ruff check altair/vegalite/v?/schema --statistics ``` * refactor(typing): Improve annotations and narrowing in `schemapi` - Changes are mainly on `_FromDict.from_dict` - Avoids type errors due to multiple reassignments - Adds a range of overloads that preserve the `SchemaBase` subclass - Not possible in all cases - replacing `cls` with `tp` - when used in an instance method - The former is quite strange to see outside of a `@classmethod` - Especially in a class that has a `@classmethod` - updated `_default_wrapper_classes` to match `_subclasses` return type in `generate_schema_wrapper` * build: run `generate-schema-wrapper` The cumulative effect of `FA100`, `UP`, `TCH` rules applied to `v5.schema`. All tests pass! * revert(typing): Roll back runtime evaluated types not possible on older python `from __future__ import annotations` applies in annotation scope, but not in aliases. Until `altair` requires `python>=3.9-10`, some syntax will need to differ - like many of the changes seen in this commit. # Note The current `ruff` config will only upgrade syntax in annotation scope, until `tool.ruff.target-version` "unlocks" new features. These were changes I made manually, not having tested against all versions - which are being reverted. * fix(typing): Adds overloads and reduce ignore comments relating to `spec_to_mimebundle` The repetition in `utils.save` is unfortunate, but is required to satisfy mypy. The issues are from overloading `spec_to_mimebundle` with `tuple/dict` as each has different indexing constraints. A longer term solution would be to adding an intermediate function to return the nested value. Due to `spec_to_mimebundle`'s use elsewhere operating directly on the full output. * revert(typing): Roll back additional runtime evaluated types for `python=3.8` * fix(typing): Use `...` for `DataTransformerType` Original `pass` keyword was equivalent to a `None` return * docs(typing): `FA100` on `alt.vegalite.v5.api` and manual annotation rewrites ### `api` A unique, but significant, difference to the other commits is the large number of module prefixed annotations that can be shortened. `TopLevelMixin.project` shows the greatest reduction in combined characters for annotations, but there are small improvements throughout. I've made all of these `TYPE_CHECKING`-only imports explicit, to highlight a new path forward this opens up. There are many repeat/similar `Union` types, which can now be declared in another module - without concern for circular import issues. ### `v5.__init__` These changes have also removed the need for both the import of `expr` in `api`, and the `# type: ignore[no-redef]` in `v5.__init__`. * test: add pytest rules `PT` and fix `PT001` violations https://docs.astral.sh/ruff/rules/pytest-fixture-incorrect-parentheses-style/ * test: add ignores for `PT011` on existing tests Probably useful to enforce on new code, https://docs.astral.sh/ruff/rules/pytest-raises-too-broad/ * test: fix `PT018` violation https://docs.astral.sh/ruff/rules/pytest-composite-assertion/ * test: fix `PT006` violations https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/ * test: add ignore for `PT012` violation Not sure how this could be rewritten. https://docs.astral.sh/ruff/rules/pytest-raises-with-multiple-statements/ * test: add `pytest.mark.xfail` for flaky `scatter_with_layered_histogram` test Struggling to find the source of the failure, the `mark` is my best guess but currently can't reproduce the following error: ``` ___________________________________________ test_compound_chart_examples[False-scatter_with_layered_histogram.py-all_rows18-all_cols18] ___________________________________________ [gw2] win32 -- Python 3.8.19 C:\Users\*\AppData\Local\hatch\env\virtual\altair\CXM7NV9I\hatch-test.py3.8\Scripts\python.exe filename = 'scatter_with_layered_histogram.py', all_rows = [2, 17], all_cols = [['gender'], ['__count']], to_reconstruct = False @pytest.mark.skipif(vf is None, reason="vegafusion not installed") # fmt: off @pytest.mark.parametrize("filename,all_rows,all_cols", [ ("errorbars_with_std.py", [10, 10], [["upper_yield"], ["extent_yield"]]), ("candlestick_chart.py", [44, 44], [["low"], ["close"]]), ("co2_concentration.py", [713, 7, 7], [["first_date"], ["scaled_date"], ["end"]]), ("falkensee.py", [2, 38, 38], [["event"], ["population"], ["population"]]), ("heat_lane.py", [10, 10], [["bin_count_start"], ["y2"]]), ("histogram_responsive.py", [20, 20], [["__count"], ["__count"]]), ("histogram_with_a_global_mean_overlay.py", [9, 1], [["__count"], ["mean_IMDB_Rating"]]), ("horizon_graph.py", [20, 20], [["x"], ["ny"]]), ("interactive_cross_highlight.py", [64, 64, 13], [["__count"], ["__count"], ["Major_Genre"]]), ("interval_selection.py", [123, 123], [["price_start"], ["date"]]), ("layered_chart_with_dual_axis.py", [12, 12], [["month_date"], ["average_precipitation"]]), ("layered_heatmap_text.py", [9, 9], [["Cylinders"], ["mean_horsepower"]]), ("multiline_highlight.py", [560, 560], [["price"], ["date"]]), ("multiline_tooltip.py", [300, 300, 300, 0, 300], [["x"], ["y"], ["y"], ["x"], ["x"]]), ("pie_chart_with_labels.py", [6, 6], [["category"], ["value"]]), ("radial_chart.py", [6, 6], [["values"], ["values_start"]]), ("scatter_linked_table.py", [392, 14, 14, 14], [["Year"], ["Year"], ["Year"], ["Year"]]), ("scatter_marginal_hist.py", [34, 150, 27], [["__count"], ["species"], ["__count"]]), ("scatter_with_layered_histogram.py", [2, 17], [["gender"], ["__count"]]), ("scatter_with_minimap.py", [1461, 1461], [["date"], ["date"]]), ("scatter_with_rolling_mean.py", [1461, 1461], [["date"], ["rolling_mean"]]), ("seattle_weather_interactive.py", [1461, 5], [["date"], ["__count"]]), ("select_detail.py", [20, 1000], [["id"], ["x"]]), ("simple_scatter_with_errorbars.py", [5, 5], [["x"], ["upper_ymin"]]), ("stacked_bar_chart_with_text.py", [60, 60], [["site"], ["site"]]), ("us_employment.py", [120, 1, 2], [["month"], ["president"], ["president"]]), ("us_population_pyramid_over_time.py", [19, 38, 19], [["gender"], ["year"], ["gender"]]), ]) # fmt: on @pytest.mark.parametrize("to_reconstruct", [True, False]) def test_compound_chart_examples(filename, all_rows, all_cols, to_reconstruct): source = pkgutil.get_data(examples_methods_syntax.__name__, filename) chart = eval_block(source) if to_reconstruct: # When reconstructing a Chart, Altair uses different classes # then what might have been originally used. See # vega/vegafusion#354 for more info. chart = alt.Chart.from_dict(chart.to_dict()) dfs = chart.transformed_data() if not to_reconstruct: # Only run assert statements if the chart is not reconstructed. Reason # is that for some charts, the original chart contained duplicated datasets # which disappear when reconstructing the chart. assert len(dfs) == len(all_rows) for df, rows, cols in zip(dfs, all_rows, all_cols): > assert len(df) == rows E assert 19 == 17 E + where 19 = len( bin_step_5_age bin_step_5_age_end gender __count\n0 45.0 50.0 M 247\n1 ... 11\n17 30.0 35.0 F 5\n18 70.0 75.0 F 1) tests\test_transformed_data.py:132: AssertionError ``` * ci: tidy up `ruff` section of `pyproject.toml` - removed `SIM` rules from `extend-safe-fixes`, only 1 required it and manual fixes should be fine for new code - Provided links for new config groups/settings #3431 (comment) * perf: adds config `pyproject.toml` for faster build, test runs - `pip` -> `uv` - `pytest` single-core -> logical max https://hatch.pypa.io/latest/how-to/environment/select-installer/#enabling-uv https://pytest-xdist.readthedocs.io/en/latest/distribution.html#running-tests-across-multiple-cpus Should speed up CI, although GitHub Actions only runs on 4 cores iirc * ci: adds `update-init-file` hatch script * ci: adds newer-style `hatch` test config I've attempted to recreate the existing `test` scripts, but haven't quite worked out the `coverage` parts. This provides: matrix python version testing, parallel tests (within a python version), and slightly shorter commands: ```bash >>> hatch run test >>> hatch test >>> hatch run test-coverage >>> hatch test --cover ``` * feat(typing): Use `TYPE_CHECKING` block for `schema` modules - Removes the need for `_Parameter` protocol - Reduce lines of code by more than 5000, by removing prefix for `SchemaBase`, `Parameter` * test: use a more reliable `xfail` condition for flaky test The difficulty in reproducing came from the python debugger disabling `xdist`. * build: Embed extra `ruff` calls in `generate-schema-wrapper` into the python script Hopefully solves: https://github.com/vega/altair/actions/runs/9456437889/job/26048327764?pr=3431 * build: Manually rewrite certain exceptions with flaky autofix These were previously fixed by `EM` rules, but seem to not always occur - leading to CI fail https://github.com/vega/altair/actions/runs/9466550650/job/26078482982#step:8:60 * refactor: remove now-unneeded `PARAMETER_PROTOCOL` * refactor: replace existing references to `typing.Optional` * refactor(typing): rename `T` TypeVar to `TSchemaBase` * feat(typing): Adds dedicated `Optional` alias for `Union[..., UndefinedType]` `typing.Optional` is deprecated and is no longer needed in `altair`. The new alias improves the readability of generated signatures, by reducing noise (repeating `UndefinedType`) and clearly grouping together the options for an argument. * refactor(typing): Remove `UndefinedType` dependency in `api` * refactor(typing): Remove `UndefinedType` dependency in `api` * refactor(typing): Remove annotation scope `UndefinedType` dependency in `api` Aligns `api` with `schema` changes in fbc02e2 * ci: add `W291` to ensure trailing whitespace is autofixed * refactor: define non-relevant attributes closer to imports in `update_init_file` Will make it easier to keep the two in sync, and now they can be copy/pasted - rather than adding an `or attr is _` expression * feat(typing): Adds `_TypeAliasTracer` and reorders `generate_schema_wrapper` to collect unique `Literal` types The next commit will show the changes from build * build: run `generate-schema-wrapper` using `_TypeAliasTracer` This commit reduces the length of each module in `schema` by ~50-70% and provides the each alias in the new `_typing` module. ## Existing art [`pandas`](https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py#L199) and [`polars`](https://github.com/pola-rs/polars/blob/faf0f061eead985a8b29e4584b619fb5d8647e31/py-polars/polars/dataframe/frame.py#L105) similarly define a large number of `TypeAlias`, `Union`, `Literal` types in a single file. These definitions can be imported anywhere in the project - without concern for circular dependencies. It helps keep signatures consistent, and in many cases, simplifies those signatures such that the user can quickly understand what they represent. ## Future config The generated names are currently: ```py f"{SchemaInfo.title}_T" ``` so as to not conflict with the names defined in `core`/`channels`. I have left this configurable via: ```py _TypeAliasTracer(fmt=...) ``` Personally don't mind what the aliases are called, so if anyone has any ideas please feel free to change. * fix: Add missing `LiteralString` import * test: add `pytest.mark.filterwarnings` for tests that cannot avoid them - `pandas` warnings cannot trivially be avoided, as they need to be compatible with `pandas~=0.25` and are used in examples. - I have been unable to track down what specific calls are triggering the `traitlets` warnings. There seems to be multiple levels of indirection, but the stacklevel reported is external to `altair`. * refactor(typing): Replace `Literal[None]` -> `None` Came across [Literal[None] conversion](https://typing.readthedocs.io/en/latest/spec/literal.html#legal-parameters-for-literal-at-type-check-time) during #3426 (comment) * test: Change `skipif` -> `xfail` for `test_sanitize_pyarrow_table_columns` on `win32` Seems to be a better fit, considering this would fail for `win32` devs not using `pyarrow` but have it installed. This could be removed in the future if installing/confirming install `tzdata` to the correct location was handled during environment activation. Possibly resolves: #3431 (comment) * ci: bump `actions/setup-python`, `python-version`, use `uv` in `lint.yml` I've made quite a few improvements to performance locally like fd20f00, but only `docbuild.yml` uses `hatch` so they haven't reduced CI run time. This is more of a test run on the simplest workflow to identify any issues. * ci: create venv before uv pip install * ci: ensure venv is activated after install * ci: use environment provided by `hatch` * ci: testing `pytest-xdist` in `build` workflow Would be faster if using `uv`/`hatch` instead of `pip` - but I'm not sure how to untangle `pip` from this yet. * test(perf): adds significant parallelism for slow `test_examples` #3431 (comment) * refactor, perf: reduce `sys.path` usage, use dictcomp in `update_init_file` # `sys.path` Mainly modules in `tools`, but also `tests`, `sphinxext` repeat the same pattern of manipulating `sys.path` to find imports. In some cases, I think this wasn't need at all - but others were resolves by adding `tools.__init__.py`. - Aligns with many of the other sub/packages - Removes need for ignoring `E402` - Makes many of the imports in these modules explicit, and therefore easier to follow. # `update_init_file` While rewriting the imports here, I saw an opportunity to simplify gathering relevant attributes. - `attr = getattr(alt, attr_name)` using the keys of `__dict__`, was a slow equivalent of a dictionary comprehension. - `str(Path(alt.__file__).parent)` is a verbose `str(Path.cwd())` - An earlier commit of mine used this instead of `os.path` - This version is much shorter and obvious in what it does * build: adding debug message to help with build failure The previous run worked locally due to `hatch` managing `sys.path` - but not in CI, which doesn't use hatch yet https://github.com/vega/altair/actions/runs/9527747043/job/26264700374 * build: add cwd to debug message * fix: possibly fix cwd not on path * ci: testing invoking script with `-m` for cwd * fix: remove debug code * fix: resolve lint, format, type conflicts following rebase Fixes https://github.com/vega/altair/actions/runs/9581196541/job/26417448238?pr=3431 * DO NOT MERGE - TESTING DOC PERF * revert: undo last commit Intended to discard the changes, but accidentally pushed * refactor: Remove commented out dead code Fixes #3431 (comment) * ci: Remove extra whitespace in `build.yml` Fixes #3431 (comment) --------- Co-authored-by: Stefan Binder <[email protected]>

I should have removed this in vega#3426 but better late than never

I should have removed this in #3426 but better late than never

dangotbanned added 16 commits May 26, 2024 12:34

refactor: Remove unneeded @curried.curry from to_values

3caf7ff

Currying a single argument function is equivalent to referencing the function itself. This only would've added performance overhead.

refactor: Remove toolz.curried.pipe dependency in _magics

d36a2c4

NOTE: Unsure why this differs from `vegalite.v5.api._prepare_data`. Have only removed the dependency.

refactor: Replaced attribute checks with instance checks for `Support…

d6dce19

…sGeoInterface` Now possible due to `@runtime_checkable`

refactor: Simplified check_data_type

aae983a

Added missing `sys` import from earlier commit

refactor: Remove @curried.curry dependency from sample

c33433f

Same pattern as `limit_rows`. Also aliased the return type, aligning with the other transformers.

refactor: Remove top-level toolz dependency from utils.data

bf99d72

The nested imports are purely for a deprecation warning introduced in [4.10.0](vega@af4148d). 4 years on and over a major version later, would these be due for removal?

refactor: Remove toolz dependency in vegalite.data

26e77c4

- Both `@curried.curry` and `curried.pipe` no longer needed - Originally rewrote a generic curried pipe, but given that the two functions are explicitly named here - it seemed excessive.

refactor: Remove @curried.curry dependency in utils._vegafusion_data

3800fc5

- Required a new alias `NonLikeDataType` to resolve the overloads. - Merged the two default branches - `max_rows` should probably be removed since it is never used

fix: Update DataTransformerType to describe currying

b706a94

The original describes a function before currying, but not the defered variant.

fix: Ignore special form not a type

78d1a7c

This is valid for the `assert`. An alternative would be using `callable` from builtins, or reusing `PluginType`'s bounds/constraints.

fix: drop type subscript, not available in earlier versions

8cd40e2

fix: Use 3.8 compatible typing.Callable over collections.abc

8bfaf41

dangotbanned marked this pull request as ready for review May 28, 2024 10:00

dangotbanned changed the title ~~refactor(DRAFT): Remove toolz dependency~~ refactor: Remove toolz dependency Jun 1, 2024

dangotbanned mentioned this pull request Jun 5, 2024

Add additional ruff rules to pyproject.toml #3430

Closed

dangotbanned added 5 commits June 6, 2024 18:52

test: Remove toolz.pipe dependency in test_data

f34eb6f

build: drop toolz dependency

48eb2aa

refactor: Implement optional toolz imports

1e8167b

This preserves the existing `AltairDeprecationWarning` but removes the final traces of `toolz` dependency in `altair`.

test: adds test_toolz covering both pipe, curry

2f5aa56

test: fix assert match on test_toolz

60b8231

Forgot I replaced a literal string with regex

joelostblom added the maintenance label Jun 14, 2024

dangotbanned added 3 commits June 14, 2024 15:52

feat: Remove alt.curry, alt.pipe

6484161

vega#3426 (comment)

test: add F821 ignores in test_toolz

3e4edcb

maint: Remove test_toolz

650ee0e

dangotbanned requested review from joelostblom and mattijn June 14, 2024 15:14

Merge branch 'main' into remove-toolz-depend

ed13af6

dangotbanned added 3 commits June 18, 2024 12:13

chore: merge main

7b6e0d7

Merge branch 'remove-toolz-depend' of https://github.com/dangotbanned…

7b0e5d8

…/altair into remove-toolz-depend

revert: remove now-uneeded import_toolz_function

88fbd11

fix(typing): add missing ... for some overloads

bd949a9

Spotted while testing a solution for [comment](vega#3426 (comment)). This isn't a solution for that, but ensures the overloads solve correctly.

docs(typing): Remove NonGeoDataType and amend error message in `_da…

5780f21

…ta_to_csv_string` See vega#3426 (comment) See vega#3441

mattijn reviewed Jun 19, 2024

View reviewed changes

altair/utils/_vegafusion_data.py Show resolved Hide resolved

dangotbanned added 2 commits June 19, 2024 09:35

fix: ensure SupportsGeoInterface branch is triggered in `vegafusion…

9f0d58a

…_data_transformer` Fixes: vega#3426 (comment)

Merge branch 'vega:main' into remove-toolz-depend

7a84764

mattijn merged commit 76a9ce1 into vega:main Jun 19, 2024
11 checks passed

dangotbanned deleted the remove-toolz-depend branch June 19, 2024 08:59

dangotbanned mentioned this pull request Jun 26, 2024

Remove PyArrow dependency for Polars support #3445

Closed

dangotbanned mentioned this pull request Jul 1, 2024

Deprecation/breaking changes policy & process #3454

Closed

dangotbanned added a commit to dangotbanned/altair that referenced this pull request Jul 11, 2024

ci: remove toolz from [[tool.mypy.overrides]]

7b99ff5

I should have removed this in vega#3426 but better late than never

dangotbanned mentioned this pull request Jul 11, 2024

ci: remove toolz from [[tool.mypy.overrides]] #3474

Merged

mattijn pushed a commit that referenced this pull request Jul 11, 2024

ci: remove toolz from [[tool.mypy.overrides]] (#3474)

2bea09b

I should have removed this in #3426 but better late than never

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Remove `toolz` dependency #3426

refactor: Remove `toolz` dependency #3426

dangotbanned commented May 26, 2024 •

edited

Loading

mattijn commented Jun 6, 2024

dangotbanned commented Jun 6, 2024 •

edited

Loading

mattijn commented Jun 6, 2024

dangotbanned commented Jun 6, 2024 •

edited

Loading

mattijn commented Jun 12, 2024

dangotbanned commented Jun 12, 2024

dangotbanned commented Jun 14, 2024 •

edited

Loading

mattijn commented Jun 18, 2024

dangotbanned commented Jun 18, 2024 •

edited

Loading

mattijn commented Jun 18, 2024

dangotbanned commented Jun 18, 2024

mattijn commented Jun 19, 2024

mattijn commented Jun 19, 2024

refactor: Remove toolz dependency #3426

refactor: Remove toolz dependency #3426

Conversation

dangotbanned commented May 26, 2024 • edited Loading

Additional Notes

Edit

mattijn commented Jun 6, 2024

dangotbanned commented Jun 6, 2024 • edited Loading

mattijn commented Jun 6, 2024

dangotbanned commented Jun 6, 2024 • edited Loading

mattijn commented Jun 12, 2024

dangotbanned commented Jun 12, 2024

#3426

Other PRs

#3431

#3427

dangotbanned commented Jun 14, 2024 • edited Loading

mattijn commented Jun 18, 2024

dangotbanned commented Jun 18, 2024 • edited Loading

mattijn commented Jun 18, 2024

dangotbanned commented Jun 18, 2024

mattijn commented Jun 19, 2024

mattijn commented Jun 19, 2024

refactor: Remove `toolz` dependency #3426

refactor: Remove `toolz` dependency #3426

dangotbanned commented May 26, 2024 •

edited

Loading

dangotbanned commented Jun 6, 2024 •

edited

Loading

dangotbanned commented Jun 6, 2024 •

edited

Loading

dangotbanned commented Jun 14, 2024 •

edited

Loading

dangotbanned commented Jun 18, 2024 •

edited

Loading