06 May 14:28

612d25c

Release 0.19.0: Polars validation support

✨ Highlights ✨

📣 Pandera now supports validation of polars.DataFrame and polars.LazyFrame 🐻‍❄️!

You can now do this:

import pandera.polars as pa
import polars as pl


class Schema(pa.DataFrameModel):
    state: str
    city: str
    price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})


lf = pl.LazyFrame(
    {
        'state': ['FL','FL','FL','CA','CA','CA'],
        'city': [
            'Orlando',
            'Miami',
            'Tampa',
            'San Francisco',
            'Los Angeles',
            'San Diego',
        ],
        'price': [8, 12, 10, 16, 20, 18],
    }
)
Schema.validate(lf).collect()

And of course you can do functional validation with decorators like so:

from pandera.typing.polars import LazyFrame

@pa.check_types
def function(lf: LazyFrame[Schema]) -> LazyFrame[Schema]:
    return lf.filter(pl.col("state").eq("CA"))

function(lf).collect()

You can read more about the integration here. Not all pandera features are supported at this point, but depending on community demand/contributions we'll slowly add them. To learn more about what's currently supported, check out this table.

Special shoutout to @AndriiG13 and @FilipAisot for their contributions on the built-in checks and polars datatypes, respectively, and to @evanrasmussen9, @baldwinj30, @obiii, @Filimoa, @philiporlando, @r-bar, @alkment, @jjfantini, and @robertdj for their early feedback and bug reports during the 0.19.0 beta.

What's Changed

Support polars DataFrames, LazyFrames by @cosmicBboy, @AndriiG13, and @FilipAisot in #1373
bugfix: optional columns in polars schema should no longer raise errors when not present by @cosmicBboy in #1532
check_nullable does not uselessly compute isna() anymore in pandas backend by @smarie in #1538
Polars LazyFrames are validated at the schema-level by default by @cosmicBboy in #1534
Enable from_format_kwargs for dict format by @ektar in #1539
Convert docs to myst by @cosmicBboy in #1542
fix README(tab to space) by @np-yoe in #1544
pandas DataFrameModel accepts python generic types by @cosmicBboy in #1547
Backend registration happens at schema initialization by @cosmicBboy in #1548
do not format if test is not necessary by @mattB1989 in #1530
Register default backends when restoring state by @alkment in #1550
Bump actions/setup-python from 4 to 5 by @dependabot in #1452
fix: prevent environment pollution when importing pyspark by @sam-goodwin in #1552
use rst to speed up api docs generation by @cosmicBboy in #1557
Add _GenericAlias.call patch by @cosmicBboy in #1561
support typeguard < 3 for better compatability by @cosmicBboy in #1563
Add parse function to DataFrameModel in #1181
localize GenericAlias patch to DataFrameBase subclasses by @cosmicBboy in #1571
Bump idna from 3.4 to 3.7 by @dependabot in #1569
docs: fix typo in env var name by @alekseik1 in #1562
polars: fix element-wise checks, register backends by @cosmicBboy in #1572
remove pytest ignore on modin, dask. pyspark tests with pandas >= 2 by @cosmicBboy in #1573
make sure check name is propagated to error report by @cosmicBboy in #1574
update ci to run pyspark, modin, dask with pandas >= v2 by @cosmicBboy in #1575
use sphinx-design instead of sphinx-panels by @cosmicBboy in #1581
Update bug_report.md by @philiporlando in #1585
bugfix: polars column core checks now return check output by @cosmicBboy in #1586
make pandera.typing.Series[TYPE] error in polars DataFrameModel more readable by @cosmicBboy in #1588
implement timezone agnostic polars_engine.DateTime type by @cosmicBboy in #1589
fix pyspark import error by @cosmicBboy in #1591
fix pyspark tests when run on full test suite by @cosmicBboy in #1593
Bugfix/1580 by @cosmicBboy in #1596
Set pandas_io.from_frictionless_schema to use a raw string for docs by @mark-thm in #1597
Add a generic Series type for polars by @baldwinj30 in #1595
Add StructType and DDL extraction from Pandera schemas by @filipeo2-mck in #1570
Clean up typing for pandas GenericDtype by @cosmicBboy in #1601
Adding warning for unique in pyspark field and a test showing the issue as well as config when it works. by @zippeurfou in #1592
bugfix/1607: coercion error should correctly report relevant failure cases by @cosmicBboy in #1608
Create a common DataFrameSchema class, update mypy used in pre-commit by @cosmicBboy in #1609
Dataframe column schema by @cosmicBboy in #1611
bugfix: column-level coercion is properly implemented by @cosmicBboy in #1612
update docs for polars by @cosmicBboy in #1613
fix: properly coerce dtypes for columns with regex=True by @tesslinden in #1602
rewrite Check class docstrings to remove pandas assumption by @cosmicBboy in #1614
add tests for polars decorators by @cosmicBboy in #1615

New Contributors

@smarie made their first contribution in #1538
@ektar made their first contribution in #1539
@np-yoe made their first contribution in #1544
@alkment made their first contribution in #1550
@sam-goodwin made their first contribution in #1552
@alekseik1 made their first contribution in #1562
@philiporlando made their first contribution in #1585
@mark-thm made their first contribution in #1597
@baldwinj30 made their first contribution in #1595
@zippeurfou made their first contribution in #1592
@tesslinden made their first contribution in #1602

Full Changelog: v0.18.3...v0.19.0

Contributors

cosmicBboy, smarie, and 21 other contributors

Assets 2

0 Join discussion

03 May 04:11

cosmicBboy

v0.19.0b4

d058f71

Beta release: v0.19.0b4 Pre-release

Pre-release

What's Changed

fix pyspark tests when run on full test suite by @cosmicBboy in #1593
Bugfix/1580 by @cosmicBboy in #1596
Set pandas_io.from_frictionless_schema to use a raw string for docs by @mark-thm in #1597
Add a generic Series type for polars by @baldwinj30 in #1595
Add StructType and DDL extraction from Pandera schemas by @filipeo2-mck in #1570
Clean up typing for pandas GenericDtype by @cosmicBboy in #1601
Adding warning for unique in pyspark field and a test showing the issue as well as config when it works. by @zippeurfou in #1592
bugfix/1607: coercion error should correctly report relevant failure cases by @cosmicBboy in #1608
Create a common DataFrameSchema class, update mypy used in pre-commit by @cosmicBboy in #1609

New Contributors

@mark-thm made their first contribution in #1597
@baldwinj30 made their first contribution in #1595
@zippeurfou made their first contribution in #1592

Full Changelog: v0.19.0b3...v0.19.0b4

Contributors

cosmicBboy, zippeurfou, and 3 other contributors

Assets 2

20 Apr 00:58

cosmicBboy

v0.19.0b3

e4eb3a5

Beta release: v0.19.0b3 Pre-release

Pre-release

What's Changed

fix pyspark import error by @cosmicBboy in #1591

Full Changelog: v0.19.0b2...v0.19.0b3

Contributors

cosmicBboy

Assets 2

19 Apr 15:34

cosmicBboy

v0.19.0b2

c1e7c06

Beta release 0.19.0b2 Pre-release

Pre-release

What's Changed

do not format if test is not necessary by @mattB1989 in #1530
Register default backends when restoring state by @alkment in #1550
Bump actions/setup-python from 4 to 5 by @dependabot in #1452
fix: prevent environment pollution when importing pyspark by @sam-goodwin in #1552
use rst to speed up api docs generation by @cosmicBboy in #1557
Add _GenericAlias.call patch by @cosmicBboy in #1561
support typeguard < 3 for better compatability by @cosmicBboy in #1563
Add parse function to DataFrameModel in #1181
localize GenericAlias patch to DataFrameBase subclasses by @cosmicBboy in #1571
Bump idna from 3.4 to 3.7 by @dependabot in #1569
docs: fix typo in env var name by @alekseik1 in #1562
polars: fix element-wise checks, register backends by @cosmicBboy in #1572
remove pytest ignore on modin, dask. pyspark tests with pandas >= 2 by @cosmicBboy in #1573
make sure check name is propagated to error report by @cosmicBboy in #1574
update ci to run pyspark, modin, dask with pandas >= v2 by @cosmicBboy in #1575
use sphinx-design instead of sphinx-panels by @cosmicBboy in #1581
Update bug_report.md by @philiporlando in #1585
bugfix: polars column core checks now return check output by @cosmicBboy in #1586
make pandera.typing.Series[TYPE] error in polars DataFrameModel more readable by @cosmicBboy in #1588
implement timezone agnostic polars_engine.DateTime type by @cosmicBboy in #1589

New Contributors

@alkment made their first contribution in #1550
@sam-goodwin made their first contribution in #1552
@alekseik1 made their first contribution in #1562
@philiporlando made their first contribution in #1585

Full Changelog: v0.19.0b1...v0.19.0b2

Contributors

cosmicBboy, alkment, and 5 other contributors

Assets 2

05 Apr 02:13

cosmicBboy

v0.19.0b1

58c5e45

Beta release 0.19.0b1 Pre-release

Pre-release

What's Changed

Support polars DataFrames, LazyFrames by @cosmicBboy in #1373
bugfix: optional columns in polars schema should no longer raise errors when not present by @cosmicBboy in #1532
check_nullable does not uselessly compute isna() anymore in pandas backend by @smarie in #1538
Polars LazyFrames are validated at the schema-level by default by @cosmicBboy in #1534
Enable from_format_kwargs for dict format by @ektar in #1539
Convert docs to myst by @cosmicBboy in #1542
fix README(tab to space) by @np-yoe in #1544
pandas DataFrameModel accepts python generic types by @cosmicBboy in #1547
Backend registration happens at schema initialization by @cosmicBboy in #1548

New Contributors

@smarie made their first contribution in #1538
@ektar made their first contribution in #1539
@np-yoe made their first contribution in #1544

Full Changelog: v0.18.3...v0.19.0b1

Contributors

cosmicBboy, smarie, and 2 other contributors

Assets 2

15 Mar 05:15

cosmicBboy

v0.19.0b0

7d1b1ba

Beta release 0.19.0b0: Polars integration Pre-release

Pre-release

What's Changed

Support polars DataFrames, LazyFrames by @cosmicBboy, @AndriiG13, and @FilipAisot in #1373

Full Changelog: v0.18.3...v0.19.0b0

Contributors

cosmicBboy, AndriiG13, and FilipAisot

Assets 2

11 Mar 18:57

cosmicBboy

v0.18.3

17c558f

Release v0.18.3: Bugfix issue with SeriesSchema Index validation

What's Changed

bugfix: add index validation to SeriesSchema by @cosmicBboy in #1524

Full Changelog: v0.18.2...v0.18.3

Contributors

cosmicBboy

Assets 2

11 Mar 06:34

cosmicBboy

v0.18.2

6c11fbb

Release v0.18.2: Docs fix - try pandera page.

docs fix release 0.18.2

Assets 2

11 Mar 02:13

cosmicBboy

v0.18.1

0c2533a

Release v0.18.1: Granular control of validation on pandas dfs.

✨ Highlights ✨

Granular control of pandas validation #1490

There is now support for granular control of schema-level or data-level validations. This can be done via the PANDERA_VALIDATION_DEPTH environment variable. Schema-level (or metadata) validation includes things like column name checks and column data types, while data-level validation involves checks that operate on actual data values.

export PANDERA_VALIDATION_DEPTH= SCHEMA_AND_DATA  # check schema- and data-level checks (default)
export PANDERA_VALIDATION_DEPTH=SCHEMA_ONLY  # only do schema-level checks
export PANDERA_VALIDATION_DEPTH=DATA_ONLY  # only do data-level checks

Efficient Hypothesis strategies #1503

Pandas data synthesis strategies now uses comparison operator functions for more efficient data synthesis. It also updates the minimum hypothesis version to 6.92.7.

What's Changed

Fix copy-pasted docstring in PySpark accessor test by @deepyaman in #1448
Mypy precommit by @cosmicBboy in #1468
@check_types now properly passes in *args **kwargs and checks their types by @ecthompson99 in #1336
Bump starlette from 0.27.0 to 0.36.2 in /dev by @dependabot in #1484
Bump fastapi from 0.103.0 to 0.109.1 by @dependabot in #1482
Bump actions/cache from 3 to 4 by @dependabot in #1478
Bump codecov/codecov-action from 3 to 4 by @dependabot in #1477
Bump jinja2 from 3.1.2 to 3.1.3 by @dependabot in #1459
fix: pin multimethod dep version (#1485) by @schatimo in #1486
Fix issue where str dtype in a multiindex dataframe schema results in invalid example by @gsugar87 in #1050
Bump python-multipart from 0.0.6 to 0.0.7 by @dependabot in #1496
Bump python-multipart from 0.0.6 to 0.0.7 in /dev by @dependabot in #1495
Bump python-multipart from 0.0.6 to 0.0.7 in /ci by @dependabot in #1494
Bump jinja2 from 3.1.2 to 3.1.3 in /ci by @dependabot in #1457
Bump starlette from 0.27.0 to 0.36.2 in /dev by @dependabot in #1489
Bugfix/1463 Pandas 2.2.0 FutureWarning resolution by using assignment instead of … by @derinwalters in #1464
Bump jinja2 from 3.1.2 to 3.1.3 in /dev by @dependabot in #1458
add pandas 2.2.0 to tests, use uv for pip compile by @cosmicBboy in #1502
Efficient Hypothesis strategies by @Zac-HD in #1503
remove headers in requirements files by @cosmicBboy in #1512
Granular validations on pandas dfs by @kykyi in #1490

New Contributors

@deepyaman made their first contribution in #1448
@ecthompson99 made their first contribution in #1336
@schatimo made their first contribution in #1486
@gsugar87 made their first contribution in #1050
@Zac-HD made their first contribution in #1503

Full Changelog: v0.18.0...v0.18.1

Contributors

cosmicBboy, gsugar87, and 7 other contributors

Assets 2

0 Join discussion

08 Dec 21:11

cosmicBboy

v0.18.0

e99737f

Release v0.18.0: Pandas schemas supports global configuration

✨ Highlight ✨

Pandera now supports the configuration environment variable PANDERA_VALIDATION_ENABLED.
export PANDERA_VALIDATION_ENABLED=False now globally deactivates validation.

What's Changed

Bump urllib3 from 2.0.4 to 2.0.7 by @dependabot in #1383
Bump urllib3 from 2.0.5 to 2.0.7 in /dev by @dependabot in #1382
Bump urllib3 from 2.0.4 to 2.0.7 in /ci by @dependabot in #1381
Bugfix/1278 add_missing_columns assorted bugfixes by @derinwalters in #1372
Fix lack of support for new TimestampNTZType in Spark 3.4 datatypes by @filipeo2-mck in #1385
Current pip-compile usage does not have --no-emit-index-url by @filipeo2-mck in #1390
Avoid throwing exception on Union types by @mjgp2 in #1378
Fix optional fields in PySpark SQL by @filipeo2-mck in #1387
Add support for unique validation in PySpark by @filipeo2-mck in #1396
Enhancement to support GeoDataFrame, Geometry coercion, and CRS (Feature/1108) by @derinwalters in #1392
fix issue for optional fields by @coobas in #1258
Fix validating pyspark dataframes with regex columns by @lexanth in #1397
Bump pyarrow from 13.0.0 to 14.0.1 by @dependabot in #1417
Bump pyarrow from 13.0.0 to 14.0.1 in /dev by @dependabot in #1416
Bump pyarrow from 13.0.0 to 14.0.1 in /ci by @dependabot in #1415
[BUGFIX] [PYSPARK] Avoid running nullable checks if nullable=True by @filipeo2-mck in #1403
Add Date type to pandera.all by @diederikperdok in #1419
Fix disabling validation for PySpark DataFrame Schemas by @maxispeicher in #1407
Bump actions/checkout from 3 to 4 by @dependabot in #1361
[PySpark] Improve validation performance by enabling cache()/unpersist() toggles by @filipeo2-mck in #1414
Bump urllib3 from 2.0.5 to 2.0.7 by @dependabot in #1420
Generate localized timestamps in multiindex examples by @rob-sil in #1426
feature: support string column validation for pandas 2.1.3 by @karlma821 in #1425
Add support for PANDERA_VALIDATION_ENABLED for pandas and Configuration docs by @noklam in #1354
update total download badge and fix contributing instructions by @cosmicBboy in #1436
update cache dataframe config args, fix tests by @cosmicBboy in #1437
Bump jupyter-server from 2.7.3 to 2.11.2 in /dev by @dependabot in #1440
Bump cryptography from 41.0.4 to 41.0.6 by @dependabot in #1435
Bump jupyter-server from 2.7.2 to 2.11.2 by @dependabot in #1441

New Contributors

@filipeo2-mck made their first contribution in #1385
@mjgp2 made their first contribution in #1378
@coobas made their first contribution in #1258
@lexanth made their first contribution in #1397
@diederikperdok made their first contribution in #1419
@maxispeicher made their first contribution in #1407
@rob-sil made their first contribution in #1426
@karlma821 made their first contribution in #1425
@noklam made their first contribution in #1354

Full Changelog: v0.17.2...v0.18.0

Contributors

mjgp2, lexanth, and 10 other contributors

Assets 2

0 Join discussion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Highlights ✨

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

✨ Highlights ✨

Granular control of pandas validation #1490

Efficient Hypothesis strategies #1503

What's Changed

New Contributors

Contributors

✨ Highlight ✨

What's Changed

New Contributors

Contributors

Releases: unionai-oss/pandera

Release 0.19.0: Polars validation support

✨ Highlights ✨

What's Changed

New Contributors

Contributors

Beta release: v0.19.0b4

What's Changed

New Contributors

Contributors

Beta release: v0.19.0b3

What's Changed

Contributors

Beta release 0.19.0b2

What's Changed

New Contributors

Contributors

Beta release 0.19.0b1

What's Changed

New Contributors

Contributors

Beta release 0.19.0b0: Polars integration

What's Changed

Contributors

Release v0.18.3: Bugfix issue with SeriesSchema Index validation

What's Changed

Contributors

Release v0.18.2: Docs fix - try pandera page.

Release v0.18.1: Granular control of validation on pandas dfs.

✨ Highlights ✨

Granular control of pandas validation #1490

Efficient Hypothesis strategies #1503

What's Changed

New Contributors

Contributors

Release v0.18.0: Pandas schemas supports global configuration

✨ Highlight ✨

What's Changed

New Contributors

Contributors