Skip to content

Releases: unionai-oss/pandera

Release 0.19.0: Polars validation support

06 May 14:28
612d25c
Compare
Choose a tag to compare

✨ Highlights ✨

📣 Pandera now supports validation of polars.DataFrame and polars.LazyFrame 🐻‍❄️!

You can now do this:

import pandera.polars as pa
import polars as pl


class Schema(pa.DataFrameModel):
    state: str
    city: str
    price: int = pa.Field(in_range={"min_value": 5, "max_value": 20})


lf = pl.LazyFrame(
    {
        'state': ['FL','FL','FL','CA','CA','CA'],
        'city': [
            'Orlando',
            'Miami',
            'Tampa',
            'San Francisco',
            'Los Angeles',
            'San Diego',
        ],
        'price': [8, 12, 10, 16, 20, 18],
    }
)
Schema.validate(lf).collect()

And of course you can do functional validation with decorators like so:

from pandera.typing.polars import LazyFrame

@pa.check_types
def function(lf: LazyFrame[Schema]) -> LazyFrame[Schema]:
    return lf.filter(pl.col("state").eq("CA"))

function(lf).collect()

You can read more about the integration here. Not all pandera features are supported at this point, but depending on community demand/contributions we'll slowly add them. To learn more about what's currently supported, check out this table.

Special shoutout to @AndriiG13 and @FilipAisot for their contributions on the built-in checks and polars datatypes, respectively, and to @evanrasmussen9, @baldwinj30, @obiii, @Filimoa, @philiporlando, @r-bar, @alkment, @jjfantini, and @robertdj for their early feedback and bug reports during the 0.19.0 beta.

What's Changed

New Contributors

Full Changelog: v0.18.3...v0.19.0

Beta release: v0.19.0b4

03 May 04:11
d058f71
Compare
Choose a tag to compare
Pre-release

What's Changed

  • fix pyspark tests when run on full test suite by @cosmicBboy in #1593
  • Bugfix/1580 by @cosmicBboy in #1596
  • Set pandas_io.from_frictionless_schema to use a raw string for docs by @mark-thm in #1597
  • Add a generic Series type for polars by @baldwinj30 in #1595
  • Add StructType and DDL extraction from Pandera schemas by @filipeo2-mck in #1570
  • Clean up typing for pandas GenericDtype by @cosmicBboy in #1601
  • Adding warning for unique in pyspark field and a test showing the issue as well as config when it works. by @zippeurfou in #1592
  • bugfix/1607: coercion error should correctly report relevant failure cases by @cosmicBboy in #1608
  • Create a common DataFrameSchema class, update mypy used in pre-commit by @cosmicBboy in #1609

New Contributors

Full Changelog: v0.19.0b3...v0.19.0b4

Beta release: v0.19.0b3

20 Apr 00:58
e4eb3a5
Compare
Choose a tag to compare
Pre-release

What's Changed

Full Changelog: v0.19.0b2...v0.19.0b3

Beta release 0.19.0b2

19 Apr 15:34
c1e7c06
Compare
Choose a tag to compare
Beta release 0.19.0b2 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v0.19.0b1...v0.19.0b2

Beta release 0.19.0b1

05 Apr 02:13
58c5e45
Compare
Choose a tag to compare
Beta release 0.19.0b1 Pre-release
Pre-release

What's Changed

  • Support polars DataFrames, LazyFrames by @cosmicBboy in #1373
  • bugfix: optional columns in polars schema should no longer raise errors when not present by @cosmicBboy in #1532
  • check_nullable does not uselessly compute isna() anymore in pandas backend by @smarie in #1538
  • Polars LazyFrames are validated at the schema-level by default by @cosmicBboy in #1534
  • Enable from_format_kwargs for dict format by @ektar in #1539
  • Convert docs to myst by @cosmicBboy in #1542
  • fix README(tab to space) by @np-yoe in #1544
  • pandas DataFrameModel accepts python generic types by @cosmicBboy in #1547
  • Backend registration happens at schema initialization by @cosmicBboy in #1548

New Contributors

Full Changelog: v0.18.3...v0.19.0b1

Beta release 0.19.0b0: Polars integration

15 Mar 05:15
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.18.3...v0.19.0b0

Release v0.18.3: Bugfix issue with SeriesSchema Index validation

11 Mar 18:57
17c558f
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.18.2...v0.18.3

Release v0.18.2: Docs fix - try pandera page.

11 Mar 06:34
6c11fbb
Compare
Choose a tag to compare

Release v0.18.1: Granular control of validation on pandas dfs.

11 Mar 02:13
0c2533a
Compare
Choose a tag to compare

✨ Highlights ✨

Granular control of pandas validation #1490

There is now support for granular control of schema-level or data-level validations. This can be done via the PANDERA_VALIDATION_DEPTH environment variable. Schema-level (or metadata) validation includes things like column name checks and column data types, while data-level validation involves checks that operate on actual data values.

export PANDERA_VALIDATION_DEPTH= SCHEMA_AND_DATA  # check schema- and data-level checks (default)
export PANDERA_VALIDATION_DEPTH=SCHEMA_ONLY  # only do schema-level checks
export PANDERA_VALIDATION_DEPTH=DATA_ONLY  # only do data-level checks

Efficient Hypothesis strategies #1503

Pandas data synthesis strategies now uses comparison operator functions for more efficient data synthesis. It also updates the minimum hypothesis version to 6.92.7.

What's Changed

New Contributors

Full Changelog: v0.18.0...v0.18.1

Release v0.18.0: Pandas schemas supports global configuration

08 Dec 21:11
e99737f
Compare
Choose a tag to compare

✨ Highlight ✨

Pandera now supports the configuration environment variable PANDERA_VALIDATION_ENABLED.
export PANDERA_VALIDATION_ENABLED=False now globally deactivates validation.

What's Changed

New Contributors

Full Changelog: v0.17.2...v0.18.0