Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate function run on DataFrameSchema w/ lazy=true doesn't report all violations on non nullable column #532

Closed
3 tasks done
alex-tully opened this issue Jun 28, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@alex-tully
Copy link

Describe the bug
When a column is set to non-nullable and a nullable value is supplied in the data frame then no further validations are executed for that column and all failure cases are not reported for that column. I am hoping you'll tell me that I am doing something wrong 😃

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the master branch of pandera.

Code Sample

import pandas as pd
import pandera as pa
column = pa.Column(pa.String, name="id", nullable=False, required=True, allow_duplicates=False)
column.checks.append(pa.Check.str_matches(r"^ID[\d]{3}$"))
#column.has_subcomponents = True - should i switch this on?
schema = pa.DataFrameSchema({
     "id": column
})
data = ["ID001", None, "XXX"]
df = pd.DataFrame(data, columns=["id"])
try:
    schema.validate(df, lazy=True)
except pa.errors.SchemaErrors as exc:
    print(exc.failure_cases)

Output

schema_context column check check_number failure_case index
0 Column id non_nullable None None 1

Expected behavior

I would expect all validations to be performed and failure cases to be reported, in this case i would expect the output to be

schema_context column check check_number failure_case index
0 Column id non_nullable None None 1
1 Column id str_matches(...) ... XXX 2

Desktop (please complete the following information):

  • OS: Amazon Linux 2
    • CPE OS Name: cpe:2.3:o:amazon:amazon_linux:2
    • Kernel: Linux 4.14.232-177.418.amzn2.x86_64
  • Version 0.6.4

Additional context

I have done some digging and have found this issue and #528 that offers a fix. This doesn't resolve the nullable issue (i don't think).

In schemas.py the nullable check is here and the error handler is initialized here it appears that the lazy parameter that is passed in here only comes as None unless has_subcomponents is true which has been removed in the latest dev branch.

@alex-tully alex-tully added the bug Something isn't working label Jun 28, 2021
@cosmicBboy
Copy link
Collaborator

fixed by #550

@alex-tully
Copy link
Author

@cosmicBboy thanks for fixing, do you know when this will be released?

@cosmicBboy
Copy link
Collaborator

cosmicBboy commented Jul 12, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants