option to specify default value upon validation coercion #502

bfmcneill · 2021-05-27T15:53:02Z

Describe the solution you'd like

During schema validation it would be helpful to not only coerce the data type but to also have the option to fill in NaN

Describe alternatives you've considered

This could be achieved through pandas dataframe manipulation but it would be pretty slick to have an option to default column value as part of the schema validation

Additional context

Perhaps there is a better way to achieve this which you might recommend?

cosmicBboy · 2021-05-29T16:07:35Z

Thanks @bfmcneill this is a good idea! This highlights the difference between parsing and validation: parsing modifies values to fulfill certain assumptions while validations just check that those assumptions are true, e.g. pydantic is primarily a parsing library while pandera is primarily a validation library.

The only parsing that pandera does today is through the coerce option, which modifies values to a specified type. There's been prior discussion about parsing #252... thanks for bringing back the momentum into this effort!

Providing an interface for specifying custom parsers will need a little more thought, but I think a default option at the column-level would make a lot of sense.

I was a little hesitant to expand the parsing capabilities of pandera, the main concern being that it encroaches on the concern of data manipulation (which the the pandas library optimized for), but I think as long as we constrain it to modifying values of the dataframe while preserving the overall shape of the dataframe, we'll be fine.

Let me know if you have other thoughts/would have capacity to make a contribution!

rjurney · 2022-08-25T23:32:20Z

I need to specify a default value for a pa.Field and I don't know how :( This would be incredibly helpful. I am having to make my own custom classes that contain SchemaModels and that I have a transform() and validate() methods in... I am recreating much of what Pandera has and it isn't clear why I should have to do this?

rjurney · 2022-08-30T01:39:34Z

I think default= at the Column or Field level makes a lot of sense.

rbeucher · 2022-10-11T08:03:23Z

Hi, I would be happy to have a go at this.
@rjurney I have the same pb and things would be a lot simpler if there was a "default" value option.

I had a look at the other issues that @cosmicBboy tagged. The nullable column would also be a good thing.
The question is, what do we want the default value to do?

Work like pandas fillna and replace NaN values. ?
When a column is missing, create it and set values to default ? (a combination of options could achieve this)

In my case I would like it to work like fillna.
In pandas, fillna has the option to backfill or forward fill with the latest valid information. Is it something that could be useful?

Now there is the question of the implementation itself. I have done something that mirrors the implementation of the coerce method. That does work but maybe you have something else in mind?

Again happy to help. Let me know what you think

cosmicBboy · 2022-10-18T18:37:35Z

Hey @rbeucher thanks! I'm currently doing a major overhaul of pandera's internals, which should make adding new features like this easier: #381

You can check out progress on this branch: https://github.com/unionai-oss/pandera/tree/core-schema

I'm gonna try to get this done by the end of November, I'll ping you when it's ready!

rbeucher · 2022-10-18T22:12:51Z

Great. Looking forward to it. I haven't look at the other branch but did see that you mentioned some refactoring in other posts.
Ping me when ready. Happy to test it.

alejandro-yousef · 2022-10-26T14:41:14Z

Thanks to all Pandera contributors!

I would love a default argument for pa.Field equivalent to the one of Pydantic https://pydantic-docs.helpmanual.io/usage/schema/#field-customization:~:text=the%20following%20arguments%3A-,default,-%3A%20(a%20positional%20argument

jtlz2 · 2023-02-27T15:59:03Z

What is the status of this issue, still open? Would love to see a default arg!

cosmicBboy · 2023-03-09T20:07:54Z

hi @jtlz2 this work is currently blocked by the completion of the pandera internals re-write, just need to clean up a few things before releasing 0.14.0, after which work on this issue can begin. Are you interested in making a PR?

kykyi · 2023-03-20T16:52:36Z

@cosmicBboy I'd be interested 👌

kykyi · 2023-03-21T10:14:08Z

@cosmicBboy https://github.com/unionai-oss/pandera/tree/core-schema gives me a 404, this may mean the PR has been merged and branch deleted?

If so would be keen to get started on this work. @rbeucher are you still interested?

cosmicBboy · 2023-03-21T15:49:51Z

Hi @kykyi yes 0.14.* now has all the re-write changes! Would appreciate a PR to get default values supported

At a high level, here's what needs to happen:

Add a default argument to ArraySchema and its subclasses: Column, Index, and Field
Update the corresponding backends for all of these components:
- ArraySchemaBackend, ColumnBackend, IndexBackend
- Basically the specific default-value-filling logic should be happening in ArraySchemaBackend.validate, somewhere between the data type coercion and the core checks
Add unit tests
Add documentation

Please check out the contribution guide for the process of making a PR, and feel free to ask me any questions here!

kykyi · 2023-03-21T20:47:18Z

Thanks @cosmicBboy I'll get started and ask questions here as I go 🙏 🚀 !!

rbeucher · 2023-03-21T22:23:50Z

Yes. I am still very much interested. I have not looked at 0.14 yet.

kykyi · 2023-03-22T18:54:09Z

Hey @cosmicBboy do you mind please giving some early feedback on my fork? Bit of an open-source n00b so may need some hand holding 😄

kykyi · 2023-04-12T16:07:44Z

Hey @cosmicBboy running make requirements I am unable to install codecov and the PyPi page is giving me a 404. I haven't used it before, wondering if there is some other way of installing?

cosmicBboy · 2023-04-12T18:20:04Z

@kykyi looks like codecov was yanked from pypi: https://twitter.com/hynek/status/1646162688676974594 will need to spend some time migrating to the new language-independent codecov uploader binary: https://docs.codecov.com/docs/codecov-uploader

cosmicBboy · 2023-04-17T17:59:32Z

fixed by #1136

bfmcneill added the enhancement New feature or request label May 27, 2021

cosmicBboy added the help wanted Extra attention is needed label Sep 13, 2021

jeffzi mentioned this issue Mar 28, 2022

Implement the ability to exclude fields in inherited models #805

Closed

cosmicBboy mentioned this issue Jul 20, 2022

Add nullable column when missing. #687

Open

derinwalters mentioned this issue Oct 22, 2022

Pandera dataframe in Pydantic model .dict() and .json() compatability #966

Closed

a-recknagel mentioned this issue Jan 26, 2023

Allow fallback coercion function as column/field argument #1082

Closed

kykyi mentioned this issue Mar 22, 2023

Add default column value param #1136

Merged

2 tasks

cosmicBboy closed this as completed Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

option to specify default value upon validation coercion #502

option to specify default value upon validation coercion #502

bfmcneill commented May 27, 2021 •

edited

Loading

cosmicBboy commented May 29, 2021 •

edited

Loading

rjurney commented Aug 25, 2022 •

edited

Loading

rjurney commented Aug 30, 2022

rbeucher commented Oct 11, 2022 •

edited

Loading

cosmicBboy commented Oct 18, 2022

rbeucher commented Oct 18, 2022

alejandro-yousef commented Oct 26, 2022

jtlz2 commented Feb 27, 2023

cosmicBboy commented Mar 9, 2023

kykyi commented Mar 20, 2023

kykyi commented Mar 21, 2023

cosmicBboy commented Mar 21, 2023

kykyi commented Mar 21, 2023

rbeucher commented Mar 21, 2023

kykyi commented Mar 22, 2023 •

edited

Loading

kykyi commented Apr 12, 2023

cosmicBboy commented Apr 12, 2023

cosmicBboy commented Apr 17, 2023

option to specify default value upon validation coercion #502

option to specify default value upon validation coercion #502

Comments

bfmcneill commented May 27, 2021 • edited Loading

Describe the solution you'd like

Describe alternatives you've considered

Additional context

cosmicBboy commented May 29, 2021 • edited Loading

rjurney commented Aug 25, 2022 • edited Loading

rjurney commented Aug 30, 2022

rbeucher commented Oct 11, 2022 • edited Loading

cosmicBboy commented Oct 18, 2022

rbeucher commented Oct 18, 2022

alejandro-yousef commented Oct 26, 2022

jtlz2 commented Feb 27, 2023

cosmicBboy commented Mar 9, 2023

kykyi commented Mar 20, 2023

kykyi commented Mar 21, 2023

cosmicBboy commented Mar 21, 2023

kykyi commented Mar 21, 2023

rbeucher commented Mar 21, 2023

kykyi commented Mar 22, 2023 • edited Loading

kykyi commented Apr 12, 2023

cosmicBboy commented Apr 12, 2023

cosmicBboy commented Apr 17, 2023

bfmcneill commented May 27, 2021 •

edited

Loading

cosmicBboy commented May 29, 2021 •

edited

Loading

rjurney commented Aug 25, 2022 •

edited

Loading

rbeucher commented Oct 11, 2022 •

edited

Loading

kykyi commented Mar 22, 2023 •

edited

Loading