Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to specify default value upon validation coercion #502

Closed
bfmcneill opened this issue May 27, 2021 · 18 comments
Closed

option to specify default value upon validation coercion #502

bfmcneill opened this issue May 27, 2021 · 18 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bfmcneill
Copy link

bfmcneill commented May 27, 2021

Describe the solution you'd like

During schema validation it would be helpful to not only coerce the data type but to also have the option to fill in NaN

Describe alternatives you've considered

This could be achieved through pandas dataframe manipulation but it would be pretty slick to have an option to default column value as part of the schema validation

Additional context

Perhaps there is a better way to achieve this which you might recommend?

@bfmcneill bfmcneill added the enhancement New feature or request label May 27, 2021
@cosmicBboy
Copy link
Collaborator

cosmicBboy commented May 29, 2021

Thanks @bfmcneill this is a good idea! This highlights the difference between parsing and validation: parsing modifies values to fulfill certain assumptions while validations just check that those assumptions are true, e.g. pydantic is primarily a parsing library while pandera is primarily a validation library.

The only parsing that pandera does today is through the coerce option, which modifies values to a specified type. There's been prior discussion about parsing #252... thanks for bringing back the momentum into this effort!

Providing an interface for specifying custom parsers will need a little more thought, but I think a default option at the column-level would make a lot of sense.

I was a little hesitant to expand the parsing capabilities of pandera, the main concern being that it encroaches on the concern of data manipulation (which the the pandas library optimized for), but I think as long as we constrain it to modifying values of the dataframe while preserving the overall shape of the dataframe, we'll be fine.

Let me know if you have other thoughts/would have capacity to make a contribution!

@rjurney
Copy link

rjurney commented Aug 25, 2022

I need to specify a default value for a pa.Field and I don't know how :( This would be incredibly helpful. I am having to make my own custom classes that contain SchemaModels and that I have a transform() and validate() methods in... I am recreating much of what Pandera has and it isn't clear why I should have to do this?

@rjurney
Copy link

rjurney commented Aug 30, 2022

I think default= at the Column or Field level makes a lot of sense.

@rbeucher
Copy link

rbeucher commented Oct 11, 2022

Hi, I would be happy to have a go at this.
@rjurney I have the same pb and things would be a lot simpler if there was a "default" value option.

I had a look at the other issues that @cosmicBboy tagged. The nullable column would also be a good thing.
The question is, what do we want the default value to do?

  • Work like pandas fillna and replace NaN values. ?
  • When a column is missing, create it and set values to default ? (a combination of options could achieve this)

In my case I would like it to work like fillna.
In pandas, fillna has the option to backfill or forward fill with the latest valid information. Is it something that could be useful?

Now there is the question of the implementation itself. I have done something that mirrors the implementation of the coerce method. That does work but maybe you have something else in mind?

Again happy to help. Let me know what you think

@cosmicBboy
Copy link
Collaborator

Hey @rbeucher thanks! I'm currently doing a major overhaul of pandera's internals, which should make adding new features like this easier: #381

You can check out progress on this branch: https://github.com/unionai-oss/pandera/tree/core-schema

I'm gonna try to get this done by the end of November, I'll ping you when it's ready!

@rbeucher
Copy link

Great. Looking forward to it. I haven't look at the other branch but did see that you mentioned some refactoring in other posts.
Ping me when ready. Happy to test it.

@alejandro-yousef
Copy link

Thanks to all Pandera contributors!

I would love a default argument for pa.Field equivalent to the one of Pydantic https://pydantic-docs.helpmanual.io/usage/schema/#field-customization:~:text=the%20following%20arguments%3A-,default,-%3A%20(a%20positional%20argument

@jtlz2
Copy link

jtlz2 commented Feb 27, 2023

What is the status of this issue, still open? Would love to see a default arg!

@cosmicBboy
Copy link
Collaborator

hi @jtlz2 this work is currently blocked by the completion of the pandera internals re-write, just need to clean up a few things before releasing 0.14.0, after which work on this issue can begin. Are you interested in making a PR?

@kykyi
Copy link
Contributor

kykyi commented Mar 20, 2023

@cosmicBboy I'd be interested 👌

@kykyi
Copy link
Contributor

kykyi commented Mar 21, 2023

@cosmicBboy https://github.com/unionai-oss/pandera/tree/core-schema gives me a 404, this may mean the PR has been merged and branch deleted?

If so would be keen to get started on this work. @rbeucher are you still interested?

@cosmicBboy
Copy link
Collaborator

Hi @kykyi yes 0.14.* now has all the re-write changes! Would appreciate a PR to get default values supported

At a high level, here's what needs to happen:

Please check out the contribution guide for the process of making a PR, and feel free to ask me any questions here!

@kykyi
Copy link
Contributor

kykyi commented Mar 21, 2023

Thanks @cosmicBboy I'll get started and ask questions here as I go 🙏 🚀 !!

@rbeucher
Copy link

Yes. I am still very much interested. I have not looked at 0.14 yet.

@kykyi
Copy link
Contributor

kykyi commented Mar 22, 2023

Hey @cosmicBboy do you mind please giving some early feedback on my fork? Bit of an open-source n00b so may need some hand holding 😄

@kykyi
Copy link
Contributor

kykyi commented Apr 12, 2023

Hey @cosmicBboy running make requirements I am unable to install codecov and the PyPi page is giving me a 404. I haven't used it before, wondering if there is some other way of installing?

@cosmicBboy
Copy link
Collaborator

@kykyi looks like codecov was yanked from pypi: https://twitter.com/hynek/status/1646162688676974594 will need to spend some time migrating to the new language-independent codecov uploader binary: https://docs.codecov.com/docs/codecov-uploader

@cosmicBboy
Copy link
Collaborator

fixed by #1136

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

7 participants