Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas_dtype should be support built-in collection types like list, dict, set #260

Closed
cosmicBboy opened this issue Aug 16, 2020 · 6 comments · Fixed by #1171
Closed

pandas_dtype should be support built-in collection types like list, dict, set #260

cosmicBboy opened this issue Aug 16, 2020 · 6 comments · Fixed by #1171
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@cosmicBboy
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Pandas does not natively support a dtype representation for collections like lists, dicts, sets, and iterables. For these types the corresponding pandas data type is object. This may obfuscate the actual type of a column for users that rely on these types being present in a particular column.

Describe the solution you'd like

Extend the PandasDtype representation and support for list, dict, and set types.

  • abstract away the type-check handling currently in SeriesSchemaBase.validate
  • handle logic for checking the dtype for collection types, e.g. with `series.map(lambda x: x.isinstance(x, list))

It may be even nicer to support typing like:

  • List[int]
  • Dict[str, int]

And for pandera to verify types like this.

@cosmicBboy cosmicBboy added the enhancement New feature or request label Aug 16, 2020
@jdvala
Copy link

jdvala commented Feb 24, 2021

Has this been in consideration?

@cosmicBboy
Copy link
Collaborator Author

Yes @jdvala, it still needs to be prioritized in the release roadmap, it's dependent on #369, which is a re-vamping of the pandera typing system, which should make this feature easier to implement.

@exitNA
Copy link

exitNA commented Apr 21, 2022

#369 has been solved, what about this feature ?

@anantzoid
Copy link

Looks like #369 is merged. What's the status of this?

@cosmicBboy
Copy link
Collaborator Author

@anantzoid current status is help wanted. Open to contribution!

Basically would require:

  • creating new pandera datatypes (see here) that supports:
    • lists: list, List[...]
    • dictionaries: dict, Dict[...]
    • etc.
  • adding unit tests for these

This would basically use object as the underlying pandas type, and using the logical data type system to check the actual values of the data_container to make sure the types are correct.

@cosmicBboy
Copy link
Collaborator Author

Note: based on this thread we also want the pandera datatype system to handle unhashable types (sets, lists)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants