Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Set default storage for strings globaly #44203

Closed
taoufik07 opened this issue Oct 27, 2021 · 4 comments
Closed

ENH: Set default storage for strings globaly #44203

taoufik07 opened this issue Oct 27, 2021 · 4 comments
Labels
Enhancement Strings String extension data type and string data

Comments

@taoufik07
Copy link

taoufik07 commented Oct 27, 2021

Is your feature request related to a problem?

As far as I know, the only way to use the new string dtypes, is to set them manually either via the param dtype if it's supported or using astype otherwise the string will be stored as object.

Describe the solution you'd like

I'm aware that the string dtype is experimental, but it would be easy for us to play with it in large projects if there's a way to change the default behavior globally which can also be used in the future to set the storage for StringDtype.

@taoufik07 taoufik07 added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 27, 2021
@lithomas1 lithomas1 added Strings String extension data type and string data Usage Question Closing Candidate May be closeable, needs more eyeballs and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member Closing Candidate May be closeable, needs more eyeballs labels Oct 27, 2021
@lithomas1
Copy link
Member

Are you asking about using the StringArray by default when creating DataFrames or about using the setting the storage for StringArrays?
If you want to make StringArrays use the pyarrow based storage I think you can use pd.set_option(""string_storage", "pyarrow").

@taoufik07
Copy link
Author

Yes that and the ability to infer string as string and not object

>>> df = pd.DataFrame(["1", "2", "3"])
>>> df
   0
0  1
1  2
2  3
>>>
>>> df.dtypes
0    object
dtype: object # string instead

@lithomas1
Copy link
Member

Hmmm. I don't think we have a way to specify to infer the string datatype in the dataframe constructor.
I guess you could call .convert_dtypes() on the frame afterwards which would convert string columns to the string datatype and leave the other columns alone, if that's a suitable workaround.

@mroeschke
Copy link
Member

There is future.infer_string so I think we can close this out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

3 participants