You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#39968, which is included in version 16.0.0 release, added a Python binding of C++ AzureFileSystem to the PyArrow API.
However, (1) this addition of the native file system implementation for Azure has not yet been documented, and (2) this addition causes a backward compatibility issue in Pandas.
Pandas' read_parquet() and to_parquet() with abfs:// have stopped working in specific cases since PyArrow 16.0.0 due to the addition of AzureFileSystem in PyArrow.
Pandas implement a logic that first tries to get a PyArrow native file system implementation for a given URL and then falls back to fsspec if PyArrow does not have a native implementation for the URL.
Due to this fallback logic, Pandas's read_parquet() and to_parquet() always use fssepc with PyArrow before 16.0.0.
With PyArrow 16.0.0, Pandas automatically uses PyArrow's native AzureFileSystem. However, this AzureFileSystem does not use authentication settings set in fsspec's global configuration. Instead, we must explicitly provide an authentication setting to read_parquet() and to_parquet() as storage_options independently of fsspec.
We need to figure out where and how we should document this backward compatibility issue.
Component(s)
Python
The text was updated successfully, but these errors were encountered:
Describe the enhancement requested
#39968, which is included in version 16.0.0 release, added a Python binding of C++
AzureFileSystem
to the PyArrow API.However, (1) this addition of the native file system implementation for Azure has not yet been documented, and (2) this addition causes a backward compatibility issue in Pandas.
Documentation of the API and usage
We should document
AzureFileSystem
in the API reference and its usage in the user guide.Note about the backward compatibility in Pandas
Pandas'
read_parquet()
andto_parquet()
withabfs://
have stopped working in specific cases since PyArrow 16.0.0 due to the addition ofAzureFileSystem
in PyArrow.Pandas implement a logic that first tries to get a PyArrow native file system implementation for a given URL and then falls back to fsspec if PyArrow does not have a native implementation for the URL.
https://github.com/pandas-dev/pandas/blob/v2.2.2/pandas/io/parquet.py#L116-L124
Due to this fallback logic, Pandas's
read_parquet()
andto_parquet()
always use fssepc with PyArrow before 16.0.0.With PyArrow 16.0.0, Pandas automatically uses PyArrow's native
AzureFileSystem
. However, thisAzureFileSystem
does not use authentication settings set in fsspec's global configuration. Instead, we must explicitly provide an authentication setting toread_parquet()
andto_parquet()
asstorage_options
independently of fsspec.We need to figure out where and how we should document this backward compatibility issue.
Component(s)
Python
The text was updated successfully, but these errors were encountered: