-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Array API functions on DataFrame objects #50
Comments
We had (/have) a pretty strong consensus that there should not be a separate Series-like object, but only a DataFrame object with a single column. IIRC the key issue is that Series and DataFrame have so much API duplication, for little benefit. That's not a complete answer though. Your question can be rephrased as something like:
It in principle makes sense to me to have array API functions work on dataframes with a homogeneous dtype (which includes single-column dataframes). I'm not sure there's a good way to pick and choose what functions in the array API make sense. I can imagine a dataframe library providing the whole array API somehow, or to reuse an existing array library that is a dependency. |
When discussing this in the call a few weeks back (and please feel free to correct me), we explored a few options. In the end gravitated towards having some way to share/convert data between DataFrames and Arrays with the idea one could then use Array operations on the converted Array. If the underlying library doesn't have an actually Array, they could just return some object that implements the Array API. Here are some related discussions on this topic ( #25 ) ( #39 ) ( #48 ) cc @jorisvandenbossche (in case I missed anything here 🙂) |
is something analogous to |
The array API standard has
Indeed. |
We collectively changed our minds on this one. I think that came over time with (a) the realization that we're really building a library author-focused API that is quite different from the public APIs in current dataframe libraries, (b) that there are actually things that a one-column dataframe does not do well enough (e.g. we now are introducing unique values function for So, I think we still want the thing here that @jakirkham originally asked for - but on |
In some cases users like to use Array API functions (for example
where
) on DataFrame objects (in particular Series). Is this something that we would like to support in the API? If not, how would we recommend users approach these kinds of problems.For an example of this please see issue ( dask/distributed#5224 )
The text was updated successfully, but these errors were encountered: