You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is a very common pattern to iterate of an X layer slice by rows, where the slice is specified by an ExperimentAxisQuery, and the consumer wants the slices in a CSR/CSC format for computation (e.g., scipy.csr_matrix). The current API only provides COO iteration and does not provide direct reindexing support of joinids.
Propose an iterator API that:
reads slices of a given size
reindexes the joinids
returns for each step the joinids and reindexed sparse matrix
In Python type sigs, a row-based iterator method on ExperimentAxisQuery might look like:
_RT=Tuple[Tuple[npt.NDArray[np.int64], npt.NDArray[np.int64]], sparse.spmatrix]
defX_sparse_iter(
self: soma.ExperimentAxisQuery,
X_name: str="raw", # the X layer to readrow_stride: int=2**16, # row stride for each stepfmt: Literal["csr", "csc"] ="csr", # the resulting sparse format
) ->Iterator[_RT]
Example usage:
withexperiment.axis_query(...) asquery:
for (obs_joinids, var_join_ids), X_chunkinquery.X_sparse_iter(X_name="raw"):
...
In this case, obs_joinids[i] and var_joinids[j] corresponds to X[i,j].
I have a fully functional and working prototype implementation available here. Important:
The prototype does not utilize the ExperimentAxisQuery fast csr conversion, which would be necessary work for a "real" implementation.
The prototype does lazy multi-threaded pipelining of the steps, which is essential for performance on the queries we typical do on the Census.
There is a notebook in the same directory that shows example usage.
The text was updated successfully, but these errors were encountered:
johnkerl
changed the title
[Feature request] CSR/CSC iterator over ExperimentAxisQuery.X
[Feature request] CSR/CSC iterator over ExperimentAxisQuery.XJun 26, 2023
It is a very common pattern to iterate of an X layer slice by rows, where the slice is specified by an ExperimentAxisQuery, and the consumer wants the slices in a CSR/CSC format for computation (e.g., scipy.csr_matrix). The current API only provides COO iteration and does not provide direct reindexing support of joinids.
Propose an iterator API that:
In Python type sigs, a row-based iterator method on ExperimentAxisQuery might look like:
Example usage:
In this case, obs_joinids[i] and var_joinids[j] corresponds to X[i,j].
I have a fully functional and working prototype implementation available here. Important:
There is a notebook in the same directory that shows example usage.
The text was updated successfully, but these errors were encountered: