Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Blockwise SciPy sparse and Arrow Table iterator #1792

Merged
merged 38 commits into from
Nov 3, 2023

Conversation

bkmartinjr
Copy link
Member

@bkmartinjr bkmartinjr commented Oct 16, 2023

Issue and/or context:

Fixes #1503

Changes:

Added blockwise SciPy and Arrow Table iterators for the SparseNDMatrix class. Also usable form from ExperimentAxisQuery.X() method. Example usage:

uri = "s3://cellxgene-data-public/cell-census/2023-10-09/soma/"
context = soma.options.SOMATileDBContext(
    tiledb_config={
        "soma.init_buffer_bytes": 1 * 1024**3,
        "vfs.s3.region": "us-west-2",
    }
)
with soma.open(uri, context=context) as census:
    exp = census["census_data"]["homo_sapiens"]
    with exp.axis_query(
        measurement_name="RNA",
        obs_query=soma.AxisQuery(value_filter="tissue in ['pancreas', 'liver']"),
    ) as query:
        for sp, (obs_joinids, var_joinids) in query.X("raw").blockwise(axis=0).scipy():
            print(repr(sp))

produces:

<65536x60664 sparse matrix of type '<class 'numpy.float32'>'
        with 77510135 stored elements in Compressed Sparse Row format>
<65536x60664 sparse matrix of type '<class 'numpy.float32'>'
        with 72742745 stored elements in Compressed Sparse Row format>
<65536x60664 sparse matrix of type '<class 'numpy.float32'>'
        with 110708131 stored elements in Compressed Sparse Row format>
...

The API is implemented on SparseNDArray (not ExperimentAxisQuery), and can be used directly on the sparse array, e.g.,

with soma.open(an_array_uri) as A:
    for csr_matrix, (obs_joinids, var_jonids) in A.read().blockwise(axis=0).scipy(compress=True):
        ...

In addition to the SciPy blockwise iterator, you can also blockwise iterate over Arrow Tables, with all the same controls over axis, reindexing, step size, etc.

In [6]: for tbl, indices in query.X("raw").blockwise(axis=0).tables(): print(tbl)
pyarrow.Table
soma_dim_0: int64
soma_dim_1: int64
soma_data: float
----
soma_dim_0: [[0,0,0,0,0,...,64270,64271,64976,65177,65263]]
soma_dim_1: [[27,33,43,46,58,...,50593,49918,50593,50593,50593]]
soma_data: [[1,2,17,1,2,...,1,1,1,2,1],[1,3,1,1,2,...,1,1,1,1,1]]
pyarrow.Table
soma_dim_0: int64
soma_dim_1: int64
soma_data: float
----
soma_dim_0: [[0,0,0,0,0,...,50062,50128,50151,50164,50255]]
soma_dim_1: [[4,7,20,21,27,...,30934,30934,30934,30934,30934]]
soma_data: [[1,1,1,1,1,...,1,1,1,1,1]]

The docstrings have more detail, but there are several parameters that control what is emitted by the iterator:

  • compress: bool - if False, create a COO, else create a compressed sparse matrix
  • axis: int - if zero (0), step over the rows and generate a CSR or COO (dependent on compress param). If one (1), generate CSC or COO.
  • size: int - number of coordinates on the blockwise axis, per iterator step

For example, to produce a CSC while stepping column (var)-wise:

A.read().blockwise(axis=1).scipy(compress=True)

Or a COO while stepping row (obs)-wise:

A.read().blockwise(axis=0).scipy(compress=False)

Notes for Reviewer:

  1. Minor: I added a .flake8 config, matching our black config for those of us who run flake8 in our IDEs.
  2. The API works equally well with ExperimentAxisQuery.X, but as this is just a wrapper on he SparseNDArray.read, I implemented the bulk of the tests against SparseNDArray directly.

@codecov-commenter
Copy link

codecov-commenter commented Oct 16, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

see 40 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

@bkmartinjr bkmartinjr changed the title scipy sparse iterator [python] scipy sparse iterator Oct 17, 2023
@thetorpedodog
Copy link
Contributor

Overall shape of the API looks good. I have some implementation-level suggestions but will save that for Actual Review Time.

@bkmartinjr
Copy link
Member Author

implementation-level suggestions

@thetorpedodog - feel free to make them now if low effort for you. I'm happy to keep refining the core implementation while we discuss the API.

@mlin
Copy link
Member

mlin commented Oct 17, 2023

High level API/docs comment: it may be too easily missed that this is the way to page through complete rows/columns of the matrix (unlike the Arrow table iterator right beside it, that can yield ragged chunks). I think a new user would need to read the guts of the docstring to deduce that (axis+step etc.), whereas it may be the key property they're looking for when the need arises.

At the least, the first graf of the docstring could explicate this. I even wonder about splitting the scipy() method into a few others with individually simpler signatures and more-explicit names like scipy_sparse_rows() and so on...

@bkmartinjr
Copy link
Member Author

bkmartinjr commented Oct 17, 2023

it may be too easily missed that this is the way to page through complete rows/columns

IMHO, there is no reason we can't add the same (non-ragged) approach for Table iterators. I just felt that it was a separate feature request, so I deferred it for now. My implementation would facilitate it easily, as there is a stand-alone iterator already implemented which provides exactly this functionality for 2D arrays - just needs to be extended to nD, which I believe is straight forward.

@aaronwolen @pablo-gar - do you guys want to see this enhancement in this PR? I can add, or we can push to future. It would likely look something like:

# generate ragged (current implementation)
A.read().tables()

# generate row-major non-ragged
A. read().tables(axis=0, step=10000)

# generate col-major non-ragged
A.read().tables(axis=1, step=10000)

The same constraints on the compatibility of result_order and axis would exist as with the scipy iterator.


As a late thought on the ragged result topic -- the SOMA read() methods already support ordering via the result_order argument. With this feature, plus a few calls to pyarrow.concat_table, etc, it is fairly straight forward to partition results into non-ragged tables.

The primary reason this PR does not take that approach is for performance - the current TileDB ordered reader implementation exacts a (roughly) 50% read time penalty over unordered. But with a bit of work, the user already has the ability to get non-ragged results.

@bkmartinjr bkmartinjr requested a review from aaronwolen October 17, 2023 22:09
@bkmartinjr
Copy link
Member Author

Regarding:

splitting the scipy() method into a few others with individually simpler signatures and more-explicit names like scipy_sparse_rows() and so on...

I prefer the single signature as the defaults are almost always sufficient (in my experience and past usage). That said, I am very happy to change this if we have an alternative consensus. The other option is to simply add aliases (sugar) which call the base (most flexible) variant.

@pablo-gar
Copy link
Member

pablo-gar commented Oct 18, 2023

Potential issues

I've had some time to digest it. Current potential issues I see:

  • As @mlin mention there may be confusion as to which is "the" way to row/column pagination. Ideally all iterators should have the potential to have row/column pagination, and this is the first example of it. @bkmartinjr mentions doing it for tables() is trivial now.
  • I think having the axis argument of read().scipy() to be some times incompatible on the result_order of .read() is not ideal. Imo there should be only one place that defines the functionality.

Suggestion

I have a suggestion that I think address the two comments above and which I think is future proof:

1. Adding step argument to read() and control stride using result_order

    def read(
        self,
        coords: options.SparseNDCoords = (),
        step: int = None,
        *,
        result_order: options.ResultOrderStr = options.ResultOrder.AUTO,
        batch_size: options.BatchSize = _UNBATCHED,
        partitions: Optional[options.ReadPartitions] = None,
        platform_config: Optional[PlatformConfig] = None,
    ) -> "SparseNDArrayRead":
        """
        Reads a user-defined slice of the :class:`SparseNDArray`.
        Args:
           [...]
           step: 
              When specified and `result_order` is "ROW_MAJOR" or "COLUMN_MAJOR",
              the resulting read iterators will guarantee full row or column pagination, respectively.
              When `None`, the resulting iterators produce ragged results per chunk.  
        [...]
        """

2. Removing step and axis arguments from .read().scipy()

  • Always returns coo if compress = False
  • Returns csr or csc if compress = True and depending in result_order of .read()
    def scipy(
        self,
        compress: bool = True,
        reindex_sparse_axis: bool = True,
    ) -> Iterator[
        Tuple[
            Tuple[npt.NDArray[np.int64], npt.NDArray[np.int64]],
            Union[
                scipy.sparse.csr_matrix,
                scipy.sparse.csc_matrix,
                scipy.sparse.coo_matrix,
            ],
        ]
    ]

3. Implement row/column pagination for other iterators: coos(), dense_tensors(), and tables()

Implement later and raise error for now if step is not None in read()

Predicted API functionality

# generate ragged
A.read().coos()
A.read().dense_tensors()
A.read().tables()
A.read().scipy()

A.read(result_order = "ROW_MAJOR").tables()

...

# generate row-major non-ragged,
A. read(result_order = "ROW_MAJOR", step = 1000).scipy() # scipy.csr
A. read(result_order = "ROW_MAJOR", step = 1000).scipy(compress = False) # scipy.coo
A. read(result_order = "ROW_MAJOR", step = 1000).coos() # raise error if not yet implemented
A. read(result_order = "ROW_MAJOR", step = 1000).tables() # raise error if not yet implemented

...

# generate column-major non-ragged,
A. read(result_order = "COLUMN_MAJOR", step = 1000).scipy() # scipy.csc
A. read(result_order = "COLUMN_MAJOR", step = 1000).scipy(compress = False) # scipy.coo
A. read(result_order = "COLUMN_MAJOR", step = 1000).coos() # raise error if not yet implemented
A. read(result_order = "COLUMN_MAJOR", step = 1000).tables() # raise error if not yet implemented

...

# Incompatible, raise error
A. read(result_order = "AUTO", step = 1000)

@atolopko-czi
Copy link
Member

atolopko-czi commented Oct 18, 2023

I prefer the step arg in read() as well, since step is more about the mechanics of the read operations, rather than the data type that is being requested. However, we already have the batch_size arg. Can we just use that now (it's been unimplemented)?

While the compress and axis args lead to a "cleaner", minimal set of format-related read methods (i.e., just coos, tables, scipy), it's common in other libraries to have explicit methods for each of csr, csc, and coo. As an API reader, I would be looking such method names, and it would take slightly longer to understand how compress and axis map to these types. So I would prefer just having explicit scipy_{csr,csc,coo} methods. While I don't feel too strongly about this, thought it worth mentioning. [I now see @mlin had a similar thought]

@bkmartinjr
Copy link
Member Author

There is a lot of confusion here that is (likely) the result of thinking about 2D matrics. Sort order and a step axis are entirely different concepts:

  • Result order is the sort order of points (for those result formats that are otherwise unsorted). Row-order means sort by [dim0, dim1, ..., dimN], and column-order means sort by [dimN, dimN-1, ..., dim0].
  • Axis, when specified, is the single-dimension slice (step) that the iterator will produce. It does NOT imply any sort order (other than this step partition), unless the particular iterator has other sorting semantics (as CSC/CSR do on their compressed dimension).

For 2D scipy, these concepts are quite close. For nD > 2D, and for formats that do not imply any particular point order (e.g., Table), result_order and step axis control entirely different things.

@bkmartinjr
Copy link
Member Author

bkmartinjr commented Oct 18, 2023

it's common in other libraries to have explicit methods for each of csr, csc, and coo.

this is not true in numpy, which is arguably the default (standard) for nd stuff in Python.

Also note that it isn't true in Arrow. And Pandas is mostly "to_numpy", not "to_numpy_ndarray"/"to_numpy_matrix"

@bkmartinjr
Copy link
Member Author

bkmartinjr commented Nov 2, 2023

While test driving the blockwise API, I came across a possible bug

@atolopko-czi - definitely a bug. A corner case I had missed. Fix and unit tests will be pushed shortly.


Update: now fixed by 93d3233

Copy link
Member

@atolopko-czi atolopko-czi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Code structure is solid and code is well tested. API is straight forward to use. A few final nits for your consideration. Approving optimistically, ahead of fix for #1792 (comment). Looking forward to using this!

apis/python/src/tiledbsoma/_sparse_nd_array.py Outdated Show resolved Hide resolved
axis: Union[int, Sequence[int]],
*,
size: Optional[Union[int, Sequence[int]]] = None,
reindex_disable: Optional[Union[int, Sequence[int]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: what if this also accepted a bool where True maps to range(len(axis)) and False maps to None?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a hard no, but I worry that doing so would overload the meaning of this parameter a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice idea, will add. more sugar, more unit tests :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent - but have it implemented. Who wants to decide?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested it because while test driving the API, I mistakenly assumed it was a bool flag. :) After RTFM, it made sense, but it seemed like a minor trap. If we want to keep typing as is, maybe consider renaming the param to something, e.g. reindex_disable_on_axis

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atolopko-czi's reindex_disable_on_axis suggestion seems like good middle ground. The arg name is a bit verbose but its intent is clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see disabling re-indexing a bit more of an advanced functionality, and will also likely happen on one axis frequently. Let's not add more cognitive dissonance with bool option. Re-naming the parameter is a +1.

Thus

  • reindex_disable_on_axis should be the name
  • reindex_disable_on_axis should not take bool.

This is one of these cases where we can easily add the sugar in the future if needed, whereas removing it in the future can get complicated.

apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
@bkmartinjr
Copy link
Member Author

@atolopko-czi - fix for the bug you found is in the branch. See 93d3233

Copy link
Contributor

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like this a lot! A few things which are mostly smaller concerns but it’s looking good.

.flake8 Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Show resolved Hide resolved
_EagerRT = TypeVar("_EagerRT")

def _if_eager(
self, x: Iterator[_EagerRT], _pool: Optional[ThreadPoolExecutor] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pool here is always going to be self.pool, right?

Copy link
Member Author

@bkmartinjr bkmartinjr Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I want the ability to override the self.pool with the context-managed pool, so it is definitely not the same. self.pool is whatever the user provided.

The goal of this is to allow:

  • user provided thread pool, OR
  • internally created one, which is cleaned up by the context exit

the latter is reliant on the generator/context integration

apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
axis: Union[int, Sequence[int]],
*,
size: Optional[Union[int, Sequence[int]]] = None,
reindex_disable: Optional[Union[int, Sequence[int]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a hard no, but I worry that doing so would overload the meaning of this parameter a bit.

apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
Copy link
Member

@aaronwolen aaronwolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic. Docs are comprehensive and super clear. Thanks!

Copy link
Contributor

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor things but overall very good

apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
self.joinids: List[pa.Array] = [
pa.array(
np.concatenate(
list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this list( needed? I would expect numpy to take any iterable you give it (though could be wrong)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your expectations is wrong - it takes a Sequence, not an interable.

demo:

In [4]: np.concatenate( (np.zeros((i,)) for i in range(4)) )
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [4], line 1
----> 1 np.concatenate( (np.zeros((i,)) for i in range(4)) )

File <__array_function__ internals>:200, in concatenate(*args, **kwargs)

TypeError: The first input argument needs to be a sequence

In [5]: np.concatenate( tuple(np.zeros((i,)) for i in range(4)) )
Out[5]: array([0., 0., 0., 0., 0., 0.])

apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
apis/python/src/tiledbsoma/_read_iters.py Outdated Show resolved Hide resolved
@bkmartinjr bkmartinjr merged commit a4fef9f into main Nov 3, 2023
12 checks passed
@bkmartinjr bkmartinjr deleted the bkmartinjr/cs-iter branch November 3, 2023 20:21
github-actions bot pushed a commit that referenced this pull request Nov 3, 2023
* add flake8 config for IDE happiness

* first cut at scipy iter for sparse array

* cleanup

* additional refinement of scipy iterator, plus unit tests

* clean up typing and docstring

* 3.7 support

* refine docstring

* cleanup

* performance work

* fix typo in docstring

* remove debugging print

* PR feedback on docstring

* add prototype stepped table iterator

* add result_order to stepped table iterator

* major revision to proposed blockwise api

* slim down unit test

* initial set of PR review changes

* clean up passing pool to generators

* more refactoring from PR review f/b

* even DRYer

* PR review fixes

* docstring revision based on PR feedback

* fix bug found in PR review

* additional PR review changes

* more PR inspired changes

* comment in response to PR f/b

* more PR f/b

* rename reindex_disable
@johnkerl
Copy link
Member

johnkerl commented Nov 3, 2023

🚀

johnkerl pushed a commit that referenced this pull request Nov 3, 2023
* add flake8 config for IDE happiness

* first cut at scipy iter for sparse array

* cleanup

* additional refinement of scipy iterator, plus unit tests

* clean up typing and docstring

* 3.7 support

* refine docstring

* cleanup

* performance work

* fix typo in docstring

* remove debugging print

* PR feedback on docstring

* add prototype stepped table iterator

* add result_order to stepped table iterator

* major revision to proposed blockwise api

* slim down unit test

* initial set of PR review changes

* clean up passing pool to generators

* more refactoring from PR review f/b

* even DRYer

* PR review fixes

* docstring revision based on PR feedback

* fix bug found in PR review

* additional PR review changes

* more PR inspired changes

* comment in response to PR f/b

* more PR f/b

* rename reindex_disable

Co-authored-by: Bruce Martin <[email protected]>
mojaveazure added a commit that referenced this pull request Jun 14, 2024
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`:
 - `TRUE`: disable re-indexing on all axes
 - `FALSE: re-index on all axes
 - `NA`: re-index only on major axis, disable re-indexing on all axes (default)

`BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python)

`BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances:
 - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"`
 - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"`

`repr` of `"T"` is allowed in all circumstances and continues to be the default

Two new fields are available to blockwise iterators:
 - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed
 - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed

resolves #2671
johnkerl pushed a commit that referenced this pull request Jun 17, 2024
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`:
 - `TRUE`: disable re-indexing on all axes
 - `FALSE: re-index on all axes
 - `NA`: re-index only on major axis, disable re-indexing on all axes (default)

`BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python)

`BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances:
 - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"`
 - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"`

`repr` of `"T"` is allowed in all circumstances and continues to be the default

Two new fields are available to blockwise iterators:
 - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed
 - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed

resolves #2671
github-actions bot pushed a commit that referenced this pull request Jun 17, 2024
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`:
 - `TRUE`: disable re-indexing on all axes
 - `FALSE: re-index on all axes
 - `NA`: re-index only on major axis, disable re-indexing on all axes (default)

`BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python)

`BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances:
 - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"`
 - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"`

`repr` of `"T"` is allowed in all circumstances and continues to be the default

Two new fields are available to blockwise iterators:
 - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed
 - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed

resolves #2671
johnkerl pushed a commit that referenced this pull request Jun 17, 2024
Connect the re-indexer to the blockwise iterator, allowing reads to be re-indexed on-the-fly. This PR parallels #1792 and completes #2152 and #2637; in addition, provides new shorthand for `reindex_disable_on_axis`:
 - `TRUE`: disable re-indexing on all axes
 - `FALSE: re-index on all axes
 - `NA`: re-index only on major axis, disable re-indexing on all axes (default)

`BlockwiseTableReadIter$concat()` and `BlockwiseSparseReadIter$concat()` are disabled when re-indexing is requested (paralleling Python)

`BlockwiseSparseReadIter` now accepts `repr = "R"` or `repr = "C"` under certain circumstances:
 - axis 0 (`soma_dim_0`) must be re-indexed to allow `repr = "R"`
 - axis 1 (`soma_dim_1`) must be re-indexed to allow `repr = "C"`

`repr` of `"T"` is allowed in all circumstances and continues to be the default

Two new fields are available to blockwise iterators:
 - `$axes_to_reindex`: a vector of minor axes slated to be re-indexed
 - `$reindexable`: status indicator stating if _any_ axis (major or minor) is slated to be re-indexed

resolves #2671

Co-authored-by: Paul Hoffman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] CSR/CSC iterator over ExperimentAxisQuery.X
9 participants