Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial CSC/ CSR classes #442

Merged
merged 11 commits into from
Mar 17, 2021
Merged

Conversation

ivirshup
Copy link
Contributor

@ivirshup ivirshup commented Mar 15, 2021

As discussed in #433. The goal of this PR is primarily to get skeletons for these classes that work, while fast implementations of indices and broadcasting can be added in subsequent PRs.

Open questions

Class structure

Current design: CSC and CSR classes subclass GCXS. Is this a good idea, or should a non-user visible super class be added? It may be strange that:

gcxs = sparse.random((50, 30), .2, format="gcxs")
csc = CSC(gcxs)

assert isinstance(csc, GCXS) and isinstance(gcxs, GCXS)
assert not np.array_equal(csc.data, gcxs.data)

Return types

What is the result type of CSR + CSC? What about COO + CSC?

Do we follow scipy convention here?

@hameerabbasi
Copy link
Collaborator

Class structure

Current design: CSC and CSR classes subclass GCXS. Is this a good idea, or should a non-user visible super class be added? It may be strange that:

gcxs = sparse.random((50, 30), .2, format="gcxs")
csc = CSC(gcxs)

assert isinstance(csc, GCXS) and isinstance(gcxs, GCXS)
assert not np.array_equal(csc.data, gcxs.data)

I'm okay with this; if you pass the correct compressed dimensions; the assert will pass.

What is the result type of CSR + CSC? What about COO + CSC?

Do we follow scipy convention here?

I suggest we document that it can return any SparseArray subclass instance without any more concrete promise. This will allow us flexibility later. This is what we currently do for most operations.

@codecov
Copy link

codecov bot commented Mar 15, 2021

Codecov Report

Merging #442 (a486dae) into master (d0a4074) will increase coverage by 0.19%.
The diff coverage is 96.92%.

@@            Coverage Diff             @@
##           master     #442      +/-   ##
==========================================
+ Coverage   95.61%   95.80%   +0.19%     
==========================================
  Files          20       20              
  Lines        2939     3002      +63     
==========================================
+ Hits         2810     2876      +66     
+ Misses        129      126       -3     

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a shallow review, deeper one to come later.

sparse/_compressed/compressed.py Outdated Show resolved Hide resolved
sparse/_compressed/compressed.py Outdated Show resolved Hide resolved
sparse/_compressed/compressed.py Outdated Show resolved Hide resolved
@ivirshup
Copy link
Contributor Author

ivirshup commented Mar 16, 2021

Modify SparseArray.__array_ufunc__ to produce CSR/CSC.

@hameerabbasi, what did you mean exactly by this? This is what I was thinking of with the Return types topic above.

Does this mean modifying all of:

  • __array_function__
  • elemwise
  • and _reduce_calc/ _reduce_return?

I'm wondering if these can be split up into separate PRs, since they touching different parts of the codebase. Right now I'm thinking that dimension preserving operations with CSR/ CSC should generally prefer to return these types.


Could CSR(arg: scipy.sparse.spmatrix) work for conversion? I feel like it's unintuitive that CSR(: Union[ndarray, SparseArray]) works, but construction from spmatrix has to be through a separate method.

(Added in 8f21503)

@hameerabbasi
Copy link
Collaborator

Does this mean modifying all of:

Yes.

I'm wondering if these can be split up into separate PRs, since they touching different parts of the codebase.

Yes, I'm fine with that.

Right now I'm thinking that dimension preserving operations with CSR/ CSC should generally prefer to return these types.

I'd say they should return these types, but I'd prefer not to be locked into that decision. Which is why we return the correct type, but only document we're returning a SparseArray.

Could CSR(arg: scipy.sparse.spmatrix) work for conversion? I feel like it's unintuitive that CSR(: Union[ndarray, SparseArray]) works, but construction from spmatrix has to be through a separate method.

That should be done, yes, at the GCXS level (it will then trickle down to CSR/CSC). COO already does do it. Pretty sure DOK doesn't.

@ivirshup ivirshup marked this pull request as ready for review March 16, 2021 09:27
@ivirshup ivirshup requested a review from hameerabbasi March 16, 2021 09:34
Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general question

Comment on lines 154 to 160

if not isinstance(self, Compressed2d):
self.compressed_axes = (
tuple(compressed_axes)
if isinstance(compressed_axes, Iterable)
else None
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, I don't like it because it makes the superclass depend on the subclass -- That's bad coding practice, usually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise self.compressed_axes would be mutable for Compressed2d classes, since we'd have assignment here.

I'm open to other suggestions. Alternatives I've considered are:

  • Separate __init__ methods – my initial solution
  • setattr(csr. "compressed_axes", x) is a no-op – but it really should be an error
  • Check that the new value is correct – this is adding moving parts to an immutable property. I don't like making changes to the subclasses based purely on the implementation of super classes.
  • Higher level super class. This might be adding too much indirection.

I don't like it because it makes the superclass depend on the subclass

I agree, but think that super classes have to be designed around allowing subclassing. I think this is a perennial python problem (e.g. how easy is it to subclass an ndarray or dataframe?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to go about this is to make it immutable in this PR. Just rename it to _compressed_axes and also add a @property.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another is to just have it mutable for now, and make it immutable later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use composition instead of inheritance and only subclass, SparseArray.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer your question: I don't see anyone relying on subclass behaviour too much, just the fact that it has all the operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the fact that it has all the operations.

Yeah, this is an important part. I would like to think more about the type structure before making any big decisions on it, and would like to get this merged so operations can be defined and test/ benchmark suites can be built. I imagine experience while adding to the classes – as well as getting more eyes on it – will help clarify here.

To that end, what's the minimum fix here? To me it's:

  1. Leave as is
  2. Split common initialization into it's own function

I like 1, since it's the smallest amount of code that allows both behaviors. It also has very little indirection. I'd be happy to implement 2 if that's a better solution to you. Either of these would come with an issue to think about type structure.

I don't see anyone relying on subclass behaviour too much

I think it will come up, especially as typing continues to mature. I bet it's going to come up while writing the dispatch methods.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best fix is to leave it mutable (#442 (comment)) and move mutability to the next PR. I'll mark the issue as a release blocker if immutability is that important to you. GCXS wasn't released, and so we're fine changing that behavior.

As a side note we may be overthinking the immutability, COO is most often used, and it holds even data and coords as attributes. Hardly anyone I've come across modifies them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit where it's kept as immutable, but we don't need the instance check. Just makes sure it's not set to anything other that what it should be.

@ivirshup
Copy link
Contributor Author

CSR(arg: scipy.sparse.spmatrix)

Added in 8f21503

I'd say they should return these types, but I'd prefer not to be locked into that decision

👍. Where would this have been documented? Is it sufficient that elemwise returns a SparseArray?

@hameerabbasi
Copy link
Collaborator

Where would this have been documented? Is it sufficient that elemwise returns a SparseArray?

We don't have to do it in this PR, but the docstring of CSR/CSC classes as well as the getting started page are two possible places to put it.

@hameerabbasi hameerabbasi merged commit 85d5824 into pydata:master Mar 17, 2021
@hameerabbasi
Copy link
Collaborator

This is in, thanks @ivirshup!

@mrocklin
Copy link
Contributor

mrocklin commented Apr 1, 2021

Quick comment from the gallery: this is very fun to see :)

@hammer hammer mentioned this pull request Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants