Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient getindex implementations for subsetting AcceleratedArrays #4

Open
kcajf opened this issue Sep 13, 2019 · 5 comments
Open

Comments

@kcajf
Copy link
Contributor

kcajf commented Sep 13, 2019

Hi @andyferris - this project looks great, and is something I am considering extending / building on. I already have some hacky implementations of something similar, but not as nicely integrate and as general.

What are you thoughts on adding a family of getindex(A::AcceleratedVector, idx::AbstractVector{Int}) methods? I.e. indexing into AcceleratedVectors with integer StepRanges, integer Vectors, etc, and returning AcceleratedVectors of the same type.

For SortIndex, this would be quite easy. When indexing with StepRanges, we would have to check the step is positive. For indexing with arrays, we would have to check the array is sorted first. Since we would be constructing SortIndexes from known-sorted arrays, it would be good to have a SortIndex constructor that doesn't do any checks. This might also be useful generally.

For HashIndex, the hash table would have to be modified, but there are likely lots of optimisations / shortcuts to minimise the work. For example, when indexing with UnitRange{Int}s, we could add to a global integer offset that is subtracted from the values in the Dict when they are accessed. Similar to the above, we might want a more direct HashIndex constructor that accepts a pre-built dictionary & offset, etc.

I'd be happy to do some initial work on this after hearing your thoughts.

@andyferris
Copy link
Owner

Yes, that sounds great - the other one I’be thought of is the way indexing a sorted array with a sorted array is also sorted (and we also know that ranges are naturally sorted...). So indexing can basically “preserve” SortIndex (as well as uniqueness).

When I think of hashes - I think it might be simplest to just drop them (except for maybe uniqueness?). I’m trying to think of compelling end-use cases.

@andyferris
Copy link
Owner

(To say it differently - indexing is definitely in the “roadmap” as it is in my head but it’s been a matter of finding time for me to contribute to this project - and help is always appreciated!)

@kcajf
Copy link
Contributor Author

kcajf commented Sep 14, 2019

When I think of hashes - I think it might be simplest to just drop them (except for maybe uniqueness?). I’m trying to think of compelling end-use cases.

What do you mean by this?

@andyferris
Copy link
Owner

I just mean - it seems quite complicated when indexing to preserve the hash table for some (possibly repeated?) subset. I also think it could be slow - if you want to shrink the size of the hash table you need to recompute hashes etc (slow for large selections) and for small selections the overhead of the hash table is large while not providing any acceleration benefits. It seems it might be best to leave the decision in the hand of the user, who can always accelerate the result when it makes sense for their use case.

@ivirshup
Copy link

Would it be reasonable to encode propagation through getindex in it's type? Something like HashIndex{Propagating}?

My use-case is using an accelerated array for the dimensions of a DimensionalArray (from DimensionalData). I'd like to be able to subset a DimensionalArray, and have the result still have accelerated indices. I currently don't see a way to enforce this without some type-piracy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants