Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: High-level functions that can convert between vector types #373

Open
nh2 opened this issue Mar 27, 2021 · 7 comments
Open

Discussion: High-level functions that can convert between vector types #373

nh2 opened this issue Mar 27, 2021 · 7 comments

Comments

@nh2
Copy link
Member

nh2 commented Mar 27, 2021

Edit: Issue description of the underlying problem #373 (comment)

Below is the original question that led to it:


I often work with indices that I'd like to store in Unboxed.Vector Int vectors, and Generic.Vector vector a vectors into which those indices point.

At some point I have to resolve the indices back into the original generic vector type. I would like to do:

import qualified Data.Vector.Generic as VG
import qualified Data.Vector.Unboxed as VU

myIndexVector :: VU.Vector Int

atIndices :: (VG.Vector vector a)  => vector a -> VU.Vector Int -> vector a
atIndices vals indices = VG.map (vals VG.!) indices

(This is basically the same as myVector[indicesVector] does in Python's numpy.)

However in the above, VG.map (xs VG.!) indices does not typecheck, beause map requires that the input vector type be the same as the output vector type.

Thus it would be awesome to have a convertingMap function that combines map with convert, of type:

convertingMap :: (Vector v a, Vector w b) => (a -> b) -> v a -> w b

Edit: Solution for this specific function due to @lehins in #373 (comment).


Alternatives considered:

  • convertingMap vals f = VG.map f (VG.convert vals) or VG.convert (VG.map f vals) isn't great, because it requires that v and w can store the same elements, which sometimes isn't the case, e.g. I can have a data D1 that only works with Unboxed and data D2 that only works with Storable.
  • I currently use:
    convertingVectorMap :: (VG.Vector vectorA a, VG.Vector vectorB b) => (a -> b) -> vectorA a -> vectorB b
    convertingVectorMap f va =
      VG.generate (VG.length va) $ \i -> f (va VG.! i)
    but I don't know if that's optimal stream-fusion-wise.
@lehins
Copy link
Contributor

lehins commented Mar 27, 2021

convertingMap is out of the question, because then we will have to create convertingZipWith, convertingUpdate ... the list will go on forever.

The whole point of convert is that it should work well with fusion, so I think this is the proper way to approach this:

atIndices :: (VG.Vector vector a)  => vector a -> VU.Vector Int -> vector a
atIndices xs = VG.map (xs VG.!) . convert

I am pretty sure convert and map will get fused, so there is no need to worry about an extra allocation for vector with indices.
@nh2 Is this what you had in mind?

PS. I wasn't 100% sure because the question has vals vs xs, which seemed like a typo, since there are no visible xs in scope

@nh2
Copy link
Member Author

nh2 commented Mar 27, 2021

PS. I wasn't 100% sure because the question has vals vs xs, which seemed like a typo, since there are no visible xs in scope

Oops right, it was a paste error, I've fixed it now.

Is this what you had in mind?

When specialising to VU.Vector Int, it does work, but the problem of

  • because it requires that v and w can store the same elements, which sometimes isn't the case, e.g. I can have a data D1 that only works with Unboxed and data D2 that only works with Storable.

still exists. In my code base this happens quite a bit, because for some types Storable is easier / more performant, and for others Unboxed.

For example, I cannot use the map . convert approach to implement my own convertingMap that way:

convertingVectorMap :: (VG.Vector vectorA a, VG.Vector vectorB b) => (a -> b) -> vectorA a -> vectorB b
convertingVectorMap f va =
  VG.map f (VG.convert va)

This does not typecheck as expected:

    • Could not deduce (VG.Vector vectorB a) arising from a use of ‘VG.map’

because vectorB cannot store as, and vectorA cannot store bs.

@nh2
Copy link
Member Author

nh2 commented Mar 27, 2021

Just to provide a more concrete example, I may need to convert between e.g. a Storable.Vector OpenCVPixel and an Unboxed.Vector SomeLibraryPixelType.

@lehins
Copy link
Contributor

lehins commented Mar 27, 2021

I see what you mean and what the actual problem is. Definition of map that works with fusion that you need is this:

map :: (Vector v a, Vector v' b) => (a -> b) -> v a -> v' b
{-# INLINE map #-}
map f = unstream . inplace (S.map f) id . Bundle.reVector . stream

Issue is that the Bundle, which is used for fusion, can actually contain a vector in sVector, so without changing it's contents we can't convert a vector type.

We could potentially allow all function in Generic vector to change the type of vector from input to output by using reVector , but that would mean that sVector in Bundle becomes totally useless, which is't very useful already: (see #269) This would be a drastic change that needs some weighing of pros and cons.

One way or another we aren't adding a whole bunch of new functions like convertingMap, that is just way too much overhead. @nh2 but you can already use the map as specified above, since all those functions are available from vector package.

FYI, the suggested solution VG.generate (VG.length va) $ \i -> f (va VG.! i) doesn't work with fusion at all.

@nh2
Copy link
Member Author

nh2 commented Mar 28, 2021

you can already use the map as specified above

That looks great, thanks!

Easy full copy-paste for myself and others:

import qualified Data.Vector.Generic as VG
import qualified Data.Vector.Fusion.Bundle as Bundle
import qualified Data.Vector.Fusion.Stream.Monadic as Stream

-- | Like `VG.map`, but can convert to another vector type.
convertingVectorMap :: (VG.Vector vectorA a, VG.Vector vectorB b) => (a -> b) -> vectorA a -> vectorB b
convertingVectorMap f =
  -- Implemetation from https://github.com/haskell/vector/issues/373#issuecomment-808819064
  VG.unstream . Bundle.inplace (Stream.map f) id . Bundle.reVector . VG.stream
{-# INLINE convertingVectorMap #-}

FYI, the suggested solution VG.generate (VG.length va) $ \i -> f (va VG.! i) doesn't work with fusion at all.

Thanks, that's what I feared!

We could potentially allow all function in Generic vector to change the type of vector from input to output [..], but that would mean that sVector in Bundle becomes totally useless

It would also be backwards-incompatible because while the current functions are certainly correctly expressed by the vector-type-changing ones, the more general type signatures make for looser type inference, creating free type variables that the user now has to pin down.

I wonder if it would make sense to make an experimental Data.Vector.Generic.Very (the name is of course a joke, maybe there's a better one) that uses fully general input/output vector types.

I'll rename the issue to turn it from a feature request for that specific function, into a general discussion of the vector-converting-functions idea.

@nh2 nh2 changed the title Add map function that can convert between vector types Discussion: High-level functions that can convert between vector types Mar 28, 2021
@lehins
Copy link
Contributor

lehins commented Mar 28, 2021

I'll rename the issue to turn it from a feature request for that specific function, into a general discussion of the vector-converting-functions idea.

Cool, I was about to open a new issue for this. Here is my summary:

All functions in Data.Vector.Generic restrict to the same vector type, eg. map:

map :: (Vector v a, Vector v b) => (a -> b) -> v a -> v b

Note that the v is the same in input and output. This can be problematic and a real life example was reported by @nh2 in this ticket

One would think that convert could help use here, as I mistakenly assumed, it is not always the case (see #373 (comment))

It would be useful to provide this ability of changing the type of a vector from input to output, i.e.:

map :: (Vector v1 a, Vector v2 b) => (a -> b) -> v1 a -> v2 b

But it seems like that requires removal of sVector from Bundle. There is some discussion about this topic already in #269

So, this question is for @Bodigrim, @Shimuuar and anyone else that might have an opinion on this topic. Is this a worthwhile goal to pursue and are the benefits worth it?

Thus far I see pros:

  • This really only affects the generic vector API, so breakage would not be big
  • Solves the problem for types that might not have both instances, eg Storable, but not Unbox (essentially fixes teh original problem reported int this ticket)

Cons:

  • Type inference Generic vector API suffers significantly VG.map f . VG.map g no longer composes without specifying intermediate vector type, thus quite a bit of breakage for Generic vector API
  • We have to loose sVector, which might come in handy, but for now it is useless
  • Requires a lot of work.

Alternative solution would be to mirror Generic API in a separate module that implements this idea without dropping sVector and using reVector everywhere instead. (I'd call it Data.Vector.Generic.Poly) This is what @nh2 suggested in the previous comment:

I wonder if it would make sense to make an experimental Data.Vector.Generic.Very (the name is of course a joke, maybe there's a better one) that uses fully general input/output vector types.

@Shimuuar
Copy link
Contributor

I think that making functions in Generic type-changing is very-very breaking change. It makes it impossible to write functions that are polymorphic in vector type. Since type inference is of no help you'll have to annotate everything.

On related note. Currently we can leak implementation details in API. For example:

foo :: (Vector v A, Vector v B, Vector v C)   v A  v C
foo = VG.map (h :: B  C) . VG.filter p . VG.map (f :: A  B)

vector of B should fuse away though it may not so we do need dictionary for Vector v B even though existence of B could be internal implementation detail. Maybe we could add "vector" type that work with any element type and guarantees fusion? Some thin wrapper over Stream?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants