Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape5D with floats? #11

Open
k-dominik opened this issue Dec 4, 2020 · 9 comments
Open

Shape5D with floats? #11

k-dominik opened this issue Dec 4, 2020 · 9 comments

Comments

@k-dominik
Copy link
Contributor

k-dominik commented Dec 4, 2020

Shape5D seems to be a shape that has floating point number extents - fine. to_tuple will cast those to integers. Also in some other places it looks weird allowing floating point extents. Take is_scalar for example. How do you even define this with floats? Is something with a c extent of 1.9 still scalar?! And what would this even mean in channel dimension...
Also think of a shape with integers size along x=9.0, is 9 included [probing at 9.0 should be possible] (like it would be in a continuous world) or excluded (in the discrete world [probing not possible at this position]).

What was your motivation? Were you concerned with world coordinates? Then it might be better to allow for a transformation to world coordinates. Where do you need that?

@Tomaz-Vieira
Copy link
Collaborator

I had no intentions of actually allowing non-integers in either Shape5D or Point5DThe motivation for using floats is the ability to use +inf and -inf. You'll see that Shape5D is a subclass of Point5D, and that Point5D's constructor will not allow values with decimal points; all that is allowed is integers,inf or -inf.

This is needed to streamline the implementation of Slice5D; Whenever you create a Slice5D, all unspecified axes will default to slice(None) (i.e., "everything"), which mimic's numpy's slicing behavior. Also, most of Slice5D methods operate on its boundaries, that is, my_slice.start and my_slice.stop, which are themselves Point5Ds.

A slice(None) in any axis of a Slice5D will be interpreted as having start = -inf and stop = +inf. And because of that, methods such as contains, clamped, enlarged, translated, etc, don't have special cases; inf and -inf work just like any other number, and produce correct results (for example, calling Slice5D.enlarge on a Slice5D that has some of its axes set to slice(None) will correctly "move" start from -inf to -inf and stop from +inf to +inf).

Clearly this was not an obvious design decision, though =/ I'd be happy to hear any ideas of more elegantly solving this =)

@k-dominik
Copy link
Contributor Author

k-dominik commented Dec 7, 2020

Hm. I don't think allowing +inf -inf is necessary to represent shapes. I guess it's your way of constructing a shape from start to stop that requires it. However, a shape itself should not have any axis with negative dimensions. What would this mean?

I have seen that Shape5D is a Point5D subclass. Maybe that is not appropriate then. Or Point should be actually of Type T and in case of shape, it's an integer. Open shapes should not be allowed.

This is needed to streamline the implementation of Slice5D; Whenever you create a Slice5D, all unspecified axes will default to slice(None) (i.e., "everything"), which mimic's numpy's slicing behavior. Also, most of Slice5D methods operate on its boundaries, that is, my_slice.start and my_slice.stop, which are themselves Point5Ds.

Why invent something here if the slice(None) behavior is already an established concept? Also in a context of a shape it would only go from 0 to +Size of that thing not -something.... You are using slices internally, which have .start .stop and .step members that can be checked against None. Methods like contains can be realized with this - no problem. slice.start would be the sl.start values of the slices per axis. It's a slice5D. And in case of "None" it's not defined. Since it is not defined. You added this trick of allowing floats here to pretend that it exists. But it shouldn't.

I think it's dangerous to sacrifice clarity of annotations. For me they are counter intuitive in this case.

I see that slice(None) is not defined until used to index something existing. What one can be sure of though, is that it actually is not inf.

@k-dominik
Copy link
Contributor Author

k-dominik commented Dec 7, 2020

And what about having negative start indices (valid slices), e.g. slice(-5, 10, 1). The start of this thing is not -5 in a "Point in space" kind of meaning. It is only finally defined when applied to a shape...

e.g.

sl = ndstructs.Slice5D(t=slice(-1, 10), z=slice(-5, 20), y=slice(-5, 20), x=slice(-5, 20), c=slice(0, 10))
sl.shape
# Shape5D(t:11.0,c:10.0,x:25.0,y:25.0,z:25.0)

This is all valid Python. But the result of sl.shape is wrong. I'd argue that a method like shape would need a parameter of type shape, too. Same goes for start, stop... those are just not defined with a slice.

@k-dominik
Copy link
Contributor Author

Also I'd argue that Slice5D.split and it's friend .tiles should exist on a shape, not on a Slice5D. (and, maybe different story, this should be a single inter_blockwise method, with an appropriate parameter to handle boundaries.

@Tomaz-Vieira
Copy link
Collaborator

As I read further down your comments, I realized that we were operating under different assumptions/semantics. So here's what Slice5D really represents:

  • A Slice5D defines a region/interval in 5D space. You can use that slice to cut an Array5D or DataSource, and you'll get back the elements that were contained in such space as defined by the Slice5D;
    • because of that, start <= stop must always hold. This is undocumented and probably not checkd in runtime either, which is bad -.-
  • Being a region in space, start and stop really mean "Points in space", even for negative values. E.g.: Slice5D(x=slice(-5, 10)) actually means "the 5D space between x=-5 and x=10" and not the usual python meaning of counting indices from the end of the array;
  • Whenever one of the axis of the Slice5D is a slice(None) I interpret it to mean "This Slice5D encompasses the entirety of space in this dimension", hence the usage of inf;
    • As a consequence of that, Slice5D(x=slice(None)).contains( Slice5D(x=slice(-99999, 99999)) )
  • By defining a region in space, a Slice5D naturally has a shape. Such shape can have components that are inf if the Slice5D is unbounded in that axis;

One usage of this concept that justifies this interpretion of Slice5D is requesting tiles from a DataSource and using a halo:

my_roi = Slice5D(x=slice(0, 10), y=slice(0, 10))
roi_with_halo = my_roi.enlarged(radius=Point5d(x=5, y=5)) # adds halo, 
assert roi_with_halo == Slice5D(x=slice(-5, 15), y=slice(-5, 15)) # roi_with_halo goes out of bounds for a tile on the edge

# my_datasource will decide how to fill in the out-of-bounds roi; reflect, zeros, whatever
data = my_datasource.retrieve(roi_with_halo) 
assert data.shape == roi_with_halo == Shape5D(x=20, y=20)

@k-dominik
Copy link
Contributor Author

* A `Slice5D` defines a region/interval in 5D space. You can use that slice to cut an `Array5D` or `DataSource`, and you'll get back the elements that were contained in such space as defined by the `Slice5D`;

Maybe a different name should be considered then, something like Block5D (or as you suggested Interval5D). Slice has a meaning in Python. It doesn't have a shape, negative values mean something different than negative coordinates in space and so on.
I think the code example is already showing it. Slice5D(x=slice(-5, 15), y=slice(-5, 15)) is something that has not the intuitive meaning.

I'd argue against allowing undefined instances of this thing and rather have e.g. Shape5D be able to produce those for it's given, valid, shape. Or only an Indexable5D or something.

Slice5D should behave like tagged slices, no automagic. this should all be handled by the objects that are sliced, not the slice itself.

@k-dominik
Copy link
Contributor Author

I'm not saying that the current implementation couldn't actually work for what I want to do :). However, I'm reluctant to use it.

I'll try to summarize this a bit:

  • This lib makes extensive use of type annotations throughout this code base, which is great. These things are supposed to help users of this library though, and while it is not the intention to use floats in there, the type annotations tell you a different story.
  • start and stop can be represented by points in an 5D space. Then you will have to define once what that means, exactly. Are start/stop inclusive, exclusive? Does this block magically cut off one unit in every direction earlier? What is the volume of this block?
  • Usage of floats allows to use +inf -inf to represent open intervals in terms of Points in a 5D space. So I guess Array5D will then check if the starting point is bigger than negative inf, and in this case will do some magic to do padding, if it's -inf it will not do magic and return only the bit of data starting from zero. So:
sl1 = Slice5D.zero(x=slice(None))
sl2  = Slice5D.zero(*x=slice(-9999, 9999))
# so
sl1.contains(sl2) == True
# but the data returned by indexing with sl1 will not contain
# the data that is retrieved with indexing with sl2
  • The slices can be used for negative indexing of, say arrays with different numbers of channels. Right now I cannot think of an operation that would pad channels. So this would be an error then I guess. Or what? However, the slice would still be valid and even have a shape if I say Slice5D.zero(c=slice(-100, 1)). As a remedy there is clampedanddefined_with`. But why does the slice tell me how to be interpreted for a certain valid datasource/array only taking it's shape, shouldn't this be the responsibility of the array? What if the array doesn't allow padding at all? What if it does?
  • shape returns a shape in terms of floating point numbers. With an open interval it throws an assertion error. So did I use this class correctly? Was this object actually valid, or did the constructor allow me to instantiate a "broken" object where a property does not work.

How to make this more usable for Python developers?

A solution, should:

  1. not interpret Python slices in any way different than Python would do.
  2. Communicate clearly that it is using extended array coordinates. Extended in a sense that negative indices mean something completely different than negative indices in any other listlike/ndarray.
  3. Really not call it slice if it does not behave like one.
  4. Accept the fact that a box that allows for open intervals can not define a shape, but only in conjunction with something that is indexed the final shape can be determined. Open intervals are extremely useful, so they should stay.
  5. More than just the shape of a potential datasource / array determine the actual shape of a "box" once it's used for indexing. Can the array pad certain axes? and so on.
  6. any iteration over subparts can only happen once it is clear what the total extent is, so should either not be responsibility of 5DBox, but array/datasource, or should need one of those in order to deliver such an iterator. This of course is inconvenient, if one only wants to use the slices in conjunction with, let's say, a numpy array. But then again, should this be allowed, since negative indices are treated completely different?
  7. communicate to the user clearly via type annotations what they are allowed and expected to put into such an object.

@Tomaz-Vieira
Copy link
Collaborator

Tomaz-Vieira commented Dec 7, 2020

start and stop can be represented by points in an 5D space.

Yes. This is supremely convenient for expressing the halos thing. There are no edge cases if we treat negative coordinates as points in space.

Then you will have to define once what that means, exactly. Are start/stop inclusive, exclusive?

Start inclusive, stop exclusive. That's why I was using slice, to leverage those conventions.

Does this block magically cut off one unit in every direction earlier? What is the volume of this block

What do you mean by "cut off one unit in every direction"? I also don't see an issue with defining a volume/hypervolume/whatever. I do have methods for that, but I could remove them and just let the users multiply whatever axes they are interested in.

With an open interval [.shape] throws an assertion error. So did I use this class correctly? Was this object actually valid, or did the constructor allow me to instantiate a "broken" object where a property does not work.

You got me there =) This is also something that deserves more thought. I also don't like the fact that the shape property breaks on open intervals. When I wrote it, I wasn't allowing a Shape5D to have inf axes, but I could do it now. More on that later.

sl1 = Slice5D.zero(x=slice(None))
sl2 = Slice5D.zero(*x=slice(-9999, 9999))
# so
sl1.contains(sl2) == True
# but the data returned by indexing with sl1 will not contain
# the data that is retrieved with indexing with sl2

Also a very good point 👍 It is indeed super weird that sl1.contains(sl2) but produces a smaller Array5D once it slices a DataSource. I'm not sure if there's an elegant way to solve this.

Open intervals are extremely useful, so they should stay.

Agreed. I have once considered differentiating slices into something like OpenSlice5D and ClosedSlice5D, so maybe that idea is worth revisiting. I have found that open intervals are extremely convenient to have, but they do come with all those shortcomings =/

  1. not interpret Python slices in any way different than Python would do.
  2. Communicate clearly that it is using extended array coordinates. Extended in a sense that negative indices mean something completely different than negative indices in any other listlike/ndarray.
  3. Really not call it slice if it does not behave like one.

Also agree with that.

Accept the fact that a box that allows for open intervals can not define a shape

Why not?. Why wouldn't the shape of Slice5D(x=slice(15, 25)) be Shape5D(x=10, y=inf, z=inf, t=inf, c=inf) or Shape5D(x=10, y=None, z=None, t=None, c=None)? I guess it's questionable if a Shape5D with infinite (or undefined) values in some axis is useful, but it is at least consistent. Or maybe we're just arguing semantics again and this would sound better if it were called Domain5D or Extent5D or something like that.

communicate to the user clearly via type annotations what they are allowed and expected to put into such an object.

I really like obsessing with type annotations, but there is only so much one can do with those, specially if your types are numbers or some other very fundamental concepts; For example, you're not supposed to index a list with numbers bigger than its length, but there is no reasonable way to express that as a type, nor to statically check it if you did; list.__get_item__ takes a slice, but it has no way to communicate what kinds of slice will work with it. In a sense, we could say that ints are broken because you can't call the "divide" method with the parameter 0, which is also a valid int.

Similarly, say that I replace all slices with a custom Interval(int, int) class; This class is still annotated with 2 ints, but the user must set the second one to something greater than the first one, and that also can't be specified by the type system. That being said, I think we could get away with Optional[int], as mentioned above.

@k-dominik
Copy link
Contributor Author

k-dominik commented Dec 8, 2020

Only clarifying where you asked more questions:

Start inclusive, stop exclusive. That's why I was using slice, to leverage those conventions.

this is basically the only property of slice you (can) use. No strides, negative indices have a different meaning, no inf....

Does this block magically cut off one unit in every direction earlier? What is the volume of this block

What do you mean by "cut off one unit in every direction"? I also don't see an issue with defining a volume/hypervolume/whatever. I do have methods for that, but I could remove them and just let the users multiply whatever axes they are interested in.

The problem is, if I have a region defined by two points, floating point numbers, then this first of all does not imply any natural increment. Furthermore, a rectangle that starts at (0., 0.), ends at (10., 10.) intuitively should have 100 as an area, with your definition, and the magic decrement of one (in floats) has an "area" of 81 (or 81 elements).

really like obsessing with type annotations, but there is only so much one can do with those, specially if your types are numbers or some other very fundamental concepts;

but types. It's really number types that are wrong and set the wrong expectations there.

What is so bad about a coordinate being None for that matter? Implementation then would explicitly handle this case and would send this message - None is special. It is not -inf nor +inf - it's not defined. Also for axes like channel -inf does not make the a lot of sense in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants