Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Python buffer handling #7125

Merged
merged 8 commits into from
Oct 28, 2022
Merged

Fix Python buffer handling #7125

merged 8 commits into from
Oct 28, 2022

Conversation

steven-johnson
Copy link
Contributor

In the category of "how did this ever work"...

TL;DR: in general, Halide Buffers have the opposite axis ordering from Python/NumPy buffers; in Halide, the most-frequently-varying dimension comes first, while in Python, it comes last. This isn't surprising, though, since Halide's indexing scheme is effectively column-major while NumPy's is row-major.

Anyway: what we should have done was to reverse the order of dimensions when converting to/from Halide Buffers vs Python buffers; instead, we kept the same order, then jumped thru hoops to rearrange buffers to fit this setup. This PR does the appropriate axis reordering, fixing the apps and tests as needed.

It also adds some helper code for image reading and writing; by default, we use imageio for this, but imageio ~always wants RGB/RGBA images to be interleaved (vs the planar that Halide prefers). So, I added the halide.imageio package, that has wrapper functions to quietly convert to/from planar as needed.

Needless to say, this change is likely to break existing code that is using 3d buffers in Halide, but I think it's the right long-term thing to do.

Opinions greatly welcomed here.

In the category of "how did this ever work"...

TL;DR: in general, Halide Buffers have the opposite axis ordering from Python/NumPy buffers; in Halide, the most-frequently-varying dimension comes first, while in Python, it comes last. This isn't surprising, though, since Halide's indexing scheme is effectively column-major while NumPy's is row-major.

Anyway: what we *should* have done was to reverse the order of dimensions when converting to/from Halide Buffers vs Python buffers; instead, we kept the same order, then jumped thru hoops to rearrange buffers to fit this setup. This PR does the appropriate axis reordering, fixing the apps and tests as needed.

It also adds some helper code for image reading and writing; by default, we use `imageio` for this, but imageio ~always wants RGB/RGBA images to be interleaved (vs the planar that Halide prefers). So, I added the `halide.imageio` package, that has wrapper functions to quietly convert to/from planar as needed.

Needless to say, this change is likely to break existing code that is using 3d buffers in Halide, but I think it's the right long-term thing to do.

Opinions greatly welcomed here.
@steven-johnson steven-johnson added backport me This change should be backported to release versions release_notes For changes that may warrant a note in README for official releases. python Issues related to Halide/Python interop labels Oct 26, 2022
for (int i = 0; i < d; i++) {
// Halide's default indexing convention is col-major (the most rapidly varying index comes first);
// Numpy's default is row-major (most rapidly varying comes last).
// We want to reverse the order so that most-varying comes first, for better vectorization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove "for better vectorization" here and below.

Yes, we want to vectorize across the most-rapidly-varying dimension, which is the first in Halide's convention, but we could vectorize across the most rapidly varying dimension no matter where it showed up in the arg list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@abadams
Copy link
Member

abadams commented Oct 26, 2022

This makes sense to me. Halide is by convention col-major, and numpy is by convention row-major, though both support other layouts. By default you would expect my_halide_func[x, y] == my_halide_buffer[x, y] == my_numpy_array[y, x]

This makes it in keeping with the C backend, for which buffer(x, y) == c_array[y][x]

Would be curious to hear from @jiawen as an outside sanity check if he has a minute to weigh in.

@jiawen
Copy link
Contributor

jiawen commented Oct 26, 2022

I have similar code at Adobe that also uses Pybind11 - but it's for @dsharlet's array instead of Halide buffers. I'll get around to open sourcing it at some point.

The API I ended up deciding on is split into two parts: generic ndarrays, and images, which have special treatment.

For ndarrays:
On the PyBind side, to construct a Halide::Runtime::Buffer, I pass in a py::buffer (or py::array_ for the strongly-typed version) along with a bool reverse_axes = true. I don't have a separate bool make_copy parameter because for nda has an explicitly non-owning array_ref class. There is a similar bool reverse_axes for constructing instances of np.ndarray.

Here's the abstraction: most Python code does not call this directly. I instead expose a separate Python interface with make_ndarray(src: NativeArrayRef, reverse_axes = True, copy = False). It does what you expect.

The idea is that reversing axes is the common case that preserves the expected strides. In the default allocation in both np and Halide (sans slicing, cropping, etc), the indexing convention reflects those strides (innermost last in Python, innermost first in Halide).

For images:

It's easier to paste the code than describe what it does:

def make_ndarray(im_ref, copy=False):
  """Makes an ndarray that is a view or a copy of an nda.ImageRef's contents.

  Args:
    im_ref: the source "nda.ImageRef". It can be Chunky, Planar, or generic (no
      prefix) and the datatype must be one of the supported datatypes.
    copy: whether the returned ndarray is a view or a copy.

  Returns:
    An ndarray that is a view or a copy of the same memory as `im_ref`.
  """
  im_np = np.array(im_ref, copy=copy)
  return np.transpose(im_np, (1, 2, 0))

im_ref is an opaque "image ref" - a handle to a Halide::Runtime::Buffer in your case. It's indexed in C++ as (x, y, c) as usual but there is no Python API for it. To access the bytes, call image.make_ndarray on it to get a view or a copy of the data. Since you're calling image.make_ndarray instead of array.make_ndarray, it does the image-specific 3-axis transpose rather than reversing axes.

Effectively, the approach I chose was "when converting, make it so that in the target language, the indexing is using the usual conventions).

if im.ndim == 3 and not is_interleaved(im):
im = numpy.moveaxis(im, 0, 2).copy()

return im
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly recommend returning a copy in either case. Otherwise you'll have odd behavior if the caller relies on the copy behavior but things start mutating when later loading in a different file that's already interleaved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got annoyed enough at our own code to start naming these things interleaved_copy().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise you'll have odd behavior if the caller relies on the copy behavior

Good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL

@steven-johnson
Copy link
Contributor Author

I have similar code at Adobe that also uses Pybind11 - but it's for @dsharlet's array instead of Halide buffers. I'll get around to open sourcing it at some point.

(Threadjack: as I've mentioned in an issue elsewhere, we really need to add support for GPU-backed buffers in our python bindings (perhaps via DLPack?), but it didn't make the cut for Halide 15 because we had to stop somewhere; when work starts on that, def want to hear your thoughts on the matter)

@steven-johnson
Copy link
Contributor Author

@jiawen So, if I understand correctly: your code doesn't do implicit conversions from YourEquivalentOfHalideBuffer into ndarray (or py::_bufferinfo, presumably); instead, you require an explicit conversion using make_ndarray() or similar (which allows for flexibility of conversion)?

If that's the case... I think I like it (would need to experiment more to be sure)... but I suspect it's too late in the game to adopt this for Halide 15. (We intend to release once we are confident the Python-related stuff is stable, and that the Python tutorials / examples are up to date).

That said, I wonder if it would be appropriate to do something like:

  • for the hl.Buffer(pybuffer_info, name) ctor, add reverse_axes = true as an optional trailing arg
  • add an explicit conversion method to hl.Buffer() that offers the reverse_axes and copy arguments as an option. (It probably wouldn't be make_ndarray() as our Python bindings don't currently have a hard dependency on NumPy, but conversion to py::buffer or py::bufferinfo ought to be adequate).

@steven-johnson
Copy link
Contributor Author

Update: I added the reverse_axes option in s separate, dependent PR, so that it would be easier to review; see #7127 and LMK

@steven-johnson
Copy link
Contributor Author

Thanks -- want to review #7127 as well?

* add 'reverse_axes' options to Buffer conversions
@steven-johnson steven-johnson merged commit 2f1587e into main Oct 28, 2022
@steven-johnson steven-johnson deleted the srj/pybuf-axis-ordering branch October 28, 2022 00:18
steven-johnson added a commit that referenced this pull request Oct 28, 2022
* Fix Python buffer handling

In the category of "how did this ever work"...

TL;DR: in general, Halide Buffers have the opposite axis ordering from Python/NumPy buffers; in Halide, the most-frequently-varying dimension comes first, while in Python, it comes last. This isn't surprising, though, since Halide's indexing scheme is effectively column-major while NumPy's is row-major.

Anyway: what we *should* have done was to reverse the order of dimensions when converting to/from Halide Buffers vs Python buffers; instead, we kept the same order, then jumped thru hoops to rearrange buffers to fit this setup. This PR does the appropriate axis reordering, fixing the apps and tests as needed.

It also adds some helper code for image reading and writing; by default, we use `imageio` for this, but imageio ~always wants RGB/RGBA images to be interleaved (vs the planar that Halide prefers). So, I added the `halide.imageio` package, that has wrapper functions to quietly convert to/from planar as needed.

Needless to say, this change is likely to break existing code that is using 3d buffers in Halide, but I think it's the right long-term thing to do.

Opinions greatly welcomed here.

* Update PyBuffer.cpp

* -"for better vectorization"

* public halide.imageio utilities should copy() buffers

* PEP8

* Update imageio.py

* Update imageio.py

* add 'reverse_axes' options to Buffer conversions (#7127)

* add 'reverse_axes' options to Buffer conversions
steven-johnson added a commit that referenced this pull request Oct 28, 2022
* Require Python 3.8+ in CMake build (#7117)
* Add range-checking to Buffer objects in Python (#7128)
* Fix Python buffer handling (#7125)
* add 'reverse_axes' options to Buffer conversions (#7127)
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
* Fix Python buffer handling

In the category of "how did this ever work"...

TL;DR: in general, Halide Buffers have the opposite axis ordering from Python/NumPy buffers; in Halide, the most-frequently-varying dimension comes first, while in Python, it comes last. This isn't surprising, though, since Halide's indexing scheme is effectively column-major while NumPy's is row-major.

Anyway: what we *should* have done was to reverse the order of dimensions when converting to/from Halide Buffers vs Python buffers; instead, we kept the same order, then jumped thru hoops to rearrange buffers to fit this setup. This PR does the appropriate axis reordering, fixing the apps and tests as needed.

It also adds some helper code for image reading and writing; by default, we use `imageio` for this, but imageio ~always wants RGB/RGBA images to be interleaved (vs the planar that Halide prefers). So, I added the `halide.imageio` package, that has wrapper functions to quietly convert to/from planar as needed.

Needless to say, this change is likely to break existing code that is using 3d buffers in Halide, but I think it's the right long-term thing to do.

Opinions greatly welcomed here.

* Update PyBuffer.cpp

* -"for better vectorization"

* public halide.imageio utilities should copy() buffers

* PEP8

* Update imageio.py

* Update imageio.py

* add 'reverse_axes' options to Buffer conversions (halide#7127)

* add 'reverse_axes' options to Buffer conversions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport me This change should be backported to release versions python Issues related to Halide/Python interop release_notes For changes that may warrant a note in README for official releases.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants