Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tighten alignment promises for halide_malloc() #7222

Merged
merged 1 commit into from
Dec 11, 2022

Conversation

steven-johnson
Copy link
Contributor

This makes a couple of changes to the behavior/implementation of halide_malloc():

  • Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults).

  • Currently, the halide_malloc_alignment() function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.)

This makes a couple of changes to the behavior/implementation of `halide_malloc()`:

* Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults).

* Currently, the `halide_malloc_alignment()` function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.)
@steven-johnson
Copy link
Contributor Author

This PR looks clean & ready to land, PTAL

@zvookin
Copy link
Member

zvookin commented Dec 9, 2022

Is the aligned length actually a requirement we expect in the compiled code?

@steven-johnson
Copy link
Contributor Author

Is the aligned length actually a requirement we expect in the compiled code?

In Halide code, no, not yet. But this should enable us to do so down the road.

* end.
* vector loads and stores, *and* with an allocated size that is (at least)
* an integral multiple of that same alignment. The default implementation
* uses 32-byte alignment on arm and 64-byte alignment on x86. Additionally,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided this "additionally" sentence wasn't necessary now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have evidence to the contrary; I have at least one test case were we access 4 bytes before the allocation start. (Your pending work should fix this, but until it lands, the comment is still accurate.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My conclusion from examining CodeGen_LLVM's load handling was that the offset downwards could only every improve alignment relative to the allocation base, so underreads were impossible. It's possible that the underread you found is a different bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or my reading was wrong - it's tricky code. One way to check is to see if my strided load branch heals the underread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go back and doublecheck, but we are definitely underreading in this particular case (verified by ASAN); whether or not its an unrelated bug, we want to fix it. (Not sure how hard a repro case will be....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's tricky code

All the more reason to get rid of it :-)

@steven-johnson steven-johnson merged commit 6ecdcbd into main Dec 11, 2022
@steven-johnson steven-johnson deleted the srj/aligned-malloc-3 branch December 11, 2022 18:05
ardier pushed a commit to ardier/Halide-mutation that referenced this pull request Mar 3, 2024
This makes a couple of changes to the behavior/implementation of `halide_malloc()`:

* Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults).

* Currently, the `halide_malloc_alignment()` function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants