-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Enable huge pages in all Linux builds #14216
ENH: Enable huge pages in all Linux builds #14216
Conversation
numpy/core/src/multiarray/alloc.c
Outdated
// Use code 14 (MADV_HUGEPAGE) if it isn't defined. This gives | ||
// a chance of enabling huge pages even if build with linux | ||
// kernel < 2.6.38 | ||
madvise((void*)((npy_uintp)p + offset), length, 14); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just define MADV_HUGEPAGES and let it go through a more similar code path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for the good idea, let me fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the latest commits, let me know if that's ok with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok with me means nothing, just a suggestion ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if it's ok with you, it means that you probably don't disagree nor have an idea to make it even cleaner. :)
@@ -76,14 +83,7 @@ _npy_alloc_cache(npy_uintp nelem, npy_uintp esz, npy_uint msz, | |||
if (NPY_UNLIKELY(nelem * esz >= ((1u<<22u)))) { | |||
npy_uintp offset = 4096u - (npy_uintp)p % (4096u); | |||
npy_uintp length = nelem * esz - offset; | |||
#ifdef MADV_HUGEPAGE | |||
madvise((void*)((npy_uintp)p + offset), length, MADV_HUGEPAGE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should add a comment here that we explicitely do not check the return code of madvise because we want to optimistically ask for HUGEPAGE even in the case that an old kernel is being used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I added that comment in the latest commit.
Is there a test we can run to see if this is actually working? I'm trying to build numpy on my machine and it is unclear if madvise is having an effect. |
There are only two ways I found I could test: adding a |
@@ -25,10 +25,14 @@ | |||
|
|||
#include <assert.h> | |||
|
|||
#ifdef HAVE_SYS_MMAN_H | |||
#ifdef NPY_OS_LINUX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is NPY_OS_LINUX
a true superset of HAVE_SYS_MMAN_H
? Just curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<sys/mman.h>
dates back to Unix and I've confirmed that it exists at least since kernel 2.6.0 (possibly before), so I think it's a safe bet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I mean that NPY_OS_SOLARIS
or NPY_OS_DARWIN
may currently succeed HAVE_SYS_MMAN_H
but are not covered under NPY_OS_LINUX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, but we're only including <sys/mman.h>
for MADV_HUGEPAGE
, and, AFAIK, that's a feature only available on Linux.
Is there something we can do to test with |
I tried checking |
The increased usage of Thanks for working through this. |
I think fragmentation will happen anyway for sufficiently large memory objects. I'm not sure I follow why is that relevant for a test though. The only thing I was pointing was that an alternative (probably not a great one) was to check that the array gets actually flagged as hugepage, which we could do by reading |
@pentschev Do you feel happy with this? I'm a bit wary about making the backport, but if this is unlikely to break anything -- and I'm not sure how we find out without a release -- I'll do so. |
It worked well in my tests @charris, but that's really one setup among many that could exist. I also think it's unlikely to break anything, in the event that it does, I think it will be catched very easily, since all memory allocation will then be broken. If this could be in the next release, I would be very happy by that! :) |
I removed the backport label, as it makes me a bit more comfortable to have the rc testing period in 1.18. If no one complains, I'll put this in soon. |
I would certainly prefer to have this backported at least to 1.17. We have some use cases where this gives us a major performance boost that otherwise we won't be seeing for the next 5-6 months until 1.18 gets released. |
l'm always willing to gamble and relearn why it is a bad idea. But this looks pretty safe, so let's give it a try in 1.17. |
/* | ||
* Use code 14 (MADV_HUGEPAGE) if it isn't defined. This gives a chance of | ||
* enabling huge pages even if built with linux kernel < 2.6.38 | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, this means that even if compiled on a machine running an old kernel, it will enable huge pages when run on a newer kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. An attempt to enable it will be made, and if it errors out, hugepages is simply ignored.
@pentschev Have you tested this running on old kernels? |
Checking... CentOS 5.11, used for building wheels, had kernel-2.6.18, so I guess we will know if this works soon enough. @pentschev There should be a nightly wheel built sometime in the next 24 hours that has this in it, so we can test it. The wheels can be found at https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com/ |
I haven't tested that, but according to kernel documentation, simply ignoring the error should still give us the old behavior.
That's great, please let me know how it goes. Should anything go wrong I can work on a fix for that.
Just checked, still not available but will watch for that and try it out once it's up. Thanks for the link @charris! |
I went ahead and triggered a wheels build for master. How long it takes varies, you can check it at https://travis-ci.org/MacPython/numpy-wheels. |
@pentschev New wheels are up. |
@charris I see a NumPy 1.18.0.dev0 wheels, were we supposed to have a 1.17.1 as well? Either way, it works well on newer kernels at least, I for one am using 4.15.0-58 (Ubuntu 16.04) for the results below: In [1]: import numpy as np
In [2]: np.__version__
Out[2]: '1.17.0'
In [3]: %%time
...: a = np.ones(int(1024**3), dtype='u1')
...:
...:
CPU times: user 131 ms, sys: 384 ms, total: 514 ms
Wall time: 512 ms In [1]: import numpy as np
In [2]: np.__version__
Out[2]: '1.18.0.dev0+950dd4e'
In [3]: %%time
...: a = np.ones(int(1024**3), dtype='u1')
...:
...:
CPU times: user 24 ms, sys: 201 ms, total: 225 ms
Wall time: 223 ms The second one is much faster, as expected. :) |
I used the Python 3.7 package for the result on my previous comment. |
I only triggered the master build, so |
I went ahead and backported to 1.16.x as well, can't hurt. |
Was there ever any followup on the fragmentation? Because after a few days digging, this patch absolutely tanks performance on our workloads. Toggling on and off hugepages clears the issue entirely and they run in parallel with almost linear scaling. |
@Plantain My understanding was that there were early problems with fragmentation that were fixed in later linux kernels. However, it is certainly possible that your setup exposes new problems. What kernel version do you have? |
5.3.0.
…On Fri, 23 Oct 2020, 9:49 pm Charles Harris, ***@***.***> wrote:
@Plantain <https://github.com/Plantain> My understanding was that there
were early problems with fragmentation that were fixed in later linux
kernels. However, it is certainly possible that your setup exposes new
problems. What kernel version do you have?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14216 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACODF3DTZTA6BCA24UFR5LSMFUTJANCNFSM4IJ6ZTEQ>
.
|
That is pretty recent... googling around, there seems to be a fair amount of ongoing work on the fragmentation problem as well as various ways to disable hugepages that may be distro dependent. |
@Plantain it seemed mainly an issue on old kernels, so we only rolled it back on those (but only on 1.19, IIRC). We also have have an environment variable to disable madvise hugepages: https://numpy.org/doc/1.19/reference/global_state.html?highlight=global That said, I was personally never quite sure what the tradeoffs are here. |
I think this will depend a lot on OS configuration, and we'll probably not find a one-size-fits-all, unfortunately. Having variables such as |
…r spilling (#12399) A `bytearray` is zero-initialized using `calloc`, which we don't need. Additionally, `numpy.empty` both skips the zero-initialization and uses hugepages when available <numpy/numpy#14216>. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/jakirkham Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/jakirkham URL: #12399
This PR modifies memory allocation in such a way that huge pages for large arrays is enabled in all Linux builds (even if
MADV_HUGEPAGE
is not defined). As per discussion in #14177, this isn't the case currently, which means huge pages may never be available depending on the NumPy build, even if the runtime system supports it. Both conda-forge and pip builds for 1.17.0 seem to not include support for huge pages.The
madvise()
man-pages indicates that it will returnEINVAL
if theadvice
is invalid. At runtime, that means that it will be enabled if available, otherwise for the use case in NumPy it's enough to simply ignore the error to fallback to default behavior.Fixes #14177 .
cc @hmaarrfk @jakirkham @charris @seberg @mrocklin