-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use correct thread stack limits in jl_init_stack_limits #54639
Use correct thread stack limits in jl_init_stack_limits #54639
Conversation
Actually these do not really seem to matter much in practice. The one thing that did cause me some issues here, though, was that stack_hi was the low address and stack_lo the high address.
*stack_lo = (void*)stackaddr; | ||
*stack_hi = (void*)__builtin_frame_address(0); | ||
size_t stacksize = pthread_get_stacksize_np(thread); | ||
*stack_hi = stackaddr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you meant to switch these, since the malloc region is stackaddr to stackaddr+stacksize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am misunderstanding you (sorry), but: stacks on all OS/CPU combos supported by Julia grow downwards, the stack starts at the highest address stackaddr
and grows downwards towards stackaddr-stacksize
(right after which the guard pages comes by default, at least for pthread created stacks with default settings).
I double checked this in darwin_libpthread, glibc and muslc and via real-world experiments on darwin/macos and linux+glibc (no muslc for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash does my explanation make sense to you? Any other questions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, yeah, it is confusing that it is inverted from the values returned by pthread_attr_getstack
, but apparently that is the legacy behavior
Move the registers onto the stack, so that they only are present when the Task is actually switched out, saving memory when the Task is not running yet or already finished. On Linux x86_64 this reduces it from 376 bytes to 184 bytes. Has some additional advantages too, such as copy_stack tasks (e.g. with always_copy_stacks) can migrate to other threads before starting if they are not stick. Also fixes a variable that got mixed up by #54639 and caused always_copy_stacks to abort, since the stack limits were wrong.
Move the registers onto the stack, so that they only are present when the Task is actually switched out, saving memory when the Task is not running yet or already finished. On Linux x86_64 this reduces it from 376 bytes to 184 bytes. Has some additional advantages too, such as copy_stack tasks (e.g. with always_copy_stacks) can migrate to other threads before starting if they are not stick. Also fixes a variable that got mixed up by #54639 and caused always_copy_stacks to abort, since the stack limits were wrong.
Move the registers onto the stack, so that they only are present when the Task is actually switched out, saving memory when the Task is not running yet or already finished. On Linux x86_64 this reduces it from 376 bytes to 184 bytes. Has some additional advantages too, such as copy_stack tasks (e.g. with always_copy_stacks) can migrate to other threads before starting if they are not stick. Also fixes a variable that got mixed up by #54639 and caused always_copy_stacks to abort, since the stack limits were wrong.
Move the registers onto the stack, so that they only are present when the Task is actually switched out, saving memory when the Task is not running yet or already finished. On Linux x86_64 this reduces it from 376 bytes to 184 bytes. Has some additional advantages too, such as copy_stack tasks (e.g. with always_copy_stacks) can migrate to other threads before starting if they are not stick. Also fixes a variable that got mixed up by #54639 and caused always_copy_stacks to abort, since the stack limits were wrong.
Move the registers onto the stack, so that they only are present when the Task is actually switched out, saving memory when the Task is not running yet or already finished. It makes this mostly just a huge renaming job. On Linux x86_64 this reduces it from 376 bytes to 184 bytes. Has some additional advantages too, such as copy_stack tasks (e.g. with always_copy_stacks) can migrate to other threads before starting if they are not sticky. Also fixes a variable that got mixed up by #54639 and caused always_copy_stacks to abort, since the stack limits were wrong. Also now fixes #43124, though I am not quite confident enough in it to re-enable that test right now.
Move the registers onto the stack, so that they only are present when the Task is actually switched out, saving memory when the Task is not running yet or already finished. It makes this mostly just a huge renaming job. On Linux x86_64 this reduces it from 376 bytes to 184 bytes. Has some additional advantages too, such as copy_stack tasks (e.g. with always_copy_stacks) can migrate to other threads before starting if they are not sticky. Also fixes a variable that got mixed up by #54639 and caused always_copy_stacks to abort, since the stack limits were wrong. Also now fixes #43124, though I am not quite confident enough in it to re-enable that test right now.
Actually these do not really seem to matter much in practice (as explained to me by @vtjnash on Slack, thank you). The one thing that did cause me some grief here, though, was that stack_hi was the low address and stack_lo the high address.
A smaller change to achieve that "hi means high, lo means low" would be to just reverse
stack_hi
andstack_lo
.