-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite MemoryCache alloc_timeout logic #434
Conversation
# Conflicts: # src/petals/server/memory_cache.py
parser.add_argument('--alloc_timeout', type=float, default=1, | ||
help='If the cache is full, the server will wait for this number of seconds hoping that some memory will be freed ' | ||
'before rejecting the request') | ||
parser.add_argument('--max_alloc_timeout', type=float, default=600, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nb: large max timeout is no longer a problem because long-timeout users will no longer cause others to freeze
src/petals/server/memory_cache.py
Outdated
@@ -60,11 +61,14 @@ def handle_counter(self, value: int): | |||
self._handle_counter.value = value | |||
|
|||
@contextlib.asynccontextmanager | |||
async def allocate_cache(self, *descriptors: TensorDescriptor) -> AsyncContextManager[Sequence[Handle]]: | |||
async def allocate_cache( | |||
self, *descriptors: TensorDescriptor, timeout: Optional[float] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self, *descriptors: TensorDescriptor, timeout: Optional[float] = None | |
self, *descriptors: TensorDescriptor, timeout: float |
Let's assume timeout
to be known here, otherwise we'll have defaults in two places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be a misunderstanding. The value timeout=None is not a default to be replaced, it literally means no timeout. If both timeout and max_alloc_timeout is None, the user will await allocation until it succeeds.
src/petals/server/memory_cache.py
Outdated
@@ -74,6 +78,8 @@ async def allocate_cache(self, *descriptors: TensorDescriptor) -> AsyncContextMa | |||
""" | |||
assert os.getpid() != self.runtime_pid, "must be called by a ConnectionHandler, not runtime" | |||
assert all(descr.device is not None for descr in descriptors), "please specify allocated devices" | |||
if self.max_alloc_timeout is not None: | |||
timeout = min(timeout, self.max_alloc_timeout) if timeout is not None else self.max_alloc_timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout = min(timeout, self.max_alloc_timeout) if timeout is not None else self.max_alloc_timeout | |
timeout = min(timeout, self.max_alloc_timeout) |
Let's assume timeout to be known here, otherwise we'll have defaults in two places. Notably, the default here was different from the zero default in backend.py
, which was confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we have a misunderstanding: this is not a default. The suggested code would not support max_alloc_timeout=None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we still drop None
support here and require passing float('inf')
for this case? The metadata's timeout is casted to float in the wrapping code anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i solemnly swear to test that it works with float('inf')
src/petals/server/memory_cache.py
Outdated
yield handles | ||
finally: | ||
self._free(max_alloc_size, alloc_task) | ||
await shield_and_wait(self._schedule_free(max_alloc_size, alloc_task)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: You can't do await
in finally, this leads to deadlocks that we just fixed in #396.
The function must free it instantly, so you don't need await
here. As far as I understand, you have it here because you didn't merge code correctly with #396.
Please leave a comment here saying that we can't do await
here so nobody is tempted to add it once again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, done that.
for posterity: the problem occured when I merged the earlier fix into this branch about a week ago
src/petals/server/memory_cache.py
Outdated
async def _schedule_free(self, alloc_size: int, alloc_task: asyncio.Task): | ||
""" | ||
This method should be called inside asyncio.shield() because: | ||
- hivemind.utils.enter_asynchronously() does not always release the lock on cancellation | ||
- _schedule_free() must finish freeing memory even in case of cancellation | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this change - this function cannot await
and must return instantly (see comments above).
async def _schedule_free(self, alloc_size: int, alloc_task: asyncio.Task): | |
""" | |
This method should be called inside asyncio.shield() because: | |
- hivemind.utils.enter_asynchronously() does not always release the lock on cancellation | |
- _schedule_free() must finish freeing memory even in case of cancellation | |
""" | |
def _free(self, alloc_size: int, alloc_task: asyncio.Task): |
src/petals/utils/misc.py
Outdated
@@ -5,3 +5,13 @@ | |||
|
|||
def is_dummy(tensor: torch.Tensor): | |||
return tensor.numel() == 0 | |||
|
|||
|
|||
SPECIAL_DTYPE_SIZES = {torch.bool: 1, torch.int8: 1, torch.qint32: 4} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SPECIAL_DTYPE_SIZES = {torch.bool: 1, torch.int8: 1, torch.qint32: 4} | |
SPECIAL_DTYPE_SIZES = {torch.bool: 1} |
int8
and qint32
are supported by iinfo
. Given that, please consider removing the dict and just checking for torch.bool
in get_size_bytes()
explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/petals/server/memory_cache.py
Outdated
loop = asyncio.get_event_loop() | ||
async with hivemind.utils.enter_asynchronously(self._lock_acquire_memory): | ||
if timeout == 0: # if waiting is not allowed, fail when you or anyone else begins waiting | ||
stop_when_completes = loop.run_in_executor(None, self._cache_overfull_event.wait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: If waiting is not allowed, please either allocate or return exception right away. Please write code explicitly doing that - otherwise it's more difficult to check this case and we may end up with deadlocks in the most popular use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's gone, replaced with a double-checked lock now
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
Co-authored-by: Alexander Borzunov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please accept my last comment and merge once you're ready.
Co-authored-by: Alexander Borzunov <[email protected]>
Testing: