Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite MemoryCache alloc_timeout logic #434

Merged
merged 48 commits into from
Aug 28, 2023
Merged

Conversation

justheuristic
Copy link
Collaborator

@justheuristic justheuristic commented Aug 4, 2023

  • rpc_inference: server will now accept allocation timeout from user, defaults to no timeout
  • bugfix: inference timeout is now measured from the moment the request is received
    • previously, you would have to wait for your timeout plus the time it takes to sort through the queue (other users' timeout)
    • now, you get AllocationFailed if you had to wait for over (timeout) seconds - regardless of other users
  • a request for inference with no timeout will now fail instantly if there is not enough memory available
  • dtype number of bytes is now correctly determined for int, bool & other types

Testing:

  • test that nonzero timeout includes time spent waiting for lock
  • test that zero timeout is always instantaneous
  • test that zero timeout always works if cache has free memory
  • test integration with client's InferenceSession

@justheuristic justheuristic marked this pull request as ready for review August 4, 2023 22:44
parser.add_argument('--alloc_timeout', type=float, default=1,
help='If the cache is full, the server will wait for this number of seconds hoping that some memory will be freed '
'before rejecting the request')
parser.add_argument('--max_alloc_timeout', type=float, default=600,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nb: large max timeout is no longer a problem because long-timeout users will no longer cause others to freeze

tests/test_cache.py Outdated Show resolved Hide resolved
@@ -60,11 +61,14 @@ def handle_counter(self, value: int):
self._handle_counter.value = value

@contextlib.asynccontextmanager
async def allocate_cache(self, *descriptors: TensorDescriptor) -> AsyncContextManager[Sequence[Handle]]:
async def allocate_cache(
self, *descriptors: TensorDescriptor, timeout: Optional[float] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self, *descriptors: TensorDescriptor, timeout: Optional[float] = None
self, *descriptors: TensorDescriptor, timeout: float

Let's assume timeout to be known here, otherwise we'll have defaults in two places.

Copy link
Collaborator Author

@justheuristic justheuristic Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a misunderstanding. The value timeout=None is not a default to be replaced, it literally means no timeout. If both timeout and max_alloc_timeout is None, the user will await allocation until it succeeds.

@@ -74,6 +78,8 @@ async def allocate_cache(self, *descriptors: TensorDescriptor) -> AsyncContextMa
"""
assert os.getpid() != self.runtime_pid, "must be called by a ConnectionHandler, not runtime"
assert all(descr.device is not None for descr in descriptors), "please specify allocated devices"
if self.max_alloc_timeout is not None:
timeout = min(timeout, self.max_alloc_timeout) if timeout is not None else self.max_alloc_timeout
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
timeout = min(timeout, self.max_alloc_timeout) if timeout is not None else self.max_alloc_timeout
timeout = min(timeout, self.max_alloc_timeout)

Let's assume timeout to be known here, otherwise we'll have defaults in two places. Notably, the default here was different from the zero default in backend.py, which was confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a misunderstanding: this is not a default. The suggested code would not support max_alloc_timeout=None

Copy link
Collaborator

@borzunov borzunov Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we still drop None support here and require passing float('inf') for this case? The metadata's timeout is casted to float in the wrapping code anyway

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i solemnly swear to test that it works with float('inf')

yield handles
finally:
self._free(max_alloc_size, alloc_task)
await shield_and_wait(self._schedule_free(max_alloc_size, alloc_task))
Copy link
Collaborator

@borzunov borzunov Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: You can't do await in finally, this leads to deadlocks that we just fixed in #396.

The function must free it instantly, so you don't need await here. As far as I understand, you have it here because you didn't merge code correctly with #396.

Please leave a comment here saying that we can't do await here so nobody is tempted to add it once again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, done that.

for posterity: the problem occured when I merged the earlier fix into this branch about a week ago

Comment on lines 148 to 153
async def _schedule_free(self, alloc_size: int, alloc_task: asyncio.Task):
"""
This method should be called inside asyncio.shield() because:
- hivemind.utils.enter_asynchronously() does not always release the lock on cancellation
- _schedule_free() must finish freeing memory even in case of cancellation
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this change - this function cannot await and must return instantly (see comments above).

Suggested change
async def _schedule_free(self, alloc_size: int, alloc_task: asyncio.Task):
"""
This method should be called inside asyncio.shield() because:
- hivemind.utils.enter_asynchronously() does not always release the lock on cancellation
- _schedule_free() must finish freeing memory even in case of cancellation
"""
def _free(self, alloc_size: int, alloc_task: asyncio.Task):

@@ -5,3 +5,13 @@

def is_dummy(tensor: torch.Tensor):
return tensor.numel() == 0


SPECIAL_DTYPE_SIZES = {torch.bool: 1, torch.int8: 1, torch.qint32: 4}
Copy link
Collaborator

@borzunov borzunov Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SPECIAL_DTYPE_SIZES = {torch.bool: 1, torch.int8: 1, torch.qint32: 4}
SPECIAL_DTYPE_SIZES = {torch.bool: 1}

int8 and qint32 are supported by iinfo. Given that, please consider removing the dict and just checking for torch.bool in get_size_bytes() explicitly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alas, it is not:
image

loop = asyncio.get_event_loop()
async with hivemind.utils.enter_asynchronously(self._lock_acquire_memory):
if timeout == 0: # if waiting is not allowed, fail when you or anyone else begins waiting
stop_when_completes = loop.run_in_executor(None, self._cache_overfull_event.wait)
Copy link
Collaborator

@borzunov borzunov Aug 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: If waiting is not allowed, please either allocate or return exception right away. Please write code explicitly doing that - otherwise it's more difficult to check this case and we may end up with deadlocks in the most popular use case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's gone, replaced with a double-checked lock now

Copy link
Collaborator

@borzunov borzunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please accept my last comment and merge once you're ready.

@justheuristic justheuristic merged commit c08d09c into main Aug 28, 2023
9 checks passed
@justheuristic justheuristic deleted the memcache_touchup branch August 28, 2023 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants