Rewrite MemoryCache alloc_timeout logic #434

justheuristic · 2023-08-04T16:37:11Z

rpc_inference: server will now accept allocation timeout from user, defaults to no timeout
bugfix: inference timeout is now measured from the moment the request is received
- previously, you would have to wait for your timeout plus the time it takes to sort through the queue (other users' timeout)
- now, you get AllocationFailed if you had to wait for over (timeout) seconds - regardless of other users
a request for inference with no timeout will now fail instantly if there is not enough memory available
dtype number of bytes is now correctly determined for int, bool & other types

Testing:

test that nonzero timeout includes time spent waiting for lock
test that zero timeout is always instantaneous
test that zero timeout always works if cache has free memory
test integration with client's InferenceSession

# Conflicts: # src/petals/server/memory_cache.py

justheuristic · 2023-08-10T19:53:17Z

src/petals/cli/run_server.py

-    parser.add_argument('--alloc_timeout', type=float, default=1,
-                        help='If the cache is full, the server will wait for this number of seconds hoping that some memory will be freed '
-                             'before rejecting the request')
+    parser.add_argument('--max_alloc_timeout', type=float, default=600,


nb: large max timeout is no longer a problem because long-timeout users will no longer cause others to freeze

tests/test_cache.py

borzunov · 2023-08-12T01:50:52Z

src/petals/server/memory_cache.py

@@ -60,11 +61,14 @@ def handle_counter(self, value: int):
        self._handle_counter.value = value

    @contextlib.asynccontextmanager
-    async def allocate_cache(self, *descriptors: TensorDescriptor) -> AsyncContextManager[Sequence[Handle]]:
+    async def allocate_cache(
+        self, *descriptors: TensorDescriptor, timeout: Optional[float] = None


Suggested change

self, *descriptors: TensorDescriptor, timeout: Optional[float] = None

self, *descriptors: TensorDescriptor, timeout: float

Let's assume timeout to be known here, otherwise we'll have defaults in two places.

This appears to be a misunderstanding. The value timeout=None is not a default to be replaced, it literally means no timeout. If both timeout and max_alloc_timeout is None, the user will await allocation until it succeeds.

borzunov · 2023-08-12T01:51:30Z

src/petals/server/memory_cache.py

@@ -74,6 +78,8 @@ async def allocate_cache(self, *descriptors: TensorDescriptor) -> AsyncContextMa
        """
        assert os.getpid() != self.runtime_pid, "must be called by a ConnectionHandler, not runtime"
        assert all(descr.device is not None for descr in descriptors), "please specify allocated devices"
+        if self.max_alloc_timeout is not None:
+            timeout = min(timeout, self.max_alloc_timeout) if timeout is not None else self.max_alloc_timeout


Suggested change

timeout = min(timeout, self.max_alloc_timeout) if timeout is not None else self.max_alloc_timeout

timeout = min(timeout, self.max_alloc_timeout)

Let's assume timeout to be known here, otherwise we'll have defaults in two places. Notably, the default here was different from the zero default in backend.py, which was confusing.

It looks like we have a misunderstanding: this is not a default. The suggested code would not support max_alloc_timeout=None

nit: Can we still drop None support here and require passing float('inf') for this case? The metadata's timeout is casted to float in the wrapping code anyway

i solemnly swear to test that it works with float('inf')

src/petals/server/memory_cache.py

borzunov · 2023-08-12T01:54:20Z

src/petals/server/memory_cache.py

            yield handles
        finally:
-            self._free(max_alloc_size, alloc_task)
+            await shield_and_wait(self._schedule_free(max_alloc_size, alloc_task))


Important: You can't do await in finally, this leads to deadlocks that we just fixed in #396.

The function must free it instantly, so you don't need await here. As far as I understand, you have it here because you didn't merge code correctly with #396.

Please leave a comment here saying that we can't do await here so nobody is tempted to add it once again.

good point, done that.

for posterity: the problem occured when I merged the earlier fix into this branch about a week ago

borzunov · 2023-08-12T01:57:38Z

src/petals/server/memory_cache.py

+    async def _schedule_free(self, alloc_size: int, alloc_task: asyncio.Task):
+        """
+        This method should be called inside asyncio.shield() because:
+            - hivemind.utils.enter_asynchronously() does not always release the lock on cancellation
+            - _schedule_free() must finish freeing memory even in case of cancellation
+        """


Please revert this change - this function cannot await and must return instantly (see comments above).

Suggested change

async def _schedule_free(self, alloc_size: int, alloc_task: asyncio.Task):

"""

This method should be called inside asyncio.shield() because:

- hivemind.utils.enter_asynchronously() does not always release the lock on cancellation

- _schedule_free() must finish freeing memory even in case of cancellation

"""

def _free(self, alloc_size: int, alloc_task: asyncio.Task):

borzunov · 2023-08-12T02:01:23Z

src/petals/utils/misc.py

@@ -5,3 +5,13 @@

 def is_dummy(tensor: torch.Tensor):
    return tensor.numel() == 0
+
+
+SPECIAL_DTYPE_SIZES = {torch.bool: 1, torch.int8: 1, torch.qint32: 4}


Suggested change

SPECIAL_DTYPE_SIZES = {torch.bool: 1, torch.int8: 1, torch.qint32: 4}

SPECIAL_DTYPE_SIZES = {torch.bool: 1}

int8 and qint32 are supported by iinfo. Given that, please consider removing the dict and just checking for torch.bool in get_size_bytes() explicitly.

Alas, it is not:

borzunov · 2023-08-12T02:12:20Z

src/petals/server/memory_cache.py

        loop = asyncio.get_event_loop()
-        async with hivemind.utils.enter_asynchronously(self._lock_acquire_memory):
+        if timeout == 0:  # if waiting is not allowed, fail when you or anyone else begins waiting
+            stop_when_completes = loop.run_in_executor(None, self._cache_overfull_event.wait)


Important: If waiting is not allowed, please either allocate or return exception right away. Please write code explicitly doing that - otherwise it's more difficult to check this case and we may end up with deadlocks in the most popular use case.

it's gone, replaced with a double-checked lock now

…touchup

src/petals/server/handler.py

src/petals/server/memory_cache.py

Co-authored-by: Alexander Borzunov <[email protected]>

src/petals/server/memory_cache.py

borzunov

Please accept my last comment and merge once you're ready.

Co-authored-by: Alexander Borzunov <[email protected]>

Your Name added 13 commits July 21, 2023 08:00

the (still) reasonable version

cc67c33

the (still) reasonable version

ee890d5

Merge remote-tracking branch 'origin/main' into memcache_touchup

ce1c9f3

# Conflicts: # src/petals/server/memory_cache.py

black

5f15430

isort

6e4fdfa

update timeout

cb42f8c

unify

1268ed9

revert

f8b7e9b

repair main test

f2ce28b

hard test

95f2cdb

test timeouts

14f9231

replace TimeoutError with AllocationFailed

5dfcb5c

blacken

add9b30

justheuristic marked this pull request as ready for review August 4, 2023 22:44

justheuristic requested a review from borzunov August 7, 2023 16:04

justheuristic and others added 4 commits August 7, 2023 19:06

Merge branch 'main' into memcache_touchup

2b947e9

black

78a9603

black

46dedeb

Merge branch 'main' into memcache_touchup

0546ae1

justheuristic commented Aug 10, 2023

View reviewed changes

tests/test_cache.py Outdated Show resolved Hide resolved

justheuristic and others added 2 commits August 10, 2023 23:13

Update tests/test_cache.py

874f17b

black

1411c8a

borzunov reviewed Aug 12, 2023

View reviewed changes

src/petals/server/memory_cache.py Outdated Show resolved Hide resolved

borzunov reviewed Aug 12, 2023

View reviewed changes

Your Name and others added 8 commits August 16, 2023 14:40

remove default

ae090b4

quickfix

868f485

Merge branch 'main' into memcache_touchup

e100672

latest accelerate

34ac611

Merge remote-tracking branch 'origin/memcache_touchup' into memcache_…

ce31c9b

…touchup

latest peft support

2f4f298

bump peft

2242342

Merge branch 'main' into memcache_touchup

0846b4b

borzunov reviewed Aug 24, 2023

View reviewed changes

src/petals/server/handler.py Show resolved Hide resolved

borzunov reviewed Aug 24, 2023

View reviewed changes

src/petals/server/handler.py Outdated Show resolved Hide resolved

borzunov reviewed Aug 24, 2023

View reviewed changes

src/petals/server/memory_cache.py Outdated Show resolved Hide resolved

borzunov reviewed Aug 24, 2023

View reviewed changes

src/petals/server/memory_cache.py Outdated Show resolved Hide resolved

borzunov reviewed Aug 24, 2023

View reviewed changes

src/petals/server/memory_cache.py Outdated Show resolved Hide resolved

justheuristic and others added 9 commits August 24, 2023 22:44

Update src/petals/server/memory_cache.py

94a1e0a

Co-authored-by: Alexander Borzunov <[email protected]>

Update src/petals/server/memory_cache.py

6c8838a

Co-authored-by: Alexander Borzunov <[email protected]>

Update src/petals/server/handler.py

ecedd5e

Co-authored-by: Alexander Borzunov <[email protected]>

Update src/petals/server/memory_cache.py

5b26ba1

Co-authored-by: Alexander Borzunov <[email protected]>

extra test

de6b00e

extra test

6e35731

Merge branch 'main' into memcache_touchup

6d62057

Update src/petals/server/handler.py

6910c53

Merge branch 'main' into memcache_touchup

9809b0a

borzunov reviewed Aug 28, 2023

View reviewed changes

src/petals/server/memory_cache.py Outdated Show resolved Hide resolved

borzunov approved these changes Aug 28, 2023

View reviewed changes

justheuristic and others added 4 commits August 28, 2023 05:51

Update src/petals/server/memory_cache.py

7f118e8

Co-authored-by: Alexander Borzunov <[email protected]>

rollback

7924558

review

4caae8f

black

58e0166

justheuristic merged commit c08d09c into main Aug 28, 2023
9 checks passed

justheuristic deleted the memcache_touchup branch August 28, 2023 13:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite MemoryCache alloc_timeout logic #434

Rewrite MemoryCache alloc_timeout logic #434

justheuristic commented Aug 4, 2023 •

edited by borzunov

Loading

justheuristic Aug 10, 2023

borzunov Aug 12, 2023

justheuristic Aug 15, 2023 •

edited

Loading

borzunov Aug 12, 2023

justheuristic Aug 15, 2023

borzunov Aug 24, 2023 •

edited

Loading

justheuristic Aug 24, 2023

borzunov Aug 12, 2023 •

edited

Loading

justheuristic Aug 15, 2023

borzunov Aug 12, 2023

borzunov Aug 12, 2023 •

edited

Loading

justheuristic Aug 16, 2023

borzunov Aug 12, 2023 •

edited

Loading

justheuristic Aug 16, 2023

borzunov left a comment

	self, *descriptors: TensorDescriptor, timeout: Optional[float] = None
	self, *descriptors: TensorDescriptor, timeout: float

	timeout = min(timeout, self.max_alloc_timeout) if timeout is not None else self.max_alloc_timeout
	timeout = min(timeout, self.max_alloc_timeout)

	SPECIAL_DTYPE_SIZES = {torch.bool: 1, torch.int8: 1, torch.qint32: 4}
	SPECIAL_DTYPE_SIZES = {torch.bool: 1}

Rewrite MemoryCache alloc_timeout logic #434

Rewrite MemoryCache alloc_timeout logic #434

Conversation

justheuristic commented Aug 4, 2023 • edited by borzunov Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justheuristic Aug 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borzunov Aug 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borzunov Aug 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borzunov Aug 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borzunov Aug 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

borzunov left a comment

Choose a reason for hiding this comment

justheuristic commented Aug 4, 2023 •

edited by borzunov

Loading

justheuristic Aug 15, 2023 •

edited

Loading

borzunov Aug 24, 2023 •

edited

Loading

borzunov Aug 12, 2023 •

edited

Loading

borzunov Aug 12, 2023 •

edited

Loading

borzunov Aug 12, 2023 •

edited

Loading