Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsyncBuffer #88

Merged
merged 4 commits into from
Apr 6, 2023
Merged

AsyncBuffer #88

merged 4 commits into from
Apr 6, 2023

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Mar 21, 2023

@crusaderky crusaderky self-assigned this Mar 21, 2023
@crusaderky crusaderky marked this pull request as draft March 21, 2023 01:09
@crusaderky crusaderky force-pushed the async_buffer2 branch 27 times, most recently from bd957ed to 2bcd214 Compare March 27, 2023 23:02
@crusaderky crusaderky changed the title [WIP] AsyncBuffer AsyncBuffer Apr 4, 2023
@crusaderky crusaderky marked this pull request as ready for review April 4, 2023 12:42
Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits. I think the one thing I care about is the add_done_callback question. Most other things are rather an API design question but that concerns functionality that might cause surprising behavior

Comment on lines 144 to 157
if self.evicting is not None:
weight = min(weight, self.evicting)
if weight <= n:
return

self.evicting = n

def done(_: asyncio.Future) -> None:
self.evicting = None

# Note: this can get cancelled by LRU.close(), which in turn is
# triggered by Buffer.close()
future = self._offload(self.evict_until_below_target, n)
future.add_done_callback(done)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is possibly called multiple times (even if it is locked) but the done callbacks of concurrently running evict_untiL_below_target would overwrite / first one would already reset evicting. Is this intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, fixed

Comment on lines 48 to 51
if self.executor is None:
self.executor = ThreadPoolExecutor(
1, thread_name_prefix="zict.Buffer offloader"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: From an API POV I would suggest to allow the init to take an TPE and if none is provided, we'll initialize it during initialization of the class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +64 to +66
def async_get(
self, keys: Collection[KT], missing: Literal["raise", "omit"] = "raise"
) -> asyncio.Future[dict[KT, VT]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly curious, I don't think we should change this now but why is the API accepting multiple keys? is this somehow a requirement or an optimization?

The reason why I'm asking is because I think this drives some complexity and I would expect missing not to be required otherwise. I'm not entirely sure about this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an optimization to avoid having an extremely long queue to the thread pool, with each individual key incurring in thread synchronization overhead.

Comment on lines +101 to +103
# Do not pull keys towards the top of the LRU unless they are all available.
# This matters when there is a very long queue of async_get futures.
d = self.fast.get_all_or_nothing(keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, fast doesn't need to be a LRU, does it? In this case, I suspect we'll get an AttributeError here, won't we?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer.fast is always an LRU:

zict/zict/buffer.py

Lines 79 to 85 in b96afc4

self.fast = LRU(
n,
fast,
weight=weight,
on_evict=[self.fast_to_slow],
on_cancel_evict=[self._cancel_evict],
)

@crusaderky
Copy link
Collaborator Author

@fjetter all review comments have been addressed

@fjetter fjetter merged commit 78b4b7a into dask:main Apr 6, 2023
crusaderky added a commit that referenced this pull request Apr 6, 2023
@crusaderky
Copy link
Collaborator Author

Ouch. That was a merge without squash. I force-pushed to fix it; could we force squashing in the repo like we already did in distributed?

@crusaderky crusaderky deleted the async_buffer2 branch April 6, 2023 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants