Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async filesystem implementation #283

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Async filesystem implementation #283

wants to merge 1 commit into from

Conversation

AdrianoKF
Copy link
Contributor

@AdrianoKF AdrianoKF commented Jul 5, 2024

The implementation simply wraps the synchronous file system already present in the code in asyncio coroutines.

Might serve as the first implementation for #279.

@AdrianoKF AdrianoKF self-assigned this Jul 5, 2024
The implementation simply wraps the synchronous
file system already present in the code in asyncio
coroutines.
Copy link

codecov bot commented Jul 5, 2024

Codecov Report

Attention: Patch coverage is 30.00000% with 28 lines in your changes missing coverage. Please review.

Project coverage is 87.91%. Comparing base (361dbfc) to head (6e6c4e6).

Files Patch % Lines
src/lakefs_spec/asyn/spec.py 0.00% 26 Missing ⚠️
src/lakefs_spec/asyn/__init__.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #283      +/-   ##
==========================================
- Coverage   93.99%   87.91%   -6.09%     
==========================================
  Files           5        7       +2     
  Lines         383      422      +39     
  Branches       72       73       +1     
==========================================
+ Hits          360      371      +11     
- Misses         14       42      +28     
  Partials        9        9              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nicholasjng nicholasjng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to implementing async in this way? I seem to remember s3fs doing the opposite, i.e. implementing all methods as async and then applying a sync_wrapper on top of it for the syncs.

They inherit only from AsyncFileSystem: https://github.com/fsspec/s3fs/blob/main/s3fs/core.py#L176

Happy for a discussion.

src/lakefs_spec/util.py Show resolved Hide resolved
@AdrianoKF
Copy link
Contributor Author

Is there an advantage to implementing async in this way? I seem to remember s3fs doing the opposite, i.e. implementing all methods as async and then applying a sync_wrapper on top of it for the syncs.

They inherit only from AsyncFileSystem: https://github.com/fsspec/s3fs/blob/main/s3fs/core.py#L176

Happy for a discussion.

Generally, implementing only the async version and then wrapping it to make it blocking would be the more correct way, IMO. The problem is that the lakeFS high-level SDK does not expose an async API (and we rewrote our implementation based on that). If we wanted to go async-first, we have two options:

  1. Suggest a feature to make the high-level SDK non-blocking (which would be great, but unlikely to happen, IMO, because it would subvert the strong typing guarantees of the SDK, in the end leading to two parallel APIs).
  2. Revert our code back to the low-level SDK (i.e., the OpenAPI-generated API client), which supports async API requests with async_req=True.
  3. (okay, three options) keep the sync->async wrapper for and use it as a benchmarking baseline, so we can evaluate the other two options going forward.

@leonpawelzik leonpawelzik mentioned this pull request Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants