Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Caching and Docs Update #121

Merged
merged 18 commits into from
Aug 4, 2023
Merged

Enable Caching and Docs Update #121

merged 18 commits into from
Aug 4, 2023

Conversation

axkoenig
Copy link
Contributor

@axkoenig axkoenig commented May 12, 2023

Description

This is a follow up for our PRs #113 and PRs #95.

Since fsspec comes with an advanced caching functionality, we do not need to have a CacheStore for our StoreDriver. Rather we should use the powerful API provided by fsspec, which ensures a unified caching mechanism across all of our drivers.

The PR

  • removes the old caching functionality and its tests
  • implements minor changes in filesystem initialisation, such that protocol can be set through storage_options
  • implements a minor change in how storage_options can be updated via CatalogSource.get_driver(). The default behavior was that any newly provided storage_options in get_driver would entirely overwrite the existing storage_options of the CatalogSource, which leads to a loss of potentially important configs of the person who gave you the catalog (e.g. some bucket configs like requester_pays should not be overwritten!).
  • implements a test on this ^
  • does a documentation update on how to use fsspec caching with squirrel, giving some context around the topic and some minor benchmarks

Attached is a rendered and updated version of the docs.
Caching — Squirrel documentation.pdf

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring including code style reformatting
  • Other (please describe):

Checklist:

  • I have read the contributing guideline doc (external contributors only)
  • Lint and unit tests pass locally with my changes
  • I have kept the PR small so that it can be easily reviewed
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • All dependency changes have been reflected in the pip requirement files.

@axkoenig axkoenig requested a review from pzdkn May 12, 2023 15:57
Copy link
Contributor

@MaxSchambach MaxSchambach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this already looks good! I've left some minor remarks.
Also, since these are breaking changes regarding the StoreDriver and some low-level changes in how fsspec options are handled, I think @AlirezaSohofi should have a quick look as well and give his approval.

docs/advanced/caching.rst Outdated Show resolved Hide resolved
docs/advanced/caching.rst Outdated Show resolved Hide resolved
squirrel/fsspec/fs.py Outdated Show resolved Hide resolved
test/test_catalog/test_catalog.py Outdated Show resolved Hide resolved
test/test_catalog/test_catalog.py Show resolved Hide resolved
test/test_driver/test_cache.py Outdated Show resolved Hide resolved
@github-actions
Copy link

This is PR is marked as stale as it has been inactive for 30 days. It will be closed in 7 days.

Copy link
Contributor

@pzdkn pzdkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex, looks great! I left some comments here.

squirrel/driver/store.py Show resolved Hide resolved
docs/advanced/caching.rst Show resolved Hide resolved
squirrel/catalog/catalog.py Show resolved Hide resolved
docs/advanced/caching.rst Show resolved Hide resolved
@axkoenig
Copy link
Contributor Author

Hi folks, thanks for your reviews. I addressed all your comments and just pushed the code, please have another look and maybe reply to my comments if you feel there's anything to discuss. We need some final review from @AlirezaSohofi mainly to check out the minor lower-level change of how the storage options are passed down in squirrel/fsspec/fs.py
Cheers, Alex

MaxSchambach
MaxSchambach previously approved these changes Jul 13, 2023
Copy link
Contributor

@MaxSchambach MaxSchambach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! WIth one or two minor modifications, LGTM! Thanks for pushing this over the line @axkoenig 🙌

test/test_catalog/test_catalog.py Outdated Show resolved Hide resolved
squirrel/catalog/catalog.py Show resolved Hide resolved
docs/advanced/caching.rst Outdated Show resolved Hide resolved
test/test_catalog/test_catalog.py Show resolved Hide resolved
@axkoenig
Copy link
Contributor Author

axkoenig commented Jul 21, 2023

@AlirezaSohofi the branch trigger above fails because of ModuleNotFoundError: No module named 'numba' , do you have a quick fix for this or any hints? Numba is contained inside our poetry.lock here, so it should actually be correctly installed in the testing instance, no?

@AlirezaSohofi
Copy link
Contributor

@AlirezaSohofi the branch trigger above fails because of ModuleNotFoundError: No module named 'numba' , do you have a quick fix for this or any hints? Numba is contained inside our poetry.lock here, so it should actually be correctly installed in the testing instance, no?

Can you re-base to main please?

@axkoenig
Copy link
Contributor Author

Hi there, this is ready for final review @AlirezaSohofi . I implemented a test for MessagepackDriver too and rebased to main, all checks are green

AlirezaSohofi
AlirezaSohofi previously approved these changes Jul 28, 2023
Copy link
Contributor

@AlirezaSohofi AlirezaSohofi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks so much 👍

Left a minor comment. Please don't forget to update the version.

test/test_driver/test_caching.py Outdated Show resolved Hide resolved
Copy link
Contributor

@AlirezaSohofi AlirezaSohofi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @axkoenig 👍

@axkoenig axkoenig merged commit 1fc20b7 into main Aug 4, 2023
4 checks passed
@axkoenig axkoenig deleted the ak-enable-caching-and-docs branch August 4, 2023 12:50
@github-actions github-actions bot locked and limited conversation to collaborators Aug 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants