-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing 1-to-M behaviour of on_disk_cache. #810
Changing 1-to-M behaviour of on_disk_cache. #810
Conversation
[ghstack-poisoned]
ghstack-source-id: 4bfa54e5646d08eb5601ebe99ad23b044cef52ae Pull Request resolved: #810
@VitalyFedyunin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!! Thank you!!!
@@ -185,13 +194,19 @@ def __init__( | |||
|
|||
if filepath_fn is not None: | |||
_check_unpickable_fn(filepath_fn) | |||
filepath_fn = _generator_to_list(filepath_fn) if inspect.isgeneratorfunction(filepath_fn) else filepath_fn | |||
assert not inspect.isgeneratorfunction(filepath_fn) # BC breaking, now only str is accepted as return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, we need to change the dataset implementation from TorchText side. e.g.: https://github.com/pytorch/text/blob/ff1fdfce8ac030a11638b2d94d54364144586253/torchtext/datasets/imdb.py#L107
todo_dp: Any | ||
cached_dp: Any | ||
one_many_cached_dp: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: annotated as IterDataPipe
TLDR: If filename_fn of on_disk_cache ( filename_fn'1 ) generates different name from filename_fn of end_caching, it is considered 1 to many situation. In this case additional listing file would be created at filename_fn'1 position and it will be used to check cache consistency between runs. BC breaking, on_disk_cache no longer accept generators as `filename_fn` Differential Revision: [D40148560](https://our.internmc.facebook.com/intern/diff/D40148560) [ghstack-poisoned]
ghstack-source-id: faa070d82c5482588a0dd5649335bbed550c6724 Pull Request resolved: #810
@VitalyFedyunin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
TLDR: If filename_fn of on_disk_cache ( filename_fn'1 ) generates different name from filename_fn of end_caching, it is considered 1 to many situation. In this case additional listing file would be created at filename_fn'1 position and it will be used to check cache consistency between runs. BC breaking, on_disk_cache no longer accept generators as `filename_fn` Differential Revision: [D40148560](https://our.internmc.facebook.com/intern/diff/D40148560) [ghstack-poisoned]
ghstack-source-id: 74bee209f06b86519ecaa7b996fccaad8ffbec95 Pull Request resolved: #810
@VitalyFedyunin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned]
Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned]
* Fixed on_disk_cache issues [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned]
* Fixed on_disk_cache issues [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned] Co-authored-by: Vitaly Fedyunin <[email protected]>
* Fixed on_disk_cache issues [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned] Co-authored-by: Vitaly Fedyunin <[email protected]>
Summary: Pull Request resolved: pytorch#810 TLDR: If filename_fn of on_disk_cache ( filename_fn'1 ) generates different name from filename_fn of end_caching, it is considered 1 to many situation. In this case additional listing file would be created at filename_fn'1 position and it will be used to check cache consistency between runs. BC breaking, on_disk_cache no longer accept generators as `filename_fn` Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D40148560 Pulled By: VitalyFedyunin fbshipit-source-id: 3bb9c5f32546a3d4859cff6bda0cc84234312ab7
Summary: Pull Request resolved: #810 TLDR: If filename_fn of on_disk_cache ( filename_fn'1 ) generates different name from filename_fn of end_caching, it is considered 1 to many situation. In this case additional listing file would be created at filename_fn'1 position and it will be used to check cache consistency between runs. BC breaking, on_disk_cache no longer accept generators as `filename_fn` Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D40148560 Pulled By: VitalyFedyunin fbshipit-source-id: 3bb9c5f32546a3d4859cff6bda0cc84234312ab7
* Fixed on_disk_cache issues [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned] * Update on "Fixed on_disk_cache issues" Fixed issues with cache locks and cache files overwrites. Required to be compatible with pytorch/data#810 [ghstack-poisoned] Co-authored-by: Vitaly Fedyunin <[email protected]> Co-authored-by: Joe Cummings <[email protected]> Co-authored-by: Vitaly Fedyunin <[email protected]>
Summary: Pull Request resolved: pytorch#810 TLDR: If filename_fn of on_disk_cache ( filename_fn'1 ) generates different name from filename_fn of end_caching, it is considered 1 to many situation. In this case additional listing file would be created at filename_fn'1 position and it will be used to check cache consistency between runs. BC breaking, on_disk_cache no longer accept generators as `filename_fn` Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D40148560 Pulled By: VitalyFedyunin fbshipit-source-id: 3bb9c5f32546a3d4859cff6bda0cc84234312ab7
TLDR: If filename_fn of on_disk_cache ( filename_fn'1 ) generates different name from filename_fn of end_caching, it is considered 1 to many situation. In this case additional listing file would be created at filename_fn'1 position and it will be used to check cache consistency between runs.
BC breaking, on_disk_cache no longer accept generators as
filename_fn
Stack from ghstack (oldest at bottom):
Differential Revision: D40148560