You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A common situation for dataset generation/processing involves writing many tensors to disk from many processes/nodes in parallel, and over a long duration. While shared storage is assumed, the storage itself is often slow and has delays due to NFS caching, etc, and many small file ops cause inefficient operation. In addition, allowing the user to manually flush to disk can help alert the user to file I/O bottlenecks as it's clear what is blocking the code.
My current workflow with tensordicts is to generate 1 per process, periodically save to disk [by deleting the old one and creating a new one] and finally merging all individual tensordicts with a cat.
Solution
Support incremental writing/saving of a memmap tensordict. Writes should persist in memory until a manual flush occurs. The entire file shouldn't be overwritten so as to allow other processes to write to other portions of the tensordict in parallel.
Checklist
I have checked that there is no similar issue in the repo (required)
The text was updated successfully, but these errors were encountered:
In principle I don't see why it wouldn't be possible but the way we work with memmap is through torch.from_file, which does not return a traditional mmap object with flush functionality.
Motivation
A common situation for dataset generation/processing involves writing many tensors to disk from many processes/nodes in parallel, and over a long duration. While shared storage is assumed, the storage itself is often slow and has delays due to NFS caching, etc, and many small file ops cause inefficient operation. In addition, allowing the user to manually flush to disk can help alert the user to file I/O bottlenecks as it's clear what is blocking the code.
My current workflow with tensordicts is to generate 1 per process, periodically save to disk [by deleting the old one and creating a new one] and finally merging all individual tensordicts with a
cat
.Solution
Support incremental writing/saving of a memmap tensordict. Writes should persist in memory until a manual flush occurs. The entire file shouldn't be overwritten so as to allow other processes to write to other portions of the tensordict in parallel.
Checklist
The text was updated successfully, but these errors were encountered: