-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs request: Fetching remote files #5493
Comments
@trev-f To answer your immediate questions:
@christopher-hakkaart I think we can add a section under Workflow with files > Remote Files, what do you think? You can try a first draft if you want, but I might need to do it myself because I need to check a few details in the code. In any case, this would be a great thing to document as it is a mystery to many users and unfortunately doesn't rise to the level of something that just always magically works. |
Hi both, I'll write a draft and link the issue for feedback/corrections. |
Sounds good, once you have a first draft I can add some details as needed |
Signed-off-by: Christopher Hakkaart <[email protected]>
Hi! |
I think the shared file would still be downloaded multiple times because with Fusion each task is responsible for downloading its inputs. For cloud tasks this is fine because each task has its own VM and needs to download the input files anyway. For local tasks using Fusion it is suboptimal because theoretically the local tasks could cache and reuse the same input file. We just don't have a mechanism to do that with Fusion as far as I know |
Slight correction to my original response, it looks like the remote staging is designed to handle concurrent requests for the same file. Nextflow will coordinate these requests to make sure that a given file is downloaded once and reused by all tasks that request it, even if they do so at the same time. |
New feature (docs)
I would like to request documentation describing how remote files are downloaded/staged in Nextflow.
Usage scenario
Projects that require fetching large amounts of data from remote sources are common, and it's necessary to fetch those files in an efficient manner. While Nextflow makes it easy to download remote files, the lack of documentation on how remote files are handled makes it difficult to evaluate when to fetch files with this built-in Nextflow option versus building a more tailored solution.
Currently, the lack of documentation makes it difficult to build a mental model for how downloading remote files works in Nextflow. Since fetching remote data can be a massive bottleneck for some projects, it's imperative that users understand how Nextflow works so that we can build more efficient workflows.
Suggest implementation
In the remote files docs, answer some basic questions about how remote files are handled, such as:
file()
method is called on a string that resembles a path? When a Channel is created from a Path object? When a Path object inside a Channel is accessed?-resume
affect this behavior?The text was updated successfully, but these errors were encountered: