-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jetstream tasks should be able to remove tasks #119
Comments
I think we're at the point where we need to build some moderately complex test cases in order to understand the potential impacts of a changes like this. There is another idea floating around to allow the current backend setting to be used in the template rendering context. There are some potential downsides with that, and maybe they're acceptable, but I don't have a good way of illustrating them right now. One of the problems that I see here is that a project is not tied to a single workflow. If I have multiple pipelines, I can run them all on a single project: Pipeline 1 - Runs tasks A, B, and C For some projects I might run Pipeline 1, then Pipeline 2. For others, I might run Pipeline 1, then Pipeline 3. And others I might run all three. If the mash process removed tasks that were not in the new workflow, it would remove the records of the other pipeline tasks. It might seem like namespaces for the tasks would fix this, but there would still be problems. The workflow is intended to maintain an accurate record of the state of the project: Pipeline 1 version 1 - Runs tasks A, B, and C Pipeline 1 version 2 - Runs A, C, and D, but no longer includes task B for some reason. If I run version 1 on my project, then run version 2. What is the state of the project? The outputs from task B will still be present. If task B modified the outputs of task A, those effects will still be present. I think it might still need to be accounted for in the project. |
Does Jetstream support running multiple pipelines on the same project now? In the Phoenix workflow task configuration supports turning on and off specific tasks. I'm assuming in different pipelines they would never do the same tasks? But maybe this is not your intention. In the example you provide I would think things are okay. The issue is when the new render does not include a task that previously existed and ends up being reset. In your examples assuming Pipeline 1,2,3 are completely independent and pipeline 2,3 does not depend on an output of pipeline 1 you would not encounter an issue. I agree with need improved test cases with better documentation, the reset directives experience make that very clear. |
Yes, it's always been designed to allow multiple pipelines run on a single project. They cannot run in parallel, because only one runner process can access a project at any time. But, you can use several pipelines together in a modular approach the way I described above. Almost all of the complicated situations stem from a single feature - open access to the project files from any task. Open access to the project files from any task is a feature that is extremely challenging to get right. This allows tasks to modify or even delete files that other tasks have created. It's useful to be able to clean up intermediate files as you go, especially when disk space is limited. With a cloud-enabled backend, this would be nearly impossible to implement. You would need to somehow make the entire project folder to every worker node executing a task. This is a luxury of an HPC cluster with as shared file system, but a common setup on cloud platforms. I hope we can preserve the open access idea, but get better about predicting which tasks need to be reset. I think there are a few ideas that will help:
|
This is a relatively uncommon scenario during production work, but a nice feature would be to be able to remove a task from a rendered workflow.
For instance if a task is now removed or renamed in the new workflow the old tasks will still exist. In a similar area, it would likely be better if the mash function used intersection instead of union. Or label the old task as deprecated so we know how the previous data was generated, but it does not run by Jetstream.
The text was updated successfully, but these errors were encountered: