You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Snakemake provides its own environment management/deployment as well as batch system abstraction (as a plugin). This is functionality that significantly overlaps with tbg and both of these are delicate issues in the fragile and diverse HPC ecosystems we run in.
In the current stage, our usage does not conform to any of these standards. If our strategic decisions imply to professionalise the use of Snakemake, I think there would be value in moving more into that direction. However, my personal experience is that these things are awfully fiddly to setup on a cluster and their adoption would create tensions with tbg. (There are, of course, some low-hanging fruits like the adoption of the directory structure and formatter, e.g., in pre-commit.)
This issue is supposed to spark a discussion and trigger a conscious decision about such things. It is by no means intended as a suggestion to implement the above ideas. Involved in this discussion should probably be at least @psychocoderHPC, @PrometheusPi, @JessicaTiebel.
The text was updated successfully, but these errors were encountered:
Thanks for looking into the Snakemake workflow and giving input.
For 1.:
To be honest I wasn't aware of these things. Having a code quality checker, a formatter and standardised directory structure sounds like a very good idea and on the first glance also doesn't seem to be super much work. Thanks for bringing this up.
For 2.:
I am more aware of the issues you mention here. I agree that using the batch system abstraction has a lot of advantages and makes transition between different clusters much easier. This would be a huge benefit, but also is a lot of work (also the reason I didn't do it initially). I am not sure if it is possible to abstract the tbg in a way that it is also flexible enough to work with different clusters. I am definitely open for discussions regarding this.
In closing this PR, I had a look at the Snakemake workflows here and there are a few things I stumbled across:
There is a best practices page in the Snakemake docs suggesting amongst other things to use
snakemake --lint
snakefmt
Snakemake provides its own environment management/deployment as well as batch system abstraction (as a plugin). This is functionality that significantly overlaps with
tbg
and both of these are delicate issues in the fragile and diverse HPC ecosystems we run in.In the current stage, our usage does not conform to any of these standards. If our strategic decisions imply to professionalise the use of Snakemake, I think there would be value in moving more into that direction. However, my personal experience is that these things are awfully fiddly to setup on a cluster and their adoption would create tensions with
tbg
. (There are, of course, some low-hanging fruits like the adoption of the directory structure and formatter, e.g., inpre-commit
.)This issue is supposed to spark a discussion and trigger a conscious decision about such things. It is by no means intended as a suggestion to implement the above ideas. Involved in this discussion should probably be at least @psychocoderHPC, @PrometheusPi, @JessicaTiebel.
The text was updated successfully, but these errors were encountered: