Conventions around `snakemake` #5103

chillenzer · 2024-09-11T12:57:36Z

In closing this PR, I had a look at the Snakemake workflows here and there are a few things I stumbled across:

There is a best practices page in the Snakemake docs suggesting amongst other things to use
- their linter snakemake --lint
- their formatter snakefmt
- and a standardised directory structure.
Snakemake provides its own environment management/deployment as well as batch system abstraction (as a plugin). This is functionality that significantly overlaps with tbg and both of these are delicate issues in the fragile and diverse HPC ecosystems we run in.

In the current stage, our usage does not conform to any of these standards. If our strategic decisions imply to professionalise the use of Snakemake, I think there would be value in moving more into that direction. However, my personal experience is that these things are awfully fiddly to setup on a cluster and their adoption would create tensions with tbg. (There are, of course, some low-hanging fruits like the adoption of the directory structure and formatter, e.g., in pre-commit.)

This issue is supposed to spark a discussion and trigger a conscious decision about such things. It is by no means intended as a suggestion to implement the above ideas. Involved in this discussion should probably be at least @psychocoderHPC, @PrometheusPi, @JessicaTiebel.

The text was updated successfully, but these errors were encountered:

JessicaTiebel · 2024-09-11T15:14:21Z

Thanks for looking into the Snakemake workflow and giving input.

For 1.:
To be honest I wasn't aware of these things. Having a code quality checker, a formatter and standardised directory structure sounds like a very good idea and on the first glance also doesn't seem to be super much work. Thanks for bringing this up.

For 2.:
I am more aware of the issues you mention here. I agree that using the batch system abstraction has a lot of advantages and makes transition between different clusters much easier. This would be a huge benefit, but also is a lot of work (also the reason I didn't do it initially). I am not sure if it is possible to abstract the tbg in a way that it is also flexible enough to work with different clusters. I am definitely open for discussions regarding this.

chillenzer added component: tools scripts, python libs and CMake component: examples PIConGPU or PMacc examples labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conventions around `snakemake` #5103

Conventions around `snakemake` #5103

chillenzer commented Sep 11, 2024

JessicaTiebel commented Sep 11, 2024

Conventions around snakemake #5103

Conventions around snakemake #5103

Comments

chillenzer commented Sep 11, 2024

JessicaTiebel commented Sep 11, 2024

Conventions around `snakemake` #5103

Conventions around `snakemake` #5103