-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add skip tools parameter for tool selection #68
base: dev
Are you sure you want to change the base?
Conversation
|
So you propose to include extensive configs defining multiple pipeline specific profiles with this pipeline? Could you please include an example of this in the PR or here as comment, then? I still struggle to wrap my head around that.
I am not aware that there is a standard. In contrast, I was baffled by the individual differences.
It is? I am aware about the docs for process selectors and for various config files, but find these priorities very hard to comprehend. Ultimately, such top-down hierarchies do not help us anyway. We would need to patch additional labels to the modules, and then somehow handle |
If by profile you mean tool profile (instead of nextflow profile) then yes more or less. For each use case, one would need to write either a params-file or a nextflow config file (in case some tools need some specific parameters). This would be left to the user though. I think for v1 it is acceptable to just run everything that doesn't need extra db arguments for instance. These files would look like this: params.json: {
"skip_tools": "seqfu_stats"
} or, eg. nanopore.config:
Well there is a standard in how configuration settings are defined in a nextflow config file and in how tool parameters are defined there too.
Well actually the configs would be provided with for instance But at this point I don't think there is much need for being able to combine different configuration files anyways. Maybe this will change in the future, but until then and as long as we only have a dozen tools or so I find it perfectly acceptable to stick to nextflow configs. Again, this PR is quite basic, but it does fulfill all our requirements given the number of tools we currently plan to have in v1.0. |
Well, that is exactly the complexity, I wanted to avoid by unifying all settings in a YAML, which is also a lot easier to customize. Particularly when using Nextflow profiles as tool profiles, we should then also stick with the standard API My main concern with the Nextflow config profiles is the Danger box here:
I fear this is easy to get wrong in combination with a poorly written institutional config. On an upside, it is indeed straightforward to test for the correct config, with
On the contrary, that is one of the main selling points of this pipeline - the in-depth knowledge of a sequencing facility. My experience from Take for example, the |
I'm not sure I understand why would anyone need to define tool profiles within Nextflow profiles? As you said, this would add unnecessary complexity, while it's perfectly fine and much safer to specify this parameter within a config file and pass it to the pipeline with And if you think we should provide some default configs for some selected applications, sure, this feature doesn't prevent that either. |
I do not know about anyone, but I know about us. We as pipeline developers need to ship the pipeline with a few default routes that can be used by changing a simple global parameter (my suggestion) or switching the config profiles (your suggestion as far as I understand) Since you suggested to harness the default Nextflow configuration capabilities, I presume you want to use the Foremost, I am not sure if both of these are future-proof, since I believe Seqera ponders about dropping support for them in future Nextflow versions, but if we use them nonetheless, we would end up with something like this to implement our Seqinspector cases/dossiers, correct?
And that would ultimately enable a UX like: That is nice. But how do we now, e.g. run the Nanopore specific tools for a Nanopore QC? Is there a difference between How is the withLabel / ext.when priority resolved in that case? I do not know, because afaik Nextflow's config documentation does not clarify this, and it matters, because either the tools run or not. |
I believe there are some misunderstandings both about the content and the scope of this PR: First about the content, this PR only adds a new parameter, Second, about the scope, this PR implements what I believe would be the minimal requirements for a first release when it comes to tool selection. It is not supposed to provide the perfect user experience, just to provide a basic way to select tools. In other words, it is only supposed to be the first step towards tool selection and to serve as a baseline to compare it to future solutions. Because the original idea is slightly more complicated (in particular when it comes to the possibility to combine profiles and how to achieve this), I believe it needs more thoughts before it is implemented, which is what I tried to motivate in the PR's description. That being said, I think being able to define and combine tool profiles is a great feature to have and that we should absolutely do it. I just think it will take some time before we implement something viable and that we should not condition v1 to having this fully implemented. |
Having worked myself into the ground with a previous profile implementation attempt due to my inability and rather diffuse Nextflow requirements, I am very sympathetic towards breaking it down to more manageable tasks. Had you been framing your PR in this way, I would have no objections and happily focused on code review only. But even after rereading this whole discussion multiple times, the misunderstanding on my behalf cannot be remedied. Perhaps I truly overestimate the scope of this PR, but why is almost your whole PR description a plea to drop the profile/case idea entirely, then? I read redundant to have, hard to motivate why, advocate etc. and my impression persists: Your code contributions are tied to a fundamental diversion from the plan that we have agreed on in the development meeting.
I cannot recall any discussion in the developmental meeting regarding zeroing in onto a release. Personally, I perceive this pipeline still ramshackle and not anywhere near the state of a worthy nf-core pipeline. There are so many more mature pipelines than this still in In either way, I have come to the conclusion that I can't ask you to implement a feature that you seemingly consider dispensable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! This looks great! @Aratz has left some minor comments for you.
This implementation is definitely simpler than what we previously discussed. I agree that skip_tools is more understandable and easier to grasp than adding another layer of configuration.
I understand the desire for the pipeline to come with default routes that are ready to use. However, as @Aratz mentioned, we could add configurations that users can utilize if they so desire.
I see the challenge in trying to combine different "modes" (avoiding the word profile here), like lean
and illumina
. In my opinion, we don't necessarily need modes like lean/extensive; the most important thing is that the tools are compatible with the data. Starting with the assumption that one config will suffice, such as the nanopore.config mentioned earlier, makes total sense to me.
I believe this implementation meets our needs, but of course, we can discuss it further at a development meeting before merging.
Additionally, I would really appreciate it if we could all maintain a friendly tone. :)
Thanks, @matrulda, for chiming in. I agree, that a fresh perspective is helpful to resolve this conflict.
From my side, I am fine with merging it prior to further discussions. My objections are not rooted in the code itself, but in the dismissal of the whole concept of modes / routes, which I believe should happen in a developmental meeting and not in a PR discussion. Personally, I still think we need to have them, but am the first to admit that it makes things very complicated. I wasted dozens of hours on that, and ultimately failed.
Do you have the impression that this was not the case? As far as I am concerned, I do indeed feel strongly about this topic, given my unfruitful previous work on that subject and my desire for perfectionism, but also carefully worded my replies to emphasize assumptions and to ensure the subjectivity is clear. I also took note that Adrien did the same, so from my site, I never felt personally attacked, just confused and estranged by our very different perceptions. But if I have misunderstood or exaggerated some statements, I would like to apologize. It was not my intention to discuss anything outside the question of how we deal with profiles/modes/routes, or to imply ad hominem traits. |
Glad to hear that :)
I understand that, but I think @Aratz was very clear that he did not dismiss that concept, but that it for now could be put aside.
Yeah, I interpreted your answers as a bit hostile. I'm happy to hear that was not your intention. We all communicate in different ways and nuances don't always get through in text. |
I agree with @matrulda that this is good enough to have something. I was wondering if we would like to have the subsampling involved in the skip_tools parameter as well? Currently, the subsampling is regulated using the |
On a scale from yay to nay, I'm also leaning more towards yay. Establishing a simple and intuitive baseline functionality, as a first step. Not disregarding any possible future implementations. If we arrive at a point where we have a dozen platform-specific tools that are mutually exclusive, it might be helpful to add some example tool configs to the assets dir, or similar. I guess we'd have to create them anyway, to be used for functional testing of the different kinds of data in our test profile. I would like this branch to pull from dev and address the newly added tools, prior to a final review and merge. |
@matrulda @alneberg @kedhammar Thanks for the feedback, I'll first fix the template update, then rebase this PR and address your comments @alneberg unfortunately, the |
Ah, I see. Yeah then we'll keep that as a parameter, and it's not really in the scope of this PR anyway. 👍 |
Hi!
I've been working on tool selection lately and tried to implement the profile-based proposal mentioned in #32 but I've found it hard to:
ext.args
together? Either way this will make debugging harder in many cases.Hence I would really advocate for a simpler solution that makes better use of functionalities already provided by Nextflow. Tool selection and customization is something many other pipelines do and I think it would be much better to try to stick to the nf-core standard. The precedence between config files is well documented and this will make the user experience better for users who are already familiar with Nextflow and other nf-core pipelines.
This PR provides a very simple solution where the user provide a list of tools to be excluded. This can be provided either through command line arguments or through a config file. As specified in Nexflow docs, cli arguments will override values defined in config files. This is similar to what's implemented in nf-core/demultiplex/.
Essentially this still makes it possible to define custom profiles. For example, one can write a
nanopore.config
, specify which tools should be skipped, and even add extra arguments to some tools withwithName:TOOL
statements (as specified in the nf-core documentation: https://nf-co.re/docs/usage/configuration#customising-tool-arguments).This doesn't mean we should completely abandon the original idea, we can always improve the current solution if we find it too limiting in the future.
Closes #32
PR checklist
nf-core pipelines lint
).nf-test test
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.CHANGELOG.md
is updated.