Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAGEN: use sample name cleaning patterns instead of hardcoded ones #1994

Merged
merged 10 commits into from
Aug 25, 2023

Conversation

vladsavelyev
Copy link
Member

@vladsavelyev vladsavelyev commented Aug 21, 2023

Add fn_clean_exts for DRAGEN sample name cleaning, instead of hardcoding patterns into the code.

Fixes #1865

@vladsavelyev vladsavelyev added the awaits-review Awaiting final review and merge. label Aug 23, 2023
@ewels
Copy link
Member

ewels commented Aug 24, 2023

Suggestion is to have a top level config attribute called dragen_config with the search patterns under there, rather than changing the search pattern config structure.

So something like:

dragen_config:
  filename_suffixes:
    - _tumor

@vladsavelyev
Copy link
Member Author

Actually, there is no need to have _tumor and _normal as the cleaning patterns at all, since s_name = s_name.split(ext["pattern"], 1)[0] strips a suffix even if it's in the middle of a sample name (that feels even more dangerous, though!).

So removing merging the list of DRAGEN cleaning patterns with the main least, and removing _tumor and _normal from it.

@vladsavelyev vladsavelyev requested review from ewels and removed request for ewels August 24, 2023 16:05
@vladsavelyev
Copy link
Member Author

@ewels, good to take another look now.

Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks 👌🏻 It was a good discussion earlier, I'm still a little on the fence! Let's keep it in the back of our minds when we do some of the heavier refactoring in v2. Happy to reconsider the best approach.

I think there are a couple which are maybe a bit risky, in that I could imagine them legitimately coming up in some sample names (.time, .gc, .vc) - would it work to shift them to the other config where they're only removed from the end of sample names and not the middle onwards? I've done that to compromise with other similarly risky ones in the past.

multiqc/utils/config_defaults.yaml Outdated Show resolved Hide resolved
@vladsavelyev vladsavelyev merged commit a073ecb into master Aug 25, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaits-review Awaiting final review and merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DRAGEN changing name is not working
2 participants