Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Check & update of configuration defaults (first part) #532

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ericblanc20
Copy link
Contributor

First attempt. Only ngs_mapping & somatic_variant calling are used.

IMPORTANT NOTE: The default config is now out of sync with the STAR environment version. Is that OK to change the STAR version in the wrapper's environment, or should I create another PR?

@ericblanc20 ericblanc20 requested a review from tedil July 5, 2024 13:46
@ericblanc20 ericblanc20 linked an issue Jul 5, 2024 that may be closed by this pull request
8 tasks
Copy link

github-actions bot commented Jul 5, 2024

  • Please format your Python code with ruff: make fmt
  • Please check your Python code with ruff: make check
  • Please format your Snakemake code with snakefmt: make snakefmt

You can trigger all lints locally by running make lint

Copy link
Member

@tedil tedil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having all these examples will be really helpful to people getting started with the pipeline (in the CUBI environment)!
The config yamls generated will have quite long lines, though ;) but I'd rather have very long comments providing information and context than no information at all.

features:
path: /data/cephfs-1/work/projects/cubit/current/static_data/annotation/GENCODE/19/GRCh37/gencode.v19.annotation.gtf

# Step Configuration ==============================================================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're at it, we can correct the headers, which all just say "Step Configuration"

Comment on lines 36 to 46
# static_data_config:
# cosmic:
# path: /data/cephfs-1/work/projects/cubit/current/static_data/db/COSMIC/v90/GRCh38/CosmicAll.vcf.gz
# dbnsfp:
# path: /data/cephfs-1/work/projects/cubit/current/static_data/db/dbNSFP/3.5/GRCh38/dbNSFP.txt.gz
# dbsnp:
# path: /data/cephfs-1/work/projects/cubit/current/static_data/db/dbSNP/b147/GRCh38/common_all_20160407.vcf.gz
# reference:
# path: /fast/work/groups/cubi/projects/biotools/static_data/reference/GRCh38.d1.vd1/GRCh38.d1.vd1.fa
# features:
# path: /fast/work/groups/cubi/projects/biotools/static_data_by_ref/GRCh38/annotation/GENCODE/36/gencode.v36.primary_assembly.annotation.gtf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just the GRCh38 version for static_data_config defaults, correct? That should be mentioned in the description.

@@ -43,6 +43,9 @@ class TargetCoverageReportEntry(SnappyModel):
- name: IDT_xGen_V1_0
pattern: "xGen Exome Research Panel V1\\.0*"
path: "path/to/targets.bed"

Bed file for many Agilent exome panels can be found in
/fast/work/groups/cubi/projects/biotools/static_data/exome_panel/Agilent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing comment (not the one you added) seem to be incorrect, as I'm quite sure the path will be mapped to the name "IDT_xGen_V1_0" not "default" ;)
And I'm also not sure whether the pattern actually matches what the description claims.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here: the pattern is a correct (albeit strange) regular expression, and a biomedsheet entry matching such pattern is associated with the name, which is finally mapped to a path.

path_baits: str
"""
Different exome panels cannot be accomodated here, because the selection method used for coverage is not used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that the target_coverage sub-step will pick the name of the panel from the Samplesheet, and then map it to some actual files using this weird mapping scheme implemented in the __init__.py code. This allows multiple exome kits within the same dataset.
It is not the case with the path_baits in the somatic/mbcs bit. Here, we directly point to the exome kit bed file. There is no querying the samplesheet.

@coveralls
Copy link

Coverage Status

coverage: 85.869% (+0.07%) from 85.8%
when pulling 4fe223b on 528-check-and-update-config-defaults
into 17c1a87 on main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check and update config defaults
3 participants