Update branch #311

kliao12 · 2024-08-06T15:40:44Z

No description provided.

…ci call rate filters

switch snp and call rate filters

The `QC_Report.xlsx` was missing data for the `is_unexpected_replicate` column in the `SAMPLE_QC`, `SAMPLE_CONCORDANCE`, and `SUBJECT_QC` sheets. This issue arose from a column name change in commit `825c8a57e2003e8c5a55aded8759584f950752cc`, where `Unexpected Replicate` was renamed to `is_unexpected_replicate`. The renaming caused blank entries due to the reindexing step in `qc_report_tables.py`. During reindexing, `is_unexpected_replicate` was not found in the DataFrame because of the earlier renaming, leading to NaN values. Pandas docs state, "By default, values in the new index that do not have corresponding records in the DataFrame are assigned NaN." To resolve this, the column names have been reverted to their original title, `Unexpected Replicate`.

- Renamed parameter `sample_qc_csv` to `subject_qc_csv` to better reflect its intended use.

…C_Report.xlsx`)

…s Removed" We noticed a discrepancy in the terminology we are using in Table 4a. It uses "Expected Duplicates Removed", while other parts of the pipeline uses "Expected Replicates". So we updated Table 4a to "Expected Replicates Removed" for consistency throughout the pipeline.

bugfix: `is_unexpected_replicate` data missing in `QC_Report.xlsx`

- Removed "IdatsInProjectDir" from _SAMPLE_QC_COLUMNS and QC_Report.xlsx. - The "IdatsInProjectDir" column was empty in QC_Report.xlsx because it is not present in sample_qc.csv. - The equivalent column "is_missing_idats" is already included in QC_Report.xlsx, containing the same information from cgr_sample_sheet.csv. - Therefore, the "IdatsInProjectDir" column has been removed to avoid redundancy.

Removes "Expected Replicate" from _SAMPLE_QC_COLUMNS and thus the QC_Report.xlsx. It duplicates the more informative "Replicate IDs" column in the QC_Report.xlsx

- Introduces functionality to calculate the total number of QC issues present in a sample. - Addresses missing data in the "Count_of_QC_Issue" column of QC_Report.xlsx. - Calculates the number of "TRUE" occurrences in five boolean columns: "Low Call Rate", "Contaminated", "Expected Replicate Discordance", "Unexpected Replicate", and "Sex Discordant".

- Fills in missing data for "Sample Pass QC" column in QC_Report.xlsx. - Populates the column with a boolean value indicating if a sample passed all QC metrics: "Low Call Rate", "Contaminated", "Expected Replicate Discordance", "Unexpected Replicate", and "Sex Discordant". - A sample passes QC if `Count_of_QC_Issue` is 0 (no QC issues). - Otherwise, the sample fails and the column is set to False.

bugfix: retrieve additional missing data from `QC_Report.xlsx`

Issue 222

Caryn Willis and others added 24 commits May 28, 2024 13:26

Add GTC to VCF to PLINK functionality

42ad0e9

Add GTC to VCF to PLINK functionallity

82096ee

Add VCF entry point

eddd0e7

fix missing allele vcf issue

453a618

switch snp and call rate filters

d3d787a

change graf input to updated 1kg rsid plink files

077e834

docs: update DAG for sample QC workflow after switching sample and lo…

e017542

…ci call rate filters

Merge pull request #305 from NCI-CGR/288-call-rate-filter

3c5793e

switch snp and call rate filters

docs: Update figure text in sample_qc.rst

3425794

Fix minor typos in contamination documentation

937ecc2

Refactored _subject_qc function parameter for clarity

6bc7a69

- Renamed parameter `sample_qc_csv` to `subject_qc_csv` to better reflect its intended use.

Rename "SUBJECT_CONCORDANCE" tab to "SAMPLE_CONCORDANCE" (to match `Q…

b7c098e

…C_Report.xlsx`)

Remove poetryinstall.output cache file

eefa1af

Merge pull request #304 from NCI-CGR/issue-303-empty-qcreport-columns

210e1b8

bugfix: `is_unexpected_replicate` data missing in `QC_Report.xlsx`

Remove redundant "Expected Replicate" column

58c10c6

Removes "Expected Replicate" from _SAMPLE_QC_COLUMNS and thus the QC_Report.xlsx. It duplicates the more informative "Replicate IDs" column in the QC_Report.xlsx

Merge pull request #307 from NCI-CGR/issue-306-handle-missing-data

2bd2e41

bugfix: retrieve additional missing data from `QC_Report.xlsx`

Merge branch 'default' into issue_222

f42623c

issue_222 fix

3ee5b69

Merge pull request #309 from NCI-CGR/issue_222

4015e2c

Issue 222

kliao12 merged commit 1dac0b3 into 302-skip-merging-graf-king-and-plink-results-to-make-the-qc-pipeline-more-efficient Aug 6, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update branch #311

Update branch #311

kliao12 commented Aug 6, 2024

Update branch #311

Update branch #311

Conversation

kliao12 commented Aug 6, 2024