Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update branch #311

Conversation

kliao12
Copy link
Contributor

@kliao12 kliao12 commented Aug 6, 2024

No description provided.

Caryn Willis and others added 24 commits May 28, 2024 13:26
The `QC_Report.xlsx` was missing data for the `is_unexpected_replicate`
column in the `SAMPLE_QC`, `SAMPLE_CONCORDANCE`, and `SUBJECT_QC`
sheets. This issue arose from a column name change in commit
`825c8a57e2003e8c5a55aded8759584f950752cc`, where `Unexpected Replicate`
was renamed to `is_unexpected_replicate`.

The renaming caused blank entries due to the reindexing step in
`qc_report_tables.py`. During reindexing, `is_unexpected_replicate` was
not found in the DataFrame because of the earlier renaming, leading to
NaN values. Pandas docs state, "By default, values in the new index that
do not have corresponding records in the DataFrame are assigned NaN."

To resolve this, the column names have been reverted to their original
title, `Unexpected Replicate`.
- Renamed parameter `sample_qc_csv` to `subject_qc_csv` to better reflect its intended use.
…s Removed"

We noticed a discrepancy in the terminology we are using in Table 4a.
It uses "Expected Duplicates Removed", while other parts of the pipeline
uses "Expected Replicates". So we updated Table 4a to
"Expected Replicates Removed" for consistency throughout the pipeline.
bugfix: `is_unexpected_replicate` data missing in `QC_Report.xlsx`
- Removed "IdatsInProjectDir" from _SAMPLE_QC_COLUMNS and QC_Report.xlsx.
- The "IdatsInProjectDir" column was empty in QC_Report.xlsx because it
  is not present in sample_qc.csv.
- The equivalent column "is_missing_idats" is already included in
  QC_Report.xlsx, containing the same information from cgr_sample_sheet.csv.
- Therefore, the "IdatsInProjectDir" column has been removed to avoid
  redundancy.
Removes "Expected Replicate" from _SAMPLE_QC_COLUMNS and thus the
QC_Report.xlsx. It duplicates the more informative "Replicate IDs"
column in the QC_Report.xlsx
- Introduces functionality to calculate the total number of QC issues
  present in a sample.
- Addresses missing data in the "Count_of_QC_Issue" column of
  QC_Report.xlsx.
- Calculates the number of "TRUE" occurrences in five boolean columns:
    "Low Call Rate",
    "Contaminated",
    "Expected Replicate Discordance",
    "Unexpected Replicate",
    and "Sex Discordant".
- Fills in missing data for "Sample Pass QC" column in QC_Report.xlsx.
- Populates the column with a boolean value indicating if a sample passed all QC metrics:
       "Low Call Rate",
       "Contaminated",
       "Expected Replicate Discordance",
       "Unexpected Replicate",
       and "Sex Discordant".
- A sample passes QC if `Count_of_QC_Issue` is 0 (no QC issues).
- Otherwise, the sample fails and the column is set to False.
bugfix: retrieve additional missing data from `QC_Report.xlsx`
@kliao12 kliao12 merged commit 1dac0b3 into 302-skip-merging-graf-king-and-plink-results-to-make-the-qc-pipeline-more-efficient Aug 6, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants