-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update branch #311
Merged
kliao12
merged 24 commits into
302-skip-merging-graf-king-and-plink-results-to-make-the-qc-pipeline-more-efficient
from
default
Aug 6, 2024
Merged
Update branch #311
kliao12
merged 24 commits into
302-skip-merging-graf-king-and-plink-results-to-make-the-qc-pipeline-more-efficient
from
default
Aug 6, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ci call rate filters
switch snp and call rate filters
The `QC_Report.xlsx` was missing data for the `is_unexpected_replicate` column in the `SAMPLE_QC`, `SAMPLE_CONCORDANCE`, and `SUBJECT_QC` sheets. This issue arose from a column name change in commit `825c8a57e2003e8c5a55aded8759584f950752cc`, where `Unexpected Replicate` was renamed to `is_unexpected_replicate`. The renaming caused blank entries due to the reindexing step in `qc_report_tables.py`. During reindexing, `is_unexpected_replicate` was not found in the DataFrame because of the earlier renaming, leading to NaN values. Pandas docs state, "By default, values in the new index that do not have corresponding records in the DataFrame are assigned NaN." To resolve this, the column names have been reverted to their original title, `Unexpected Replicate`.
- Renamed parameter `sample_qc_csv` to `subject_qc_csv` to better reflect its intended use.
…s Removed" We noticed a discrepancy in the terminology we are using in Table 4a. It uses "Expected Duplicates Removed", while other parts of the pipeline uses "Expected Replicates". So we updated Table 4a to "Expected Replicates Removed" for consistency throughout the pipeline.
bugfix: `is_unexpected_replicate` data missing in `QC_Report.xlsx`
- Removed "IdatsInProjectDir" from _SAMPLE_QC_COLUMNS and QC_Report.xlsx. - The "IdatsInProjectDir" column was empty in QC_Report.xlsx because it is not present in sample_qc.csv. - The equivalent column "is_missing_idats" is already included in QC_Report.xlsx, containing the same information from cgr_sample_sheet.csv. - Therefore, the "IdatsInProjectDir" column has been removed to avoid redundancy.
Removes "Expected Replicate" from _SAMPLE_QC_COLUMNS and thus the QC_Report.xlsx. It duplicates the more informative "Replicate IDs" column in the QC_Report.xlsx
- Introduces functionality to calculate the total number of QC issues present in a sample. - Addresses missing data in the "Count_of_QC_Issue" column of QC_Report.xlsx. - Calculates the number of "TRUE" occurrences in five boolean columns: "Low Call Rate", "Contaminated", "Expected Replicate Discordance", "Unexpected Replicate", and "Sex Discordant".
- Fills in missing data for "Sample Pass QC" column in QC_Report.xlsx. - Populates the column with a boolean value indicating if a sample passed all QC metrics: "Low Call Rate", "Contaminated", "Expected Replicate Discordance", "Unexpected Replicate", and "Sex Discordant". - A sample passes QC if `Count_of_QC_Issue` is 0 (no QC issues). - Otherwise, the sample fails and the column is set to False.
bugfix: retrieve additional missing data from `QC_Report.xlsx`
Issue 222
kliao12
merged commit Aug 6, 2024
1dac0b3
into
302-skip-merging-graf-king-and-plink-results-to-make-the-qc-pipeline-more-efficient
2 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.