Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNVQ recalibration for flow based reads #8697

Merged

Conversation

ilyasoifer
Copy link
Collaborator

@ilyasoifer ilyasoifer commented Feb 16, 2024

The pull request addresses two issues:

  1. Improved and more robust parsing of FlowBasedReads. Specifically, the code now determines the minimal reportable quality
  2. New tool AddFlowSNVQuality that allows users to convert the flow-based quality format when every base quality reports probability of an insertion or deletion to a conventional format that gives base qualities (total probability of mismatch and probability of each mismatch in separate tags).

We believe that this tool is going to be important for users of the Ultima Genomics data that care about calling SNVs, especially in somatic setting, so the goal was to make documentation more accessible.

Happy to receive feedback about it

dror27 and others added 30 commits January 17, 2024 11:19
xgboost added, initial

xgboost version update

--svn-qual-model added, placeholder code too

FlowFeatureMapper refactor towards AppleSVNQR

ApplySVNQR Integration Test (placeholder)

typos fixed

model now loading

conf file reading added

initial matrix build

move snvqr feature creation to processor

xgboost matrix logging

feature implementation (partial)

non-called-base support

optimize for non-called-base independent features

apply snvqr test file update

all features initial coding

matrix build seems ok

BQ and histograms

lots of changes ...

quality interpolation

Interpolator edge condition handled

additional aliasing

--add-apply-snvqr-features

safeguard against non category boolean features

BQ length (and polarity?) fixed

"quality_interpolation_data" made optional

ApplySNVQR --model optional

FlowFeatureMapper optimization - reduced by ~70%

conf optional when no model

minor performance increase

enforce score 0 in SNVQ

BQ zero warning added

zeroCount report
Also updated the printout of flow matrix to have more digits
@ilyasoifer ilyasoifer requested a review from meganshand March 12, 2024 19:42
@ilyasoifer ilyasoifer marked this pull request as ready for review March 12, 2024 19:43
@meganshand meganshand self-assigned this Mar 12, 2024
@ilyasoifer
Copy link
Collaborator Author

@meganshand - this should be ready for your review, there is a parallel PR in picard that adds tools that can collect statistics from the output of this tool, so would be good to merge this one first
Thanks a lot!


@Advanced
@Hidden
@Argument(fullName="debug-collect-stats-into", doc = "", optional = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this debug argument here while the debug read name argument is in AddFlowSNVQualityArgumnetCollection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

private SAMFileGATKReadWriter outputWriter;

@Advanced
@Argument(fullName=INCLUDE_QC_FAILED_READS_FULL_NAME, doc = "include reads with QC failed flag", optional = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this here and the keepSupplementaryAlignments arg in the collection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved.

}

// collect input stats
if ( debugCollectStatsInto != null )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add braces to this one so it matches the others

}

private void collectOutputStats(GATKRead read) {
if ( read.hasAttribute("BQ") ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use BASE_QUALITY_ATTRIBUTE_NAME here instead of "BQ" (and below as well)

@Argument(fullName="debug-collect-stats-into", doc = "", optional = true)
public String debugCollectStatsInto = null;

public static final String BASE_QUALITY_ATTRIBUTE_NAME = "BQ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the BQ tag is defined in the spec as Offset to base alignment quality (BAQ). How much of a pain would it be to change this tag name (not sure how baked into your pipelines this is at this point).

Another option could be to put your updated base qualities in the output as the main base qualities, and move the old base qualities to the OQ tag (original qualities) which is how BQSR has worked in the past. This might make downstream tooling easier without deleting any data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilyasoifer could you.please suggest? This tool seems to be collecting stats on BQ - i.e. it is not the one producing the field. I'm not that familiar with the upstream and downstream tooling,

Copy link
Collaborator Author

@ilyasoifer ilyasoifer Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dror27 - agree with Megan that BQ is a confusing name (somehow I missed this point). We should fix this also in the picard tool in a different PR
So the desired interface is:

  1. By default the tool overrides base qualities
  2. There is a parameter that tells the tool to set a different tag rather than base qualities, but it is unset by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Tool now override base qualities by default. If --output-quality-attribute is provided, then base qualities are preserved and an attribute by the value of the parameter is set.

}
}

public void aux(int v, int auxValue) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these aux functions get used anywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

@@ -250,6 +254,33 @@ public FlowBasedRead(final SAMRecord samRecord, final String flowOrder, final in

}

public FlowBasedRead(final FlowBasedRead other, final boolean deepCopy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere? If not is it still needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

}

// access qual, convert to flow representation
final byte[] quals = samRecord.getBaseQualities();
final byte[] tp = samRecord.getSignedByteArrayAttribute(FLOW_MATRIX_TAG_NAME);
// initialize matrix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious comment?

@@ -820,9 +869,9 @@ private void applyFilteringFlowMatrix(){
private void clipProbs() {
for ( int i = 0 ; i < getMaxHmer(); i++ ) {
for ( int j =0; j < getNFlows(); j++) {
if ((flowMatrix[i][j] <= fbargs.probabilityRatioThreshold) &&
if ((flowMatrix[i][j] <= perHmerMinErrorProb*3) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did this *3 come from? Why is it needed here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified the function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to add this file?

@meganshand meganshand assigned dror27 and unassigned meganshand Mar 15, 2024
@dror27
Copy link
Contributor

dror27 commented Mar 25, 2024

@meganshand - thank you for your comments. I have responded and updated. We have just a few left between @ilyasoifer and myself - then you can have it back for another round.

@ilyasoifer
Copy link
Collaborator Author

ilyasoifer commented Mar 28, 2024

@dror27 - I addressed the comments that were assigned to me. There is some documentation task left and that failing test and then we are ready, I think.

@dror27
Copy link
Contributor

dror27 commented Mar 31, 2024

@dror27 - I addressed the comments that were assigned to me. There is some documentation task left and that failing test and then we are ready, I think.

I'm finding out that the failing test is due to different floating point accuracy between the platform on which the expected file is generated and the one running the test. I will change the way the expected and generated files are changed against to accommodate for that

< -5.6956674728026115
< -5.6956674728026115
< -5.6956674728026115
---
> -5.695667472802612
> -5.695667472802612
> -5.695667472802612

@ilyasoifer ilyasoifer requested a review from meganshand March 31, 2024 12:27
@ilyasoifer
Copy link
Collaborator Author

@meganshand - I think that we addressed all your comments, could you take another look please?

@meganshand meganshand self-assigned this Apr 1, 2024
Copy link
Contributor

@meganshand meganshand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good. I just had a few small suggestions for documentation.

/**
* alternate quality attribute to set instead of the usual quality string
*/
@Argument(fullName = OUTPUT_QUALITY_ATTRIBUTE_FULL_NAME, doc = "alternate quality attribute to set instead of the usual quality string.", optional = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Argument(fullName = OUTPUT_QUALITY_ATTRIBUTE_FULL_NAME, doc = "alternate quality attribute to set instead of the usual quality string.", optional = true)
@Argument(fullName = OUTPUT_QUALITY_ATTRIBUTE_FULL_NAME, doc = "alternate SAM tag to put original quality scores instead of overwriting the QUAL field. If not used, QUAL will be overwritten.", optional = true)

public SnvqModeEnum snvMode = SnvqModeEnum.Geometric;

/**
* alternate quality attribute to set instead of the usual quality string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* alternate quality attribute to set instead of the usual quality string
* By default this tool overwrites the QUAL field with the new qualities. Setting this argument saves the original qualities in the specified SAM tag.

}

@VisibleForTesting
protected double[] generateHmerBaseErrorProbabilities(final int[] key, final double[][] errorProbBands, final int flow,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking, this comment was added to AddFlowBaseQuality rather than here, right?

import java.util.List;

/**
* Set of arguments for the {@link FlowFeatureMapper}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Set of arguments for the {@link FlowFeatureMapper}
* Set of arguments for {@link AddFlowSNVQuality}

@meganshand meganshand removed their assignment Apr 1, 2024
Copy link
Contributor

@meganshand meganshand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'll merge when tests pass

@meganshand meganshand merged commit 6739e6d into broadinstitute:master Apr 4, 2024
20 checks passed
@ilyasoifer ilyasoifer deleted the ultima.snv-qual.develop.rebase branch July 26, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants