Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNVQ recalibration for flow based reads #8697

Merged
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
0ce6afe
ApplySNVQR tool squashed
dror27 Nov 14, 2023
bf5d2d8
[BIOIN-1385] FlowBasedRead now can estimate filling_value from the read
ilyasoifer Dec 27, 2023
876a4e6
[BIOIN-1385] Removed probabilityRatioThreshold
ilyasoifer Dec 27, 2023
86d3ae4
[BIOIN-1385] Removed mistaken usage by fillingValue
ilyasoifer Dec 28, 2023
c5e2ad8
[BIOIN-1385] Fixed bug in t0 parsing
ilyasoifer Dec 30, 2023
fcbbd3c
[BIOIN-1385] Reverted the aggressive change in filling probability in…
ilyasoifer Jan 1, 2024
ba29b3c
[BIOIN-1385] Minor update of poorly_modelded_reads
ilyasoifer Jan 1, 2024
ac66953
[BIOIN-1385] FlowBasedRead class cleanup
ilyasoifer Jan 2, 2024
8d9782f
[BIOIN-1385] Updated ground truth scorer for the case when the fillin…
ilyasoifer Jan 2, 2024
a032030
Fixed typing error + missed file
ilyasoifer Jan 3, 2024
626a458
ceil -> round
ilyasoifer Jan 3, 2024
b4d8341
Set phred score for bins with no zeros to maximum rather than zero
ilyasoifer Jan 5, 2024
2d69e85
Updated tests
ilyasoifer Jan 5, 2024
92aab19
Updated tests
ilyasoifer Jan 5, 2024
c564dc6
Updated tests
ilyasoifer Jan 5, 2024
51ebdb5
-10 qual fix
dror27 Jan 18, 2024
a7947b9
Guard against negative qualities
dror27 Jan 21, 2024
5e6da62
(not final) reuse AddFlowBaseQuality code to accelerate
dror27 Jan 22, 2024
fcb1f83
--debug-use-add-flow-base-quality-algorithm flasg
dror27 Jan 23, 2024
8f17ea5
use of add_base_q now default
dror27 Jan 23, 2024
8634c11
ApplySNVQR test data rebuilt
dror27 Jan 23, 2024
9ba77d3
add_base_q integration in progress ... missing mapping function
dror27 Jan 23, 2024
44b6763
allow unsorted/unmapped input to ApplySNVQR when no model specified
dror27 Jan 24, 2024
2f478de
SNVQ of called base adjusted to sum to 1
dror27 Jan 24, 2024
13aa6ee
update called base snvq
dror27 Jan 25, 2024
5850557
--limit-phred-score replaces --limit-score
dror27 Jan 28, 2024
d6707d2
flow flags passed to add base quality
dror27 Jan 29, 2024
d0e4a0f
BQ adjusted
dror27 Jan 29, 2024
f79da59
Fixed boundary case
ilyasoifer Jan 31, 2024
b2337dd
Snvq modes added. see --snvq-mode param
dror27 Feb 4, 2024
78818b4
SNVq corrected (all 80 issue)
dror27 Feb 5, 2024
f081cb5
Add missing X_ADJACENT_REF_DIFF
dror27 Feb 19, 2024
1a8b734
Tool renamed to AddFlowSNVQuality and model/conf removed
dror27 Feb 26, 2024
1461c82
further cleanup tests
dror27 Feb 26, 2024
1170d4b
FlowFeatureMapper changes removed
dror27 Feb 26, 2024
7da3aa6
expected results adjusted
dror27 Feb 26, 2024
2b25b21
snvq-mode removed. Geometric mode is one used
dror27 Mar 3, 2024
3cb409c
some comments addressed
dror27 Mar 3, 2024
94d4fe7
Update src/main/java/org/broadinstitute/hellbender/tools/walkers/feat…
dror27 Mar 3, 2024
4afc042
Update src/main/java/org/broadinstitute/hellbender/tools/walkers/feat…
dror27 Mar 3, 2024
4f8320e
0.0001 to keep consistent with GATK
dror27 Mar 3, 2024
9c17210
Merge branch 'ultima.snv-qual.develop.rebase' of github.com:Ultimagen…
dror27 Mar 3, 2024
5577f79
includeDupReads removed
dror27 Mar 3, 2024
ef7ea5e
additional updates
dror27 Mar 3, 2024
eacc100
DEFAULT_FILLING_VALUE restored to 0.001
dror27 Mar 11, 2024
a2b7f54
snvq-mode restored
dror27 Mar 11, 2024
06284b0
AddFlowSNVQuality integeration test expected data adjusted
Mar 16, 2024
4ee1255
fix comment spec on expected output
dror27 Mar 16, 2024
bc0ff83
some of the comments addressed
dror27 Mar 18, 2024
ff06e36
additional comments addressed
dror27 Mar 21, 2024
403bd29
additional comments addressed
dror27 Mar 25, 2024
a6afd8e
Clarified function
ilyasoifer Mar 28, 2024
516c152
limit scores precision to regain platform independence
dror27 Mar 31, 2024
4ee5b6d
documented generateHmerBaseErrorProbabilities
dror27 Mar 31, 2024
d9c090b
AddFlowSNVQuality now overrides read base quality by default
dror27 Mar 31, 2024
0c73917
additional doc comments addressed
dror27 Apr 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package org.broadinstitute.hellbender.tools.walkers.featuremapping;

import org.broadinstitute.barclay.argparser.Advanced;
import org.broadinstitute.barclay.argparser.Argument;
import org.broadinstitute.barclay.argparser.Hidden;

Expand All @@ -11,6 +12,13 @@
*/
public class AddFlowSNVQualityArgumentCollection implements Serializable{
private static final long serialVersionUID = 1L;
public static final String MAX_PHRED_SCORE_FULL_NAME = "max-phred-score";
public static final String KEEP_SUPPLEMENTARY_ALIGNMENTS_FULL_NAME = "keep-supplementary-alignments";
public static final String INCLUDE_QC_FAILED_READ_FULL_NAME = "include-qc-failed-read";
public static final String SNVQ_MODE_FULL_NAME = "snvq-mode";
public static final String OUTPUT_QUALITY_ATTRIBUTE_FULL_NAME = "output-quality-attribute";
public static final String DEBUG_READ_NAME_FULL_NAME = "debug-read-name";
public static final String DEBUG_COLLECT_STATS_INTO_FULL_NAME = "debug-collect-stats-into";

public enum SnvqModeEnum {
Legacy,
Expand All @@ -23,25 +31,40 @@ public enum SnvqModeEnum {
* maximum value for
* delta in score
**/
@Argument(fullName = "limit-phred-score", doc = "Limit value for phred scores", optional = true)
public double limitPhredScore = Double.NaN;
@Argument(fullName = MAX_PHRED_SCORE_FULL_NAME, doc = "Limit value for phred scores", optional = true)
public double maxPhredScore = Double.NaN;

/**
* keep supplementary alignments?
**/
@Argument(fullName = "keep-supplementary-alignments", doc = "keep supplementary alignments ?", optional = true)
@Argument(fullName = KEEP_SUPPLEMENTARY_ALIGNMENTS_FULL_NAME, doc = "keep supplementary alignments ?", optional = true)
public boolean keepSupplementaryAlignments = true;

@Advanced
@Argument(fullName= INCLUDE_QC_FAILED_READ_FULL_NAME, doc = "include reads with QC failed flag", optional = true)
public boolean includeQcFailedReads = true;

/**
* snvq computation mode
*/
@Argument(fullName = "snvq-mode", doc = "ksnvq computation mode", optional = true)
@Argument(fullName = SNVQ_MODE_FULL_NAME, doc = "snvq calculation mode.", optional = true)
public SnvqModeEnum snvMode = SnvqModeEnum.Geometric;

/**
* alternate quality attribute to set instead of the usual quality string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* alternate quality attribute to set instead of the usual quality string
* By default this tool overwrites the QUAL field with the new qualities. Setting this argument saves the original qualities in the specified SAM tag.

*/
@Argument(fullName = OUTPUT_QUALITY_ATTRIBUTE_FULL_NAME, doc = "alternate quality attribute to set instead of the usual quality string.", optional = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Argument(fullName = OUTPUT_QUALITY_ATTRIBUTE_FULL_NAME, doc = "alternate quality attribute to set instead of the usual quality string.", optional = true)
@Argument(fullName = OUTPUT_QUALITY_ATTRIBUTE_FULL_NAME, doc = "alternate SAM tag to put original quality scores instead of overwriting the QUAL field. If not used, QUAL will be overwritten.", optional = true)

public String outputQualityAttribute = null;

/**
* debug read names?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think more comments on all of these arguments would be useful. For example, this doc should be more like:

Read names of reads to output details of as part of debugging. 
Statistics about the reads will be output to xxx filename.

Copy link
Contributor

@dror27 dror27 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thank you for the texts

**/
@Hidden
@Argument(fullName = "debug-read-name", doc = "debug specific reads?", optional = true)
@Argument(fullName = DEBUG_READ_NAME_FULL_NAME, doc = "Read names of reads to output details of as part of debugging. ", optional = true)
public List<String> debugReadName = null;

@Advanced
@Hidden
@Argument(fullName= DEBUG_COLLECT_STATS_INTO_FULL_NAME, doc = "Statistics about the reads will be output to given filename.", optional = true)
public String debugCollectStatsInto = null;
}
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,25 @@ private static double[][] extractErrorProbBands(final FlowBasedRead flowRead, fi
return result;
}

/**
* The following functions estimate the error probability for an hmer. specifically two error
* probability values are generated: one for the first base of the hmer and another for the
* rest of its bases.
*
* The computation itself is performed in a subsequent function: generateSidedHmerBaseErrorProbability
* It iterates over the possible valid combinations of errors and sums them up.
*
* @param key - key (hmer length) in flow space
* @param errorProbBands - for each flow (position in the key) three error probabilities are provided:
* [0] - for the hmer being one base shorter
* [1] - for the hmer to be at its length
* [2] - for the hmer to be one base longer
* @param flow - the flow (index) for which to generate the probabilities (0 <= flow < key.length)
* @param flowOrderLength - the cycle length of of the flow order (usually 4)
* @return an array of two probabilities:
* [0] - probability for the first base of the hmer
* [1] - probability for the rest of the bases of the hmer
*/
@VisibleForTesting
protected static double[] generateHmerBaseErrorProbabilities(final int[] key, final double[][] errorProbBands, final int flow, final int flowOrderLength) {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -858,9 +858,9 @@ else if ( hasQ || hasZ ) {
cols.put("ReadName", read.getName());

// haplotypes and reference scores
cols.put("PaternalHaplotypeScore", paternal.score);
cols.put("MaternalHaplotypeScore", maternal.score);
cols.put("RefHaplotypeScore", refScore);
cols.put("PaternalHaplotypeScore", String.format("%.6f", paternal.score));
cols.put("MaternalHaplotypeScore", String.format("%.6f", maternal.score));
cols.put("RefHaplotypeScore", String.format("%.6f", refScore));

// build haplotype keys
final FlowBasedReadUtils.ReadGroupInfo rgInfo = FlowBasedReadUtils.getReadGroupInfo(getHeaderForReads(), read);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,21 +67,6 @@ public void add(double v) {
}
}

public void aux(int v, int auxValue) {
aux((double)v, auxValue);
}

public void aux(double v, int auxValue) {

if ( auxBins.containsKey(v) ) {
auxBins.get(v).add(auxValue);
} else {
SeriesStats ss = new SeriesStats();
ss.add(auxValue);
auxBins.put(v, ss);
}
}

public double getLast() {
return last;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -254,33 +254,6 @@ public FlowBasedRead(final SAMRecord samRecord, final String flowOrder, final in

}

public FlowBasedRead(final FlowBasedRead other, final boolean deepCopy) {
super(other.samRecord);
if ( deepCopy ) {
forwardSequence = other.forwardSequence.clone();
key = other.key.clone();
flow2base = other.flow2base.clone();
flowOrder = other.flowOrder.clone();
flowMatrix = other.flowMatrix.clone();
} else {
forwardSequence = other.forwardSequence;
key = other.key;
flow2base = other.flow2base;
flowOrder = other.flowOrder;
flowMatrix = other.flowMatrix;
}
maxHmer = other.maxHmer;
validKey = other.validKey;
direction = other.direction;
baseClipped = other.baseClipped;
trimLeftBase = other.trimLeftBase;
trimRightBase = other.trimRightBase;
fbargs = other.fbargs;
readInsQuals = other.readInsQuals;
readDelQuals = other.readDelQuals;
overallGCP = other.overallGCP;
}

//since the last unclipped flow is uncertain (we give high probabilities to
//also hmers higher than the called hmer)
private void spreadFlowLengthProbsAcrossCountsAtFlow(final int flowToSpread) {
Expand Down Expand Up @@ -332,8 +305,6 @@ private void readFlowMatrix(final String _flowOrder) {
// access qual, convert to flow representation
final byte[] quals = samRecord.getBaseQualities();
final byte[] tp = samRecord.getSignedByteArrayAttribute(FLOW_MATRIX_TAG_NAME);
// initialize matrix


boolean specialTreatmentForZeroCalls = false;
final byte[] t0 = SAMUtils.fastqToPhred(samRecord.getStringAttribute(FLOW_MATRIX_T0_TAG_NAME));
Expand Down Expand Up @@ -864,12 +835,15 @@ private void applyFilteringFlowMatrix(){
}

/**
* clip probability values to fbargs.probabilityRatioThreshold
* clip probability values in a way that probability between perHmerMinErrorProb and 3*perHmerMinErrorProb
* is automatically clipped to perHmerMinErrorProb. This is done to avoid issues with rounding of
* small error probabilities in the basecalling
*/
private void clipProbs() {
double probabilityThreshold = perHmerMinErrorProb*3;
for ( int i = 0 ; i < getMaxHmer(); i++ ) {
for ( int j =0; j < getNFlows(); j++) {
if ((flowMatrix[i][j] <= perHmerMinErrorProb*3) &&
if ((flowMatrix[i][j] <= probabilityThreshold) &&
(key[j]!=i)) {
flowMatrix[i][j] = perHmerMinErrorProb;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -480,4 +480,10 @@ public static CycleSkipStatus getCycleSkipStatus(FlowBasedRead read, ReferenceCo
return status;
}

public static int calcFlowOrderLength(String flowOrder) {

final int i = flowOrder.indexOf(flowOrder.charAt(0), 1);

return (i < 0) ? flowOrder.length() : i;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ public void testBasic() throws IOException {
"-O", outputFile.getAbsolutePath(),
"-I", publicTestDir + FlowTestConstants.ADD_FLOW_SNVQ_DATA_DIR + "/add_flow_snvq_input.bam",
"-L", "chr1:1-15000",
"--limit-phred-score", "50",
"--max-phred-score", "50",
"--debug-read-name", "30020185_2-UGAv3-182-1989782468",
"--verbosity", "INFO"
};
Expand All @@ -45,7 +45,38 @@ public void testBasic() throws IOException {

// walk the output and expected files, compare non-comment lines
if ( !UPDATE_EXACT_MATCH_EXPECTED_OUTPUTS ) {
IntegrationTestSpec.assertEqualTextFiles(outputFile, expectedFile, "@PG");
IntegrationTestSpec.assertEqualTextFiles(outputFile, expectedFile, "@");
}
}

@Test
public void testBasicBQ() throws IOException {

final File outputDir = createTempDir("testAddFlowSNVQTest");
final String filename = "add_flow_snvq_output_bq.sam";
final File expectedFile = new File(testDir + "/" + filename);
final File outputFile = UPDATE_EXACT_MATCH_EXPECTED_OUTPUTS ? expectedFile : new File(outputDir + "/" + filename);

final String[] args = new String[] {
"-R", largeFileTestDir + "/Homo_sapiens_assembly38.fasta.gz",
"-O", outputFile.getAbsolutePath(),
"-I", publicTestDir + FlowTestConstants.ADD_FLOW_SNVQ_DATA_DIR + "/add_flow_snvq_input.bam",
"-L", "chr1:1-15000",
"--max-phred-score", "50",
"--debug-read-name", "30020185_2-UGAv3-182-1989782468",
"--verbosity", "INFO",
"--output-quality-attribute", "BQ"
};

// run the tool
runCommandLine(args); // no assert, just make sure we don't throw

// make sure we've generated the output file
Assert.assertTrue(outputFile.exists());

// walk the output and expected files, compare non-comment lines
if ( !UPDATE_EXACT_MATCH_EXPECTED_OUTPUTS ) {
IntegrationTestSpec.assertEqualTextFiles(outputFile, expectedFile, "@");
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
package org.broadinstitute.hellbender.tools.walkers.featuremapping;

import org.broadinstitute.hellbender.testutils.BaseTest;
import org.testng.Assert;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;

public class AddFlowSNVQualityUnitTest extends BaseTest {

@DataProvider(name = "sliceIsValidDataProvider")
Object[][] sliceIsValidDataProvider() {
return new Object[][] {
{ new int[] {0, 0, 0, 0, 0, 0}, 4, false },
{ new int[] {0, 0, 1, 0, 0, 1}, 4, true },
{ new int[] {1, 0, 0, 0, 1, 1}, 4, false },
};
}

@Test(dataProvider = "sliceIsValidDataProvider")
void testSliceIsValid(final int[] slice, final int flowOrderLength, final boolean isValid) {
Assert.assertEquals(AddFlowSNVQuality.sliceIsValidForConsideration(slice, flowOrderLength), isValid);
}

@DataProvider(name = "getSnvqDataProvider")
Object[][] getSnvqDataProvider() {
return new Object[][] {
{ 0.5, 0.2, 0.3, AddFlowSNVQualityArgumentCollection.SnvqModeEnum.Legacy, 0.5 },
{ 0.5, 0.2, 0.3, AddFlowSNVQualityArgumentCollection.SnvqModeEnum.Optimistic, 0.06 },
{ 0.5, 0.2, 0.3, AddFlowSNVQualityArgumentCollection.SnvqModeEnum.Pessimistic, 0.44 },
{ 0.5, 0.2, 0.3, AddFlowSNVQualityArgumentCollection.SnvqModeEnum.Geometric, Math.sqrt(0.06 * 0.44) },
};
}

@Test(dataProvider = "getSnvqDataProvider")
void testGetSnvq(final double sliceP, final double p1, final double p2, AddFlowSNVQualityArgumentCollection.SnvqModeEnum snvMode, final double expected) {
Assert.assertEquals(AddFlowSNVQuality.getSnvq(sliceP, p1, p2, snvMode), expected, 0.0001);
}

}
3 changes: 0 additions & 3 deletions src/test/resources/large/add_flow_snvq/.gitattributes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to add this file?

This file was deleted.

Git LFS file not shown
Git LFS file not shown
4 changes: 2 additions & 2 deletions src/test/resources/large/groundTruth/ground_truth_output.csv
Git LFS file not shown
Git LFS file not shown
Loading