Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pe 1832 critical issue in 3rd party picard gcbias update module #1

Open
wants to merge 74 commits into
base: master
Choose a base branch
from

Conversation

max-l-weaver
Copy link

@max-l-weaver max-l-weaver commented Sep 5, 2024

This is a straight merge from upstream master into this branch.

The commenting out of R_SCRIPT from previous master has been retained.

I've done a very rudimentary grep of R_SCRIPT in both branches and they appear to have the same number of results, but this may need someone with more background to double check R hasn';t been introduced somewhere we haven't commented out.

lbergelson and others added 30 commits April 6, 2022 16:43
* Add a new interval list scatter mode to avoid issue of giant final list
in large joint genotyping scatters
* Fix code GenotypeConcordance code that is sensitive to Allele ordering in sets.
* Upgrade to htsjdk 3.0.0.
…roadinstitute#1823)

Add new INPUT_INDEX_MAP flag to allow a table of paths -> index files for use cases when an index is desired but not adjacent to the VCF/SAM files to be processed. Also added REQUIRE_INDEX_FILES flag to force the tool to only run when index files are either adjacent to the input files, or explicitly provided via a map. Tests were added to ensure the tool functions as desired.
…disclosure (broadinstitute#1829)

* vulnerability fix: Temporary Directory Hijacking or Information Disclosure from temp file creation


Co-authored-by: Moderne <[email protected]>
Co-authored-by: Jonathan Leitschuh <[email protected]>
* update htsjdk to 3.0.1 and replace use of a newly deprecated method
* fix mismatching interval_file in tests
…te#1837)

Updated the issue template to link to our support forum
* One of the filters that LiftoverVcf outputs was misspelled:  Should be "Intervals" instead of "Intevals"
  IndelStraddlesMultipleIntevals -> IndelStraddlesMultipleIntervals 
* fixes broadinstitute#1841
* Stop removing periods from the end of output names

* Update CheckFingerprint
* This PR skips avoids merging fingerprints when there's only one fingerprint to be merged. The merge operation performs a deep copy of the haplotype probabilities array and effectively doubles the memory requirements of the program after having loaded the fingerprints from the sources.
* It also skips merging on the fingerprinting ids though it probably doesn't take much memory.

Co-authored-by: Louis Bergelson <[email protected]>
* replacing the .travis.yml with a Github Actions test script
* This does everything that the travis script did except for coveralls / artifactory upload which can be handled later
…ang3 (broadinstitute#1856)

* Convert references to commons-lang to commons -lang3 and remove the dependency
* This has moved to both htsjdk and barclay and is therefore unnecessary to have a copy of here.
* Downstream users should replace with org.broadinstitute.barclay.argparser.ClassFinder
* The Google Genomics API was turned off years ago. This removes a global
argument (GA4GH_CLIENT_SECRETS) which was used with it and several initialization
steps in CommandLineProgram which are no longer meaningful.
* Updates to CrosscheckFingerprints documentation.
---------

Co-authored-by: Samuel Lee <[email protected]>
Co-authored-by: George Grant <[email protected]>
Co-authored-by: tmelman <[email protected]>
Co-authored-by: kachulis <[email protected]>
…dinstitute#1871)

* Create a (mostly) empty metrics file instead of giving an unhelpful stacktrace if PF_READS == 0.

* Added check for the PF_READS == 0 case.

* Added a corresponding test file.

* Use a PicardException with an informative message when PF_READS == 0 and TOTAL_READS > 0.
* Fix longstanding bug so we use the Intel Inflater when it is requested

*  We have been using the Java Inflater even when the Intel one was available and requested
   due to a bug in the configuration of the default inflater.
   The fix is fairly brittle because it depends on the order of initialization of various things.
*  This will be fixed more generally by samtools/htsjdk#1666
* Add a test
* Fix deprecated uses of newInstance()
* Fix suppression of exception which could cause test to erroneously pass.
broadinstitute#1884)

* Add tests to verify sequence dictionary MD5s respect dos line endings.

* Don't repeat yourself.
lbergelson and others added 28 commits November 14, 2023 15:57
* This adjust for the lack of picard cloud jar and properly sets the picard version
so it's not listed as "snapshot"
* Update the docker build to use staged builds.
  This reduces the build size from about 1.7gb to about 650mb
…l_list (broadinstitute#1928)

* Add flag to keep zero length intervals when converting bed -> interval_list
* This is a plugin which provides HttpFileSystemProvider which allows connecting to http(s) paths
  through the nio Path and Files interfaces.
* Updating htsjdk to 4.1.0 which improves support for plugins lke this in Tribble.
* Adding an extremely simple test to show that http-nio is available
* update to use configuration api
* remove no longer relevant bits about the parser settings
* update to automatically configure javadoc/source jars and the toolchain
* Convert to use gradle java-library plugin

* this more clearly defines the necessary dependencies needed downstream to use picard
* reorganized the libraries a bit to make things clearer

* Remove uses of log4j in picard, it's still a required transitively
  replace the dependency with a version constraint so we don't import a buggy version
* update the use of setup-gcloud@v0 -> v2 since v0 is deprecated
Support references in the cloud, and cloud-enable a few more tools.
* Add the EXT argument to CollectSamErrorMetrics.

---------

Co-authored-by: Can Kockan <[email protected]>
Update SamToFastq documentation to clarify that the tool works properly for both name-sorted and coordinate-sorted inputs with certain caveats.
…alities close to the end of the read (broadinstitute#1942)

* Fixed a bug in getFlowSumOfBaseQualities

* uncertain start position parameter in MarkDuplicates

* Modified algorithm to determine the best read

* Added a new duplicate selection strategy that looks close to the end of the read

* Test fixed and strategy test added

* added .vscode
…titute#1913)

* No coverage cap on Histogram ArrayList used for statistics on actual coverage metrics

* Use hashmap for sparse array structure

* Test coverage to ensure Fold80 & median coverage are not impacted by Coverage Cap - per documentation

* Using histogram directly

* Normalize Depth Array directly & test

* Create unfiltered, capped array

* Create unfiltered, uncapped histogram for output

* Close TODO of normalizing array directly, to avoid the extra flip between array & histogram

* Remove tests for normalizeDepthArray

* Remove normalizeDepthArray

* Build uncapped data for outputting histograms, keep capped data for theoretical stats, move calculation of min / max depth into base loading, ensure histograms are not sparsely populated

* Remove longstream import

* Clean up extra whitespace

* Remove uncappedunfiltered, it's not needed

* Clean up unneeded properties
…te#1476)

* Make the VCF option in CollectSamErrorMetrics optional

* Some additional checks for no-VCF. One of the tests requires an index that was in the vcf testdata folder but not sam.

---------

Co-authored-by: Can Kockan <[email protected]>
…#1918)

* Reject piped input (/dev/stdin) for BedToIntervalList

* Directly compare INPUT to /dev/stdin instead of PicardHtsPath.isOther()

* Update error message and add a test.

---------

Co-authored-by: Chris Norman <[email protected]>
broadinstitute#1956)

* -fixes a bug in the liftover logic whereby when the alleles couldn't be extended to the "left" because the start of the variant was already at position 1, the resulting VC was corrupt leading to the problem discussed in !1951
* CollectQualityYieldMetricsFlowSpace dev

* add flow read params to test

* Minimimal reported error probability now calculated from the read.

* removed unused flow code

* remove FLOW_MODE from CollectQualityYieldMatrics

* FlowSpace -> Flow (simplify tool name)

* histograms added

* histogram update

* Cleaner treatment of fillingValue + fixed a bug in parsing T0

* restore test passing

* Code cleanup + INCLUDE_BQ_HISTOGRAM parameter

* MEAN_READ_NAME

* remove @hidden

* Fixed crash on unaligned reads, reads with no bases and a typing error

* ceil -> round

* PCT_Q20/Q30_FLOWS added

* CollectSNVQualityYieldMetrics tool

* SNVQ stats

* Quals taken from BQ

* Small refactoring - PCT_Q20_FLOWS -> PCT_PF_Q20_FLOWS etc.

* CollectQualityYieldMetricsFlow changes

* CollectQualityYieldMetricsSNVQTest changes

* Typo fix

* zero base BQ values

* Update SeriesStats.java

* Cleaner treatment of fillingValue + fixed a bug in parsing T0

* qual range warnings added

* code cleanup (of unused methods, etc)

* various comments addressed

* Rename short name for USE_BQ_FOR_BASE_QUALITIES

* addressing pull request comments

* flow core code moved to picard

* test (double,double) assertions made compatible with future test framework

* Revert "test (double,double) assertions made compatible with future test framework"

This reverts commit fd8b16d.

* FlowBasedHaplotype removed from picard

---------

Co-authored-by: Ilya Soifer <[email protected]>
…ava (broadinstitute#1977)

* Fix documentation for AT_DROPOUT and GC_DROPOUT in PanelMetricsBase.java
…issue-in-3rd-party-picard-gcbias-update-module
@djmce djmce self-requested a review September 6, 2024 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.