Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Review for #8752 #8871

Closed

Conversation

bbimber
Copy link
Contributor

@bbimber bbimber commented Jun 13, 2024

This is responding to code review on #8752

davidbenjamin and others added 20 commits March 25, 2024 10:20
… events (broadinstitute#8717)

* M2 bad haplotype filter does not filter variants that share a haplotype with a germline event

* two ECNT annotations -- haplotype and region -- and clustered events filter looks at both
…37/hg19 conversion (broadinstitute#8758)

Don't print the very long and misleading "The following contigs are present in b37 and
missing in the input VCF sequence dictionary" log message when we're not even doing b37/hg19
conversion.
Set GGVCFs --all-sites GQ0 hom-refs to no-calls
Set regular GGVCFs GQ0 hom-refs to no-calls (any DP, PL) for better AF/AN annotations
Remove PLs in "no data" case where DP=0 for more accurate QUAL score
Several files tracked by git lfs were accidentally reimported as normal files.
This makes them stubs again.
…elimited) (broadinstitute#8771)

* Enable ReblockGVCF to subset AS annotations that aren't "raw" (i.e. pipe-delimited)

* Fix tests by removing AssemblyComplexity from default annotations
* Add MappingQualityReadFilter

* Added additional warnings for mmq

* Fixed doc typo
…adinstitute#8802)

* Add malaria spanning deletion exception regression test with fix

* Disabling codecov.

---------

Co-authored-by: Jonn Smith <[email protected]>
* Fixed a bug that prevented filtering by SOR in many cases
…f in GenomicsDB (broadinstitute#8759)

* Allow for GT to be a nocall if GQ and PL[0] are zero instead of homref in GenomicsDB

* Move to 1.5.3 release from snapshot

---------

Co-authored-by: Nalini Ganapati <[email protected]>
Co-authored-by: Nalini Ganapati <[email protected]>
* Reduced total layers in the GATK docker image from 44 down to 16.

* Reduced GATK base image layers from 20 to 3.

* This might be a better solution than a full squash down to a single layer, because: 

If we are hosting this in a premium ACR, the limit is 10,000 readOps per minute. So with 16 layers, you get around 625 pulls per minute. Also, this will be able to still take advantage of parallel pulls (default is 3, but at most 16 threads in this case, I believe) as opposed to one big layer which will not download in parallel. There's the potential of that being a lot slower and subsequent jobs falling into the same "minute" because others are not done, making it easier to hit that 10k readOps limit. Lastly, people using GATK outside data pipelines will not be able to take advantage of layer caching too.

Resolves broadinstitute#8684
…in VCF header (broadinstitute#8831)

Added a --mask-description argument to VariantFiltration to write a custom description for the mask filter in the VCF header
… with truth VCF (broadinstitute#8836)

* added 20 more Permutect read features

* Permutect test data can, like training data, be annotated with a truth VCF
* [BIOIN-1570] Fixed edge case in variant annotation when the variant is close to the edge of the reference
…rve existing behavior when storeAllVcfSources=false
@bbimber bbimber changed the title Code Review for #8750 Code Review for #8752 Jun 13, 2024
@bbimber
Copy link
Contributor Author

bbimber commented Jun 13, 2024

@lbergelson or @droazen: this is related to your comments on #8752 about preserving the default behavior of just writing the first source. This PR overloads simpleMerge(), meaning code needs to opt-in to the feature of preserving all source names.

I cannot kick off tests - would one of you be able to to that?

@bbimber
Copy link
Contributor Author

bbimber commented Jun 25, 2024

@jamesemery, thanks for kicking off tests. the compile failure is probably due to a bad merge. I pushed a fix - would you be able to restart tests?

jamesemery and others added 2 commits June 25, 2024 12:51
…CNN WDL test failures (broadinstitute#8891)

Replace usage of a deprecated Genomes-in-the-Cloud docker image that was causing the CNN WDL tests to fail with the GATK base image.
* Update no-calls in Gnarly haploid test
* Put back turned off tests
jamesemery and others added 27 commits September 24, 2024 10:08
…omicsDBImport (broadinstitute#8987)

* Switch to logging a warning instead of an exception for intervals in a query that were not part of GenomicsDBImport
* upgrade GenomicsDB 1.5.3 -> 1.5.4
* fixes broadinstitute#8415
* updates

changes for gq0 comparison

more changes

some vcfs have no gqs

clean up

change for NPE

clean up

whitespace

* addressing comments
This commit changes the default --mitochondria-mode parameters to what they are supposed to be in the source code based on the raised question in the forum 

https://gatk.broadinstitute.org/hc/en-us/community/posts/29969305493147-Incorrect-documentation-for-mitochondria-mode-in-Mutect2

This commit also fixes the mitochondria mode flag to its correct naming in the documentation.
…nstitute#9002)

* Adds a new github actions workflow to submit our dependencies for analysis
…nstitute#9001)

Our README was missing some important details on how to properly setup the GATK conda environment locally.
…nstitute#8004)

* Port of the NVIDIA-authored nvscorevariants tool into GATK, with a basic tool frontend

* This is a direct replacement for the legacy tool CNNScoreVariants. It produces results that are almost identical to that tool, but is implemented on top of a more modern ML library, Pytorch.

* The Python code is taken from https://github.com/NVIDIA-Genomics-Research/nvscorevariants, with a few minor modifications necessary to get the tool working on newer versions of the Python libraries.

* Added pytorch-lightning to the GATK conda environment, as it's required by this tool

* Disabled jacoco in build.gradle, as it was causing strange errors related to jacoco trying to parse the new Pytorch model files in the resources directory

---------

Co-authored-by: Louis Bergelson <[email protected]>
…tools (broadinstitute#9008)

* Adding small warning messages to not to feed any GVCF files to these tools.

* Update src/main/java/org/broadinstitute/hellbender/tools/walkers/vqsr/VariantRecalibrator.java

Co-authored-by: Louis Bergelson <[email protected]>

* Update src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/VariantAnnotator.java

Co-authored-by: Louis Bergelson <[email protected]>

* Update src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/VariantAnnotator.java

Co-authored-by: Louis Bergelson <[email protected]>

---------

Co-authored-by: Louis Bergelson <[email protected]>
* Rebuild base image to incorporate recent patches
* gatkbase-3.3.0 -> 3.3.1
* Update gradle and build.gradle

* update gradle wrapper 8.2.1 ->  8.10.2
* remove 'versions' plugin because we don't use it
* update gradle plugins to new versions
  * shadow plugin changed maintainers and coordinates
	com.github.johnrengelman.shadow:8.1.1 -> com.gradleup.shadow:8.3.3
  * git-version 0.5.1 -> 3.1.0
  * sonatype scan 2.6.1 -> 2.8.3
  * download 5.4.0 -> 5.6.0
* use tasks.register() which is the newer style
* add gradle.properties to configure daemon memory

* update several vulnerable dependencies to safer versions
* update htsjdk and picard
* replace deprecated override of ThresholdStream.getStream() -> getOutputStream()
…roadinstitute#9009)

Removed the code for these tools, now that NVScoreVariants is merged
and we've moved to a Python environment that is incompatible with these
legacy tools.

Also added the tools to the DeprecatedToolsRegistry.
* Updating dependency management and vulnerable dependencies

* Update dependencies to fix vulnerabilities as reported in broadinstitute#8950
* Update our dependency management to make use of some newish gradle features
  * Add dependency constraints to update transitive dependencies, this allows us to specify versions without making them
    direct dependencies
  * Remove most force expressions and replace them where necessary with version strict requirements
  * Make use of several published bom's to configure consistent dependency versions for platforms like netty and log4j2
  * Remove exclude statements that are now handled by variant dependency resolution (like guava android vs jdk)
* Exclude the org.bouncycastle:bcprov-jdk15on dependency and replace it with bcprov-jdk18onA
  This adds an unnecessary testUtilImplementation level dependency on what is really a transitive, but I couldn't get gradle's explicit version replacement to work.
  replacement logic to work so this is a workaround
…gged Gencode transcripts where possible (broadinstitute#9012)

* Added a '--prefer-mane-transcripts' mode that enforces MANE_Select tagged Gencode transcripts where possible
* fix dependabot alert for jetty-xml
* bom version -> 9.4.56.v2024082
…of + (broadinstitute#9018)

* maven central doesn't like + in version ranges and gradle doesn't translate it
In GATK3, when merging variants, the IDs of all the source VCFs were retained. This code path seems like it intended that, since the variantSources set is generated, but it doesnt get used for anything. This PR will use that set to set the source of the resulting merged VC.
…rve existing behavior when storeAllVcfSources=false
…_merge2

# Conflicts:
#	src/main/java/org/broadinstitute/hellbender/utils/variant/GATKVariantContextUtils.java
@bbimber
Copy link
Contributor Author

bbimber commented Nov 5, 2024

Closed in favor of: #9032

@bbimber bbimber closed this Nov 5, 2024
@bbimber bbimber deleted the lb_source_merge2 branch November 13, 2024 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.