Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Review for #8752 #8871

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
105b63e
Make M2 haplotype and clustered events filters smarter about germline…
davidbenjamin Mar 25, 2024
5e99907
Add overloaded method
bbimber Mar 28, 2024
724b5bc
Funcotator: suppress a log message about b37 contigs when not doing b…
droazen Apr 1, 2024
6739e6d
SNVQ recalibration tool added for flow based reads (#8697)
ilyasoifer Apr 4, 2024
7cdc985
Several GQ0 cleanup changes: (#8741)
ldgauthier Apr 10, 2024
47c4858
Re-commit large files as lfs stubs (#8769)
lbergelson Apr 11, 2024
986cb15
Enable ReblockGVCF to subset AS annotations that aren't "raw" (pipe-d…
ldgauthier Apr 12, 2024
aed8b1b
Gc getpipeupsummaries use mappingqualityreadfilter (#8781)
gokalpcelik Apr 18, 2024
ec39c37
Add malaria spanning deletion exception regression test with fix (#8802)
ldgauthier May 1, 2024
5c32785
Bug fix in flow allele filtering (#8775)
ilyasoifer May 2, 2024
24f93b5
Allow for GT to be a nocall if GQ and PL[0] are zero instead of homre…
nalinigans May 6, 2024
c6daf7d
Reduced docker layers in GATK image from 44 to 16 (#8808)
kevinpalis May 9, 2024
a3bbfc4
VariantFiltration: added arg to write custom mask filter description …
meganshand May 15, 2024
4ed93fe
Bigger Permutect tensors and Permutect test datasets can be annotated…
davidbenjamin May 17, 2024
d4744f7
[BIOIN-1570] Fixed edge case in variant annotation (#8810)
ilyasoifer Jun 3, 2024
0be44f2
Mutect2 germline resource can have split multiallelic format (#8837)
davidbenjamin Jun 3, 2024
2878ce5
Mutect2 WDL and GetSampleName can handle multiple sample names in BAM…
davidbenjamin Jun 4, 2024
2a420e4
Permutect dataset engine outputs contig and read group indices, not n…
davidbenjamin Jun 4, 2024
402d975
Add overloaded method to GATKVariantContextUtils.simpleMerge to prese…
bbimber Jun 13, 2024
9584d7c
Merge branch 'lb_source_merge' into lb_source_merge2
bbimber Jun 13, 2024
ab98a5d
Fixed a bug in AlleleFiltering that ignored more than a single sample…
ilyasoifer Jun 13, 2024
d633a57
Fixing bug in ReblockGVCFs when removing annotations (#8870)
meganshand Jun 20, 2024
abef8e1
fix for gnarly when PLs are null (#8878)
meganshand Jun 20, 2024
948cd4f
Fix GenotypeGVCFs with mixed ploidy sites (#8862)
meganshand Jun 24, 2024
78644b9
Remove final
bbimber Jun 25, 2024
8c33e31
Remove deprecated genomes in the cloud docker image that was causing …
jamesemery Jun 25, 2024
e600f1c
Restore gnarly tests (#8893)
ldgauthier Jun 25, 2024
baa0dd0
Remove header lines in ReblockGVCFs when we remove FORMAT annotations…
meganshand Jun 26, 2024
b2723d6
Update several dependencies to fix vulnerabilities (#8898)
lbergelson Jun 27, 2024
adbb626
Update http-nio to 1.1.1 (#8889)
droazen Jun 27, 2024
92dc4ae
Inverted SoftClippedReadFilter to conform to the standard filtering l…
jamesemery Jun 28, 2024
ea58e61
Tool to detect CRAM base corruption caused by GATK issue 8768 (#8819)
cmnbroad Jun 29, 2024
64348bc
Update HTSJDK to 4.1.1 and Picard to 3.2.0 (#8900)
droazen Jun 29, 2024
4af2b49
Handle CTX_PP/QQ and CTX_PQ/QP CPX_TYPE values in SVConcordance (#8885)
epiercehoffman Jul 1, 2024
ddaf66f
Updated Python and PyMC, removed TensorFlow, and added PyTorch in con…
samuelklee Jul 9, 2024
23a38ce
Complex SV intervals support (#8521)
mwalker174 Jul 11, 2024
b97448c
Update hdf5-java-bindings to version 1.2.0-hdf5_2.11.0, which removes…
droazen Jul 12, 2024
59c9c1b
Clarify in the README which git lfs files are required to build GATK …
droazen Jul 12, 2024
747df1a
Modified HaplotypeBasedVariantRecaller to support non-flow reads (#8896)
ilyasoifer Jul 24, 2024
3d99f22
X_FILTERED_COUNT semantics adjusted in FlowFeatureMapper (#8894)
dror27 Jul 25, 2024
64dc4b3
adding test to match WARP tests edge case (#8928)
meganshand Jul 30, 2024
096be07
Adds VcfComparator tool (#8933)
meganshand Aug 6, 2024
9f2fbb5
Add docs about citing GATK (#8947)
rickymagner Aug 9, 2024
3c1430d
Issue #7159 Create tool for producing genomic regions (as a BED file…
sanashah007 Aug 29, 2024
f42b7f7
Update the large test CRAM files to CRAM v3.0. (#8832)
cmnbroad Sep 4, 2024
1258174
Update detector output files. (#8971)
cmnbroad Sep 4, 2024
6bb5217
Require both overlap and breakend proximity for depth-only SV cluster…
mwalker174 Sep 11, 2024
2ba3a15
Adds QD and AS_QD emission from VariantAnnotator on GVS input (#8978)
meganshand Sep 12, 2024
6036d67
Added new argument "--variant-output-filtering" to variant walkers fo…
lbergelson Sep 18, 2024
e1b5f95
Swapped mito mode in Mutect to use the mode argument utils, providing…
jamesemery Sep 24, 2024
3401a33
Warn instead of throwing when querying intervals that were not in Gen…
nalinigans Sep 25, 2024
2e459a5
Updates to VcfComparator (#8973)
meganshand Sep 27, 2024
fa3f11c
Update Mutect2.java Documentation (#8999)
gokalpcelik Oct 14, 2024
779489b
Add dependency submission workflow to monitor vulnerabilities (#9002)
lbergelson Oct 16, 2024
a070efc
Add more detailed conda setup instructions to the GATK README (#9001)
droazen Oct 16, 2024
a377b07
Port of nvscorevariants into GATK, with a basic tool frontend (#8004)
droazen Oct 17, 2024
87e9f1f
Re-added `--only-output-calls-starting-in-intervals` as a deprecated …
jamesemery Oct 17, 2024
f5a2256
Adding small warning messages to not to feed any GVCF files to these …
gokalpcelik Oct 18, 2024
07eac31
Update docker base (#9005)
lbergelson Oct 18, 2024
c18f3e2
Added a check for whether files can be created and executed within th…
KevinCLydon Oct 18, 2024
df8e4b7
Update gradle and build.gradle (#8998)
lbergelson Oct 18, 2024
d6acb84
Remove CNNScoreVariants, CNNVariantTrain, and CNNVariantWriteTensors …
droazen Oct 18, 2024
b409f77
Mark NVScoreVariants as a beta feature (#9010)
droazen Oct 18, 2024
d056c32
Additional Dependency updates (#9006)
lbergelson Oct 23, 2024
c4860d4
Added a '--prefer-mane-transcripts' mode that enforces MANE_Select ta…
jamesemery Oct 23, 2024
dffedfb
Use jetty bom to enforce uniform jetty versions (#9016)
lbergelson Oct 23, 2024
02c87bf
Change the kryo version specification to use maven style [,) instead …
lbergelson Oct 24, 2024
6c9ae1b
Retain all source IDs on VariantContext merge
bbimber Mar 25, 2024
46792f3
Add test
lbergelson Mar 26, 2024
a370b85
Add overloaded method
bbimber Mar 28, 2024
e7e60ab
Add overloaded method to GATKVariantContextUtils.simpleMerge to prese…
bbimber Jun 13, 2024
4a7d92b
add a limit
lbergelson Mar 28, 2024
00e04b4
Remove final
bbimber Jun 25, 2024
924ddff
Merge remote-tracking branch 'origin/lb_source_merge2' into lb_source…
bbimber Nov 4, 2024
a131b0f
Enforce maxSourceFieldLength
bbimber Nov 4, 2024
ec1a6d8
Improve docs
bbimber Nov 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
9 changes: 5 additions & 4 deletions .github/actions/upload-gatk-test-results/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,10 @@ runs:
name: test-results-${{ inputs.is-docker == 'true' && 'docker-' || '' }}${{ matrix.Java }}-${{ matrix.testType }}
path: build/reports/tests

- name: Upload to codecov
run: bash <(curl -s https://raw.githubusercontent.com/broadinstitute/codecov-bash-uploader/main/codecov-verified.bash)
shell: bash
# Disabling codecov because it is timing out and failing builds that otherwise succeed.
## - name: Upload to codecov
## run: bash <(curl -s https://raw.githubusercontent.com/broadinstitute/codecov-bash-uploader/main/codecov-verified.bash)
## shell: bash

- name: Upload Reports
if: ${{ inputs.only-artifact != 'true' }}
Expand Down Expand Up @@ -91,4 +92,4 @@ runs:
run: |
pip install --user PyGithub;
python scripts/github_actions/Reporter.py ${{ steps.uploadreports.outputs.view_url }};
shell: bash
shell: bash
22 changes: 22 additions & 0 deletions .github/workflows/dependency_submission.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Dependency Submission

on:
push:
branches: [ 'master' ]

permissions:
contents: write

jobs:
dependency-submission:
runs-on: ubuntu-latest
steps:
- name: Checkout sources
uses: actions/checkout@v4
- name: Setup Java
uses: actions/setup-java@v4
with:
distribution: 'temurin'
java-version: 17
- name: Generate and submit dependency graph
uses: gradle/actions/dependency-submission@v4
8 changes: 1 addition & 7 deletions .github/workflows/gatk-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
wdlTest: [ 'RUN_CNV_GERMLINE_COHORT_WDL', 'RUN_CNV_GERMLINE_CASE_WDL', 'RUN_CNV_SOMATIC_WDL', 'RUN_M2_WDL', 'RUN_CNN_WDL', 'RUN_VCF_SITE_LEVEL_FILTERING_WDL' ]
wdlTest: [ 'RUN_CNV_GERMLINE_COHORT_WDL', 'RUN_CNV_GERMLINE_CASE_WDL', 'RUN_CNV_SOMATIC_WDL', 'RUN_M2_WDL', 'RUN_VCF_SITE_LEVEL_FILTERING_WDL' ]
continue-on-error: true
name: WDL test ${{ matrix.wdlTest }} on cromwell
steps:
Expand Down Expand Up @@ -345,12 +345,6 @@ jobs:
echo "Running M2 WDL";
bash scripts/m2_cromwell_tests/run_m2_wdl.sh;

- name: "CNN_WDL_TEST"
if: ${{ matrix.wdlTest == 'RUN_CNN_WDL' }}
run: |
echo "Running CNN WDL";
bash scripts/cnn_variant_cromwell_tests/run_cnn_variant_wdl.sh;

- name: "VCF_SITE_LEVEL_FILTERING_WDL_TEST"
if: ${{ matrix.wdlTest == 'RUN_VCF_SITE_LEVEL_FILTERING_WDL' }}
run: |
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,3 +44,4 @@ funcotator_tmp

#Test generated dot files
test*.dot
.vscode/
50 changes: 23 additions & 27 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
ARG BASE_DOCKER=broadinstitute/gatk:gatkbase-3.2.0
ARG BASE_DOCKER=broadinstitute/gatk:gatkbase-3.3.1

# stage 1 for constructing the GATK zip
FROM ${BASE_DOCKER} AS gradleBuild
LABEL stage=gatkIntermediateBuildImage
ARG RELEASE=false

RUN ls .

ADD . /gatk
WORKDIR /gatk

# Get an updated gcloud signing key, in case the one in the base image has expired
RUN rm /etc/apt/sources.list.d/google-cloud-sdk.list && \
#Download only resources required for the build, not for testing
RUN ls . && \
rm /etc/apt/sources.list.d/google-cloud-sdk.list && \
apt update &&\
apt-key list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key --keyring /usr/share/keyrings/cloud.google.gpg add - && \
Expand All @@ -19,16 +21,13 @@ RUN rm /etc/apt/sources.list.d/google-cloud-sdk.list && \
apt-get -y clean && \
apt-get -y autoclean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt/lists/*
RUN git lfs install --force

#Download only resources required for the build, not for testing
RUN git lfs pull --include src/main/resources/large

RUN export GRADLE_OPTS="-Xmx4048m -Dorg.gradle.daemon=false" && /gatk/gradlew clean collectBundleIntoDir shadowTestClassJar shadowTestJar -Drelease=$RELEASE
RUN cp -r $( find /gatk/build -name "*bundle-files-collected" )/ /gatk/unzippedJar/
RUN unzip -o -j $( find /gatk/unzippedJar -name "gatkPython*.zip" ) -d /gatk/unzippedJar/scripts
RUN chmod -R a+rw /gatk/unzippedJar
rm -rf /var/lib/apt/lists/* && \
git lfs install --force && \
git lfs pull --include src/main/resources/large && \
export GRADLE_OPTS="-Xmx4048m -Dorg.gradle.daemon=false" && /gatk/gradlew clean collectBundleIntoDir shadowTestClassJar shadowTestJar -Drelease=$RELEASE && \
cp -r $( find /gatk/build -name "*bundle-files-collected" )/ /gatk/unzippedJar/ && \
unzip -o -j $( find /gatk/unzippedJar -name "gatkPython*.zip" ) -d /gatk/unzippedJar/scripts && \
chmod -R a+rw /gatk/unzippedJar

FROM ${BASE_DOCKER}

Expand All @@ -47,17 +46,17 @@ RUN chmod -R a+rw /gatk
COPY --from=gradleBuild /gatk/unzippedJar .

#Setup linked jars that may be needed for running gatk
RUN ln -s $( find /gatk -name "gatk*local.jar" ) gatk.jar
RUN ln -s $( find /gatk -name "gatk*local.jar" ) /root/gatk.jar
RUN ln -s $( find /gatk -name "gatk*spark.jar" ) gatk-spark.jar
RUN ln -s $( find /gatk -name "gatk*local.jar" ) gatk.jar && \
ln -s $( find /gatk -name "gatk*local.jar" ) /root/gatk.jar && \
ln -s $( find /gatk -name "gatk*spark.jar" ) gatk-spark.jar

WORKDIR /root

# Make sure we can see a help message
RUN java -jar gatk.jar -h
RUN mkdir /gatkCloneMountPoint
RUN mkdir /jars
RUN mkdir .gradle
RUN java -jar gatk.jar -h && \
mkdir /gatkCloneMountPoint && \
mkdir /jars && \
mkdir .gradle

WORKDIR /gatk

Expand All @@ -80,16 +79,13 @@ RUN echo "source activate gatk" > /root/run_unit_tests.sh && \
echo "ln -s /gatkCloneMountPoint/build/ /gatkCloneMountPoint/scripts/docker/build" >> /root/run_unit_tests.sh && \
echo "cd /gatk/ && /gatkCloneMountPoint/gradlew -Dfile.encoding=UTF-8 -b /gatkCloneMountPoint/dockertest.gradle testOnPackagedReleaseJar jacocoTestReportOnPackagedReleaseJar -a -p /gatkCloneMountPoint" >> /root/run_unit_tests.sh

WORKDIR /root
RUN cp -r /root/run_unit_tests.sh /gatk
RUN cp -r gatk.jar /gatk
ENV CLASSPATH /gatk/gatk.jar:$CLASSPATH
RUN cp -r /root/run_unit_tests.sh /gatk && \
cp -r /root/gatk.jar /gatk
ENV CLASSPATH=/gatk/gatk.jar:$CLASSPATH PATH=$CONDA_PATH/envs/gatk/bin:$CONDA_PATH/bin:$PATH

# Start GATK Python environment

WORKDIR /gatk
ENV PATH $CONDA_PATH/envs/gatk/bin:$CONDA_PATH/bin:$PATH
RUN conda env create -n gatk -f /gatk/gatkcondaenv.yml && \
RUN conda env create -vv -n gatk -f /gatk/gatkcondaenv.yml && \
echo "source activate gatk" >> /gatk/gatkenv.rc && \
echo "source /gatk/gatk-completion.sh" >> /gatk/gatkenv.rc && \
conda clean -afy && \
Expand Down
38 changes: 30 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ releases of the toolkit.
* [How to contribute to GATK](#contribute)
* [Discussions](#discussions)
* [Authors](#authors)
* [Citing GATK](#citing)
* [License](#license)

## <a name="requirements">Requirements</a>
Expand Down Expand Up @@ -78,15 +79,31 @@ releases of the toolkit.
docker client, which can be found on the [docker website](https://www.docker.com/get-docker).
* Python Dependencies:<a name="python"></a>
* GATK4 uses the [Conda](https://conda.io/docs/index.html) package manager to establish and manage the
Python environment and dependencies required by GATK tools that have a Python dependency. This environment also
includes the R dependencies used for plotting in some of the tools. The ```gatk``` environment
requires hardware with AVX support for tools that depend on TensorFlow (e.g. CNNScoreVariant). The GATK Docker image
comes with the ```gatk``` environment pre-configured.
* At this time, the only supported platforms are 64-bit Linux distributions. The required Conda environment is not
currently supported on OS X/macOS.
Python environment and dependencies required by Python-based GATK tools. This environment also
includes the R dependencies used for plotting in some of the tools. The GATK Docker image
comes with the ```gatk``` conda environment pre-configured and activated.
* To establish the environment when not using the Docker image, a conda environment must first be "created", and
then "activated":
* First, make sure [Miniconda or Conda](https://conda.io/docs/index.html) is installed (Miniconda is sufficient).
* First, make sure [Miniconda or Conda](https://conda.io/docs/index.html) is installed. We recommend installing
```Miniconda3-py310_23.10.0-1``` from [the miniconda download page](https://repo.anaconda.com/miniconda/), selecting the Linux or
MacOS version of the installer as appropriate.
* This is the same version of ```miniconda``` used by the official GATK docker image.
* If you use a different version, you may run into issues.
* If you have an ARM-based Mac, you must select the `MacOSX-x86_64` installer, not the `MacOSX-arm64` installer,
and rely on Mac OS's built-in x86 emulation.
* Set up miniconda:
* Install miniconda to a location on your PATH such as ```/opt/miniconda```, and then restart your shell:
```
bash Miniconda3-py310_23.10.0-1-[YOUR_OS].sh -p /opt/miniconda -b
```
* Disable conda auto-updates, which can cause compatibility issues with GATK:
```
conda config --set auto_update_conda false
```
* Enable the (much) faster ```libmamba``` solver to greatly speed up creation of the conda environment:
```
conda config --set solver libmamba
```
* To "create" the conda environment:
* If running from a zip or tar distribution, run the command ```conda env create -f gatkcondaenv.yml``` to
create the ```gatk``` environment.
Expand Down Expand Up @@ -156,7 +173,9 @@ For more details on system packages, see the GATK [Base Dockerfile](scripts/dock

* This creates a zip archive in the `build/` directory with a name like `gatk-VERSION.zip` containing a complete standalone GATK distribution, including our launcher `gatk`, both the local and spark jars, and this README.
* You can also run GATK commands directly from the root of your git clone after running this command.
* Note that you *must* have a full git clone in order to build GATK, including the git-lfs files in src/main/resources. The zipped source code alone is not buildable.
* Note that you *must* have a full git clone in order to build GATK, including the git-lfs files in `src/main/resources/large`. The zipped source code alone is not buildable.
* The large files under `src/main/resources/large/` are required to build GATK, since they are packaged inside the GATK jar and used by tools at runtime. These include things like ML models and native C/C++ libraries used for acceleration of certain tools.
* The large files under `src/test/resources/large/`, on the other hand, are only required by the test suite when running tests, and are not required to build GATK.

* **Other ways to build:**
* `./gradlew installDist`
Expand Down Expand Up @@ -671,5 +690,8 @@ Thank you for getting involved!
The authors list is maintained in the [AUTHORS](https://github.com/broadinstitute/gatk/edit/master/AUTHORS) file.
See also the [Contributors](https://github.com/broadinstitute/gatk/graphs/contributors) list at github.

## <a name="citing">Citing GATK</a>
If you use GATK in your research, please see [this article](https://gatk.broadinstitute.org/hc/en-us/articles/360035530852-How-should-I-cite-GATK-in-my-own-publications) for details on how to properly cite GATK.

## <a name="license">License</a>
Licensed under the Apache 2.0 License. See the [LICENSE.txt](https://github.com/broadinstitute/gatk/blob/master/LICENSE.TXT) file.
Loading
Loading