Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BedToIntervalList reportedly does not handle stdin properly #1890

Closed
droazen opened this issue Jun 12, 2023 · 2 comments
Closed

BedToIntervalList reportedly does not handle stdin properly #1890

droazen opened this issue Jun 12, 2023 · 2 comments
Assignees

Comments

@droazen
Copy link
Contributor

droazen commented Jun 12, 2023

We have a report from a user that BedToIntervalList produces an empty output when its input is supplied via -I /dev/stdin rather than -I <bed_file> (original post here: https://gatk.broadinstitute.org/hc/en-us/community/posts/16231098055835-BedToIntervalList-from-dev-stdin):

BedToIntervalList behaves differently depending on whether input comes from file or from /dev/stdin. Here is a simple example:

Create a BED file with a single entry:

$ echo -e "chr1\t10000\t20000" > test.bed

Convert this to an interval list:

$ picard BedToIntervalList -SD reference.dict -I test.bed -O foo.interval_list

Do it using input from stdin:

$ cat test.bed | picard BedToIntervalList -SD reference.dict -I /dev/stdin -O bar.interval_list

Now foo.interval_list is as expected, but bar.interval_list does not contain any intervals:

$ grep -v ^@ foo.interval_list | wc -l

1

$ grep -v ^@ bar.interval_list | wc -l

0

Is this the expected behavior? The output is not necessarily empty. I became aware of this when piping a much bigger input, and the output had fewer intervals than expected.

Log from first run (with file):

$ gatk BedToIntervalList -SD reference.dict -I test.bed -O foo.interval_list

Using GATK jar /home/michaelk/miniconda3/envs/test-gatk/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar

Running:

    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/michaelk/miniconda3/envs/test-gatk/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar BedToIntervalList -SD reference.dict -I test.bed -O foo.interval_list

20:17:10.435 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/michaelk/miniconda3/envs/test-gatk/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so

[Fri Jun 09 20:17:10 CEST 2023] BedToIntervalList --INPUT test.bed --SEQUENCE_DICTIONARY reference.dict --OUTPUT foo.interval_list --SORT true --UNIQUE false --DROP_MISSING_CONTIGS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 2 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

[Fri Jun 09 20:17:10 CEST 2023] Executing as michaelk@fe-open-01 on Linux 3.10.0-1160.53.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.3-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is available; Picard version: Version:4.4.0.0

INFO 2023-06-09 20:17:10 BedToIntervalList Wrote 1 intervals spanning a total of 10000 bases

[Fri Jun 09 20:17:10 CEST 2023] picard.util.BedToIntervalList done. Elapsed time: 0.00 minutes.

Runtime.totalMemory()=285212672

Tool returned:

0

Log from second run (with /dev/stdin):

$ cat test.bed | picard BedToIntervalList -SD reference.dict -I /dev/stdin -O bar.interval_list

20:18:34.291 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/michaelk/miniconda3/envs/test-gatk/share/picard-3.0.0-1/picard.jar!/com/intel/gkl/native/libgkl_compression.so

[Fri Jun 09 20:18:34 CEST 2023] BedToIntervalList --INPUT /dev/stdin --SEQUENCE_DICTIONARY reference.dict --OUTPUT bar.interval_list --SORT true --UNIQUE false --DROP_MISSING_CONTIGS false --VERBOSITY INFO --QUIET false --VALIDATION_STRINGENCY STRICT --COMPRESSION_LEVEL 5 --MAX_RECORDS_IN_RAM 500000 --CREATE_INDEX false --CREATE_MD5_FILE false --help false --version false --showHidden false --USE_JDK_DEFLATER false --USE_JDK_INFLATER false

[Fri Jun 09 20:18:34 CEST 2023] Executing as michaelk@fe-open-01 on Linux 3.10.0-1160.53.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 17.0.3-internal+0-adhoc..src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: Version:3.0.0

INFO 2023-06-09 20:18:34 BedToIntervalList Wrote 0 intervals spanning a total of 0 bases

[Fri Jun 09 20:18:34 CEST 2023] picard.util.BedToIntervalList done. Elapsed time: 0.00 minutes.

Runtime.totalMemory()=538968064
@micknudsen
Copy link

micknudsen commented Jun 12, 2023

@droazen Thanks for opening this issue. I wanted to do the same, but the standard template for issue submission told me to first post in the GATK forums. I will be happy to supply any additional information if needed.

I first discovered this behaviour when filtering GnomAD SNPs for somatic CNV calling:

bcftools norm \
    -m- \
    -r $(grep ^@SQ reference.dict | cut -f2 | cut -d':' -f2 | tr '\n' ',') \
    af-only-gnomad.hg38.vcf.gz \
| \
bcftools view \
    -i 'TYPE="snps" & AF>0.05 & AF<0.95' \
| \
grep -v ^# \
| \
cut -f1,2 | while read CHROM POS;
do
    echo -e "${CHROM}\t$(( ${POS} - 1 ))\t${POS}"
done \
| \
gatk BedToIntervalList \
   -SD inputs["fasta_dictionary"] \
   -I /dev/stdin \
   -O gnomad.filtered.interval_list

If I leave out the BedToIntervalList part, the BED file looks as expected, and so does the output if I provide it as input for BedToIntervalList. However, when I use /dev/stdin, several thousand positions are missing.

@kockan
Copy link
Contributor

kockan commented Mar 12, 2024

Closed by #1918

@kockan kockan closed this as completed Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants