Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hostile with no options classifying different than --invert #42

Closed
jannikseidelQBiC opened this issue Sep 9, 2024 · 5 comments
Closed
Labels
bug Something isn't working
Milestone

Comments

@jannikseidelQBiC
Copy link

Hi and first, thanks for the great work.

I tried to run Hostile to get the filtered result files and the removed read-pairs (Illumina paired-end data as input). What caught my eye is that the two results do not match:
reads_removed in the first output should be the same as reads_out in the second (and the other combination).

Mode reads_removed reads_out
no option 19870638 42475288
--invert 42896358 19449568
Difference to 'no option' 421070 -421070

The commands I used (installation of Hostile 1.1.0 via conda):

hostile clean --fastq1 <file_forward>.fq.gz --fastq2 <file_reverse>.fq.gz --out-dir filtered_1 > log1_filtered.log
hostile clean --fastq1 <file_forward>.fq.gz --fastq2 <file_reverse>.fq.gz --out-dir removed_1 --invert > log1_removed.log

It seams that running with the --invert flag does a different classification than without. Am I missing an option to set to get the same results?

Thanks in advance!

PS: Here are the log files.

[
    {
        "version": "1.1.0",
        "aligner": "bowtie2",
        "index": "human-t2t-hla",
        "options": [],
        "fastq1_in_name": "<file_forward>.fq.gz",
        "fastq1_in_path": "<path_to_files>/<file_forward>.fq.gz",
        "fastq1_out_name": "<file_forward>.clean_1.fastq.gz",
        "fastq1_out_path": "filtered_1/<file_forward>.clean_1.fastq.gz",
        "reads_in": 62345926,
        "reads_out": 42475288,
        "reads_removed": 19870638,
        "reads_removed_proportion": 0.31872,
        "fastq2_in_name": "<file_reverse>.fq.gz",
        "fastq2_in_path": "<path_to_files>/<file_reverse>.fq.gz",
        "fastq2_out_name": "<file_reverse>.clean_2.fastq.gz",
        "fastq2_out_path": "filtered_1/<file_reverse>.clean_2.fastq.gz"
    }
]
[
    {
        "version": "1.1.0",
        "aligner": "bowtie2",
        "index": "human-t2t-hla",
        "options": [
            "invert"
        ],
        "fastq1_in_name": "<file_forward>.fq.gz",
        "fastq1_in_path": "<path_to_files>/<file_forward>.fq.gz",
        "fastq1_out_name": "<file_forward>.clean_1.fastq.gz",
        "fastq1_out_path": "removed_1/<file_forward>.clean_1.fastq.gz",
        "reads_in": 62345926,
        "reads_out": 19449568,
        "reads_removed": 42896358,
        "reads_removed_proportion": 0.68804,
        "fastq2_in_name": "<file_reverse>.fq.gz",
        "fastq2_in_path": "<path_to_files>/<file_reverse>.fq.gz",
        "fastq2_out_name": "<file_reverse>.clean_2.fastq.gz",
        "fastq2_out_path": "removed_1/<file_reverse>.clean_2.fastq.gz"
    }
]
@bede
Copy link
Owner

bede commented Sep 9, 2024

Hi Jannik, thank you, this is interesting. From your data there certainly appears to be a problem with how --invert is implemented. By any chance are you able to send me some (or all) of your test data?

Bede

@bede bede added the bug Something isn't working label Sep 9, 2024
@jannikseidelQBiC
Copy link
Author

Hi Bede,
the dataset I cannot share. Could you try to reproduce the behavior with another dataset? If it depends on only this dataset this would be also highly interesting.

Best,
Jannik

@bede
Copy link
Owner

bede commented Sep 11, 2024 via email

@bede bede added this to the 1.2.0 milestone Sep 12, 2024
@bede
Copy link
Owner

bede commented Dec 13, 2024

Please accept my apologies for the delay. I've reproduced and pushed a fix to be released in coming days. I had mistakenly assumed that samtools view -F 12 outputs the inverse of samtools view -f 12 in the case of paired reads. Now we use a Samtools filter expression for the inverted paired scenario using logical OR on the bitwise flags 4 and 8 rather than AND previously used incorrectly. This issue only affected --invert mode in the paired read case. A test case has been written. Thank you very much for catching this.

cc8a101

@bede
Copy link
Owner

bede commented Dec 19, 2024

Released in 2.0.0

@bede bede closed this as completed Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants