Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Salmon output invalid JSON #279

Closed
kurtwheeler opened this issue Aug 24, 2018 · 5 comments
Closed

Salmon output invalid JSON #279

kurtwheeler opened this issue Aug 24, 2018 · 5 comments

Comments

@kurtwheeler
Copy link
Contributor

Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
No.

Describe the bug
I ran salmon quant and the lib_format_counts.json file that was produced contained a NaN value which is not valid JSON.

To Reproduce
Steps and data to reproduce the behavior:

Specifically, please provide at least the following information:

  • Which version of salmon was used?

0.9.1

  • How was salmon installed (compiled, downloaded executable, through bioconda)?

From our dockerfile:

# Install Salmon
ENV SALMON_VERSION 0.9.1
RUN wget https://github.com/COMBINE-lab/salmon/releases/download/v${SALMON_VERSION}/Salmon-${SALMON_VERSION}_linux_x86_64.tar.gz
RUN tar -xzf Salmon-${SALMON_VERSION}_linux_x86_64.tar.gz
# Create soft link `/usr/local/bin/salmon` that points to the actual program
RUN ln -sf `pwd`/Salmon-latest_linux_x86_64/bin/salmon /usr/local/bin/
RUN rm -f Salmon-${SALMON_VERSION}_linux_x86_64.tar.gz
# End Salmon installation.
  • Which reference (e.g. transcriptome) was used?

One we prepared. We got the raw transcriptome from ensembl, then prepared it with:
https://github.com/AlexsLemonade/refinebio/blob/dev/workers/data_refinery_workers/processors/transcriptome_index.py

Which produced:
https://s3.amazonaws.com/data-refinery-test-assets/Caenorhabditis_elegans_short_1527089586.tar.gz

  • Which read files were used?

Two read files out of:
https://s3.amazonaws.com/data-refinery-test-assets/salmon_tests.tar.gz

found within that archive at:
test_experiment/raw/reads_1.fastq
and
test_experiment/raw/reads_2.fastq

Unfortunately I am not entirely sure where these were found.

  • Which which program options were used?

The exact invocation of salmon was:

salmon --no-version-check quant -l A --biasSpeedSamp 5 -i /home/user/data_store/processed/TEST/TRANSCRIPTOME_INDEX/index -1 /home/user/data_store/salmon_tests/test_experiment/raw/reads_1.fastq -2 /home/user/data_store/salmon_tests/test_experiment/raw/reads_2.fastq -p 20 -o /home/user/data_store/TEST/test_sample/processed/ --seqBias --gcBias --dumpEq --writeUnmappedNames

Expected behavior
This happened while I was modifying the tests for running salmon. I'm guessing that my code isn't quite right yet so something going wrong isn't quite unexpected. However I would have expected an error to come out of Salmon rather than producing JSON which is invalid.

Desktop (please complete the following information):

Our exact environment for running this is described here:
https://github.com/AlexsLemonade/refinebio/blob/dev/workers/dockerfiles/Dockerfile.salmon

The base image is ubuntu:16.04

Additional context
Here is the contents of lib_format_counts.json file (github won't let me upload it):

{
    "read_files": "( /home/user/data_store/salmon_tests/test_experiment/raw/reads_1.fastq, /home/user/data_store/salmon_tests/test_experiment/raw/reads_2.fastq )",
    "expected_format": "IU",
    "compatible_fragment_ratio": 1.0,
    "num_compatible_fragments": 184,
    "num_assigned_fragments": 184,
    "num_consistent_mappings": 0,
    "num_inconsistent_mappings": 255,
    "strand_mapping_bias": NaN,
    "MSF": 0,
    "OSF": 0,
    "ISF": 0,
    "MSR": 0,
    "OSR": 0,
    "ISR": 0,
    "SF": 126,
    "SR": 129,
    "MU": 0,
    "OU": 0,
    "IU": 0,
    "U": 0
}
@rob-p
Copy link
Collaborator

rob-p commented Aug 25, 2018

Hi @kurtwheeler,

Thanks for the detailed bug report. What seems to be going on here is that none of the reads are mapping concordantly, so that we have a paired-end input library where all of the mappable reads are mapping as orphans. This is messing up the computation of the ratio. Definitely, we should not be outputting NaN, but such a case is an interesting one and I wonder if we should be issuing a special warning if we see this in practice.

@rob-p
Copy link
Collaborator

rob-p commented Aug 25, 2018

Ok, more interesting information. I just pushed a fix for this that will put 0 instead of NAN and output a warning. But I ran this sample with --validateMappings introduced a few versions ago, and it seems none of the reads map there. This means the orphans that are mapping must be doing so poorly, and --validateMappings is taking care of this by getting rid of those reads. With that flag, none of the reads map.

@Miserlou
Copy link

Related request, if you're fixing up the JSON formatting, it'd be nice if

"read_files": "( /home/user/data_store/salmon_tests/test_experiment/raw/reads_1.fastq, /home/user/data_store/salmon_tests/test_experiment/raw/reads_2.fastq )"

was replaced with

"read_files": ["/home/user/data_store/salmon_tests/test_experiment/raw/reads_1.fastq", "/home/user/data_store/salmon_tests/test_experiment/raw/reads_2.fastq"]

@rob-p
Copy link
Collaborator

rob-p commented Aug 27, 2018

Both the originally-reported issue (form @kurtwheeler) and the request by @Miserlou are now implemented in develop, and so will be available in the next release, so I'm going to close this issue for now.

@rob-p rob-p closed this as completed Aug 27, 2018
@kurtwheeler
Copy link
Contributor Author

Wow, thanks for the super quick turnaround @rob-p! This is awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants