-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error for some reads when conducting mapping step #42
Comments
Hi Yuqiao Gong The read aligner expects to see similar record id when it processes paired-end reads. For example, the first line of This is mentioned in the log you sent us as well So, the record id of your fastq files have different format than expected for read2tree. This step prints some outputs, please share them with us as well. Thanks! |
Thanks for your prompt response! the content of
However, some reads are successfully mapped(e.g. SRR12980426), but their format is similar to ERR7552499:
It is very weird. Additionally, I followed the tutorial provided by your team to use sradownloader to download reads you used for coronavirus dataset analysis from NCBI, that is python $paper_repo/corona/sradownloader --nogeo --noena --outdir reads data/used_accession.txt --threads 8 the output files are all the SRA reads provided in used_accession.txt (e.g. ERR7552499__1.fastq.gz, ERR7552499__2.fastq.gz, etc) The full output of this step is: > bash jobs/r2t_map_BC512-ERR7552499
--- Generating reference for mapping from folder ---
Re-loading references for mapping from folder: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 82/82 [00:00<00:00, 12925.44 species/s]
--- Mapping of reads to reference sequences ---
Mapping reads to species: 0%| | 0/1 [00:00<?, ? species/s]2023-09-10 14:58:00,160 - read2tree.Mapper - INFO - ERR7552499: --- Mapping of reads to BC512 reference species ---
/home/gurc/mambaforge/envs/evolution/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
self.stdout = io.open(c2pread, 'rb', bufsize)
/home/gurc/mambaforge/envs/evolution/lib/python3.10/subprocess.py:966: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
self.stderr = io.open(errread, 'rb', bufsize)
[MAIN] NextGenMap 0.5.5
[MAIN] Startup : x64 (build Jul 3 2020 02:47:43)
[MAIN] Starting time: 2023-09-10.14:58:00
[CONFIG] Parameter: --affine 0 --argos_min_score 0 --bam 1 --bin_size 2 --block_multiplier 2 --broken_pairs 0 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 4 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 2 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_polya -1 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 0 --no_unal 1 --ocl_threads 1 --output /tmp/ngm_t6d9clkq/BC512_OGs.fa.bam --overwrite 1 --pair_score_cutoff 0.900000 --paired 1 --parse_all 1 --pe_delimiter / --qry1 ../reads/ERR7552499__1.fastq.gz --qry2 ../reads/ERR7552499__2.fastq.gz --qry_count -1 --qry_start 0 --ref /tmp/ngm_t6d9clkq/BC512_OGs.fa --ref_mode -1 --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 0 --step_count 4 --strata 0 --topn 1 --trim5 0 --update_check 0 --very_fast 0 --very_sensitive 0
[NGM] Opening for output (BAM): /tmp/ngm_t6d9clkq/BC512_OGs.fa.bam
[SEQPROV] Encoding reference sequence.
[SEQPROV] Size of reference genome 0 Mbp (max. 17179 Mbp)
[SEQPROV] Allocating 15682 (26426) bytes for the reference.
[SEQPROV] BinRef length: 15679ll (elapsed 0.000092)
[SEQPROV] 0 reference sequences were skipped (length < 10).
[SEQPROV] Writing encoded reference to /tmp/ngm_t6d9clkq/BC512_OGs.fa-enc.2.ngm
[SEQPROV] Writing to disk took 0.00s
[PREPROCESS] Building reference table
[PREPROCESS] Allocated 1 hashtable units (tableLocMax=2^32.000000, genomeSize=2^14.936500)
[PREPROCESS] Building RefTable #0 (kmer length: 13, reference skip: 2)
[PREPROCESS] Number of k-mers: 67108865
[PREPROCESS] Counting kmers took 0.04s
[PREPROCESS] Average number of positions per prefix: 1.000799
[PREPROCESS] Index size: 335544325 byte (67108865 x 5)
[PREPROCESS] Generating index took 1.67s
[PREPROCESS] Allocating and initializing prefix Table took 0.00s
[PREPROCESS] Number of prefix positions is 8769 (4)
[PREPROCESS] Size of RefTable is 35076
[PREPROCESS] Number of repetitive k-mers ignored: 0
[PREPROCESS] Overall time for creating RefTable: 1.71s
[PREPROCESS] Writing RefTable to /tmp/ngm_t6d9clkq/BC512_OGs.fa-ht-13-2.3.ngm
[PREPROCESS] Writing to disk took 0.21s
[PREPROCESS] Max. k-mer frequency set so 100!
[INPUT] Input is paired end data.
[INPUT] Opening file ../reads/ERR7552499__1.fastq.gz for reading
[INPUT] Opening file ../reads/ERR7552499__2.fastq.gz for reading
[INPUT] Input is Fastq
[INPUT] Estimating parameter from data
[INPUT] Reads found in files: 1552620
[INPUT] Average read length: 219 (min: 38, max: 222)
[INPUT] Corridor width: 37
[INPUT] Average kmer hits pro read: 0.315271
[INPUT] Max possible kmer hit: 69
[INPUT] Estimated sensitivity: 0.300000
[INPUT] Estimating parameter took 3.592s
[INPUT] Input is Fastq
[INPUT] Input is Fastq
[OPENCL] Available platforms: 2
[OPENCL] AMD Accelerated Parallel Processing
[OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing
[OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3)
[OPENCL] 1 CPU device found.
[OPENCL] Device 0: Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz (Driver: 1214.3 (sse2,avx))
[OPENCL] 112 CPU cores available.
[INPUT] Error while reading paired end reads.
[INPUT] Names of mates don't match: ERR7552499.1544703 and ERR7552499.1544751.
[INPUT] NextGenMap expects paired end read names with the format: <read name>/<mate nunber> (e.g. @HWI-ST1176_0172:8:1101:1234:1934/1)
[INPUT] Use -d/--pe-delimiter to specify a different delimiter than '/'
[INPUT] If the format of the read names is correct this error might be caused by missing mates in the input file. Please check your input files or use ngm-utils interleave to match mate pairs.
[INPUT] If you are sure that your input files are valid use --skip-mate-check
This error is fatal. Quitting...
, found '[' (at char 0), (line:1, col:1)
2023-09-10 14:58:42,145 - read2tree.wrappers.read_mappers.parser - ERROR - , found '[' (at char 0), (line:1, col:1)
Mapping reads to species: 0%| | 0/1 [00:41<?, ? species/s]
Traceback (most recent call last):
File "/home/gurc/mambaforge/envs/evolution/bin/read2tree", line 16, in <module>
main(sys.argv[1:], exe_name=exe_name(), desc=desc)
File "/home/gurc/mambaforge/envs/evolution/lib/python3.10/site-packages/read2tree/main.py", line 352, in main
Mapper(args, ref_set=reference.ref) # Run the mapping
File "/home/gurc/mambaforge/envs/evolution/lib/python3.10/site-packages/read2tree/Mapper.py", line 80, in __init__
self._map_reads_to_references(ref_set)
File "/home/gurc/mambaforge/envs/evolution/lib/python3.10/site-packages/read2tree/Mapper.py", line 289, in _map_reads_to_references
processed_reads = self._call_wrapper(ref_tmp_file_handle, reads,
File "/home/gurc/mambaforge/envs/evolution/lib/python3.10/site-packages/read2tree/Mapper.py", line 104, in _call_wrapper
ngm = ngm_wrapper()
File "/home/gurc/mambaforge/envs/evolution/lib/python3.10/site-packages/read2tree/wrappers/read_mappers/ngm.py", line 66, in __call__
self.result = self._read_result(error, self.ref_input, self.tmp_folder) # store result
File "/home/gurc/mambaforge/envs/evolution/lib/python3.10/site-packages/read2tree/wrappers/read_mappers/ngm.py", line 117, in _read_result
result = parser.to_dict(outfile, output)
File "/home/gurc/mambaforge/envs/evolution/lib/python3.10/site-packages/read2tree/wrappers/read_mappers/parser.py", line 43, in to_dict
reads_mapped, total_reads, mapping_time = self.parse(stdout)
TypeError: cannot unpack non-iterable NoneType object and the content of #!/bin/bash
read2tree --standalone_path ../marker_genes/ --reads ../reads/ERR7552499__1.fastq.gz ../reads/ERR7552499__2.fastq.gz --output_path ./ --single_mapping BC512 --threads 4 --species_name ERR755249 |
Sorry to bother you again, but I have run some other jobs during this period. I found all reads that cannot be successfully mapped share one thing in common: Just as I said, there are too many 'F' in the quality control line. All the reads files were downloaded following your instructions in the tutorial. Are my reads files different from what you used for coronavirus analysis? If so, could you please tell me where and how you downloaded these reads files so that I can reproduce your work? Thank you for your patient answer. |
Sorry, I forgot to answer there, it just slipped my mind. The list of coronavirus accession IDs are in the supplementary of paper. |
Hello
Thank you for your convenient reed2tree tool and I am very sorry to cause you trouble again.
When I followed the coronavirus dataset analysis tutorial provided by your team, I met some trouble conducting the mapping step:
A
subprocess.CalledProcessError
occurred when conducting this step when mapping the species ERR7552499. So I solely selected a job file, say, r2t_map_BC512-ERR7552499 for running to see what was going on. I got some key information in the output:It seemed like there was something wrong with the paired-end reads file, so I unzipped and checked the content of
ERR7552499__1.fastq
:Is this problem caused by too many 'F' in the quality control line? However, all these reads files are downloaded using the sradownloader as in the tutorial . How can I fix it?
Any help would be greatly appreciated.
Best regards,
Yuqiao Gong
The text was updated successfully, but these errors were encountered: