-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
paired reads have different names #18
Comments
You need to use the sequence.index file (https://github.com/genome-in-a-bottle/giab_data_indexes/blob/master/NA12878/sequence.index.NA12878_Illumina300X_wgs_09252015 in your case) to match R1 and R2 files. For 300X ILMN raw reads, some R1/R2 files may have same names, but located in different directories, e.g., ftp://ftp-trace.ncbi.nih.gov/ReferenceSamples/giab/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/131219_D00360_005_BH814YADXX/Project_RM8398/Sample_U0a/U0a_CGATGT_L001_R1_001.fastq.gz cabfe5b609fb1fe11619fdc72060185c ftp://ftp-trace.ncbi.nih.gov/ReferenceSamples/giab/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/131219_D00360_005_BH814YADXX/Project_RM8398/Sample_U0a/U0a_CGATGT_L001_R2_001.fastq.gz 6f0faed9249c1a850e6ce57c61e26e04 HG001 ftp://ftp-trace.ncbi.nih.gov/ReferenceSamples/giab/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/131219_D00360_006_AH81VLADXX/Project_RM8398/Sample_U0a/U0a_CGATGT_L001_R1_001.fastq.gz cc35b61053fe7505715f93175bbb16c4 ftp://ftp-trace.ncbi.nih.gov/ReferenceSamples/giab/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/131219_D00360_006_AH81VLADXX/Project_RM8398/Sample_U0a/U0a_CGATGT_L001_R2_001.fastq.gz cd12a23c3d71061e1bc673ce8c598dba HG001 Hope this helps. |
yeah I have used the forward and the reverse reads for the same run from the same folder which is supposed to be on the same line in the link you posted. so I mean I used the links for the ftp from one line which is supposed to be matching the same run. |
In your example, can you post the full path of the two files you were using for mapping? have you checked the md5? |
Hi, |
I have checked the md5 now and it looks something wrong with the files download, I am downloading it now and will check it again, and get back to you. Thanks, |
Hi , @chunlinxiao |
I have tried to do the alignment process using 2 paired reads from the folder |
thanks for the update and glad your alignment process was fine now - I also tested your pairs on our side, nothing was wrong, so the paired data is fine. Regarding the md5, we recently performed a metadata collection/analysis regarding all fastqs, involving gunzip/gzip - this may produce different md5s (from different gz file header if not using gzip -n ). However, the uncompressed file (fastq file) are unchanged with identical md5. The sequence.index files may need to be updated accordingly. |
so what do you think of depending on the old FastQs from 2014 ? I am running a benchmarking process so is it fine to use those fastqs and then using the VCFs from the NIST V4 directory ? |
Hi @Mahmoudbassuoni - all of the files in those directories were generated ~2014. They are probably ok to use for some purposes, but if you want to understand how your methods work on more recent illumina data, you may want to use data from this publication: https://doi.org/10.1101/2020.12.11.422022. |
Hi @Mahmoudbassuoni , the md5s were updated in sequence.index.NA12878_Illumina300X_wgs_09252015_updated (you can follow the link from the table). |
Hi, I am trying to run the alignment using bwa mem for the 2 files "U0a_CGATGT_L001_R1_001.fastq.gz" "U0a_CGATGT_L001_R2_001.fastq.gz" I already got from the FTP site with the reference "GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz" and the command I am using is
bwa mem -t 16 -R '@RG\tID:H814YADXX.5.CGATGT.1101\tSM:HG001\tPL:illumina' GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta.gz U0a_CGATGT_L001_R1_001.fastq.gz U0a_CGATGT_L001_R2_001.fastq.gz | samtools view -b - >HG001.GRCh38_no_alt_analysis_set.bam
but I am getting an error with the sequence headers:
I have tried to sorting the 2 files using fastq-sort but still getting the same error, anyone can help ?
The text was updated successfully, but these errors were encountered: