Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index sequence to Unmapped read FASTQ headers #2223

Open
andrewkennard opened this issue Oct 3, 2024 · 0 comments
Open

Add index sequence to Unmapped read FASTQ headers #2223

andrewkennard opened this issue Oct 3, 2024 · 0 comments

Comments

@andrewkennard
Copy link

andrewkennard commented Oct 3, 2024

ENHANCEMENT REQUEST:

I've noticed that when I run STAR with -outReadsUnmapped Fastx on Illumina reads that rather than just appending the mapping status of the read and the mate, the final field of the original Illumina header is modified in a way that removes the index sequence. It would be very nice to retain the index sequence in the headers of the unmapped reads.

Example of a read in the raw data and in the Unmapped file:

raw data:
@NS500540:129:HKJG2BGX7:4:13402:14458:19861 1:N:0:CGTAAG
unmapped file:
@NS500540:129:HKJG2BGX7:4:13402:14458:19861 0:N:  00

I have only tested this with STAR 2.7.9a but I didn't see anything about this issue in the changelogs of subsequent releases or in the issue tracker.

Why this would be useful: I ran many samples through STAR at the same time, which were only disambiguated by the index sequence. If I had access to the index sequence, I could easily identify which sample the unmapped read came from; right now I can only pinpoint which sequencing run it was from based on the name conventions in Illumina FASTQ headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant