-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FreeBayes requires unique lane IDs #311
Comments
Just to add to this, since I stumbled across the exact same problem. Basically, Sarek currently uses the lane as the read group ID. That does not seem to be a good solution (as evidenced by the error raised during Freebayes). A lane is commonly understood to be a number, between 1 and 4 (at most). The read group id, by definition, must be a unique identifier of that particular collection of reads. The most unambigious solution would be to use flowcell ID + library ID + lane. Technically, that information can be derived from the fastQ input files. A somewhat crummy work-around may be to use the sampleID + lane, which are already present in the TSV format. But that is technically not a truly unique ID. But it would suppress the error and, within a given Sarek run, probably not cause any problems. |
Maybe to further clarify, the first line of a fastQ file looks like this, usually: @A00686:168:H3HGMDSX2:3:1101:2501:1000 1:N:0:GTCTAATGGC+CCTGACCACT So that could be translated into the following read group id: Or, when using a (made-up) library ID (usually part of the fastq file name) instead of the barcode: H3HGMDSX2.3.J0367871 |
Just to note, I also see this with Sarek-generated BAMs when running outside of Sarek (v2.7.1), using Sentieon TNscope:
Relevant read groups in BAM headers:
NOTE: there is a work-around for Sentieon, so not horribly pressing, but I'm sure this might bite others a bit more |
So flowcell ID before was retrieved here but only when providing the fastq files as wildcard/no tsv input. @maxulysse do you recall why you didn't run this piece of code on all fastq files? Line 4246 in 68b9930
And the a random number was added for good measure, so in this case this problem should never occur: Line 4150 in 68b9930
|
Yes, this was only done when we had no ideas on how many fastq pairs we had (so folder input) |
Any reason not to always retrieve this info? (Although sampleID-laneID for read group ID works as expected) |
Fixed in #549 |
While running FreeBayes tool I get the following error message:
I've done some experimentation and consultation and the reason for this problem seems to be my input file configuration:
Strictly speaking, the problem is related to the lane column (5th). After replacing 1, 1, 1, 1 with 1, 2, 3, 4 - the pipeline works fine.
@apeltzer suggested that this may be a feature, not a bug:
Still, we would prefer to consult that because the documentation is not clear about it (at least I haven't found anything that would explain this behavior). A fix in the docs would be really nice too. At the moment my observation is that the FreeBayes tool requires a unique lane column value for each subject. I was told that @maxulysse may be the right person to ask about it.
Big thanks in advance!
The text was updated successfully, but these errors were encountered: