Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds a barcode option to FastqToBam #936

Merged
merged 5 commits into from
Sep 11, 2023
Merged

Conversation

bwlang
Copy link
Contributor

@bwlang bwlang commented Sep 9, 2023

No description provided.

Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s leave out the change for platform unit as I don’t think it’s wrong to have the sample barcode as part of the platform unit, and in some cases is useful (we did this at my time at Broad, often)

This reverts commit b960886.

GATK seems to want a sample-barcode in there too

>PU = Platform Unit
>The PU holds three types of information, the {FLOWCELL_BARCODE}.{LANE}.{SAMPLE_BARCODE}. The {FLOWCELL_BARCODE} refers to the unique identifier for a particular flow cell. The {LANE} indicates the lane of the flow cell and the {SAMPLE_BARCODE} is a sample/library-specific identifier. Although the PU is not required by GATK but takes precedence over ID for base recalibration if it is present. In the example shown earlier, two read group fields, ID and PU, appropriately differentiate flow cell lane, marked by .2, a factor that contributes to batch effects.
>(https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups)
Copy link
Member

@nh13 nh13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more question, thanks for your patience!

@@ -96,6 +96,7 @@ class FastqToBam
@arg( doc="Read group ID to use in the file header.") val readGroupId: String = "A",
@arg( doc="The name of the sequenced sample.") val sample: String,
@arg( doc="The name/ID of the sequenced library.") val library: String,
@arg( doc="Library or Sample barcode sequence.") val barcode: Option[String] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this further, how would you feel about making this a Seq[String] with default Seq.empty, then joining them with a hyphen? That way folks don't have to do so themselves?

Suggested change
@arg( doc="Library or Sample barcode sequence.") val barcode: Option[String] = None,
@arg( doc="Library or Sample barcode sequence.", minElements=0) val barcode: Seq[String] = Seq.empty,

@@ -135,6 +136,7 @@ class FastqToBam
val rg = new SAMReadGroupRecord(this.readGroupId)
rg.setSample(sample)
rg.setLibrary(library)
this.barcode.foreach(bc => rg.setBarcodes(util.Arrays.asList(bc)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in an idea, but this could work:

    this.barcode.foreach(bc => rg.setBarcodes(util.Arrays.asList(bc)))
    if (this.barcode.nonEmpty) rg.setBarcodes(...)

@bwlang
Copy link
Contributor Author

bwlang commented Sep 11, 2023

I tried this
if (this.barcode.nonEmpty) rg.setBarcodes(util.Arrays.asList(this.barcode.mkString("+")))
testing like this:
barcode=Seq("TATA", "GAGA"),
which produces a single barcode "TATA+GAGA" as I expected.

but i was surprised to see that ...
if (this.barcode.nonEmpty) rg.setBarcodes(util.Arrays.asList(this.barcode.mkString("-")))
gets split by htsjdk's getBarcodes on "-"( i guess???) , producing a list of 2 barcodes ["TATA", "GAGA"]

I've done like you asked but it's a bit too much magic for my taste...

@nh13
Copy link
Member

nh13 commented Sep 11, 2023

@bwlang want to revert to the pre-Seq changes then I'll run the CI and merge if it passes?

@bwlang
Copy link
Contributor Author

bwlang commented Sep 11, 2023

It doesn't really matter which way we pass the barcodes in.
I didn't appreciate that until testing with 'TATA-GAGA'... so whichever way you think is clearer.

@nh13
Copy link
Member

nh13 commented Sep 11, 2023

Thank-you for your patience @bwlang ! Making sure unit tests pass here, then I'll merge.

@nh13 nh13 merged commit 2af51ac into fulcrumgenomics:main Sep 11, 2023
2 checks passed
@bwlang bwlang deleted the bc_option branch September 12, 2023 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants