Which barcode-specific bam are used? #33

biofilos · 2022-09-20T07:36:17Z

Hello. First, thank you very much for the pipeline

I am in the process of implementing your pipeline in WDL (aiming to run it in our Cromwell server via AWS with infrastructure that requires WDL files). So far, I get most steps of the pipeline. However, it is not clear to me how the different fastQ files from the demultiplexed step are used.

As I understand it, after the demultiplexing step (running eclipdemux), I get a llist of files, one per barcode of the form *.BC.r1.fq.gz and *.BC.r2.fq.gz, where BC is each of the barcodes.

From what I can gather in the SOP , the rest of the steps are done starting with barcode-specific fastq (in the SOP, *CO1.r1.fq.gz).

My question is, should I merge these files at a prticular point in the pipeline? Should I merge the files of all the barcodes, or only those using the barcodeA and barcodeB?

Thank you

Juan Felipe Ortiz, Ph.D.
GeDaC. Cancer Sciences Institute
National University of Singapore

byee4 · 2022-10-11T08:41:43Z

Hi Juan, Cool! My experience with WDL/Cromwell isn’t quite proficient but I’d be curious to know how WDL works with AWS. I did hear that AWS was starting to support CWL although am unsure at which capacity. For paired-end eCLIP, you’re correct that the eclipdemux step will produce several files, at which point you will want only the files associated with the expected barcode (and make sure most of the reads do end up getting binned here). For ENCODE, we did not assign barcodes to size-matched input samples, so all input samples are effectively unassigned (the designation we use is ‘NIL’) instead, though this is experiment specific. You’re also correct that these files will be merged after PCR collapsing/deduplication. Then, R2 of the merged bam files will be used for peak calling with CLIPper. If the size-matched inputs lack inline barcodes, they may not need to be merged. SECURE: MESSAGE FROM Juan Felipe Ortiz ON 9/20/22, 12:36 AM Hello. First, thank you very much for the pipeline I am in the process of implementing your pipeline in WDL (aiming to run it in our Cromwell server via AWS with infrastructure that requires WDL files). So far, I get most steps of the pipeline. However, it is not clear to me how the different fastQ files from the demultiplexed step are used. As I understand it, after the demultiplexing step (running eclipdemux), I get a llist of files, one per barcode of the form *.BC.r1.fq.gz and *.BC.r2.fq.gz, where BC is each of the barcodes. From what I can gather in the SOP<https://urldefense.com/v3/__https:/raw.githubusercontent.com/YeoLab/eclip/master/documentation/eCLIP_analysisSOP_v2.2.docx__;!!LLK065n_VXAQ!jgEJwqjfBnBnGKgFAK-UfwtQnhK1luXB1OUmn-EzUGv_8HT-M5O2LDA8oSEc9_NtBOtdxojSGHcIStNWkWW9sxv6pxv6$> , the rest of the steps are done starting with wach barcode-specific fastq (in the SOP, *CO1.r1.fq.gz). My question is, should I merge these files at a prticular point in the pipeline? Should I merge the files of all the barcodes, or only those using the barcodeA and barcodeB? Thank you Juan Felipe Ortiz, Ph.D. GeDaC. Cancer Sciences Institute National University of Singapore — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/YeoLab/eclip/issues/33__;!!LLK065n_VXAQ!jgEJwqjfBnBnGKgFAK-UfwtQnhK1luXB1OUmn-EzUGv_8HT-M5O2LDA8oSEc9_NtBOtdxojSGHcIStNWkWW9s_GoBOKe$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AB7TJP6FWXTOWKLG4LPL7B3V7FSPZANCNFSM6AAAAAAQQZATXM__;!!LLK065n_VXAQ!jgEJwqjfBnBnGKgFAK-UfwtQnhK1luXB1OUmn-EzUGv_8HT-M5O2LDA8oSEc9_NtBOtdxojSGHcIStNWkWW9s0Vqfxcl$>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which barcode-specific bam are used? #33

Which barcode-specific bam are used? #33

biofilos commented Sep 20, 2022 •

edited

Loading

byee4 commented Oct 11, 2022 via email

Which barcode-specific bam are used? #33

Which barcode-specific bam are used? #33

Comments

biofilos commented Sep 20, 2022 • edited Loading

byee4 commented Oct 11, 2022 via email

biofilos commented Sep 20, 2022 •

edited

Loading