nuclear RNA #11

lpantano · 2019-07-19T19:30:53Z

Hi,

I am trying the tool for the quantification of nuclear RNA, after my recent tweets with Lior Patcher.

Every commands in the Velocity tutorial seems to work, but I want only the quantification by gene taking into account reads in introns and exons, so I am really don't want the outputs in the velocity tutorial.

I imagine that something like this is the solution after doing correct and sort:

bustools count -o counts -g t2g.txt -e matrix.ec -t transcripts.txt --genecounts output.sort.bus

but I want to make sure that t2g.txt is the correct. I noticed the annotation for introns is TRANSCRIPT.N-I meanwhile the exons is TRANSCRIPT.N. If the t2g.txt file is lie this:

ENST00000456328.1       ENSG00000223972 DDX11L1
ENST00000450305.2       ENSG00000223972 DDX11L1
ENST00000488147.3       ENSG00000227232 WASH7P

It would recognized introns and exons, or I need to add the intron annotation to the file?

Thanks!

The text was updated successfully, but these errors were encountered:

Munfred · 2019-07-19T22:40:15Z

The cDNA_introns.t2g.txt file provided in the tutorial does have both intons and exons in the same txt file. You can take a look at the beginning and end of the file by doing:

$ wget https://github.com/BUStools/getting_started/releases/download/velocity_tutorial/cDNA_introns.t2g.txt.gz

$ gunzip cDNA_introns.t2g.txt.gz

$ head cDNA_introns.t2g.txt
ENST00000456328.1       ENSG00000223972 DDX11L1
ENST00000450305.2       ENSG00000223972 DDX11L1
ENST00000488147.3       ENSG00000227232 WASH7P
ENST00000619216.4       ENSG00000278267 MIR6859-1
ENST00000473358.5       ENSG00000243485 MIR1302-2HG
ENST00000469289.6       ENSG00000243485 MIR1302-2HG
ENST00000607096.7       ENSG00000284332 MIR1302-2
ENST00000417324.8       ENSG00000237613 FAM138A
ENST00000461467.9       ENSG00000237613 FAM138A
ENST00000606857.10      ENSG00000268020 OR4G4P

$ tail cDNA_introns.t2g.txt
ENST00000463325.1056070-I       ENSG00000184319 RPL23AP82
ENST00000463325.1056071-I       ENSG00000184319 RPL23AP82
ENST00000494075.1056072-I       ENSG00000184319 RPL23AP82
ENST00000494075.1056073-I       ENSG00000184319 RPL23AP82
ENST00000494075.1056074-I       ENSG00000184319 RPL23AP82
ENST00000423888.1056075-I       ENSG00000184319 RPL23AP82
ENST00000423888.1056076-I       ENSG00000184319 RPL23AP82
ENST00000423888.1056077-I       ENSG00000184319 RPL23AP82
ENST00000480246.1056078-I       ENSG00000184319 RPL23AP82
ENST00000480246.1056079-I       ENSG00000184319 RPL23AP82

lpantano · 2019-07-22T18:44:47Z

Thanks, so this command should be ok?:

bustools count -o genes/ --genecounts -g cDNA_introns.t2g.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus

Munfred · 2019-07-23T05:25:47Z

Yes.

Just so you know, there is small issue right now in that bustools will not create the subfolder if you give it a multi folder output (e.g. if you want the output on ./sample2/genecounts) so you need to make sure the folder exist in that case. If you'd like to do that, this is how I'd make sure the folder gets created prior to running the command:

mkdir -p ./sample2/genecounts

bustools count \
--output  ./sample2/genecounts/genes \
--genecounts \
--genemap /references/homo_sapiens-ensembl-96/transcripts_to_genes.txt \
--ecmap ./sample2/matrix.ec \
--txnames  ./sample2/transcripts.txt \
./sample2/output.corrected.sorted.bus

lpantano · 2019-07-23T17:34:57Z

Thank you!

Sadly I am having a segmentation fault (with the conda and source code compiled by myself) only for that command, this is the debug output, I don't know if that helps.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x000000000043f9e0 in std::vector<int, std::allocator<int> >::begin (this=0x2aabd9af73f8) at /usr/include/c++/4.8.2/bits/stl_vector.h:548
548           { return const_iterator(this->_M_impl._M_start); }
(gdb) bt
#0  0x000000000043f9e0 in std::vector<int, std::allocator<int> >::begin (this=0x2aabd9af73f8) at /usr/include/c++/4.8.2/bits/stl_vector.h:548
#1  0x000000000045cb53 in intersect_genes_of_ecs (ecs=std::vector of length 1, capacity 100 = {...}, ec2genes=std::vector of length 12317869, capacity 16777216 = {...},
    glist=std::vector of length 0, capacity 100) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/Common.cpp:174
#2  0x0000000000450ddb in __lambda1::operator() (__closure=0x7fffffffbdb0, v=std::vector of length 6345915, capacity 6400000 = {...})
    at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_count.cpp:177
#3  0x0000000000451a87 in bustools_count (opt=...) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_count.cpp:241
#4  0x000000000043dd35 in main (argc=12, argv=0x7fffffffd868) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_main.cpp:1072

The output folder has an empty MTX file.

the other count commands work, so is not this step specifically.

I will try the test data to make sure is not my sample, but I really want to have this working with my data, let me know if I can give you any more information.

If it is the cpp library version, can you tell me the one you have that is working so I can try to compile versus that particular version?

thanks!

Munfred · 2019-07-23T18:33:21Z

Please make sure that all of the files you're giving bustools exist (that the path is correct) and that the bus file is not empty. If the issue persists please try again with test data. Every time we've seen seg faults its because either we were giving it empty files, or paths to files that didn't exist.

…

On Tue, Jul 23, 2019, 10:35 Lorena Pantano ***@***.***> wrote: Thank you! Sadly I am having a segmentation fault (with the conda and source code compiled by myself) only for that command, this is the debug output, I don't know if that helps. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Program received signal SIGSEGV, Segmentation fault. 0x000000000043f9e0 in std::vector<int, std::allocator<int> >::begin (this=0x2aabd9af73f8) at /usr/include/c++/4.8.2/bits/stl_vector.h:548 548 { return const_iterator(this->_M_impl._M_start); } (gdb) bt #0 0x000000000043f9e0 in std::vector<int, std::allocator<int> >::begin (this=0x2aabd9af73f8) at /usr/include/c++/4.8.2/bits/stl_vector.h:548 #1 0x000000000045cb53 in intersect_genes_of_ecs (ecs=std::vector of length 1, capacity 100 = {...}, ec2genes=std::vector of length 12317869, capacity 16777216 = {...}, glist=std::vector of length 0, capacity 100) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/Common.cpp:174 #2 0x0000000000450ddb in __lambda1::operator() (__closure=0x7fffffffbdb0, v=std::vector of length 6345915, capacity 6400000 = {...}) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_count.cpp:177 #3 0x0000000000451a87 in bustools_count (opt=...) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_count.cpp:241 #4 0x000000000043dd35 in main (argc=12, argv=0x7fffffffd868) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_main.cpp:1072 The output folder has an empty MTX file. the other count commands work, so is not this step specifically. I will try the test data to make sure is not my sample, but I really want to have this working with my data, let me know if I can give you any more information. If it is the cpp library version, can you tell me the one you have that is working so I can try to compile versus that particular version? thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AC7MY4FFG6TOVIQZLZMFIHLQA46MFA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2T33AQ#issuecomment-514309506>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC7MY4DY2GYRRHUCAWFHV73QA46MFANCNFSM4IFLDGNQ> .

lpantano · 2019-07-24T15:25:14Z

Thank you.

files exist and they are not empty:

4.7G    matrix.ec
1.5G    output.correct.sort.bus
30M     transcripts.txt
58M  cDNA_introns.t2g.txt

output folder exists. this is the command I used:

bustools count -o genes/genes --genecounts -g cDNA_introns.t2g.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus

it takes a while to give the error, like 20 min or so. There is an empty file at genes/genes.mtx.

I tried the with the example data: bus_output_06 and the same error appeared. I used only a pair of files when aligning to speed up computation:

06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL
73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz

I tried in a different cluster machines, and the same seg.fault appeared:

Program received signal SIGSEGV, Segmentation fault.
0x000055555557682b in intersect_genes_of_ecs(std::vector<int, std::allocator<int> > const&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::vector<int, std::allocator<int> >&) ()

I know it is a difficult error to catch up, I am happy to help in any way possible.

cheers

Munfred · 2019-07-25T05:35:16Z

If you can provide all files and the bustools binary you are using we will try to reproduce it and identify a fix

…

On Wed, 24 Jul 2019 at 08:25, Lorena Pantano ***@***.***> wrote: Thank you. files exist and they are not empty: 4.7G matrix.ec 1.5G output.correct.sort.bus 30M transcripts.txt 58M cDNA_introns.t2g.txt output folder exists. this is the command I used: bustools count -o genes/genes --genecounts -g cDNA_introns.t2g.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus it takes a while to give the error, like 20 min or so. There is an empty file at genes/genes.mtx. I tried the with the example data: bus_output_06 and the same error appeared. I used only a pair of files when aligning to speed up computation: 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL 73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz I tried in a different cluster machines, and the same seg.fault appeared: Program received signal SIGSEGV, Segmentation fault. 0x000055555557682b in intersect_genes_of_ecs(std::vector<int, std::allocator<int> > const&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::vector<int, std::allocator<int> >&) () I know it is a difficult error to catch up, I am happy to help in any way possible. cheers — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AC7MY4BJM7TATBH7JEQLIUTQBBX5XA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WWHRI#issuecomment-514679749>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC7MY4AWYB3XD3GP4QWLITDQBBX5XANCNFSM4IFLDGNQ> .

lpantano · 2019-07-25T18:41:40Z

Thank you! Here are the files:

wget https://lpantano-debug.s3.amazonaws.com/bustools/bustools
wget https://lpantano-debug.s3.amazonaws.com/bustools/cDNA_introns.t2g.txt
wget https://lpantano-debug.s3.amazonaws.com/bustools/matrix.ec
wget https://lpantano-debug.s3.amazonaws.com/bustools/output.correct.sort.bus
wget https://lpantano-debug.s3.amazonaws.com/bustools/transcripts.txt

I installed bustools with conda.

thanks so much

Munfred · 2019-07-26T05:10:44Z

How did you generate this bus file? I converted the 2.6GB bus file you provided into text to inspect it and it expanded to 205GB.

Each entry that should correspond to a UMI has a length of 2573 charachters, and it repeats every 32 characters. I have attached the first 5 lines of the file for you to understand.

If you can provide the detailed steps you used to generate this file we can look further into the issue.

nuclear_rna_first_5_lines.txt

lpantano · 2019-07-26T13:00:31Z

This is for the the velocity tutorial, I just took on of the samples: ``` kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 - ``` Is this the information you need? I just followed the steps in the tutorial. Thanks for helping with this. On July 26, 2019 at 1:10:45 AM, Eduardo Beltrame ([email protected]) wrote: How did you generate this bus file? I converted the 2.6GB bus file you provided into text to inspect it and it expanded to 205GB. Each entry that should correspond to a UMI has a length of 2573 charachters, and it repeats every 32 characters. I have attached the first 5 lines of the file for you to understand. If you can provide the detailed steps you used to generate this file we can look further into the issue. nuclear_rna_first_5_lines.txt <https://github.com/BUStools/bustools/files/3434254/nuclear_rna_first_5_lines.txt> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AAML6HHNZUSIGRUK2DXKBALQBKBNLA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23QNCI#issuecomment-515311241>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAML6HAZCKJOS4D5TBC546TQBKBNLANCNFSM4IFLDGNQ> .

lpantano · 2019-07-26T13:07:10Z

The fastq files were done by: ``` bamtofastq --reads-per-fastq=500000000 10X_17_029.bam ./06 ``` Can I convert the BUS file to text? I have my sample as well and I could see if that happens with my sample as well. Or I can give it to your. Thanks On July 26, 2019 at 9:00:29 AM, Lorena Pantano ([email protected]) wrote: This is for the the velocity tutorial, I just took on of the samples: ``` kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 - ``` Is this the information you need? I just followed the steps in the tutorial. Thanks for helping with this. On July 26, 2019 at 1:10:45 AM, Eduardo Beltrame ([email protected]) wrote: How did you generate this bus file? I converted the 2.6GB bus file you provided into text to inspect it and it expanded to 205GB. Each entry that should correspond to a UMI has a length of 2573 charachters, and it repeats every 32 characters. I have attached the first 5 lines of the file for you to understand. If you can provide the detailed steps you used to generate this file we can look further into the issue. nuclear_rna_first_5_lines.txt <https://github.com/BUStools/bustools/files/3434254/nuclear_rna_first_5_lines.txt> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AAML6HHNZUSIGRUK2DXKBALQBKBNLA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23QNCI#issuecomment-515311241>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAML6HAZCKJOS4D5TBC546TQBKBNLANCNFSM4IFLDGNQ> .

Munfred · 2019-07-26T17:23:10Z

Yes you can convert it to text by doing `bustools text -o bus_output_filename.txt bus_input_file.bus` On Fri, 26 Jul 2019 at 06:07, Lorena Pantano <[email protected]> wrote:

…

The fastq files were done by: ``` bamtofastq --reads-per-fastq=500000000 10X_17_029.bam ./06 ``` Can I convert the BUS file to text? I have my sample as well and I could see if that happens with my sample as well. Or I can give it to your. Thanks On July 26, 2019 at 9:00:29 AM, Lorena Pantano ***@***.***) wrote: This is for the the velocity tutorial, I just took on of the samples: ``` kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 - ``` Is this the information you need? I just followed the steps in the tutorial. Thanks for helping with this. On July 26, 2019 at 1:10:45 AM, Eduardo Beltrame ***@***.*** ) wrote: How did you generate this bus file? I converted the 2.6GB bus file you provided into text to inspect it and it expanded to 205GB. Each entry that should correspond to a UMI has a length of 2573 charachters, and it repeats every 32 characters. I have attached the first 5 lines of the file for you to understand. If you can provide the detailed steps you used to generate this file we can look further into the issue. nuclear_rna_first_5_lines.txt < https://github.com/BUStools/bustools/files/3434254/nuclear_rna_first_5_lines.txt > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < #11?email_source=notifications&email_token=AAML6HHNZUSIGRUK2DXKBALQBKBNLA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23QNCI#issuecomment-515311241 >, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAML6HAZCKJOS4D5TBC546TQBKBNLANCNFSM4IFLDGNQ > . — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AC7MY4GLJ364NNXCRVAZGV3QBLZH7A5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24QZHQ#issuecomment-515443870>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC7MY4FGUUXK7SSKSJUVNUTQBLZH7ANCNFSM4IFLDGNQ> .

lpantano · 2019-07-26T17:57:19Z

yes, the bus file from my sample looks the same than that. Used the same command I posted before to produce the bus file, with kallisto.

Munfred · 2019-07-26T20:06:06Z

Ok, can you confirm that the fastq files you generated using `bamtofastq` look correct? If you can paste here the output of head it would be good. If the fastqs look correct then it might have been an issue with the files given to kallisto, if you can also post the commands you use to call `kallisto bus` and the files used that might solve the mystery

…

On Fri, 26 Jul 2019 at 10:57, Lorena Pantano ***@***.***> wrote: yes, the bus file from my sample looks the same than that. Used the same command I posted before to produce the bus file, with kallisto. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AC7MY4F7TQT46GJOG52FOSTQBM3H7A5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD25JQGI#issuecomment-515545113>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC7MY4CBIETH6LOXRGEKXA3QBM3H7ANCNFSM4IFLDGNQ> .

lpantano · 2019-07-26T20:16:17Z

Sure.

These are the commands to create the bus file from the bam files:

bamtofastq --reads-per-fastq=500000000 10X_17_029.bam ./06

kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz

bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 -

The other sample I have is not generated with bamtofastq, they were generated with cellranger mkfasta and the bus file is the same.

R1:

@D00456:228:HL73JBCXY:2:1111:15821:53600 1:N:0:0
CTAGCCCTAACCCTAACCCTAACCCT
+
.GAGA<GGGGAGGGGGGGGAAA.A<G
@D00456:228:HL73JBCXY:2:1213:6403:92501 1:N:0:0
AACCCTAACCCTAACCCTAACCCTAA
+
GGGGAGGGGGAGGAGGGGGG..AGAG
@D00456:228:HL73JBCXY:2:2106:9266:5974 1:N:0:0
ACGTCAAAGCAGCCTCCACAATGTGG

R2:

@D00456:228:HL73JBCXY:2:1111:15821:53600 3:N:0:0
GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGCTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGATAGGGTTCGGTCTAGGGATAGG
+
AGGAAA..GG.<<.G<A.G<G..<G<G<GAGG..<AGAG<..<<A..<<GG.<<GGA<..<<<..<...<..<<.<<AA...<...<.<...<<....
@D00456:228:HL73JBCXY:2:1213:6403:92501 3:N:0:0
GGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGAGTAGGGATGGGGAGCGGGGAGGGGAGAGG
+
GA<G.AAA<GGGAG<AAGGGI<..<GGGGGAAAG..<.<.<.<AGAG.GGG.<.GAG..G<<<...<<....<<......<.<.<.......<..<..
@D00456:228:HL73JBCXY:2:2106:9266:5974 3:N:0:0
GCTAAGAGACAGCAAATACACATGAACAGAAAGAAGAGGTCAAAGAAAAGGCTGACGGCAAGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGC

In a last note, the velocity tutorial works if I use bustools capture and bustools count to get the two type of matrices. This is only happening when I want just to do bustools count with the matrix.ec file firectly since I want to quantify all reads into genes. (https://www.kallistobus.tools/velocity_tutorial.html)

thanks!

Munfred · 2019-07-26T22:16:50Z

Thanks, this all looks good. One last question - are you by any chance running this on a 32 bit machine?

…

On Fri, Jul 26, 2019, 13:16 Lorena Pantano ***@***.***> wrote: Sure. These are the commands to create the bus file from the bam files: bamtofastq --reads-per-fastq=500000000 10X_17_029.bam ./06 kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 - The other sample I have is not generated with bamtofastq, they were generated with cellranger mkfasta and the bus file is the same. R1: @D00456:228:HL73JBCXY:2:1111:15821:53600 1:N:0:0 CTAGCCCTAACCCTAACCCTAACCCT + .GAGA<GGGGAGGGGGGGGAAA.A<G @D00456:228:HL73JBCXY:2:1213:6403:92501 1:N:0:0 AACCCTAACCCTAACCCTAACCCTAA + GGGGAGGGGGAGGAGGGGGG..AGAG @D00456:228:HL73JBCXY:2:2106:9266:5974 1:N:0:0 ACGTCAAAGCAGCCTCCACAATGTGG R2: @D00456:228:HL73JBCXY:2:1111:15821:53600 3:N:0:0 GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGCTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGATAGGGTTCGGTCTAGGGATAGG + AGGAAA..GG.<<.G<A.G<G..<G<G<GAGG..<AGAG<..<<A..<<GG.<<GGA<..<<<..<...<..<<.<<AA...<...<.<...<<.... @D00456:228:HL73JBCXY:2:1213:6403:92501 3:N:0:0 GGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGAGTAGGGATGGGGAGCGGGGAGGGGAGAGG + GA<G.AAA<GGGAG<AAGGGI<..<GGGGGAAAG..<.<.<.<AGAG.GGG.<.GAG..G<<<...<<....<<......<.<.<.......<..<.. @D00456:228:HL73JBCXY:2:2106:9266:5974 3:N:0:0 GCTAAGAGACAGCAAATACACATGAACAGAAAGAAGAGGTCAAAGAAAAGGCTGACGGCAAGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGC In a last note, the velocity tutorial works if I use bustools capture and bustools count to get the two type of matrices. This is only happening when I want just to do bustools count with the matrix.ec file firectly since I want to quantify all reads into genes. ( https://www.kallistobus.tools/velocity_tutorial.html) thanks! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11?email_source=notifications&email_token=AC7MY4BGCD4VK3OFTQIRICLQBNLRDA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD25TMEY#issuecomment-515585555>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC7MY4EK3CRRD5ULBB7ELXDQBNLRDANCNFSM4IFLDGNQ> .

lpantano · 2019-07-26T22:20:43Z

thank you for confirming. nope, 64 linux machines. thanks!

Munfred · 2019-07-26T22:47:08Z

Ok we're very puzzled as to how the bus file with the 2573 characters in the UMI have been made, and ran out of ideas.

Can you subsample 100k reads from R1 and R2 so that we can try to reproduce the problematic bus file? seqtk is an easy way to subsample them: https://github.com/lh3/seqtk

lpantano · 2019-07-27T00:09:36Z

Thanks. I’ll do that on Monday. I checked the BUS file from kallisto and looks like this: CTGTTTACATACGCTA TAATAAGGAA 1746067 1 CTTAACTAGCTCTCGG ATCGTCCTGG 1746067 1 CTTTGCGAGCGATGAC TCCAGGGTGA 1746067 1 CTTTGCGGTACTCGCG TTAAATCGTG 1746067 1 CTTTGCGGTGTTCGAT CCCGCACGGT 1746067 1 CTTTGCGGTGTTCGAT CCCGCACGGT 1746067 1 GAAGCAGAGGCTCATT TGTTTTGCAC 1746067 1 GAAGCAGAGGCTCATT GTCTTTGATG 1746067 1 GAATGAACATGCTAGT AAATGAACAA 1746067 1 GACAGAGCAATCGAAA CACGGATGTC 1746067 1 But after bustools correct and sort it looks like this (it seems the problem is the correct | sort step): AAAAAAAAAAAAAAAA AAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAA AAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAA AAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACC AAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAA AAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAA CAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAA AAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAA AAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAA AACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAA AAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAA AAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACC AAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAA AAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCA 0 3275751424

lpantano · 2019-08-01T13:25:09Z

Hi,

I uploaded the BUS file before sorting and correction that it seems right. Do you still want the FASTQ files?

https://lpantano-debug.s3.amazonaws.com/bustools/output.bus

As I posted last week, the weird bus appears after the correct command.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nuclear RNA #11

nuclear RNA #11

lpantano commented Jul 19, 2019

Munfred commented Jul 19, 2019 •

edited

Loading

lpantano commented Jul 22, 2019

Munfred commented Jul 23, 2019

lpantano commented Jul 23, 2019

Munfred commented Jul 23, 2019 via email

lpantano commented Jul 24, 2019

Munfred commented Jul 25, 2019 via email

lpantano commented Jul 25, 2019

Munfred commented Jul 26, 2019

lpantano commented Jul 26, 2019 via email

lpantano commented Jul 26, 2019 via email

Munfred commented Jul 26, 2019 via email

lpantano commented Jul 26, 2019

Munfred commented Jul 26, 2019 via email

lpantano commented Jul 26, 2019

Munfred commented Jul 26, 2019 via email

lpantano commented Jul 26, 2019 via email

Munfred commented Jul 26, 2019

lpantano commented Jul 27, 2019 via email

lpantano commented Aug 1, 2019

nuclear RNA #11

nuclear RNA #11

Comments

lpantano commented Jul 19, 2019

Munfred commented Jul 19, 2019 • edited Loading

lpantano commented Jul 22, 2019

Munfred commented Jul 23, 2019

lpantano commented Jul 23, 2019

Munfred commented Jul 23, 2019 via email

lpantano commented Jul 24, 2019

Munfred commented Jul 25, 2019 via email

lpantano commented Jul 25, 2019

Munfred commented Jul 26, 2019

lpantano commented Jul 26, 2019 via email

lpantano commented Jul 26, 2019 via email

Munfred commented Jul 26, 2019 via email

lpantano commented Jul 26, 2019

Munfred commented Jul 26, 2019 via email

lpantano commented Jul 26, 2019

Munfred commented Jul 26, 2019 via email

lpantano commented Jul 26, 2019 via email

Munfred commented Jul 26, 2019

lpantano commented Jul 27, 2019 via email

lpantano commented Aug 1, 2019

Munfred commented Jul 19, 2019 •

edited

Loading