Count generates three matrices (nascent/mature/ambiguous), barcode metadata supported, and updates to bustools correct
bustools count
- bustools count now has a --split= option (-s), for which a file containing a list of transcripts (a subset of the --txnames file) can be supplied to "split" the count matrix generated into multiple count matrices, as follows:
- A count matrix (.mtx) will be generated from collapsed UMIs that map solely to transcripts found in the file supplied to the --txnames file (but not the --split file).
- A second count matrix (.2.mtx) will be generated for collapsed UMIs mapping transcripts found in the --split file.
- An ambiguous matrix (.ambiguous.mtx) is also generated for collapsed UMIs that map to multiple transcripts such that some transcripts in the mapping might have been assigned to .mtx file and other transcripts in the mapping might have been assigned to the .2.mtx file; therefore, such UMIs rather than going into either of those two matrices file, will actually go into this ambiguous matrix file.
Note that, at the gene-level, this "splitting" is done following normal UMI collapsing and, by default, matrix assignment only occurs if all transcripts in the collapsed UMIs mapping belong to the same gene (as specified in the t2g file supplied to --genemap). This --split option is useful for generating nascent/mature/ambiguous matrices for workflows that involve looking at splicing.
barcode metadata
- BUS records can now have metadata stored in the barcodes column. This metadata might have been generated by the --batch-barcodes option in kallisto bus, which stores sample barcodes in the metadata (while cell barcodes belong to the non-metadata). When bustools count is called and metadata is detected in the BUS file, a .barcodes.prefix.txt file is generated that contains the metadata (extended to 16 characters because the metadata generated by kallisto bus is 16 characters in length).
- bustools text has a --showAll (-a) option that can expose the metadata in plaintext format.
bustools correct
- The onlist supplied to bustools correct can have multiple columns so that each component of the barcode is correct independently.
- bustools correct now has a --replace option (-r) which takes in a replacement file which contains two columns of equal length. The barcodes in the first column are replaced with the barcodes in the corresponding row in the second column. Additionally, partial replacements are possible using an asterisk, for example, a row containing CATCATCC *CATTCCTA means that the end of a barcode (if CATCATCC) is replaced with CATTCCTA.