Select reduced transcriptome from clusters #18

mpaya · 2017-10-12T21:27:02Z

Hi,
I am comparing clustering results from CD-HIT and RapClust. One of the characteristics of CD-HIT is that it selects one representative transcript per cluster, while rapclust doesn't. Would it be representative to also select the largest transcript from RapClust clusters to generate assemblies with reduced redundancy?
Thank you

rob-p · 2017-10-13T14:02:00Z

Hi @mpaya,

The clustering methodology of CD-HIT is considerably different from that of RapClust. Specifically, in CD-HIT selecting a single cluster member as a representative is often reasonable because the clusters are formed from sequences that are generally very similar. However, RapClust aims to cluster together multiple transcript isoforms of the same gene, which can vary considerably in their length and sequence composition (e.g. through the inclusion or exclusion of alternatively-spliced exons). Hence, the idea of selecting a single representative sequence from the cluster isn't as straightforward, though it is true that selecting the longest transcript is likely to choose the one that contains much of the sequence in the cluster, it is not necessarily likely to be pairwise-similar to all cluster members.

More generally, how you select a representative might depend on which type of analysis you hope to do. One approach to representative generation that is compatible with RapClust is the Lace method from the Oshlack group --- it's probably worth taking a look over that paper if you're not already familiar with it and seeing if it will suit your needs.

mpaya · 2017-10-13T16:39:13Z

Hi @rob-p,

For this current project, the analysis that I was expecting to do was just comparing results of CD-HIT and RapClust. On the reduced assemblies, after selection of a single cluster representative, the purpose is to use Transrate, Transdecoder and BUSCO results for comparison. So our concern was whether this naive representative selection on RapClust to generate this artificial reduced assembly may or not be acceptable. I wasn't familiar with Lace, would you recommend to use this output instead for the indicated purpose?

Thank you for your kind help

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select reduced transcriptome from clusters #18

Select reduced transcriptome from clusters #18

mpaya commented Oct 12, 2017

rob-p commented Oct 13, 2017

mpaya commented Oct 13, 2017

Select reduced transcriptome from clusters #18

Select reduced transcriptome from clusters #18

Comments

mpaya commented Oct 12, 2017

rob-p commented Oct 13, 2017

mpaya commented Oct 13, 2017