Class project of CSE 6411: Computational Biology iteration of April 2019 semester at BUET.
We apply word embedding techniques on (preferably) codons or k-mers to obtain better representation for the purpose of clustering DNA sequences. The embedding technique is counting the number of codon pairs in a window and updating an initialized matrix. Later the matrix is flattened and normalized to get the embedding.
Please contact the author at [email protected] for any details.