You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Next-generation sequencing (NGS) technologies have enabled affordable sequencing of billions of short DNA fragments at high throughput, paving the way for population-scale genomics. Genomics data analytics at this scale requires overcoming performance bottlenecks, such as searching for short DNA sequences over long reference sequences. In this paper, we introduce LISA (Learned Indexes for Sequence Analysis), a novel learning-based approach to DNA sequence search. As a first proof of concept, we focus on accelerating one of the most essential flavors of the problem, called exact search. LISA builds on and extends FM-index, which is the state-of-the-art technique widely deployed in genomics tool-chains. Initial experiments with human genome datasets indicate that LISA achieves up to a factor of 4X performance speedup against its traditional counterpart.
Although not strictly deep learning, this paper presents an interesting application of machine learning to improve running time of DNA sequence search algorithms. I think this is interesting, because algorithms and data structures are important components of bioinformatics research but they have not really seen significant applications of ML yet. This paper, which is inspired by The Case for Learned Index Structures , signifies a change in that regard. Along with the Sapling paper, this is a sort of first application of ML to this aspect of bioinformatics. It is possible that we will continue to see similar works. As such, the review might benefit from us carefully speculating about whether deep learning might similarly benefit these fields, or whether deep learning is outperformed by ML here. It is entirely possible that deep learning is a poor fit here due to running time constraints, and that might be worth noting.
https://arxiv.org/abs/1910.04728
The text was updated successfully, but these errors were encountered: