Skip to content

lpmarques/portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 

Repository files navigation

Who I am

Twenty-four years old with a twenty-year-old interest in computers

Former academic with a degree in evolutionary biology

Bioinformatist, data freak, self-starter developer

Ever eager to learn (and make) more from the digital world to enhance lives in the physical one

Development history

Work tools

My first contact with software development rose from the need to automatize data manipulation routines for research. Simple and functional bioinformatics that led to the creation of a library of scripts.

For the purposes of accessing, filtering and cleaning DNA sequences, a handful of those tools can be checked here. They include a script for conversion of the (often confusing) multigb file format (obtained from GenBank database searches) to a cleaner multi-sequence fasta file; as well as one that executes a complete data cleaning job on sequence alignments. A tool for automatic concatenation of multiple sequence alignments is also included.

Molecular scientists are frequently interested in determining base character composition of their DNA sequence alignments too. In this case, they could find useful scripts here that can simply calculate the total proportion of (often undesireble) gap characters, or even access detailed information on base composition across all columns of an alignment.

Those interested in macro-evolutionary issues could make good use of few scripts I developed to automatize analyses of big evolutionary trees too. These include a simple function to unroot trees and a more complex one that can detect differences between partitions of two trees and return what is usually called “topological distances”.

Projects

Although 3 years ago I wrote my first pieces of code in Python, my main working language for the last 2 years has been Perl, a highly efficient language to handle text-like data, with clear implementation of regular expressions. Moreover, as my postgrad research project involved great amounts of data analysis, soon I got experienced in R too. Building Shell pipelines to integrate multiple softwares on complex tasks became a routine to me just as well. A piece of that work can be seen here.

Currently, I am developing phyTest, a software that can perform various statistical tests regarding evolutionary tree models. Although still in alpha phase, it is already the most flexible, publicly acessible software in terms of diversity of approaches to determine statistical confidence on phylogenies. The last version of phyTest can always be found here (an English-speaking version of the manual will be available soon).

Competitive Programming

Recently, I have also engaged in the 2018's edition of the international Bioinformatics Contest. Some of the Python scripts that I delevoped (and found most interesting) in the process to qualify to the final round can be checked here. These include a heuristic solution to approximate the most cost-efficient ballance of glucose and oxygen mols to produce ATP via fermentation or aerobic respiration and an efficient algorithm to locate the longest tandem repeat in a sequence that fits a user-provided maximum amount of character insertions, deletions or substitutions across it.

Where to find me

About

Lucas Marques - Development Portfolio

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published