PCA Analysis on alignment fasta #40

necrolyte2 · 2015-08-11T20:12:22Z

Input: fasta file that has been already aligned by user. Will contain multiple datasets(For now just 2)
output: PCA 3D graphics showing difference of dataset

Need to determine best way to allow user to supply different datasets to be compared. Right now, all sequence names have a common name for the dataset
See https://github.com/VDBWRAIR/bio_pieces/blob/pca/tests/testinput/aln1.fasta
This file has 2 data sets that need to be compared. Essentially, the graphic needs to have 2 colors to distinquish them

Currently the script aln_pca builds what I think is a PCA graphic(https://github.com/VDBWRAIR/bio_pieces/blob/pca/docs/_static/pca.png), however, I am not 100% confident it is done correctly as the matplotlib and scikit-learn PCA graphics look different.

You can also check out www.jalview.org as it builds a PCA graphic that we are trying to semi-replicate, but with better colors and axis and such.

The pca branch also includes a hacked together ipython notebook for messing around with which has the matplotlib as well as scikit-learn pca graphic which you can see does not look like the manual one I built using the tutorial from a different website which is listed in that file.

@mmelendrez
@averagehat

The text was updated successfully, but these errors were encountered:

necrolyte2 · 2015-08-11T20:24:38Z

Here is some info I dug up prior for @mmelendrez in an ipython notebook from when @mmelendrez and I discussed this initially
https://gist.github.com/necrolyte2/f0a42035debdce0d3dc2

Link to PCA in python that I used to build aln_pca
http://sebastianraschka.com/Articles/2014_pca_step_by_step.html#sklearn_pca

averagehat · 2015-08-12T13:36:34Z

The plot reminds me of emperor which supports 3D.

scikit-bio provides support for PCoA, which is a little bit different

necrolyte2 · 2015-08-12T13:42:01Z

Emperor looks pretty neat

necrolyte2 · 2015-08-12T14:34:58Z

Sure would be cool if they said how they generate the coordinates and mapping files for input to emperor

necrolyte2 · 2015-08-12T14:52:16Z

from skbio import Alignment
from skbio.stats.ordination import PCoA

fasta = 'tests/testinput/aln1.fasta'

alignment = Alignment.read(fasta)
distance_matrix = alignment.distances()
pcoa = PCoA(distance_matrix)
scores = pcoa.scores()
scores.png

averagehat · 2015-08-12T20:34:11Z

I made a little test for make_pca here just to check for build errors. But, Travis says:

Running setup.py install for scipy

No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.

The build has been terminated

I don't think we benefit very much from that test, and the make_pca.py script really doesn't do much. However, this brings up the issue that scikit-* takes a long time to install--and we want scikit-bio and emperor in the requirements file so people can use make_pca.

One way around this would be to use conda, which is very fast, and avoids complicated build issues. module load bio_pieces could load a conda "virtualenv".

necrolyte2 · 2015-08-13T15:53:11Z

One concern I would have is that the pathdiscov project depends on bio_pieces which means that it would then depend on conda. This would change that pipeline a bit as well and I'm not sure we want to do that.

More discussion is needed

averagehat · 2015-08-13T21:03:55Z

created a seperate issue for the build discussion #41

necrolyte2 · 2015-08-14T12:27:06Z

I'm on the fence on whether or not it is important to have tests that are more or less just testing to see that a script is executable and was installed via setup.py. I've thought about it with our other projects before and always just made the test, but now I feel it may be easier to just not include the tests and let any issues come through via bug reports or our own internal testing.

Thoughs?

averagehat · 2015-08-14T13:44:52Z

I agree; in this case, testing doesn't produce much information, as the script interface is so simple.
If it becomes more complex and I decide it needs any tests I can mock out the emperor import or something.

mmelendrez · 2015-08-14T14:17:47Z

agreed.

Melanie Melendrez, Ph.D.
Chief, Bioinformatics, Lead Scientist
Walter Reed Army Institute of Research
Viral Diseases Branch
Silver Spring, MD 20910
[email protected]
[email protected]

On Fri, Aug 14, 2015 at 9:44 AM, Mike Panciera [email protected]
wrote:

I agree; in this case, testing doesn't produce much information, as the
script interface is so simple.
If it becomes more complex and I decide it needs any tests I can mock out
the emperor import or something.

—
Reply to this email directly or view it on GitHub
#40 (comment).

averagehat · 2015-10-28T19:34:58Z

Closed by #43

necrolyte2 added the enhancement label Aug 11, 2015

necrolyte2 self-assigned this Aug 11, 2015

necrolyte2 assigned averagehat and unassigned necrolyte2 Aug 12, 2015

This was referenced Aug 14, 2015

Pca #42

Closed

Pcoa--emperor solution #43

Open

averagehat closed this as completed Oct 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCA Analysis on alignment fasta #40

PCA Analysis on alignment fasta #40

necrolyte2 commented Aug 11, 2015

necrolyte2 commented Aug 11, 2015

averagehat commented Aug 12, 2015

necrolyte2 commented Aug 12, 2015

necrolyte2 commented Aug 12, 2015

necrolyte2 commented Aug 12, 2015

averagehat commented Aug 12, 2015

necrolyte2 commented Aug 13, 2015

averagehat commented Aug 13, 2015

necrolyte2 commented Aug 14, 2015

averagehat commented Aug 14, 2015

mmelendrez commented Aug 14, 2015

averagehat commented Oct 28, 2015

PCA Analysis on alignment fasta #40

PCA Analysis on alignment fasta #40

Comments

necrolyte2 commented Aug 11, 2015

necrolyte2 commented Aug 11, 2015

averagehat commented Aug 12, 2015

necrolyte2 commented Aug 12, 2015

necrolyte2 commented Aug 12, 2015

necrolyte2 commented Aug 12, 2015

averagehat commented Aug 12, 2015

necrolyte2 commented Aug 13, 2015

averagehat commented Aug 13, 2015

necrolyte2 commented Aug 14, 2015

averagehat commented Aug 14, 2015

mmelendrez commented Aug 14, 2015

averagehat commented Oct 28, 2015