Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA Analysis on alignment fasta #40

Closed
necrolyte2 opened this issue Aug 11, 2015 · 12 comments
Closed

PCA Analysis on alignment fasta #40

necrolyte2 opened this issue Aug 11, 2015 · 12 comments
Assignees

Comments

@necrolyte2
Copy link
Member

Input: fasta file that has been already aligned by user. Will contain multiple datasets(For now just 2)
output: PCA 3D graphics showing difference of dataset

Need to determine best way to allow user to supply different datasets to be compared. Right now, all sequence names have a common name for the dataset
See https://github.com/VDBWRAIR/bio_pieces/blob/pca/tests/testinput/aln1.fasta
This file has 2 data sets that need to be compared. Essentially, the graphic needs to have 2 colors to distinquish them

Currently the script aln_pca builds what I think is a PCA graphic(https://github.com/VDBWRAIR/bio_pieces/blob/pca/docs/_static/pca.png), however, I am not 100% confident it is done correctly as the matplotlib and scikit-learn PCA graphics look different.

You can also check out www.jalview.org as it builds a PCA graphic that we are trying to semi-replicate, but with better colors and axis and such.

The pca branch also includes a hacked together ipython notebook for messing around with which has the matplotlib as well as scikit-learn pca graphic which you can see does not look like the manual one I built using the tutorial from a different website which is listed in that file.

@mmelendrez
@averagehat

@necrolyte2 necrolyte2 self-assigned this Aug 11, 2015
@necrolyte2
Copy link
Member Author

Here is some info I dug up prior for @mmelendrez in an ipython notebook from when @mmelendrez and I discussed this initially
https://gist.github.com/necrolyte2/f0a42035debdce0d3dc2

Link to PCA in python that I used to build aln_pca
http://sebastianraschka.com/Articles/2014_pca_step_by_step.html#sklearn_pca

@averagehat
Copy link
Contributor

@necrolyte2
Copy link
Member Author

Emperor looks pretty neat

@necrolyte2
Copy link
Member Author

Sure would be cool if they said how they generate the coordinates and mapping files for input to emperor

@necrolyte2
Copy link
Member Author

pca_skbio

from skbio import Alignment
from skbio.stats.ordination import PCoA

fasta = 'tests/testinput/aln1.fasta'

alignment = Alignment.read(fasta)
distance_matrix = alignment.distances()
pcoa = PCoA(distance_matrix)
scores = pcoa.scores()
scores.png

@necrolyte2 necrolyte2 assigned averagehat and unassigned necrolyte2 Aug 12, 2015
@averagehat
Copy link
Contributor

I made a little test for make_pca here just to check for build errors. But, Travis says:

Running setup.py install for scipy

No output has been received in the last 10 minutes, this potentially indicates a stalled build or something wrong with the build itself.

The build has been terminated

I don't think we benefit very much from that test, and the make_pca.py script really doesn't do much. However, this brings up the issue that scikit-* takes a long time to install--and we want scikit-bio and emperor in the requirements file so people can use make_pca.

One way around this would be to use conda, which is very fast, and avoids complicated build issues. module load bio_pieces could load a conda "virtualenv".

@necrolyte2
Copy link
Member Author

One concern I would have is that the pathdiscov project depends on bio_pieces which means that it would then depend on conda. This would change that pipeline a bit as well and I'm not sure we want to do that.

More discussion is needed

@averagehat
Copy link
Contributor

created a seperate issue for the build discussion #41

@necrolyte2
Copy link
Member Author

I'm on the fence on whether or not it is important to have tests that are more or less just testing to see that a script is executable and was installed via setup.py. I've thought about it with our other projects before and always just made the test, but now I feel it may be easier to just not include the tests and let any issues come through via bug reports or our own internal testing.

Thoughs?

@averagehat
Copy link
Contributor

I agree; in this case, testing doesn't produce much information, as the script interface is so simple.
If it becomes more complex and I decide it needs any tests I can mock out the emperor import or something.

@mmelendrez
Copy link
Member

agreed.


Melanie Melendrez, Ph.D.
Chief, Bioinformatics, Lead Scientist
Walter Reed Army Institute of Research
Viral Diseases Branch
Silver Spring, MD 20910
[email protected]
[email protected]

On Fri, Aug 14, 2015 at 9:44 AM, Mike Panciera [email protected]
wrote:

I agree; in this case, testing doesn't produce much information, as the
script interface is so simple.
If it becomes more complex and I decide it needs any tests I can mock out
the emperor import or something.


Reply to this email directly or view it on GitHub
#40 (comment).

This was referenced Aug 14, 2015
Closed
@averagehat
Copy link
Contributor

Closed by #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants