-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCA Analysis on alignment fasta #40
Comments
Here is some info I dug up prior for @mmelendrez in an ipython notebook from when @mmelendrez and I discussed this initially Link to PCA in python that I used to build aln_pca |
The plot reminds me of emperor which supports 3D. scikit-bio provides support for PCoA, which is a little bit different |
Emperor looks pretty neat |
Sure would be cool if they said how they generate the coordinates and mapping files for input to emperor |
I made a little test for
I don't think we benefit very much from that test, and the One way around this would be to use conda, which is very fast, and avoids complicated build issues. |
One concern I would have is that the pathdiscov project depends on bio_pieces which means that it would then depend on conda. This would change that pipeline a bit as well and I'm not sure we want to do that. More discussion is needed |
created a seperate issue for the build discussion #41 |
I'm on the fence on whether or not it is important to have tests that are more or less just testing to see that a script is executable and was installed via setup.py. I've thought about it with our other projects before and always just made the test, but now I feel it may be easier to just not include the tests and let any issues come through via bug reports or our own internal testing. Thoughs? |
I agree; in this case, testing doesn't produce much information, as the script interface is so simple. |
agreed. Melanie Melendrez, Ph.D. On Fri, Aug 14, 2015 at 9:44 AM, Mike Panciera [email protected]
|
Closed by #43 |
Input: fasta file that has been already aligned by user. Will contain multiple datasets(For now just 2)
output: PCA 3D graphics showing difference of dataset
Need to determine best way to allow user to supply different datasets to be compared. Right now, all sequence names have a common name for the dataset
See https://github.com/VDBWRAIR/bio_pieces/blob/pca/tests/testinput/aln1.fasta
This file has 2 data sets that need to be compared. Essentially, the graphic needs to have 2 colors to distinquish them
Currently the script
aln_pca
builds what I think is a PCA graphic(https://github.com/VDBWRAIR/bio_pieces/blob/pca/docs/_static/pca.png), however, I am not 100% confident it is done correctly as the matplotlib and scikit-learn PCA graphics look different.You can also check out www.jalview.org as it builds a PCA graphic that we are trying to semi-replicate, but with better colors and axis and such.
The pca branch also includes a hacked together ipython notebook for messing around with which has the matplotlib as well as scikit-learn pca graphic which you can see does not look like the manual one I built using the tutorial from a different website which is listed in that file.
@mmelendrez
@averagehat
The text was updated successfully, but these errors were encountered: