Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCA percentage of variance explained is wrong #750

Open
lambdamoses opened this issue Sep 6, 2023 · 1 comment
Open

PCA percentage of variance explained is wrong #750

lambdamoses opened this issue Sep 6, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@lambdamoses
Copy link

lambdamoses commented Sep 6, 2023

In scRNA-seq and Visium data, the first 50 PCs usually only explain a small proportion of total variance, but I saw this code used to make the scree plot:

eigs = slot(pca_obj, 'misc')$eigenvalues
# variance explained
var_expl = eigs/sum(eigs)*100

This is incorrect. Variance explained by each PC normally means the eigenvalue divided by the total variance, or the sum of all eigenvalues, but here the sum is of the eigenvalues computed instead of all eigenvalues. If I compute 50 eigenvalues, then the sum of the 50 eigenvalues would be used here, even if there should be thousands of eigenvalues and the sum of the first 50 is only about 25% of the total. Showing 100% cumulative variance explained as in the plot below while it should really be more like 25% is very misleading.

image

@lambdamoses lambdamoses added the bug Something isn't working label Sep 6, 2023
@RubD
Copy link
Collaborator

RubD commented Sep 12, 2023

Hi @lambdamoses thanks for the feedback. We will update our documentation soon and provide additional clarification and suggestions to the users. I'm not sure if 50 PCs typically explain only a small proportion of total variance. That would be surprising but we will also empirically test this in the near future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants