Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the propr select argument #30

Open
JemmaSun opened this issue Oct 30, 2021 · 3 comments
Open

Question about the propr select argument #30

JemmaSun opened this issue Oct 30, 2021 · 3 comments
Labels
helpful This question has been marked as potentially helpful to others. question

Comments

@JemmaSun
Copy link

Hi!
I am trying to use propr for a co-occurrence analysis. I noticed that the log-ratio values calculated by propr (data @logratio in the output) sometimes are quite different with that calculated by log(x[i]/exp(mean(log(x)))) # where x is a list of counts, as described in your paper. Could you please tell me which R function did you use to calculate the logratio? If I misunderstand about the formula, please let me know. Thank you in advance!

@tpq
Copy link
Owner

tpq commented Oct 30, 2021

Hi Jemma, thanks for your interest in propr. For a data set where rows are samples and columns are features, the CLR should be performed row-wise. So in your case do you calculate log(x[i]/exp(mean(log(x)))) where x is a sample? They seem to match for me :-)

dat <- matrix(runif(30),5,6)
library(propr)
pr <- propr(dat)
A <- pr@logratio

B <- t(apply(dat, 1, function(x) log(x / exp(mean(log(x))))))
A
B

@JemmaSun
Copy link
Author

Hi tpq,

Thanks for your quick reply.
Yes, the x I meant was a list of counts for a certain sample. My current dataset has 12 samples/rows and over 10,000 OTUs/columns.

I have been using propr on smaller datasets before, and the log-ratio values given by propr are exactly the same as what were given by log(x/exp(mean(log(x)))). However, I got totally different logratios this time with my current dataset. I attached one of my samples and its counts (they are not true counts, more like (but not real) relative abundances) in file "SRR_counts.csv". Using this sample as an example, the geometric mean is 0.06017464, and the logratio for the first sample "Root; d__Archaea" should be log(336.5/0.06017464) = 8.629102, whereas the logratio value calculated by propr for this sample was 11.56008.

Wait... I guess I found the reason. Actually before I run perb(), I applied some selection to keep only the columns that are abundant in at least 2 samples. Then when I run perb(data, select=keep), the log-ratio values are calculated based on the untrimmed data set, whereas I was calculating geometric mean with the trimmed data. I think I've figured it out :-D Sorry for the trouble.

SRR_counts.csv

@tpq
Copy link
Owner

tpq commented Nov 22, 2021

Yes, that'd be it!

@tpq tpq added helpful This question has been marked as potentially helpful to others. question labels Nov 22, 2021
@tpq tpq changed the title Question about how propr calculates log-ratio values Question about how propr calculates log-ratio values with select argument Nov 22, 2021
@tpq tpq changed the title Question about how propr calculates log-ratio values with select argument Question about the propr select argument Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
helpful This question has been marked as potentially helpful to others. question
Projects
None yet
Development

No branches or pull requests

2 participants