t-test and p-value for partial and semi-partial correlation #171

anthonybritto · 2021-05-21T15:35:49Z

As per the documentation of the R package ppcor : https://dx.doi.org/10.5351%2FCSAM.2015.22.6.665, the correct degrees of freedom for the calculation of the t-statistic and p-values is n - g- 2 where g is the number of covariates.

I believe that the subtraction of g from df has not been implemented in pingouin, as the three screenshots below show.

Here, I simply pull the p-value from partial_corr to show that I haven’t altered anything.

I now compute what I believe to be the correct p-value, with df=n-(k-1)-2 since I control for all predictors but one.

Finally, I show how by using the incorrect df=n-2, I am able to reproduce pingouin’s results.

The text was updated successfully, but these errors were encountered:

raphaelvallat · 2021-05-21T23:09:47Z

Thanks @anthonybritto for spotting this critical issue. I need to fix this ASAP. Please see below for more details.

The problem

pingouin.partial_corr does not adjust the degrees of freedom by the number of covariates, which means that the p-value, confidence interval, power and Bayes factor are wrong. The correlation coefficient is unaffected by this issue. The plot below shows how the p-value is affected as a function of the correlation coefficient, sample size and number of covariates. In short, the p-values currently returned by Pingouin are smaller compared to the exact p-values, and this gets worse with a low sample size and/or high number of covariates and/or weak correlation coefficient. The smaller the sample size, the worse the problem is. If the sample size is large enough, and/or the correlation is high, then the p-value is largely unaffected.

How to fix / testing

The calculation of the p-value, confidence interval, power must be done within the pingouin.partial_corr, and not by calling pingouin.corr as is currently the case.

The correlation coefficient and p-value of a (semi)-partial correlation can be calculated with the ppcor package.
The formula to calculate the confidence intervals of a partial correlation can be found here
I believe the power of the correlation can be calculated by just subtracting the number of covariates to n before calling pingouin.power_corr(r, n)
The adjusted R^2 should be calculated with: adj_r2 = 1 - (((1 - r2) * (n - 1)) / (n - k - 2)) where k is the number of covariates (reference?).
A formula for the calculation of the Bayes Factor of a partial correlation is given in the appendix of Wetzels and Wagenmakers 2012. An R implementation can also be found here, though it seems to use a different method. Note that the default method for calculating the Bayes Factor of a correlation coefficient in Pingouin is based on the analytical solution provided by Ly 2016, but I did not find any mention of partial or semi-partial correlation in the paper. I am therefore not sure whether we can just subtract the number of covariates to n to get the correct BF.

anthonybritto · 2021-05-25T13:40:19Z

Your fixes look good, far as I can tell! I unfortunately don't know anything about the Bayes Factor. I will look at the sources you reference at some point this week to see if I can help.

raphaelvallat · 2021-05-27T22:19:33Z

@anthonybritto I just released a new stable version of Pingouin (v0.3.12) which fixes the p-value and confidence intervals of the partial correlation. Please see the full list of modifications here: https://pingouin-stats.org/changelog.html

raphaelvallat · 2021-07-16T19:24:18Z

FYI I have just pushed a commit on the develop branch (81d1aaf) to re-implement the partial correlation using the same method as ppcor:

pingouin/pingouin/correlation.py

Lines 781 to 822 in 81d1aaf

    
           # Calculate the partial corrrelation matrix - similar to pingouin.pcorr() 
        
           if method == "spearman": 
        
               # Convert the data to rank, similar to R cov() 
        
               V = data.rank(na_option='keep').cov() 
        
           else: 
        
               V = data.cov() 
        
           Vi = np.linalg.pinv(V)  # Inverse covariance matrix 
        
           Vi_diag = Vi.diagonal() 
        
           D = np.diag(np.sqrt(1 / Vi_diag)) 
        
           pcor = -1 * (D @ Vi @ D)  # Partial correlation matrix 
        
           if covar is not None: 
        
               r = pcor[0, 1] 
        
           else: 
        
               # Semi-partial correlation matrix 
        
               with np.errstate(divide='ignore'): 
        
                   spcor = pcor / \ 
        
                       np.sqrt(np.diag(V))[..., None] / \ 
        
                       np.sqrt(np.abs(Vi_diag - Vi ** 2 / Vi_diag[..., None])).T 
        
               if y_covar is not None: 
        
                   r = spcor[0, 1]  # y_covar is removed from y 
        
               else: 
        
                   r = spcor[1, 0]  # x_covar is removed from x 
        
           if np.isnan(r): 
        
               # Correlation failed. Return NaN. When would this happen? 
        
               return pd.DataFrame({'n': n, 'r': np.nan, 'CI95%': np.nan, 
        
                                    'p-val': np.nan}, index=[method]) 
        
           # Compute the two-sided p-value and confidence intervals 
        
           # https://online.stat.psu.edu/stat505/lesson/6/6.3 
        
           pval = _correl_pvalue(r, n, k) 
        
           ci = compute_esci( 
        
               stat=r, nx=(n - k), ny=(n - k), eftype='r', decimals=6) 
        
           # Create dictionnary 
        
           stats = { 
        
               'n': n, 
        
               'r': r, 
        
               'CI95%': [ci], 
        
               'p-val': pval if tail == 'two-sided' else .5 * pval, 
        
           }

We're now getting exactly the same results as ppcor for partial/semi-partial correlations (either Pearson or Spearman). I think this will be included in the next stable release of Pingouin.

anthonybritto · 2021-07-20T07:56:33Z

@raphaelvallat Thanks for keeping me updated.

I see that you have updated the p-value calculation to use the beta-distribution: I am not familiar with this approach, so I'm afraid I can't comment.

If you are however reproducing the ppcor results, I should reiterate that the p-value of the semi-partial correlation is incorrect. For the record, I attach a preprint of the communication that I wish to submit to the journal in which ppcor was published; it includes what I believe are the correct formulas for the t-statistics of the betas, and the partial and semi-partial correlations .

ppcor_correction.pdf

raphaelvallat · 2021-07-20T15:34:54Z

Hi @anthonybritto,

The formula for the p-value calculation can be found on Wikipedia and is used in the scipy.stats.pearsonr function. It gives similar results to the method based on the T distribution, but faster.
Did you hear back from the authors of the ppcor package? When the paper is peer-reviewed and accepted, I think it would be great to make a PR to integrate your suggested edits. For now however, I'd prefer to keep it consistent with the R output as it is what most users would expect.

Thanks,
Raphael

raphaelvallat · 2021-10-28T22:15:20Z

Closing this issue but feel free to reopen.

raphaelvallat self-assigned this May 21, 2021

raphaelvallat added bug 💥 Something isn't working invalid 🚩 This doesn't seem right labels May 21, 2021

IndrajeetPatil mentioned this issue Aug 3, 2021

feature request: supporting semi-partial correlations easystats/correlation#204

Open

raphaelvallat closed this as completed Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t-test and p-value for partial and semi-partial correlation #171

t-test and p-value for partial and semi-partial correlation #171

anthonybritto commented May 21, 2021

raphaelvallat commented May 21, 2021

anthonybritto commented May 25, 2021 •

edited

Loading

raphaelvallat commented May 27, 2021

raphaelvallat commented Jul 16, 2021

anthonybritto commented Jul 20, 2021 •

edited

Loading

raphaelvallat commented Jul 20, 2021

raphaelvallat commented Oct 28, 2021

t-test and p-value for partial and semi-partial correlation #171

t-test and p-value for partial and semi-partial correlation #171

Comments

anthonybritto commented May 21, 2021

raphaelvallat commented May 21, 2021

anthonybritto commented May 25, 2021 • edited Loading

raphaelvallat commented May 27, 2021

raphaelvallat commented Jul 16, 2021

anthonybritto commented Jul 20, 2021 • edited Loading

raphaelvallat commented Jul 20, 2021

raphaelvallat commented Oct 28, 2021

anthonybritto commented May 25, 2021 •

edited

Loading

anthonybritto commented Jul 20, 2021 •

edited

Loading