Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some phenotypes not being included in regression #372

Open
LCPerry opened this issue Nov 27, 2024 · 0 comments
Open

Some phenotypes not being included in regression #372

LCPerry opened this issue Nov 27, 2024 · 0 comments

Comments

@LCPerry
Copy link

LCPerry commented Nov 27, 2024

Hello,

I'm having a problem with PRSice only including a small subset of the phenotypes I've provided in the regression. My pheno file contains 1721 samples, while I have genotype data for 6898 samples. However, PRScise is only including 286 phenotyped samples in the regression. Looking at my output, it seems PRSice has read all 6898 samples from my genotype file and correctly flagged that 5177 samples are without phenotypes. But is says only 286 have valid phenotypes. So there are 1435 samples which is appears to recognize as having phenotypes which it considers invalid? Inspecting the phenotype of the samples which were are were not included in the regression has been uninformative- a value of -0.0458746230390881 is seemingly not a valid phenotype, but a value of -0.0488835983227899 is, so I’m struggling to see a difference. Can you given any insight on this behaviour? What might cause PRSice to regard a phenotype as not valid and exclude it from the regression?

log:

PRSice 2.3.3 (2020-08-05) 
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly
GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2024-11-25 22:42:38
./PRSice_linux \
    --a1 A1 \
    --a2 A2 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base GWAS.txt \
    --beta  \
    --binary-target F \
    --bp BP \
    --chr CHR \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov COV.txt \
    --interval 5e-05 \
    --lower 5e-08 \
    --num-auto 22 \
    --out OUT \
    --pheno pheno.txt \
    --pvalue P \
    --seed 4082496553 \
    --snp SNP \
    --stat BETA \
    --target QC2 \
    --thread 1 \
    --upper 0.5

Initializing Genotype file: QC2 (bed) 

Start processing GWAS
================================================== 

Base file: GWAS.txt 
Header of file is: 
SNP CHR BP MAF A1 A2 lhs op rhs free BETA SE Z_Estimate P 
chisq chisq_df chisq_pval AIC Q_SNP Q_SNP_df Q_SNP_pval 
error warning Z_smooth
 

1506687 variant(s) observed in base file, with: 
280037 ambiguous variant(s) excluded 
1226650 total variant(s) included from base file 

Loading Genotype info from target 
================================================== 

6898 people (0 male(s), 0 female(s)) observed 
4271 founder(s) included 

6425348 variant(s) not found in previous data 
825371 variant(s) included 

Phenotype file: pheno.txt 
Column Name of Sample ID: B1045723+3999347013_R01C01 
Note: If the phenotype file does not contain a header, the 
column name will be displayed as the Sample ID which is 
expected. 

There are a total of 1 phenotype to process 

Start performing clumping 

Number of variant(s) after clumping : 76134 

Processing the 1 th phenotype 

Phenotype is a continuous phenotype 
5177 sample(s) without phenotype 
286 sample(s) with valid phenotype

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant