Skip to content

Commit

Permalink
reinitialising
Browse files Browse the repository at this point in the history
  • Loading branch information
jeffersonfparil committed Dec 29, 2023
1 parent 5a83cf1 commit 5ad78b3
Show file tree
Hide file tree
Showing 44 changed files with 102,546 additions and 31 deletions.
22 changes: 22 additions & 0 deletions .github/workflows/r.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: 🚀
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: r-lib/actions/setup-r@v2
- name: Install dependencies on Ubuntu
run: |
sudo apt install -y libcurl4-openssl-dev libharfbuzz-dev libfribidi-dev
- name: Install dependencies
run: |
install.packages(c("devtools", "rextendr", "testthat"))
shell: Rscript {0}
- name: Tests
run: |
Rscript tests/tests.R
14 changes: 7 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
/tests/apple.vcf
/tests/soybean.vcf
/misc/
/res/eval/*.csv
/res/eval/*.rds
/res/eval/*.sync
/res/eval/*.out
/res/eval/*.tmp
/res/eval/*.vcf
/res/eval/*BK*
/res/*.csv
/res/*.rds
/res/*.sync
/res/*.out
/res/*.tmp
/res/*.vcf
/res/*BK*
/.vscode/
14 changes: 14 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Package: imputef
Title: Imputing allele frequencies for individual polyploid genotype data and pools of individuals or population genotype data
Version: 0.0.0.1
Authors@R:
person("Jeff", "Paril", , "[email protected]", role = c("aut", "cre", "mai"),
comment = c(ORCID = "0000-0002-5693-4123"))
Description: Imputation of genotype data from sequencing of more than 2 sets of genomes, i.e. polyploid individuals, population samples, or pools of individuals. This library can also perform simple genotype data filtering prior to imputation. Two imputation methods are available: (1) mean value imputation which uses the arithmentic mean of the locus across non-missing pools (`?imputef::mvi`); (2) adaptive linkage-informed k-nearest neighbour imputation (`?imputef::aldknni`). This is an attempt to extend the [LD-kNNi method of Money et al, 2015, i.e. LinkImpute](https://doi.org/10.1534/g3.115.021667), which was an extension of the [kNN imputation of Troyanskaya et al, 2001](https://doi.org/10.1093/bioinformatics/17.6.520). Similar to LD-kNNi, LD is estimated using Pearson's product moment correlation across loci per pair of samples, but instead of computing this across all the loci, we divide the genome into windows which respect chromosomal/scaffold boundaries. We use Euclidean distance which accomodates for continuous allele frequencies instead of genotype classes as in taxicab or Manhattan distance used in LD-kNNi. The adaptive behavior of our algorithm can be described in cases where the sparsity in the data is too high resulting to:
- completely undefined correlation (LD) matrix, at which point we will use all the loci to compute distances between samples, and when
- all k-nearest neighbours are missing at the locus which needs to be imputed, then we increase `k` until one of the neighbours has data to be used for weighted imputation of the missing allele.
License: `use_gpl3_license()`
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Config/rextendr/version: 0.3.1
5 changes: 5 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Generated by roxygen2: do not edit by hand

export(aldknni)
export(mvi)
useDynLib(imputef, .registration = TRUE)
17 changes: 17 additions & 0 deletions R/extendr-wrappers.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Generated by extendr: Do not edit by hand

# nolint start

#
# This file was created with the following call:
# .Call("wrap__make_imputef_wrappers", use_symbols = TRUE, package_name = "imputef")

#' @docType package
#' @usage NULL
#' @useDynLib imputef, .registration = TRUE
NULL

impute <- function(fname, imputation_method, min_coverage, min_allele_frequency, max_missingness_rate_per_locus, pool_sizes, min_depth_below_which_are_missing, max_depth_above_which_are_missing, frac_top_missing_pools, frac_top_missing_loci, window_size_bp, min_loci_per_window, min_loci_corr, max_pool_dist, optimise_for_thresholds, optimise_n_steps_corr, optimise_n_steps_dist, optimise_n_reps, n_threads, fname_out_prefix) .Call(wrap__impute, fname, imputation_method, min_coverage, min_allele_frequency, max_missingness_rate_per_locus, pool_sizes, min_depth_below_which_are_missing, max_depth_above_which_are_missing, frac_top_missing_pools, frac_top_missing_loci, window_size_bp, min_loci_per_window, min_loci_corr, max_pool_dist, optimise_for_thresholds, optimise_n_steps_corr, optimise_n_steps_dist, optimise_n_reps, n_threads, fname_out_prefix)


# nolint end
234 changes: 234 additions & 0 deletions R/imputef.R

Large diffs are not rendered by default.

48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# imputepolypools
# imputef

Reduce genotype data sparsity through imputation of genotype classes or allele frequencies of individual polyploids or pools of individuals or populations.

|**Build Status**|**License**|
|:--------------:|:---------:|
| <a href="https://github.com/jeffersonfparil/imputepolypools/actions"><img src="https://github.com/jeffersonfparil/imputepolypools/actions/workflows/r.yml/badge.svg"></a> | [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) |
| <a href="https://github.com/jeffersonfparil/imputef/actions"><img src="https://github.com/jeffersonfparil/imputef/actions/workflows/r.yml/badge.svg"></a> | [![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) |

## Manual installation and development tools

Expand All @@ -13,9 +13,9 @@ Reduce genotype data sparsity through imputation of genotype classes or allele f
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh ./Miniconda3-latest-Linux-x86_64.sh
# Download the repo
git clone https://jeffersonfparil:<API_KEY>@github.com/jeffersonfparil/imputepolypools.git some_branch
git clone https://jeffersonfparil:<API_KEY>@github.com/jeffersonfparil/imputef.git some_branch
# Create the development environment
conda env create -n rustenv --file imputepolypools/tests/rustenv.yml
conda env create -n rustenv --file imputef/tests/rustenv.yml
conda activate compare_genomes
```

Expand All @@ -24,16 +24,16 @@ conda activate compare_genomes
```R
usethis::use_git_config(user.name="USERNAME", user.email="[email protected]")
credentials::set_github_pat() ### Enter access token
remotes::install_github("jeffersonfparil/imputepolypools")
remotes::install_github("jeffersonfparil/imputef")
```

## Usage

```R
?imputepolypools::mvi
?imputepolypools::aldknni
imputepolypools::mvi(fname="tests/test.vcf")
imputepolypools::aldknni(fname="tests/test.vcf")
?imputef::mvi
?imputef::aldknni
imputef::mvi(fname="tests/test.vcf")
imputef::aldknni(fname="tests/test.vcf")
```

### Functions
Expand Down Expand Up @@ -168,71 +168,71 @@ This is used for genotype classes, i.e., binned allele frequencies: $g = {{1 \ov

### Autotetraploid (Lucerne) mean absolute error

![mae_barplots](./res/eval/lucerne-Mean_absolute_error.svg)
![mae_barplots](./res/lucerne-Mean_absolute_error.svg)

### Pool (Soybean pools) mean absolute error

![mae_barplots](./res/eval/soybean-Mean_absolute_error.svg)
![mae_barplots](./res/soybean-Mean_absolute_error.svg)

### Diploid (Zucchini) mean absolute error

![mae_barplots](./res/eval/zucchini-Mean_absolute_error.svg)
![mae_barplots](./res/zucchini-Mean_absolute_error.svg)

### Diploid (Apple) mean absolute error

![mae_barplots](./res/eval/apple-Mean_absolute_error.svg)
![mae_barplots](./res/apple-Mean_absolute_error.svg)

### Diploid (Grape) mean absolute error

![mae_barplots](./res/eval/grape-Mean_absolute_error.svg)
![mae_barplots](./res/grape-Mean_absolute_error.svg)

------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------

### Autotetraploid (Lucerne) concordance of observed and imputed genotype classes

![concordance_genotype_classes_barplots](./res/eval/lucerne-Concordance.svg)
![concordance_genotype_classes_barplots](./res/lucerne-Concordance.svg)

### Pool (Soybean pools) concordance of observed and imputed genotype classes

![concordance_genotype_classes_barplots](./res/eval/soybean-Concordance.svg)
![concordance_genotype_classes_barplots](./res/soybean-Concordance.svg)

### Diploid (Zucchini) concordance of observed and imputed genotype classes

![concordance_genotype_classes_barplots](./res/eval/zucchini-Concordance.svg)
![concordance_genotype_classes_barplots](./res/zucchini-Concordance.svg)

### Diploid (Apple) concordance of observed and imputed genotype classes

![concordance_genotype_classes_barplots](./res/eval/apple-Concordance.svg)
![concordance_genotype_classes_barplots](./res/apple-Concordance.svg)

### Diploid (Grape) concordance of observed and imputed genotype classes

![concordance_genotype_classes_barplots](./res/eval/grape-Concordance.svg)
![concordance_genotype_classes_barplots](./res/grape-Concordance.svg)

------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------

### Autotetraploid (Lucerne) coefficient of determination

![r2_barplots](./res/eval/lucerne-Coefficient_of_determination.svg)
![r2_barplots](./res/lucerne-Coefficient_of_determination.svg)

### Pool (Soybean pools) coefficient of determination

![r2_barplots](./res/eval/soybean-Coefficient_of_determination.svg)
![r2_barplots](./res/soybean-Coefficient_of_determination.svg)

### Diploid (Zucchini) coefficient of determination

![r2_barplots](./res/eval/zucchini-Coefficient_of_determination.svg)
![r2_barplots](./res/zucchini-Coefficient_of_determination.svg)

### Diploid (Apple) coefficient of determination

![r2_barplots](./res/eval/apple-Coefficient_of_determination.svg)
![r2_barplots](./res/apple-Coefficient_of_determination.svg)

### Diploid (Grape) coefficient of determination

![r2_barplots](./res/eval/grape-Coefficient_of_determination.svg)
![r2_barplots](./res/grape-Coefficient_of_determination.svg)



Expand Down
Loading

0 comments on commit 5ad78b3

Please sign in to comment.