Skip to content

Commit

Permalink
last updates pre notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
jsstevenson committed May 31, 2024
1 parent ea83b7d commit 0587708
Show file tree
Hide file tree
Showing 8 changed files with 439 additions and 30,092 deletions.
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -162,5 +162,6 @@ cython_debug/

*.pickle

# BLAT query file directory
mavedb_mapping/tmp/*
# mapping data/output
notebooks/analysis/analysis_files
notebooks/analysis/mavedb_files
26 changes: 21 additions & 5 deletions notebooks/analysis/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@

Code for data analysis and figure generation for ["Mapping MAVE data for use in human genomics applications" (Arbesfeld et. al.)](https://www.biorxiv.org/content/10.1101/2023.06.20.545702v1):

* `mavedb_mapping_analysis.ipynb`: This notebook applies the mapping algorithm to a set of 209 examined score sets from MaveDB, successfully creating mappings for ~2.5 million variant pairs across 207 score sets.
* `mapping_analysis.ipynb`: This notebook computes reference sequence concordance across the generated VRS mapping pairs. The notebook also computes the number of unique pre-mapped and post-mapped variants.
* `mavedb_scoreset_breakdown.ipynb`: This notebook generates the summary statistics that are described in the manuscript.
* [`mavedb_mapping.ipynb`](mavedb_mapping.ipynb): This notebook applies the mapping algorithm to a set of 209 examined score sets from MaveDB, successfully creating mappings for ~2.5 million variant pairs across 207 score sets.
* [`mapping_analysis.ipynb`](mapping_analysis.ipynb): This notebook computes reference sequence concordance across the generated VRS mapping pairs. The notebook also computes the number of unique pre-mapped and post-mapped variants.
* [`mavedb_scoreset_breakdown.ipynb`](mavedb_scoreset_breakdown.ipynb): This notebook generates the summary statistics that are described in the manuscript.

## Environment

Expand All @@ -23,6 +23,22 @@ Then install all requirements in `requirements.txt`:
python3 -m pip install -r requirements.txt
```

## Layout
## Directory layout

TODO
After executing mapping code, this directory will contain working and output data in the following locations:

```
├── README.md
├── analysis_files
│ ├── mappings
│ │ └── <mapping output files>
│ └── <mapping checkpoint files>
├── experiment_scoresets.txt
├── mapping_analysis.ipynb
├── mave_mapping_fig_3b.R
├── mavedb_files
│ └── <Scoreset records and metadata from MaveDB>
├── mavedb_mapping.ipynb
├── mavedb_scoreset_breakdown.ipynb
└── requirements.txt
```
Loading

0 comments on commit 0587708

Please sign in to comment.