last updates pre notebook

ave-dcd · May 31, 2024 · 0587708 · 0587708
1 parent ea83b7d
commit 0587708
Show file tree

Hide file tree

Showing 8 changed files with 439 additions and 30,092 deletions.
diff --git a/.gitignore b/.gitignore
@@ -162,5 +162,6 @@ cython_debug/
 
 *.pickle
 
-# BLAT query file directory
-mavedb_mapping/tmp/*
+# mapping data/output
+notebooks/analysis/analysis_files
+notebooks/analysis/mavedb_files
diff --git a/notebooks/analysis/README.md b/notebooks/analysis/README.md
@@ -2,9 +2,9 @@
 
 Code for data analysis and figure generation for ["Mapping MAVE data for use in human genomics applications" (Arbesfeld et. al.)](https://www.biorxiv.org/content/10.1101/2023.06.20.545702v1):
 
-* `mavedb_mapping_analysis.ipynb`: This notebook applies the mapping algorithm to a set of 209 examined score sets from MaveDB, successfully creating mappings for ~2.5 million variant pairs across 207 score sets.
-* `mapping_analysis.ipynb`: This notebook computes reference sequence concordance across the generated VRS mapping pairs. The notebook also computes the number of unique pre-mapped and post-mapped variants.
-* `mavedb_scoreset_breakdown.ipynb`: This notebook generates the summary statistics that are described in the manuscript.
+* [`mavedb_mapping.ipynb`](mavedb_mapping.ipynb): This notebook applies the mapping algorithm to a set of 209 examined score sets from MaveDB, successfully creating mappings for ~2.5 million variant pairs across 207 score sets.
+* [`mapping_analysis.ipynb`](mapping_analysis.ipynb): This notebook computes reference sequence concordance across the generated VRS mapping pairs. The notebook also computes the number of unique pre-mapped and post-mapped variants.
+* [`mavedb_scoreset_breakdown.ipynb`](mavedb_scoreset_breakdown.ipynb): This notebook generates the summary statistics that are described in the manuscript.
 
 ## Environment
 
@@ -23,6 +23,22 @@ Then install all requirements in `requirements.txt`:
 python3 -m pip install -r requirements.txt
 ```
 
-## Layout
+## Directory layout
 
-TODO
+After executing mapping code, this directory will contain working and output data in the following locations:
+
+```
+├── README.md
+├── analysis_files
+│   ├── mappings
+│   │   └── <mapping output files>
+│   └── <mapping checkpoint files>
+├── experiment_scoresets.txt
+├── mapping_analysis.ipynb
+├── mave_mapping_fig_3b.R
+├── mavedb_files
+│   └── <Scoreset records and metadata from MaveDB>
+├── mavedb_mapping.ipynb
+├── mavedb_scoreset_breakdown.ipynb
+└── requirements.txt
+```